-1

I wanted to try to parse this nested json using Pandas, and I'm confused when i wanted to extract the data from column "amount" and "items", and the data has so many rows like hundreds, this is one of the example

{
    "_id": "62eaa99b014c9bb30203e48a",
    "amount": {
      "product": 291000,
      "shipping": 75000,
      "admin_fee": 4500,
      "order_voucher_deduction": 0,
      "transaction_voucher_deduction": 0,
      "total": 366000,
      "paid": 366000
    },
    "status": 32,
    "items": [
      {
        "_id": "62eaa99b014c9bb30203e48d",
        "earning": 80400,
        "variants": [
          {
            "name": "Color",
            "value": "Black"
          },
          {
            "name": "Size",
            "value": "38"
          }
        ],
        "marketplace_price": 65100,
        "product_price": 62000,
        "reseller_price": 145500,
        "product_id": 227991,
        "name": "Heels",
        "sku_id": 890512,
        "internal_markup": 3100,
        "weight": 500,
        "image": "https://product-asset.s3.ap-southeast-1.amazonaws.com/1659384575578.jpeg",
        "quantity": 1,
        "supplier_price": 60140
      }

I've tried using this and it'd only shows the index

dfjson=pd.json_normalize(datasetjson)
dfjson.head(3)

enter image description here

##UPDATE enter image description here

I tried added the pd.Dataframe , yes it works to become dataframe, but i still haven't got to know how to extract the _id, earning, variants

4 Answers 4

1

Given:

data = {
 '_id': '62eaa99b014c9bb30203e48a',
 'amount': {'admin_fee': 4500,
            'order_voucher_deduction': 0,
            'paid': 366000,
            'product': 291000,
            'shipping': 75000,
            'total': 366000,
            'transaction_voucher_deduction': 0},
 'items': [{'_id': '62eaa99b014c9bb30203e48d',
            'earning': 80400,
            'image': 'https://product-asset.s3.ap-southeast-1.amazonaws.com/1659384575578.jpeg',
            'internal_markup': 3100,
            'marketplace_price': 65100,
            'name': 'Heels',
            'product_id': 227991,
            'product_price': 62000,
            'quantity': 1,
            'reseller_price': 145500,
            'sku_id': 890512,
            'supplier_price': 60140,
            'variants': [{'name': 'Color', 'value': 'Black'},
                         {'name': 'Size', 'value': '38'}],
            'weight': 500}],
 'status': 32
}

Doing:

df = pd.json_normalize(data, ['items'], ['amount'])
df = df.join(df.amount.apply(pd.Series))
df = df.join(df.variants.apply(pd.DataFrame)[0].set_index('name').T.reset_index(drop=True))
df = df.drop(['amount', 'variants'], axis=1)
print(df)

Output:

                        _id  earning  marketplace_price  product_price  reseller_price  product_id   name  sku_id  internal_markup  weight                                              image  quantity  supplier_price  product  shipping  admin_fee  order_voucher_deduction  transaction_voucher_deduction   total    paid  Color Size
0  62eaa99b014c9bb30203e48d    80400              65100          62000          145500      227991  Heels  890512             3100     500  https://product-asset.s3.ap-southeast-1.amazon...         1           60140   291000     75000       4500                        0                              0  366000  366000  Black   38

There's probably a better way to do some of this, but the sample provided wasn't even a valid json object, so I can't be sure what the real data actually looks like.

Sign up to request clarification or add additional context in comments.

Comments

0

Try pd.json_normalize(datasetjson, max_level=0)

1 Comment

it gave me an error of : AttributeError: 'str' object has no attribute 'values'
0

I guess you are confuse working with dictionaries or json format.

This line is the same sample you have but it's missed ]} at the end. I formatted removing blank spaces but it's the same:

dfjson = {"_id":"62eaa99b014c9bb30203e48a","amount":{"product":291000,"shipping":75000,"admin_fee":4500,"order_voucher_deduction":0,"transaction_voucher_deduction":0,"total":366000,"paid":366000},"status":32,"items":[{"_id":"62eaa99b014c9bb30203e48d","earning":80400,"variants":[{"name":"Color","value":"Black"},{"name":"Size","value":"38"}],"marketplace_price":65100,"product_price":62000,"reseller_price":145500,"product_id":227991,"name":"Heels","sku_id":890512,"internal_markup":3100,"weight":500,"image":"https://product-asset.s3.ap-southeast-1.amazonaws.com/1659384575578.jpeg","quantity":1,"supplier_price":60140}]}

Now, if you want to call amount:

dfjson['amount']
# Output
{'product': 291000,
 'shipping': 75000,
 'admin_fee': 4500,
 'order_voucher_deduction': 0,
 'transaction_voucher_deduction': 0,
 'total': 366000,
 'paid': 366000}

If you want to call items:

dfjson['items']
# Output
[{'_id': '62eaa99b014c9bb30203e48d',
  'earning': 80400,
  'variants': [{'name': 'Color', 'value': 'Black'},
   {'name': 'Size', 'value': '38'}],
  'marketplace_price': 65100,
  'product_price': 62000,
  'reseller_price': 145500,
  'product_id': 227991,
  'name': 'Heels',
  'sku_id': 890512,
  'internal_markup': 3100,
  'weight': 500,
  'image': 'https://product-asset.s3.ap-southeast-1.amazonaws.com/1659384575578.jpeg',
  'quantity': 1,
  'supplier_price': 60140}]

For getting the items, you can create a list:

list_items = []
for i in dfjson['items']:
    list_items.append(i)

2 Comments

yes i've tried this, but it seems like it isn't output-ing like table format? like when i used pd.read_json, it's output are in table format, but when i only do like that, it's more like a lists? and my goal is that i wanted to take the items.product_id to join later with my csv data source
You can make a for loop to save those items. I just update the answer.
0

For how to import the entire json data into a pandas DataFrame check out the answer given by BeRT2me

but if you are only after extracting _id, earning, variants to a Pandas DataFrame giving:

                        _id                    _id_id  earning  variants 
0  62eaa99b014c9bb30203e48a  62eaa99b014c9bb30203e48d    80400  [{'name': 'Color', 'value': 'Black'}, {'name':...  

as you state in your question:

but I still haven't got to know how to extract the _id, earning, variants

notice that the problem with extracting _id, earning, variants is that this values are 'hidden" within a list of one item. Resolving this issue with indexing it by [0] gives the required values:

json_text = """\
{'_id': '62eaa99b014c9bb30203e48a',
 'amount': {'admin_fee': 4500,
            'order_voucher_deduction': 0,
            'paid': 366000,
            'product': 291000,
            'shipping': 75000,
            'total': 366000,
            'transaction_voucher_deduction': 0},
 'items': [{'_id': '62eaa99b014c9bb30203e48d',
            'earning': 80400,
            'image': 'https://product-asset.s3.ap-southeast-1.amazonaws.com/1659384575578.jpeg',
            'internal_markup': 3100,
            'marketplace_price': 65100,
            'name': 'Heels',
            'product_id': 227991,
            'product_price': 62000,
            'quantity': 1,
            'reseller_price': 145500,
            'sku_id': 890512,
            'supplier_price': 60140,
            'variants': [{'name': 'Color', 'value': 'Black'},
                         {'name': 'Size', 'value': '38'}],
            'weight': 500}],
 'status': 32}

json_dict = eval(json_text)

print(f'{(_id      := json_dict["items"][0]["_id"])=}')
print(f'{(earning  := json_dict["items"][0]["earning"])=}')
print(f'{(variants := json_dict["items"][0]["variants"])=}')
print('---')
print(f'{_id=}')
print(f'{earning=}')
print(f'{variants=}')

gives:

(_id      := json_dict["items"][0]["_id"])='62eaa99b014c9bb30203e48d'
(earning  := json_dict["items"][0]["earning"])=80400
(variants := json_dict["items"][0]["variants"])=[{'name': 'Color', 'value': 'Black'}, {'name': 'Size', 'value': '38'}]
---
_id='62eaa99b014c9bb30203e48d'
earning=80400
variants=[{'name': 'Color', 'value': 'Black'}, {'name': 'Size', 'value': '38'}]```

If you want in addition a Pandas DataFrame with rows holding all this extracted values you can loop over all your json data files adding a row to a DataFrame using:

# Create an empty DataFrame: 
df = pd.DataFrame(columns=['_id', '_id_id', 'earning', 'variants'])
# Add rows to df in a loop processing the json data files: 
df_to_append = pd.DataFrame(
    [[json_dict['_id'], _id, earning, variants]], 
    columns=['_id', '_id_id', 'earning', 'variants']
    )
df = df.append(df_to_append)
pd.set_option('display.max_columns', None)
print(df.to_dict())

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.