0

JSON STR:

{
"PurchaseId": "Pur-001",
"Orders": [{
    "id": "154",
    "isOnline": false,
    "Store_location": {
        "Order-Date": "2019-06-04T07:35:00"
    },
    "Store_Network": [{
        "Network_Domain": "Food_Processing"
    }]
}],
"Sales": [{
    "id": "1856",
    "SalesLoads": [
        1000,
        3000,
        5000
    ],
    "Network": [{
        "id": "London_Store",
        "history": [
            0,
            1,
            2,
            0,
            0,
            0,
            0,
            0
        ],
        "Leads": {
            "From": "Mgmt-Dept",
            "time": "34hrs"
        }
    }]
}]

}

Expected Dataframe: enter image description here

My code so far:

import pandas.io.json as pd_json
data = pd_json.loads(json_str)
df=pd_json.json_normalize(data, record_path='loads')

I've tried JSON_Normalize but unable to load this JSON string into dataframe. Is it possible to do it using JSON Normalize or is there any other optimized solution available.

4
  • json validator gives "Invalid JSON" on your json string Commented Jul 26, 2019 at 10:42
  • @RomanPerekhrest, I got it fixed. Updated the question with valid JSON . Commented Jul 26, 2019 at 10:54
  • unfortunately, I get _recursive_extract(obj[path[0]], path[1:], TypeError: string indices must be integers in the middle of json_normalize Commented Jul 26, 2019 at 11:46
  • No problem..Thanks for all your help. I will look for some work around. Commented Jul 26, 2019 at 11:47

1 Answer 1

1

This is pretty long, but gets the job done. Hopefully someone answers with a better solution and less verbose.

a = {
"PurchaseId": "Pur-001",
"Orders": [{
    "id": "154",
    "isOnline": False,
    "Store_location": {
        "Order-Date": "2019-06-04T07:35:00"
    },
    "Store_Network": [{
    "Network_Domain": "Food_Processing"
}]
}],
"Sales": [{
    "id": "1856",
    "SalesLoads": [
    1000,
    3000,
    5000
],
"Network": [{
    "id": "London_Store",
    "history": [
        0,
        1,
        2,
        0,
        0,
        0,
        0,
        0
    ],
    "Leads": {
        "From": "Mgmt-Dept",
        "time": "34hrs"
    }
}]
}]}

b = pd.DataFrame.from_dict(a)


b = (b.assign(Orders_id = b.Orders[0]['id'],
              Orders_isOnline = b.Orders[0]['isOnline'],
              Orders_Store_Location_Number = pd.to_datetime(b.Orders[0]['Store_location']['Order-Date'].split('T')[0])
                                               .strftime('%m/%d/%Y'),
              Orders_Store_Network_Domain = b.Orders[0]['Store_Network'][0]['Network_Domain'],
              Sales_id = b.Sales[0]['id'],
              Sales_Load = [b.Sales[0]['SalesLoads']],
              Sales_Network_id = b.Sales[0]['Network'][0]['id'],
              Sales_Network_history = [b.Sales[0]['Network'][0]['history']],
              Sales_Leads_from = b.Sales[0]['Network'][0]['Leads']['From'],
              Sales_Lead_Time = b.Sales[0]['Network'][0]['Leads']['time']                                                    
            )
      .drop(['Orders','Sales'],axis=1)
     )

b    
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.