0

I have the following df sample:

{'id_user': {0: -8884522802746938515,
  1: -8884522802746938515,
  2: -8884522802746938515},
 'time': {0: '2021-01-01 11:10:34',
  1: '2021-01-01 11:11:48',
  2: '2021-01-01 11:12:38'},
 'data': {0: '{"fat": 4, "type": "FOOD_GENERAL", "unit": "1 mug (8 fl oz)", "title": "Cappuccino", "amount": 1.0, "protein": 4, "calories": 74, "foodType": 4, "recipeId": 7350, "servings": 1.0, "timestamp": "1609499434205", "ingredient": true, "carbohydrates": 6, "nutrientsData": {"iron": 0.19, "fiber": 0.2, "sugar": 6.41, "sodium": 50.0, "calcium": 144.0, "protein": 4.08, "fatTotal": 3.98, "vitaminA": 34.0, "potassium": 233.0, "cholesterol": 12.0, "fatSaturated": 2.273, "carbohydrates": 5.81, "energyConsumed": 74.0, "fatMonounsaturated": 1.007, "fatPolyunsaturated": 0.241}}',
  1: '{"fat": 1, "type": "FOOD_BRANDED", "unit": "1/2 cup prepared", "title": "Stove Top Stuffing Mix For Turkey (Kraft)", "amount": 1.0, "protein": 3, "calories": 110, "foodType": 5, "recipeId": 4072396, "servings": 1.0, "mealIndex": 2, "timestamp": "1609499508328", "ingredient": true, "carbohydrates": 21, "nutrientsData": {"iron": 1.3, "fiber": 1.0, "sugar": 2.0, "sodium": 370.0, "protein": 3.0, "fatTotal": 1.0, "potassium": 100.0, "carbohydrates": 21.0, "energyConsumed": 110.0}}',
  2: '{"fat": 1, "type": "FOOD_BRANDED", "unit": "1/2 cup prepared", "title": "Stove Top Stuffing Mix For Turkey (Kraft)", "amount": 1.0, "protein": 3, "calories": 110, "foodType": 5, "recipeId": 4072396, "servings": 1.0, "timestamp": "1609499558606", "ingredient": true, "carbohydrates": 21, "nutrientsData": {"iron": 1.3, "fiber": 1.0, "sugar": 2.0, "sodium": 370.0, "protein": 3.0, "fatTotal": 1.0, "potassium": 100.0, "carbohydrates": 21.0, "energyConsumed": 110.0}}'}}

I am doing the following on data column:

pd.json_normalize(df.data.apply(json_loads))

And as a result I get what I need but I want it to be glued to the original df. Should I just merge the dataframes on the index? Is there another approach to do it in one line or at once?

2
  • 1
    What's json_loads? And what do you get from that pd.json_normalize? Commented Jan 28, 2021 at 17:36
  • @QuangHoang json_loads parses the JSONs in the data column, json_normalize creates a row per parsed JSON and final result is a dataframe. I want this dataframe to be "glued" to my original df. I can merge on indexes but maybe there is even simpler solution. Commented Jan 28, 2021 at 17:56

1 Answer 1

4

The data column in df should be converted from json to dict first.

Then use:

  • method1. use pd.json_normalize when df tranform to dict
  • method2. convert the df['data'] to dataframe, and merge to the origin df.
df['data'] = df['data'].map(json.loads)

# method1
dfn = pd.json_normalize(df.to_dict(orient='records'))

# method2
obj = df['data']
dfn = df.merge(pd.DataFrame(obj.tolist(), index = obj.index),
               left_index=True,
               right_index=True)
Sign up to request clarification or add additional context in comments.

2 Comments

I have used something similar to method 2. I have converted the column of JSONs to dataframe using json.load and apply and merged on index the both dataframes.
Great answer. After much searching, method2 is the one that exactly fits my use case. Kudos.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.