How to parse JSON column in pandas dataframe and concat the new dataframe to the original one?

Question

I have the following df sample:

{'id_user': {0: -8884522802746938515,
  1: -8884522802746938515,
  2: -8884522802746938515},
 'time': {0: '2021-01-01 11:10:34',
  1: '2021-01-01 11:11:48',
  2: '2021-01-01 11:12:38'},
 'data': {0: '{"fat": 4, "type": "FOOD_GENERAL", "unit": "1 mug (8 fl oz)", "title": "Cappuccino", "amount": 1.0, "protein": 4, "calories": 74, "foodType": 4, "recipeId": 7350, "servings": 1.0, "timestamp": "1609499434205", "ingredient": true, "carbohydrates": 6, "nutrientsData": {"iron": 0.19, "fiber": 0.2, "sugar": 6.41, "sodium": 50.0, "calcium": 144.0, "protein": 4.08, "fatTotal": 3.98, "vitaminA": 34.0, "potassium": 233.0, "cholesterol": 12.0, "fatSaturated": 2.273, "carbohydrates": 5.81, "energyConsumed": 74.0, "fatMonounsaturated": 1.007, "fatPolyunsaturated": 0.241}}',
  1: '{"fat": 1, "type": "FOOD_BRANDED", "unit": "1/2 cup prepared", "title": "Stove Top Stuffing Mix For Turkey (Kraft)", "amount": 1.0, "protein": 3, "calories": 110, "foodType": 5, "recipeId": 4072396, "servings": 1.0, "mealIndex": 2, "timestamp": "1609499508328", "ingredient": true, "carbohydrates": 21, "nutrientsData": {"iron": 1.3, "fiber": 1.0, "sugar": 2.0, "sodium": 370.0, "protein": 3.0, "fatTotal": 1.0, "potassium": 100.0, "carbohydrates": 21.0, "energyConsumed": 110.0}}',
  2: '{"fat": 1, "type": "FOOD_BRANDED", "unit": "1/2 cup prepared", "title": "Stove Top Stuffing Mix For Turkey (Kraft)", "amount": 1.0, "protein": 3, "calories": 110, "foodType": 5, "recipeId": 4072396, "servings": 1.0, "timestamp": "1609499558606", "ingredient": true, "carbohydrates": 21, "nutrientsData": {"iron": 1.3, "fiber": 1.0, "sugar": 2.0, "sodium": 370.0, "protein": 3.0, "fatTotal": 1.0, "potassium": 100.0, "carbohydrates": 21.0, "energyConsumed": 110.0}}'}}

I am doing the following on data column:

pd.json_normalize(df.data.apply(json_loads))

And as a result I get what I need but I want it to be glued to the original df. Should I just merge the dataframes on the index? Is there another approach to do it in one line or at once?

What's json_loads? And what do you get from that pd.json_normalize? — Quang Hoang
– Quang Hoang, Commented Jan 28, 2021 at 17:36
@QuangHoang json_loads parses the JSONs in the data column, json_normalize creates a row per parsed JSON and final result is a dataframe. I want this dataframe to be "glued" to my original df. I can merge on indexes but maybe there is even simpler solution. — SteveS
– SteveS, Commented Jan 28, 2021 at 17:56

Ferris · Accepted Answer · 2021-01-29 09:14:39Z

4

The data column in df should be converted from json to dict first.

Then use:

method1. use pd.json_normalize when df tranform to dict
method2. convert the df['data'] to dataframe, and merge to the origin df.

df['data'] = df['data'].map(json.loads)

# method1
dfn = pd.json_normalize(df.to_dict(orient='records'))

# method2
obj = df['data']
dfn = df.merge(pd.DataFrame(obj.tolist(), index = obj.index),
               left_index=True,
               right_index=True)

answered Jan 29, 2021 at 9:14

Ferris

5,6611 gold badge18 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

SteveS Over a year ago

I have used something similar to method 2. I have converted the column of JSONs to dataframe using json.load and apply and merged on index the both dataframes.

Ricky McMaster Over a year ago

Great answer. After much searching, method2 is the one that exactly fits my use case. Kudos.

Collectives™ on Stack Overflow

How to parse JSON column in pandas dataframe and concat the new dataframe to the original one?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related