Pandas - Break nested json into multiple rows

Question

I have my Dataframe in the below structure. I would like to break them based on the nested values within the details column

cust_id, name, details
101, Kevin, [{"id":1001,"country":"US","state":"OH"}, {"id":1002,"country":"US","state":"GA"}]
102, Scott, [{"id":2001,"country":"US","state":"OH"}, {"id":2002,"country":"US","state":"GA"}]

Expected output

cust_id, name, id, country, state
101, Kevin, 1001, US, OH
101, Kevin, 1002, US, GA
102, Scott, 2001, US, OH
102, Scott, 2002, US, GA

score 8 · Accepted Answer · 2021-11-08 19:36:34Z

8

df = df.explode('details').reset_index(drop=True)
df = df.merge(pd.json_normalize(df['details']), left_index=True, right_index=True).drop('details', axis=1)

df.explode("details") basically duplicates each row in the details N times, where N is the number of items in the array (if any) of details of that row
Since explode duplicates the rows, the original rows' indices (0 and 1) are copied to the new rows, so their indices are 0, 0, 1, 1, which messes up later processing. reset_index() creates a fresh new column for the index, starting at 0. drop=True is used because by default pandas will keep the old index column; this removes it.
pd.json_normalize(df['details']) converts the column (where each row contains a JSON object) to a new dataframe where each key unique of all the JSON objects is new column
df.merge() merges the new dataframe into the original one
left_index=True and right_index=True tells pandas to merge the specified dataframe starting from it's first, row into this dataframe, starting at its first row
.drop('details', axis=1) gets rid of the old details column containing the old objects

edited Nov 8, 2021 at 19:36

answered Nov 8, 2021 at 19:21

user17242583

Sign up to request clarification or add additional context in comments.

13 Comments

Kevin Nash Over a year ago

thanks for responding. I am however getting an error AttributeError: 'str' object has no attribute 'values' when I execute the second line

user17242583 Over a year ago

Oh weird. The dataframe you're running this on must not have the same structure as the sample one in your question. What does it look like?

Kevin Nash Over a year ago

though the number of columns might differ the structure is exactly the same. I am seeing the explode method (step 1) does not change the Dataframe. I also did check the pandas version and it is 1.0.5

Kevin Nash Over a year ago

@user17242583 I was able to get this working by doing the below df["details"] = df["details"].apply(json.loads) Appreciate all the help from your end.. Thanks

Kevin Nash Over a year ago

@user17242583 no worries appreciate all your help

|

Collectives™ on Stack Overflow

Pandas - Break nested json into multiple rows

1 Answer 1

13 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

13 Comments

Your Answer

Sign up or log in

Post as a guest

Related