9

I am trying to unpack nested JSON in the following pandas dataframe:

           id                                                              info
0           0  [{u'a': u'good', u'b': u'type1'}, {u'a': u'bad', u'b': u'type2'}]
1           1  [{u'a': u'bad', u'b': u'type1'}, {u'a': u'bad', u'b': u'type2'}]
2           2  [{u'a': u'good', u'b': u'type1'}, {u'a': u'good', u'b': u'type2'}]

My expected outcome is:

           id        type1    type2
0           0        good     bad
1           1        bad      bad
2           2        good     good

I've been looking at other solutions including json_normalize but it does not work for me unfortunately. Should I treat the JSON as a string to get what I want? Or is there a more straight forward way to do this?

0

1 Answer 1

11
  1. Use json_normalize to handle a list of dictionaries and break individual dicts into separate series after setting the common path, which is info here. Then, unstack + apply series which gets appended downwards for that level.

from pandas.io.json import json_normalize

df_info = json_normalize(df.to_dict('list'), ['info']).unstack().apply(pd.Series)
df_info

enter image description here

  1. Pivot the DF with an optional aggfunc to handle duplicated index axis:

DF = df_info.pivot_table(index=df_info.index.get_level_values(1), columns=['b'], 
                         values=['a'], aggfunc=' '.join)

DF

enter image description here

  1. Finally Concatenate sideways:

pd.concat([df[['ID']], DF.xs('a', axis=1).rename_axis(None, 1)], axis=1)

enter image description here


Starting DF used:

df = pd.DataFrame(dict(ID=[0,1,2], info=[[{u'a': u'good', u'b': u'type1'}, {u'a': u'bad', u'b': u'type2'}], 
                                        [{u'a': u'bad', u'b': u'type1'}, {u'a': u'bad', u'b': u'type2'}],
                                        [{u'a': u'good', u'b': u'type1'}, {u'a': u'good', u'b': u'type2'}]]))
Sign up to request clarification or add additional context in comments.

1 Comment

Worked like a charm. Just had to change the rows type from strto list first. Nice little hack. Many Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.