How do I flatten JSON Array Elements in a pandas DataFrame

Question

I have an input DataFrame df which is as follows (ID IS NOT 1,2,3):

| id    | name                                                                                  |
|-------|---------------------------------------------------------------------------------------|
| a1xy  | [  {  "event": "sports",   "start": "100"},  {  "event": "lunch",  "start": "121" } ] |
| a7yz  | [  {  "event": "lunch",   "start": "109"},  {  "event": "movie",  "start": "97" } ]   |
| bx4y  | [  {  "event": "dinner",   "start": "78"},  {  "event": "sleep",  "start": "25" } ]   |

I want to flatten the JSON array elements so that my result output is:

| id    | name.event | name.start |
|-------|------------|------------|
| a1xy  | sports     | 100        |
| a1xy  | lunch      | 121        |
| a7yz  | lunch      | 109        |
| a7yz  | movie      | 97         |
| bx4y  | dinner     | 78         |
| bx4y  | sleep      | 25         |

How can I do this in Python?

can you provide the source of the data, so it becomes easy to recreate this — skt7
– skt7, Commented Apr 20, 2018 at 19:01

skt7 · Accepted Answer · 2018-04-20 20:22:49Z

1

You can use python json library to parse JSON with pandas apply function and create a list which you can later convert to dataframe using pandas concat function then change the index of that dataframe.

import json
ll = df.name.apply(lambda row: pd.DataFrame(json.loads(row))).tolist()
new_df = pd.concat(ll)
new_df.index = pd.Series(new_df.index).shift(-1).fillna(0).cumsum()

new_df

      event start
1.0  sports   100
1.0   lunch   121
2.0   lunch   109
2.0   movie    97
3.0  dinner    78
3.0   sleep    25

answered Apr 20, 2018 at 20:22

skt7

1,2351 gold badge9 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

rafaelc · Accepted Answer · 2018-04-20 19:09:45Z

0

Use apply(pd.DataFrame)

k= df.name.apply(pd.DataFrame).tolist()

final_df = pd.concat(k)
final_df.index = pd.Series(final_df.index).shift(-1).fillna(0).cumsum()

final_df
    event start
1  sports   100
1   lunch   121
2   lunch   109
2   movie    97
3  dinner    78
3   sleep    25

edited Apr 20, 2018 at 19:09

answered Apr 20, 2018 at 19:02

rafaelc

59.4k15 gold badges64 silver badges87 bronze badges

4 Comments

Symphony Over a year ago

The values in the input dataframe column id will not to be 1,2,3; it will be varchar such as a1xy, a7yz,bx4y. can the code be modified to reflect this?

Chiel Over a year ago

I don't get using the 'name' variable in 'df.name.aply..'. I'm getting an AttributeError..

Chiel Over a year ago

AttributeError: 'DataFrame' object has no attribute 'name'

rafaelc Over a year ago

@Chiel In this specific example, name was the name of the column. You should change this for the actual name of your column. For example, if your column name is name, then df['name'] or df.name should work fine: but if your column name is age or sales, then you should use df['age'] (or df.age) and df['sales'] (or df.sales)

Collectives™ on Stack Overflow

How do I flatten JSON Array Elements in a pandas DataFrame

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related