4

I have an input DataFrame df which is as follows (ID IS NOT 1,2,3):

| id    | name                                                                                  |
|-------|---------------------------------------------------------------------------------------|
| a1xy  | [  {  "event": "sports",   "start": "100"},  {  "event": "lunch",  "start": "121" } ] |
| a7yz  | [  {  "event": "lunch",   "start": "109"},  {  "event": "movie",  "start": "97" } ]   |
| bx4y  | [  {  "event": "dinner",   "start": "78"},  {  "event": "sleep",  "start": "25" } ]   |

I want to flatten the JSON array elements so that my result output is:

| id    | name.event | name.start |
|-------|------------|------------|
| a1xy  | sports     | 100        |
| a1xy  | lunch      | 121        |
| a7yz  | lunch      | 109        |
| a7yz  | movie      | 97         |
| bx4y  | dinner     | 78         |
| bx4y  | sleep      | 25         |

How can I do this in Python?

1
  • 1
    can you provide the source of the data, so it becomes easy to recreate this Commented Apr 20, 2018 at 19:01

2 Answers 2

1

You can use python json library to parse JSON with pandas apply function and create a list which you can later convert to dataframe using pandas concat function then change the index of that dataframe.

import json
ll = df.name.apply(lambda row: pd.DataFrame(json.loads(row))).tolist()
new_df = pd.concat(ll)
new_df.index = pd.Series(new_df.index).shift(-1).fillna(0).cumsum()

new_df

      event start
1.0  sports   100
1.0   lunch   121
2.0   lunch   109
2.0   movie    97
3.0  dinner    78
3.0   sleep    25
Sign up to request clarification or add additional context in comments.

Comments

0

Use apply(pd.DataFrame)

k= df.name.apply(pd.DataFrame).tolist()

final_df = pd.concat(k)
final_df.index = pd.Series(final_df.index).shift(-1).fillna(0).cumsum()

final_df
    event start
1  sports   100
1   lunch   121
2   lunch   109
2   movie    97
3  dinner    78
3   sleep    25

4 Comments

The values in the input dataframe column id will not to be 1,2,3; it will be varchar such as a1xy, a7yz,bx4y. can the code be modified to reflect this?
I don't get using the 'name' variable in 'df.name.aply..'. I'm getting an AttributeError..
AttributeError: 'DataFrame' object has no attribute 'name'
@Chiel In this specific example, name was the name of the column. You should change this for the actual name of your column. For example, if your column name is name, then df['name'] or df.name should work fine: but if your column name is age or sales, then you should use df['age'] (or df.age) and df['sales'] (or df.sales)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.