split one string column to multiple columns in Python

Question

I have a following dataframe:

df = pd.DataFrame({'scene':[{"living":"0.515","kitchen":"0.297"}, {"kitchen":"0.401","study":"0.005"}, {"study":"0.913"}, {}, {"others":"0"}], 'id':[1, 2, 3 ,4, 5]}) 

id        scene
01      {"living":"0.515","kitchen":"0.297"}
02      {"kitchen":"0.401","study":"0.005"}
03      {"study":"0.913"}
04      {}
05      {"others":"0"}

and I want to create a new dataframe as shown below, can someone help me to create this using Pandas?

id      living     kitchen     study     others
01      0.515       0.297        0         0 
02        0         0.401      0.005       0
03        0           0        0.913       0
04        0           0          0         0 
05        0           0          0         0

Lev Zakharov · Accepted Answer · 2018-08-24 12:10:14Z

4

Simple solution is to convert your scene column to the list of dictionaries and create new data frame with default constructor:

pd.DataFrame(df.scene.tolist()).fillna(0)

Result:

  kitchen living others  study
0   0.297  0.515      0      0
1   0.401      0      0  0.005
2       0      0      0  0.913
3       0      0      0      0
4       0      0      0      0

One of the "default" way to create DataFrame is to use a list of dictionaries. In this case each dictionary of list will be converted to the separate row and each key of dict will be used for the column heading.

edited Aug 24, 2018 at 12:10

answered Aug 24, 2018 at 4:22

Lev Zakharov

2,4371 gold badge13 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ah bon Over a year ago

you mean the method below?

Raunaq Jain · Accepted Answer · 2018-08-25 06:53:34Z

2

On your data,

df = pd.DataFrame({'scene':[{"living":"0.515","kitchen":"0.297"}, {"kitchen":"0.401","study":"0.005"}, 
                        {"study":"0.913"}, {}, {"others":"0"}], 
               'id':[1, 2, 3 ,4,5], 's': ['a','b','c','d','e']})

df:
    id  s   scene
0   1   a   {'kitchen': '0.297', 'living': '0.515'}
1   2   b   {'kitchen': '0.401', 'study': '0.005'}
2   3   c   {'study': '0.913'}
3   4   d   {}
4   5   e   {'others': '0'}

There are two ways you can go about doing this,

In a single line, where you have to input all column names except 'scene' to set_index function

df = df.set_index(['id', 's'])['scene'].apply(pd.Series).fillna(0).reset_index()

which will output:

   id   s   kitchen living  study   others
0   1   a   0.297   0.515   0       0
1   2   b   0.401   0       0.005   0
2   3   c   0       0       0.913   0
3   4   d   0       0       0       0
4   5   e   0       0       0       0

In two lines, where you create your excepted result and concat it to the original dataframe.

df1 = df.scene.apply(pd.Series).fillna(0)
df = pd.concat([df, df1], axis=1)

which gives,

   id   s                                    scene  kitchen living  study others
0   1   a   {'kitchen': '0.297', 'living': '0.515'} 0.297   0.515   0     0
1   2   b    {'kitchen': '0.401', 'study': '0.005'} 0.401   0    0.005    0
2   3   c                        {'study': '0.913'} 0       0   0.913     0
3   4   d                                        {} 0       0      0      0
4   5   e                           {'others': '0'} 0       0      0      0

edited Aug 25, 2018 at 6:53

answered Aug 24, 2018 at 4:32

Raunaq Jain

9177 silver badges13 bronze badges

21 Comments

ah bon Over a year ago

thanks. but what i want to do is not just fillna but also transform JSON style dataframe to as you mentioned above.

Raunaq Jain Over a year ago

Can you clarify your question? Your original dataframe is the first one and you want to transform it to the second one, right?

ah bon Over a year ago

Yes, exactly i want transform first one to second one.

Raunaq Jain Over a year ago

and my answer doesn't transform the first to second?

ah bon Over a year ago

sorry nope. need to import json and use a list of dictionaries.

|

ah bon · Accepted Answer · 2018-08-25 07:12:25Z

0

Updated. This one works perfectly. Welcome to give your suggestions to keep it more concise.

import json
import pandas as pd

df = pd.DataFrame({'scene':[{"living":"0.515","kitchen":"0.297"}, {"kitchen":"0.401","study":"0.005"}, {"study":"0.913"}, {}, {"others":"0"}], 'id':[1, 2, 3 ,4,5], 's':['a','b','c','d','e']}) 
def test(Scene, type):
    Scene = json.loads(Scene)
    if type in Scene.keys():
        return Scene[type]
    else:
        return ""

a = ['living', 'kitchen', 'study', 'others']
for b in a:
    df[b] = df['Scene'].map(lambda Scene: test(Scene, b.lower()))

cols = ['living', 'kitchen', 'study', 'others']
df[cols] = df[cols].replace({'': 0})
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce', axis=1)

edited Aug 25, 2018 at 7:12

answered Aug 24, 2018 at 9:01

ah bon

10.1k22 gold badges82 silver badges184 bronze badges

Comments

ah bon · Accepted Answer · 2018-08-27 03:48:14Z

0

The perfect one line solution is here, thanks for all helps:

df.join(df['scene'].apply(json.loads).apply(pd.Series))

answered Aug 27, 2018 at 3:48

ah bon

10.1k22 gold badges82 silver badges184 bronze badges

Collectives™ on Stack Overflow

split one string column to multiple columns in Python

4 Answers 4

1 Comment

21 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

21 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related