Split pandas dataframe nested list into new named columns

Question

I have a dataframe (df) of the form:

name alias col3
mark david ['3109892828','[email protected]','123 main st']
john twixt ['5468392873','[email protected]','345 grand st']

What is a concise way to split col3 into new, named columns? (perhaps using lambda and apply)

EdChum · Accepted Answer · 2015-09-18 15:21:44Z

2

You could apply a join to the list elements to make a comma separated string and then call the vectorised str.split with expand=True to create the new columns:

In [12]:
df[['UserID', 'email', 'address']] = df['col3'].apply(','.join).str.split(expand=True)
df

Out[12]:
   alias                                        col3  name  \
0  david   [3109892828, [email protected], 123 main st]  mark   
1  twixt  [5468392873, [email protected], 345 grand st]  john   

                          UserID  email address  
0  3109892828,[email protected],123   main      st  
1  5468392873,[email protected],345  grand      st

A cleaner method would be to apply the pd.Series ctor which will turn each list into a Series:

In [15]:
df[['UserID', 'email', 'address']] = df['col3'].apply(pd.Series)
df

Out[15]:
   alias                                        col3  name      UserID  \
0  david   [3109892828, [email protected], 123 main st]  mark  3109892828   
1  twixt  [5468392873, [email protected], 345 grand st]  john  5468392873   

            email       address  
0  [email protected]   123 main st  
1  [email protected]  345 grand st

edited Sep 18, 2015 at 15:21

answered Sep 18, 2015 at 15:01

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Jon Clements Over a year ago

This might cause difficulties if the "columns" legitimately contain commas... Maybe something like df[['id', 'email', 'address']] = df.col3.apply(pd.Series) then drop col3 ?

EdChum Over a year ago

Hmm. True but unless the OP has this in their data I did not consider it an issue, still applying the Series ctor is cleaner and enough here, will update, thanks

DNburtonguster Over a year ago

normally, this would be a great solution, but it seems my array does not have the same number of columns for each row.. so what can I do if the nested list does not have the same number of fields per record? Here is the error I get: ValueError: Columns must be same length as key

DNburtonguster Over a year ago

Here is the error I get when using split(): TypeError: split() got an unexpected keyword argument 'expand'

EdChum Over a year ago

Well with inconsistent number of elements then you can't create the new columns unless the length is the same, what version of pandas are you using?

|

DNburtonguster · Accepted Answer · 2015-09-18 21:42:03Z

0

Here's what I came up with. It includes a bit of scrubbing of the raw file, and a conversion to a dictionary.

import pandas as pd
with open('/path/to/file', 'rb') as f:
    data = f.readlines()

data = map(lambda x: x.split('}'), data)
data_df = pd.DataFrame(data)
data_dfn = data_df.transpose()
data_new = data_dfn[0].map(lambda x: x.lstrip('[,{)').replace("'","").split(','))

s = pd.DataFrame(data_new)
d = dict(data_new)
D = pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in d.iteritems() ]))
D = D.transpose()

answered Sep 18, 2015 at 21:42

DNburtonguster

3673 silver badges15 bronze badges

Collectives™ on Stack Overflow

Split pandas dataframe nested list into new named columns

2 Answers 2

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related