1

I have a dataframe (df) of the form:

name alias col3
mark david ['3109892828','[email protected]','123 main st']
john twixt ['5468392873','[email protected]','345 grand st']

What is a concise way to split col3 into new, named columns? (perhaps using lambda and apply)

2 Answers 2

2

You could apply a join to the list elements to make a comma separated string and then call the vectorised str.split with expand=True to create the new columns:

In [12]:
df[['UserID', 'email', 'address']] = df['col3'].apply(','.join).str.split(expand=True)
df

Out[12]:
   alias                                        col3  name  \
0  david   [3109892828, [email protected], 123 main st]  mark   
1  twixt  [5468392873, [email protected], 345 grand st]  john   

                          UserID  email address  
0  3109892828,[email protected],123   main      st  
1  5468392873,[email protected],345  grand      st

A cleaner method would be to apply the pd.Series ctor which will turn each list into a Series:

In [15]:
df[['UserID', 'email', 'address']] = df['col3'].apply(pd.Series)
df

Out[15]:
   alias                                        col3  name      UserID  \
0  david   [3109892828, [email protected], 123 main st]  mark  3109892828   
1  twixt  [5468392873, [email protected], 345 grand st]  john  5468392873   

            email       address  
0  [email protected]   123 main st  
1  [email protected]  345 grand st  
Sign up to request clarification or add additional context in comments.

6 Comments

This might cause difficulties if the "columns" legitimately contain commas... Maybe something like df[['id', 'email', 'address']] = df.col3.apply(pd.Series) then drop col3 ?
Hmm. True but unless the OP has this in their data I did not consider it an issue, still applying the Series ctor is cleaner and enough here, will update, thanks
normally, this would be a great solution, but it seems my array does not have the same number of columns for each row.. so what can I do if the nested list does not have the same number of fields per record? Here is the error I get: ValueError: Columns must be same length as key
Here is the error I get when using split(): TypeError: split() got an unexpected keyword argument 'expand'
Well with inconsistent number of elements then you can't create the new columns unless the length is the same, what version of pandas are you using?
|
0

Here's what I came up with. It includes a bit of scrubbing of the raw file, and a conversion to a dictionary.

import pandas as pd
with open('/path/to/file', 'rb') as f:
    data = f.readlines()

data = map(lambda x: x.split('}'), data)
data_df = pd.DataFrame(data)
data_dfn = data_df.transpose()
data_new = data_dfn[0].map(lambda x: x.lstrip('[,{)').replace("'","").split(','))

s = pd.DataFrame(data_new)
d = dict(data_new)
D = pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in d.iteritems() ]))
D = D.transpose()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.