Split dataframe string (when string can hold n values of that cell variable), into multiple columns

Question

Currently working on a dataset with a lot of contact data, being Emails one of the variables.

A cell in the Emails column can have more than one email (1 to n) and they are all separated by a comma and a space.

For contacts with only two emails, the process would be quite straightforward. One can split the string and create a new column for that secondary email as follows

email_df[['Emails', 'SecondaryEmail']] = email_df['Emails'].str.split(', ', expand=True)

However this won't work with more than 2 emails. Therefore, I wonder what is the most efficient way to split the emails when the number of emails can go from 1 to n (in this case the n is limited to around 10 but that won't always be the case), into columns with only one email each (and different names each)?

jezrael · Accepted Answer · 2022-03-03 12:50:10Z

1

Use Series.str.splitSeries.str.rsplit with DataFrame.pop for remove column Email after processing:

df = email_df.join(email_df.pop('Emails').str.split(', ', expand=True).add_prefix('Email'))

edited Mar 3, 2022 at 12:50

answered Mar 3, 2022 at 12:24

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Gonçalo Peres Over a year ago

Thank you @jezrael - it is exactly this. One note, for those that haven't tested, after this one might want to drop the column "Emails". If that is the case df .drop('Emails', axis=1, inplace=True)

Collectives™ on Stack Overflow

Split dataframe string (when string can hold n values of that cell variable), into multiple columns

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related