4

I am trying to transform a Pandas DataFrame into a new one with every item from a certain column given its own row. For example:

Before:

   ID             Name        Date   Location
0   0       John, Dave  01/01/1992     Mexico
1   1              Tim  07/07/1997  Australia
2   2       Mike, John  12/24/2012     Zambia
3   3  Bob, Rick, Tony  05/17/2007       Cuba
4   4            Roger  04/05/2000    Iceland
5   5           Carlos  05/24/1995       Guam

Current Solution:

new_df = pd.DataFrame(columns = df.columns)
for index,row in df.iterrows():
    new_row = pd.DataFrame(df.loc[index]).transpose()
    target_info = df.loc[index,'Name']
    if (len(target_info.split(',')) > 1):
        for item in target_info.split(','):
            new_row.loc[index,'Name'] = item
           new_df = new_df.append(new_row)
    else:
        new_df = new_df.append(new_row)  

After:

  ID    Name        Date   Location
0  0    John  01/01/1992     Mexico
0  0    Dave  01/01/1992     Mexico
1  1     Tim  07/07/1997  Australia
2  2    Mike  12/24/2012     Zambia
2  2    John  12/24/2012     Zambia
3  3     Bob  05/17/2007       Cuba
3  3    Rick  05/17/2007       Cuba
3  3    Tony  05/17/2007       Cuba
4  4   Roger  04/05/2000    Iceland
5  5  Carlos  05/24/1995       Guam

Surely there is something more elegant?

2 Answers 2

2

You could get the split names as a Series, drop your existing Name column, then join the split names.

# Split the 'Name' column as a Series, setting the appropriate name and index.
split_names = df['Name'].str.split(', ', expand=True).stack()
split_names.name = 'Name'
split_names.index = split_names.index.get_level_values(0)

# Drop the existing 'Name' column, and join the split names.
df.drop('Name', axis=1, inplace=True)
df = df.join(split_names)

The resulting output is the same as in your example, but with the Name column last. You can reorder the columns if you want the original order.

   ID        Date   Location    Name
0   0  01/01/1992     Mexico    John
0   0  01/01/1992     Mexico    Dave
1   1  07/07/1997  Australia     Tim
2   2  12/24/2012     Zambia    Mike
2   2  12/24/2012     Zambia    John
3   3  05/17/2007       Cuba     Bob
3   3  05/17/2007       Cuba    Rick
3   3  05/17/2007       Cuba    Tony
4   4  04/05/2000    Iceland   Roger
5   5  05/24/1995       Guam  Carlos
Sign up to request clarification or add additional context in comments.

Comments

1

you can do it this way:

nm = df.Name.str.split(',\s*', expand=True)
cols=list(set(df.columns) - set(['Name']))

pd.melt(df[cols].join(nm),
        id_vars=cols,
        value_vars=nm.columns.tolist(),
        value_name='Name') \
  .dropna() \
  .drop(['variable'], axis=1) \
  .sort_values('ID')

Step by step:

In [128]: nm = df.Name.str.split(',\s*', expand=True)

In [129]: nm
Out[129]:
        0     1     2
0    John  Dave  None
1     Tim  None  None
2    Mike  John  None
3     Bob  Rick  Tony
4   Roger  None  None
5  Carlos  None  None

In [130]: cols=list(set(df.columns) - set(['Name']))

In [131]: cols
Out[131]: ['Date', 'ID', 'Location']

In [133]: pd.melt(df[cols].join(nm),
   .....:         id_vars=cols,
   .....:         value_vars=nm.columns.tolist(),
   .....:         value_name='Name') \
   .....:   .dropna() \
   .....:   .drop(['variable'], axis=1) \
   .....:   .sort_values('ID')
Out[133]:
          Date  ID   Location    Name
0   01/01/1992   0     Mexico    John
6   01/01/1992   0     Mexico    Dave
1   07/07/1997   1  Australia     Tim
2   12/24/2012   2     Zambia    Mike
8   12/24/2012   2     Zambia    John
3   05/17/2007   3       Cuba     Bob
9   05/17/2007   3       Cuba    Rick
15  05/17/2007   3       Cuba    Tony
4   04/05/2000   4    Iceland   Roger
5   05/24/1995   5       Guam  Carlos

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.