How to combine strings in dataframe to list and break up column?

Question

I'm newbie in Python and Pandas. Could you give me advice how to make next manipulation with DataFrame? I have DataFrame_1:

  id id_name  revenue
0  a  name_a       65
1  a  name_b       65
2  a  name_a       70
3  a  name_b       70
4  a  name_a      121
5  a  name_b      121

and I want to make next DataFrame_2:

  id           id_name  revenue
0  a    name_a, name_b       65
1  a    name_a, name_b       70
2  a    name_a, name_b      121

and then make the next DataFrame_3

  id id_name1 id_name2  revenue
0  a   name_a   name_b       65
1  a   name_a   name_b       70
2  a   name_a   name_b      121

So, I want on the first step combine strings with the same 'revenue', and on the second step break up column 'id_name'.

What's with the id variable? If you're only grouping on Revenue what do you want to happen in the case that the Revenue is the same but id is different? — ALollz
– ALollz, Commented May 13, 2018 at 22:07

BENY · Accepted Answer · 2018-05-13 22:51:14Z

2

By using groupby and cumcount create the additional key , then we do unstack

s=df.groupby(['id','id_name']).cumcount()
df['NewId']=s.groupby(s).cumcount()+1
df.set_index(['id','revenue','NewId'])['id_name'].unstack().add_prefix('id_name').reset_index()
Out[137]: 
NewId id  revenue id_name1 id_name2
0      a       65   name_a   name_b
1      a       70   name_a   name_b
2      a      121   name_a   name_b

answered May 13, 2018 at 22:51

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jpp · Accepted Answer · 2018-05-13 22:52:30Z

2

This is one solution. The first part is identical to @ALollz, but the second uses a list comprehension after calculating the maximum number of id_names per group.

# groupby to list of id_names
df2 = df.groupby(['id', 'revenue'])['id_name'].apply(list).reset_index()

# copy df2
df3 = df2.copy()

# calculate max number of id_names
lens = max(map(len, df3['id_name'].values))

# split columns
df3[['id_name'+str(i) for i in range(1, lens+1)]] = df2['id_name'].apply(pd.Series)

# drop unsplit column
df3 = df3.drop('id_name', 1)

print(df3)

  id  revenue id_name1 id_name2 id_name3
0  a       65   name_a   name_b      NaN
1  a       70   name_a   name_b      NaN
2  a      121   name_a   name_b   name_c

answered May 13, 2018 at 22:52

jpp

166k37 gold badges301 silver badges362 bronze badges

Comments

ALollz · Accepted Answer · 2018-05-13 22:40:26Z

You can basically achieve the second DataFrame with groupby

df2 = df1.groupby(['id', 'revenue']).id_name.apply(list).reset_index()

  id  revenue           id_name
0  a       65  [name_a, name_b]
1  a       70  [name_a, name_b]
2  a      121  [name_a, name_b]

For the third DataFrame you can just apply pandas.Series to the lists you created above. Here's a solution where you don't need to know how many columns you'll wind up with in the end. It will rename up to 10 properly.

import pandas as pd
import numpy as np

df3 = pd.concat([df2[['id', 'revenue']], df2['id_name'].apply(pd.Series)], axis=1)
df3.rename(columns=dict((item, 'id_name'+str(item+1)) for item in np.arange(0,10,1)), inplace=True)

  id  revenue id_name1 id_name2
0  a       65   name_a   name_b
1  a       70   name_a   name_b
2  a      121   name_a   name_b

Collectives™ on Stack Overflow

How to combine strings in dataframe to list and break up column?

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related