Python pandas groupby conditional concatenate strings into multiple columns

Question

I am trying to group by a dataframe on one column, keeping several columns from one row in each group and concatenating strings from the other rows into multiple columns based on the value of one column. Here is an example...

df = pd.DataFrame({'test' : ['a','a','a','a','a','a','b','b','b','b'],
     'name' : ['aa','ab','ac','ad','ae','ba','bb','bc','bd','be'],
     'amount' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 9.5],
     'role' : ['x','y','y','x','x','z','y','y','z','y']})

df

      amount    name    role    test
0        1.0    aa      x       a
1        2.0    ab      y       a
2        3.0    ac      y       a
3        4.0    ad      x       a
4        5.0    ae      x       a
5        6.0    ba      z       a
6        7.0    bb      y       b
7        8.0    bc      y       b
8        9.0    bd      z       b
9        9.5    be      y       b

I would like to groupby on test, retain name and amount when role = 'z', create a column (let's call it X) that concatenates the values of name when role = 'x' and another column (let's call it Y) that concatenates the values of name when role = 'y'. [Concatenated values separated by '; '] There could be zero to many rows with role = 'x', zero to many rows with role = 'y' and one row with role = 'z' per value of test. For X and Y, these can be null if there are no rows for that role for that test. The amount value is dropped for all rows with role = 'x' or 'y'. The desired output would be something like:

     test   name     amount        X              Y
0    a      ba          6.0        aa; ad; ae     ab; ac
1    b      bd          9.0        None           bb; bc; be

For the concatenating part, I found x.ix[x.role == 'x', X] = "{%s}" % '; '.join(x['name']), which I might be able to repeat for y. I tried a few things along the lines of name = x[x.role == 'z'].name.first() for name and amount. I also tried going down both paths of a defined function and a lambda function without success. Appreciate any thoughts.

akuiper · Accepted Answer · 2016-11-10 04:27:19Z

2

You can create customized columns in the apply function after groupby as follows where g can be considered a sub data frame with a single value in the test column, and since you want multiple columns returned, you need to create a Series object for each group where the indices are the corresponding headers in the result:

df.groupby('test').apply(lambda g: pd.Series({'name': g['name'][g.role == 'z'].iloc[0],
                                              'amount': g['amount'][g.role == 'z'].iloc[0], 
                                              'X': '; '.join(g['name'][g.role == 'x']), 
                                              'Y': '; '.join(g['name'][g.role == 'y'])
                                             })).reset_index()

edited Nov 10, 2016 at 4:27

answered Nov 10, 2016 at 4:12

akuiper

216k33 gold badges362 silver badges379 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

piRSquared · Accepted Answer · 2016-11-10 05:40:47Z

1

# set index and get crossection where test is 'z'
z = df.set_index(['test', 'role']).xs('z', level='role')
# get rid of 'z' rows and group by 'test' and 'role' to join names
xy = df.query('role != "z"').groupby(['test', 'role'])['name'].apply(';'.join).unstack()
# make columns of xy upper case
xy.columns = xy.columns.str.upper()

pd.concat([z, xy], axis=1).reset_index()

answered Nov 10, 2016 at 5:40

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Collectives™ on Stack Overflow

Python pandas groupby conditional concatenate strings into multiple columns

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related