0

Instead of e.g. calculating the sum with group_by I would like to concatenate all rows within the same group. Instead of sum() the code beneath should just combine/ concat the rows. If there would be 5 rows per group the new data frame would have 5-times the columns (each column x 5)

Example: This is the data frame I have right now.

Index    Pool   B          C         D           E
70       Pool1  8.717402   7.873173  16.029238   8.533174   
71       Pool1  7.376365   6.228181  9.272679    7.498993   
72       Pool2  8.854857   10.340896 9.218947    8.670379   
73       Pool2  11.509130  8.571492  19.363829   14.605199   
74       Pool3  14.780578  7.405982  9.279374    13.551686   
75       Pool3  7.448860   11.952275 8.239564    12.264440

I want to have it like this:

Index    Pool   B1         C1        D1          E1        B2         C2        D2          E2
70       Pool1  8.717402   7.873173  16.029238   8.533174  7.376365   6.228181  9.272679    7.498993  
71       Pool2  8.854857   10.340896 9.218947    8.670379  11.509130  8.571492  19.363829   14.605199  
72       Pool3  14.780578  7.405982  9.279374    13.551686 7.448860   11.952275 8.239564    12.264440  

I would provide you with sample code but have no idea. If I would just sum the rows up I would use:

t.groupby(['pool']).sum()

But I do not want to combine the rows and keep the column structure, I want to concatenate the rows with the same group.

4
  • Please provide a sample code to work with. Commented Jan 6, 2016 at 13:22
  • Could not provide sample code but added an example that is hopefully helpful to you guys. Commented Jan 6, 2016 at 14:07
  • @Jamona in your desired output, e.g. df['B'] would essentially be an ambiguous statement. Such non-unique columns seem somewhat odd to me. Commented Jan 6, 2016 at 14:10
  • The column names don't need to be the same - B1 and B2 would also be fine or sth. else. Commented Jan 6, 2016 at 14:12

1 Answer 1

1

You could try:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'Pool': ['a', 'a', 'b', 'b', 'c'], 'B':[1, 2, 3, 4, 5], 'C':[1,2,3,4,5]})
gd = df1.groupby('Pool')

def comb2(x):
    rslt = dict()
    for col in x.columns:
        rslt[col]=x[col].tolist()
    return pd.Series(rslt)

rslt = gd.apply(comb2)
rslt = rslt.drop('Pool', axis=1)
finaldf = pd.DataFrame()
for col in rslt.columns:
    tempdf = rslt[col].apply(lambda x: pd.Series(x))
    tempdf.columns  = [col+str(i+1) for i in range(len(tempdf.columns))]
    finaldf = pd.concat([finaldf, tempdf],axis=1)

print(finaldf)

Output:
      B1  B2  C1  C2
Pool                
a      1   2   1   2
b      3   4   3   4
c      5 NaN   5 NaN
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.