0

Say I have the following dataframe:

import pandas as pd

df = pd.DataFrame()
df['ID'] = [1, 1, 1, 2, 2]
df['Type'] = ['A', 'B', 'Q', 'B', 'R']
df['Status'] = [0, 0, 1, 0, 1]
>>> df
   ID Type  Status
0   1    A       0
1   1    B       0
2   1    Q       1
3   2    B       0
4   2    R       1
>>> 

I want to group this dataframe by 'ID' and reshape it so that I have a "Type" variable and a "Status" variable for each item within the group. See below:

   Type1 Type2 Type3  Status1  Status2  Status3
ID                                             
1      A     B     Q        0        0        1
2      B     R   NaN        0        1      NaN

The number of rows in my output dataframe will depend on the max number of records in any one group of IDs.

I believe that the pivot function is what I want to use here. However, it calls for a "columns" parameter which I believe should be an ID of each item within each group. I have a very clunky way of calculating this, which I appreciate advice on a better way of doing this:

>>> g=df.groupby('ID')
>>> df['IDinGroup']=[item for sublist in [range(1,len(i[1])+1) for i in g] for item in sublist]
>>> df
   ID Type  Status  IDinGroup
0   1    A       0          1
1   1    B       0          2
2   1    Q       1          3
3   2    B       0          1
4   2    R       1          2
>>> 

Then I can loop through the 'Type' and 'Status' variables and pivot each and then merge them back together:

>>> ListOfValues=[]
>>> for ValueCol in ['Type','Status']:
...     f=df.pivot(index='ID',columns='IDinGroup',values=ValueCol)
...     f.columns=[ValueCol+str(Col) for Col in f.columns]
...     f.columns.name=None
...     ListOfValues.append(f)
... 
>>> pd.concat(ListOfValues,1)
   Type1 Type2 Type3  Status1  Status2  Status3
ID                                             
1      A     B     Q        0        0        1
2      B     R   NaN        0        1      NaN
>>> 

Am I taking the correct approach here? And if so, what is a better way to calculate the "columns" parameter for the pivot function? (the id of each item within each group)

1 Answer 1

1

Try this:

import pandas as pd

df = pd.DataFrame()
df['ID'] = [1, 1, 1, 2, 2]
df['Type'] = ['A', 'B', 'Q', 'B', 'R']
df['Status'] = [0, 0, 1, 0, 1]

g = df.groupby("ID")[["Type","Status"]]
df2 = g.apply(pd.DataFrame.reset_index, drop=True).unstack()

Here is the result:

   Type         Status       
      0  1    2      0  1   2
ID                           
1     A  B    Q      0  0   1
2     B  R  NaN      0  1 NaN

The columns is a MultiIndex, if you want to flat it:

df2.columns = [l0 + str(l1+1) for l0, l1 in df2.columns]

the output:

   Type1 Type2 Type3  Status1  Status2  Status3
ID                                             
1      A     B     Q        0        0        1
2      B     R   NaN        0        1      NaN
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.