0

INPUT TABLE

pcd INCOME Education age1to_20 TG
a1001 INCOME_1 Education_1 1 1
a1003 INCOME_2 Education_2 0 2
a1001 INCOME_3 Education_2 5 2
a1002 INCOME_2 Education_2 1 5
a1003 INCOME_1 Education_2 3 4

REQUIRED OUTPUT

pcd INCOME_1 INCOME_2 INCOME_3 Education_1 Education_2 age1to_20 TG
a1001 1 0 1 1 1 6 1.5
a1002 0 1 0 0 1 1 5
a1003 1 1 0 0 2 3 3

pcd is index and income1,income2,income3,education1,education2,age are aggregated to sum while TG is aggregated to average.

pd.pivot_table(df, index=['pcd', 'age1to_20'],
           aggfunc={'INCOME':sum,'Education'=sum,'age1to_20'=sum,'TG':avg},fill_value=0)

Tried above code but finding no success

3 Answers 3

2

You can first melt, then pivot_table to reshape, and finally groupby.agg to combine the 'pcd':

agg_funcs = {'TG': 'mean', 'pcd': 'first'}

out = (df
   .melt(['pcd', 'age1to_20', 'TG'])
   .assign(v=1)
   .pivot_table(index=['pcd', 'age1to_20', 'TG'], columns='value',
                values='v', fill_value=0)
   .reset_index().rename_axis(columns=None)
   .pipe(lambda d: d.groupby('pcd', as_index=False)
                    .agg({c: agg_funcs.get(c, 'sum') for c in d.columns}))
)

Output:

     pcd  age1to_20   TG  Education_1  Education_2  INCOME_1  INCOME_2  INCOME_3
0  a1001          6  1.5            1            1         1         0         1
1  a1002          1  5.0            0            1         0         1         0
2  a1003          3  3.0            0            2         1         1         0
Sign up to request clarification or add additional context in comments.

Comments

1

As an alternative, you can use crosstab and groupby:

x=pd.crosstab(df['pcd'],columns=df['INCOME'])
print(x)
'''
INCOME  INCOME_1  INCOME_2  INCOME_3
pcd                                 
a1001          1         0         1
a1002          0         1         0
a1003          1         1         0
'''

y=pd.crosstab([df['pcd']],columns=[df['Education']])
z=df.groupby('pcd').agg({'age1to_20':'sum','TG':'mean'})
final=x.join([y,z])
print(final)
'''
       INCOME_1  INCOME_2  INCOME_3  Education_1  Education_2  age1to_20   TG
pcd                                                                          
a1001         1         0         1            1            1          6  1.5
a1002         0         1         0            0            1          1  5.0
a1003         1         1         0            0            2          3  3.0
'''

1 Comment

what if I want one of the crosstab column to be mean over sum.
0

You also need to specify where the values are going to come from and which columns will specify your new column names.

Also, it's not very clear to me where your values are coming from.

But it would look like something like this, e.g.,

pd.pivot_table(df, index=['pcd', 'age1to_20'], 
           values=['age1to_20']
           columns=['income', 'education']
           aggfunc={'INCOME':sum,'TG':avg},fill_value=0)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.