Python Pandas grouping columns

Question

This is a Pandas question - my brain is too tired to figure this out today. Could someone please help me? I have a dataframe with many columns with one column as a category:

Category B C D .... Z 
1        2 11 1.0 'HOME' ....
1        3 21 1.0 'HOME' ....
1        1 33 .9 'GOPHER' ....
2        4 34 0.6  'HUMM'  ...
2        1 72 1.4  'VEEE'   ...
3        5 23  2.3  'ETC '  ....
4        3 99  3.141 'PI'  ...
4        4 1  2.634 'PI'   ...

And want to get this (the text columns are really irrelevant)

Category B C D .... Z 
1        6 11 2.9 'HOME' ....
2        5 34 2.6  'HUMM'  ...
3        5 23  2.3  'ETC '  ....
4        7 100  5.775 'PI'  ...

How do I go about doing this in Python Pandas? Do I use a group()?

If df is my DataFrame, and the result is in newdf would be resulting data frame, then there would be one row in ndf['B'] with newdf['A'] = 1 and newdf['B'] would the sum of values in df['B'] for all rows where df['A'] was 1.
For the next category, there would be one row in ndf['B'] with newdf['A'] = 2 and newdf['B'] would the sum of values in df['B'] for all rows where df['A'] was 2

and so on.

I am trying to aggregate the sum of the columns based on the category in column A. For each category in A, I want to sum the rest of the columns with the same category.

I hope I have explained it properly. Manually, this would be similar to

ndf['B'] = df[ df['A'] == 1 ].sum() 
ndf['C'] = df[ df['A'] == 1 ].sum()

Basically, can I use something like this:

for col in df.columns:
    if col.type(??) is number: 
        ndf[col] = df[ df[col] == 1 ].sum()

and for each category in A; repeat

ndf['B'] = df[ df['A'] == 2 ].sum() 
ndf['C'] = df[ df['A'] == 3 ].sum()

I would then have to loop for each value in the category for A.

Is this the right way to approach the problem?

So you want B and D summed in each group, but C you just want the first value from each group? Could you be more explicit with your requirements here? — alkasm
– alkasm, Commented Oct 5, 2018 at 3:37
Your question is not clear, please re frame it and explain it properly with examples. — user8403237
– user8403237, Commented Oct 5, 2018 at 6:26
I want to group by category by summing up all the columns which have numbers. I can safely ignore columns which do not have numbers or are empty. — old_guy
– old_guy, Commented Oct 5, 2018 at 12:30

jpp · Accepted Answer · 2018-10-05 12:51:53Z

1

You can use GroupBy + agg to specify a different function for each series. I have linked C and Z series to 'first', i.e. extract the first value from each group, as this is consistent with your desired output.

agg_rules = {'B': 'sum', 'C': 'first', 'D': 'sum', 'Z': 'first'}
res = df.groupby('Category').agg(agg_rules).reset_index()

print(res)

   Category  B   C      D       Z
0         1  6  11  2.900  'HOME'
1         2  5  34  2.000  'HUMM'
2         3  5  23  2.300   'ETC'
3         4  7  99  5.775    'PI'

answered Oct 5, 2018 at 12:51

jpp

166k37 gold badges301 silver badges362 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python Pandas grouping columns

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related