1

This is a Pandas question - my brain is too tired to figure this out today. Could someone please help me? I have a dataframe with many columns with one column as a category:

Category B C D .... Z 
1        2 11 1.0 'HOME' ....
1        3 21 1.0 'HOME' ....
1        1 33 .9 'GOPHER' ....
2        4 34 0.6  'HUMM'  ...
2        1 72 1.4  'VEEE'   ...
3        5 23  2.3  'ETC '  ....
4        3 99  3.141 'PI'  ...
4        4 1  2.634 'PI'   ...

And want to get this (the text columns are really irrelevant)

Category B C D .... Z 
1        6 11 2.9 'HOME' ....
2        5 34 2.6  'HUMM'  ...
3        5 23  2.3  'ETC '  ....
4        7 100  5.775 'PI'  ...

How do I go about doing this in Python Pandas? Do I use a group()?

If df is my DataFrame, and the result is in newdf would be resulting data frame, then there would be one row in ndf['B'] with newdf['A'] = 1 and newdf['B'] would the sum of values in df['B'] for all rows where df['A'] was 1.
For the next category, there would be one row in ndf['B'] with newdf['A'] = 2 and newdf['B'] would the sum of values in df['B'] for all rows where df['A'] was 2

and so on.

I am trying to aggregate the sum of the columns based on the category in column A. For each category in A, I want to sum the rest of the columns with the same category.

I hope I have explained it properly. Manually, this would be similar to

ndf['B'] = df[ df['A'] == 1 ].sum() 
ndf['C'] = df[ df['A'] == 1 ].sum() 

Basically, can I use something like this:

for col in df.columns:
    if col.type(??) is number: 
        ndf[col] = df[ df[col] == 1 ].sum() 

and for each category in A; repeat

ndf['B'] = df[ df['A'] == 2 ].sum() 
ndf['C'] = df[ df['A'] == 3 ].sum() 

I would then have to loop for each value in the category for A.

Is this the right way to approach the problem?

3
  • 1
    So you want B and D summed in each group, but C you just want the first value from each group? Could you be more explicit with your requirements here? Commented Oct 5, 2018 at 3:37
  • Your question is not clear, please re frame it and explain it properly with examples. Commented Oct 5, 2018 at 6:26
  • I want to group by category by summing up all the columns which have numbers. I can safely ignore columns which do not have numbers or are empty. Commented Oct 5, 2018 at 12:30

1 Answer 1

1

You can use GroupBy + agg to specify a different function for each series. I have linked C and Z series to 'first', i.e. extract the first value from each group, as this is consistent with your desired output.

agg_rules = {'B': 'sum', 'C': 'first', 'D': 'sum', 'Z': 'first'}
res = df.groupby('Category').agg(agg_rules).reset_index()

print(res)

   Category  B   C      D       Z
0         1  6  11  2.900  'HOME'
1         2  5  34  2.000  'HUMM'
2         3  5  23  2.300   'ETC'
3         4  7  99  5.775    'PI'
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.