This is a Pandas question - my brain is too tired to figure this out today. Could someone please help me? I have a dataframe with many columns with one column as a category:
Category B C D .... Z
1 2 11 1.0 'HOME' ....
1 3 21 1.0 'HOME' ....
1 1 33 .9 'GOPHER' ....
2 4 34 0.6 'HUMM' ...
2 1 72 1.4 'VEEE' ...
3 5 23 2.3 'ETC ' ....
4 3 99 3.141 'PI' ...
4 4 1 2.634 'PI' ...
And want to get this (the text columns are really irrelevant)
Category B C D .... Z
1 6 11 2.9 'HOME' ....
2 5 34 2.6 'HUMM' ...
3 5 23 2.3 'ETC ' ....
4 7 100 5.775 'PI' ...
How do I go about doing this in Python Pandas? Do I use a group()?
If df is my DataFrame, and the result is in newdf would be resulting data frame, then there would be one row in ndf['B'] with newdf['A'] = 1 and newdf['B'] would the sum of values in df['B'] for all rows where df['A'] was 1.
For the next category, there would be one row in ndf['B'] with newdf['A'] = 2 and newdf['B'] would the sum of values in df['B'] for all rows where df['A'] was 2
and so on.
I am trying to aggregate the sum of the columns based on the category in column A. For each category in A, I want to sum the rest of the columns with the same category.
I hope I have explained it properly. Manually, this would be similar to
ndf['B'] = df[ df['A'] == 1 ].sum()
ndf['C'] = df[ df['A'] == 1 ].sum()
Basically, can I use something like this:
for col in df.columns:
if col.type(??) is number:
ndf[col] = df[ df[col] == 1 ].sum()
and for each category in A; repeat
ndf['B'] = df[ df['A'] == 2 ].sum()
ndf['C'] = df[ df['A'] == 3 ].sum()
I would then have to loop for each value in the category for A.
Is this the right way to approach the problem?