Adding sum row to specific columns in dataframe

Question

I've got a dataframe,

df = pd.DataFrame([{'project': 123456, 'date': '08/07/2019', 'total': 123, 
                    'count': 12}, {'project': 123457, 'date': '08/07/2019', 
                    'total': 124, 'count': 13}, {'project': 123458, 'date': 
                    '08/07/2019', 'total': 125, 'count': 14}])

I'd like to add a total row to the bottom of only the total and count columns. I know I can do

df.loc['Total'] = df.sum(numeric_only=True)

But my project column is numeric and I do not want the word Total at the bottom row, only the sums for those two columns. Is there any way to remove the word and ensure that only those two columns get summed?

Not sure what you want, but to avoid the word total, just do df.loc[len(df)] = df.sum(numeric_only=True) — rafaelc
– rafaelc, Commented Aug 7, 2019 at 20:12
@rafaelc how can I avoid my project column getting summed? Since it's numeric as well. — AGH_TORN
– AGH_TORN, Commented Aug 7, 2019 at 20:13
slice yours columns df[cols_to_add].sum() where cols_to_add = ['col1', 'col2', ...] etc — rafaelc
– rafaelc, Commented Aug 7, 2019 at 20:14

Celius Stingher · Accepted Answer · 2019-08-07 20:16:02Z

8

I believe each project has a unique ID, so I don't know if this will be a valid solution. Since there are no limitations to the question, I propose the following, by using the column project as an index, you can easily add further project id's with their information, and the final row will sum them all up!

import pandas as pd
df = pd.DataFrame([{'project': 123456, 'date': '08/07/2019', 'total': 123, 
                    'count': 12}, {'project': 123457, 'date': '08/07/2019', 
                    'total': 124, 'count': 13}, {'project': 123458, 'date': 
                    '08/07/2019', 'total': 125, 'count': 14}])
df1 = df.set_index('project')
df1.loc['Total'] = df1.sum(numeric_only=True)
print(df1)

And I get this result, which is what I believe you want:

               date  total  count
project
123456   08/07/2019  123.0   12.0
123457   08/07/2019  124.0   13.0
123458   08/07/2019  125.0   14.0
Total           NaN  372.0   39.0

answered Aug 7, 2019 at 20:16

Celius Stingher

18.4k6 gold badges26 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

piRSquared Over a year ago

That's twice today I've deleted an answer of mine because I liked another answer so much more.

Celius Stingher Over a year ago

Excuse me, I'm not sure I understood the point of the reply (no sarcasm meant, just 0 skills when it comes to communication and emotional intelligence).

piRSquared Over a year ago

I'm saying that this is a very good answer. So good that I decided to delete my answer because I want the OP to focus on this one. AND it happens to be the second time I've done that today (-:

Benoit Drogou · Accepted Answer · 2019-08-07 20:20:30Z

2

I believe you are looking for something like this

In [1]:
import pandas as pd

df = pd.DataFrame([{'project': 123456, 'date': '08/07/2019', 'total': 123, 
                    'count': 12}, {'project': 123457, 'date': '08/07/2019', 
                    'total': 124, 'count': 13}, {'project': 123458, 'date': 
                    '08/07/2019', 'total': 125, 'count': 14}])

df.append(df[['count', 'total']].sum(numeric_only=True), ignore_index=True)

Out [1]:
    count   date         project    total
0   12.0    08/07/2019   123456.0   123.0
1   13.0    08/07/2019   123457.0   124.0
2   14.0    08/07/2019   123458.0   125.0
3   39.0    NaN          NaN        372.0

answered Aug 7, 2019 at 20:20

Benoit Drogou

9691 gold badge5 silver badges15 bronze badges

1 Comment

OCa Over a year ago

Deprecated since pandas version 1.4.0

Dimitris Thomas · Accepted Answer · 2019-08-08 07:43:25Z

I would had done it like this:

import pandas as pd
import numpy as np

df = pd.DataFrame([{'project': 123456, 'date': '08/07/2019', 'total': 123, 
                    'count': 12}, {'project': 123457, 'date': '08/07/2019', 
                    'total': 124, 'count': 13}, {'project': 123458, 'date': 
                    '08/07/2019', 'total': 125, 'count': 14}])

# Append an empty row at the bottom of the df
df.loc[df.shape[0]] = [np.nan for col_num in range(1,df.shape[1]+1)]

# Write the sums of the columns you want at the last row 
df.iloc[df.shape[0]-1,[2,3]] = df.iloc[:,[2,3]].sum(axis=0)

Output:

     project          date  total   count
0   123456.0    08/07/2019  123.0   12.0
1   123457.0    08/07/2019  124.0   13.0
2   123458.0    08/07/2019  125.0   14.0
3        NaN           NaN  372.0   39.0

This way you can calculate the sums of any columns you want and append them at the last row, no matter how many rows or columns your df has.

Collectives™ on Stack Overflow

Adding sum row to specific columns in dataframe

3 Answers 3

3 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related