Pandas df.sum() creating only 0s

Question

I have a large df of information that I want to add a 'Total' column to. However, when I use the sum() method the resulting column is populated only with 0 values.

Here is a slice of my df:

    COL NAME0  COL NAME1  COL NAME2  COL NAME3  COL NAME4
0     Alabama   4.099099   4.090001   2.042345   NaN
1      Alaska   1.396396   1.390001   1.000000   1.000000
2     Arizona   4.189189   NaN        2.003257   1.537777
3    Arkansas   2.927928   2.920001   2.208723   NaN
4  California   3.378378   3.780001   1.754930   2.012395

To add the Total column, I did the following:

df['Total'] = df.sum(axis=1)

which created the following df:

    COL NAME0  COL NAME1  COL NAME2  COL NAME3  COL NAME4  Total
0     Alabama   4.099099   4.090001   2.042345   NaN        0.0
1      Alaska   1.396396   1.390001   1.000000   1.000000   0.0
2     Arizona   4.189189   NaN        2.003257   1.537777   0.0
3    Arkansas   2.927928   2.920001   2.208723   NaN        0.0
4  California   3.378378   3.780001   1.754930   2.012395   0.0

I then tried a different approach, to add each column to the Total column one at a time:

for col in df:
    df['Total'] = df['Total'] + df[col]

However, this results in the Total column being populated with only NaN values.

My guess is the existing NaN values in df is causing this behavior in the Total column. This seems to be a simple task so if there's something I'm overlooking let me know. Any suggestions/ solution would be greatly appreciated.

jezrael · Accepted Answer · 2019-08-02 14:13:14Z

I suggest filter out first column with strings and convert all another columns to floats:

df['Total'] = df.iloc[:, 1:].astype(float).sum(axis=1)

print (df.dtypes)
COL NAME0    object
COL NAME1    object
COL NAME2    object
COL NAME3    object
COL NAME4    object
dtype: object

df['Total'] = df.sum(axis=1)
print (df)
    COL NAME0 COL NAME1 COL NAME2 COL NAME3 COL NAME4  Total
0     Alabama  4.099099  4.090001  2.042345       NaN    0.0
1      Alaska  1.396396  1.390001  1.000000  1.000000    0.0
2     Arizona  4.189189       NaN  2.003257  1.537777    0.0
3    Arkansas  2.927928  2.920001  2.208723       NaN    0.0
4  California  3.378378  3.780001  1.754930  2.012395    0.0

df['Total'] = df.iloc[:, 1:].astype(float).sum(axis=1)
print (df)
    COL NAME0 COL NAME1 COL NAME2 COL NAME3 COL NAME4      Total
0     Alabama  4.099099  4.090001  2.042345       NaN  10.231445
1      Alaska  1.396396  1.390001  1.000000  1.000000   4.786397
2     Arizona  4.189189       NaN  2.003257  1.537777   7.730223
3    Arkansas  2.927928  2.920001  2.208723       NaN   8.056652
4  California  3.378378  3.780001  1.754930  2.012395  10.925704

If need processing columns later:

df =  df.astype(dict.fromkeys(df.columns[1:], 'float'))
print (df.dtypes)
COL NAME0     object
COL NAME1    float64
COL NAME2    float64
COL NAME3    float64
COL NAME4    float64
dtype: object


df['Total'] = df.sum(axis=1)
print (df)
    COL NAME0  COL NAME1  COL NAME2  COL NAME3  COL NAME4      Total
0     Alabama   4.099099   4.090001   2.042345        NaN  10.231445
1      Alaska   1.396396   1.390001   1.000000   1.000000   4.786397
2     Arizona   4.189189        NaN   2.003257   1.537777   7.730223
3    Arkansas   2.927928   2.920001   2.208723        NaN   8.056652
4  California   3.378378   3.780001   1.754930   2.012395  10.925704

Umar.H · Accepted Answer · 2019-08-02 14:00:36Z

2

you could also select the columns by dtypes :

df['Total'] = df.select_dtypes(exclude=['object']).sum(axis=1)


print(df)

    COL_NAME0   COL_NAME1   COL_NAME2   COL_NAME3   COL_NAME4   Total
0   Alabama     4.099099    4.090001    2.042345    NaN         10.231445
1   Alaska      1.396396    1.390001    1.000000    1.000000    4.786397
2   Arizona     4.189189    NaN         2.003257    1.537777    7.730223
3   Arkansas    2.927928    2.920001    2.208723    NaN         8.056652
4   California  3.378378    3.780001    1.754930    2.012395    10.925704

answered Aug 2, 2019 at 14:00

Umar.H

23.1k7 gold badges50 silver badges94 bronze badges

4 Comments

jezrael Over a year ago

I thinking about this solution too, but here working nice OP solution.

Umar.H Over a year ago

I like yours but I think his data frame might have more columns so this works a little better I think (: (also it might have bools and category datatype so mine may error too)

jezrael Over a year ago

No, I think sometjing else, df['Total'] = df.select_dtypes(exclude=['object']).sum(axis=1) and df['Total'] = df.sum(axis=1) working with sample data same way.

Umar.H Over a year ago

oh, you are right, strange he must have a different df to the sample one then. we will have to wait for OP to see if either work

Collectives™ on Stack Overflow

Pandas df.sum() creating only 0s

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related