3

I have a large df of information that I want to add a 'Total' column to. However, when I use the sum() method the resulting column is populated only with 0 values.

Here is a slice of my df:

    COL NAME0  COL NAME1  COL NAME2  COL NAME3  COL NAME4
0     Alabama   4.099099   4.090001   2.042345   NaN
1      Alaska   1.396396   1.390001   1.000000   1.000000
2     Arizona   4.189189   NaN        2.003257   1.537777
3    Arkansas   2.927928   2.920001   2.208723   NaN
4  California   3.378378   3.780001   1.754930   2.012395

To add the Total column, I did the following:

df['Total'] = df.sum(axis=1)

which created the following df:

    COL NAME0  COL NAME1  COL NAME2  COL NAME3  COL NAME4  Total
0     Alabama   4.099099   4.090001   2.042345   NaN        0.0
1      Alaska   1.396396   1.390001   1.000000   1.000000   0.0
2     Arizona   4.189189   NaN        2.003257   1.537777   0.0
3    Arkansas   2.927928   2.920001   2.208723   NaN        0.0
4  California   3.378378   3.780001   1.754930   2.012395   0.0

I then tried a different approach, to add each column to the Total column one at a time:

for col in df:
    df['Total'] = df['Total'] + df[col] 

However, this results in the Total column being populated with only NaN values.

My guess is the existing NaN values in df is causing this behavior in the Total column. This seems to be a simple task so if there's something I'm overlooking let me know. Any suggestions/ solution would be greatly appreciated.

0

2 Answers 2

5

I suggest filter out first column with strings and convert all another columns to floats:

df['Total'] = df.iloc[:, 1:].astype(float).sum(axis=1)

print (df.dtypes)
COL NAME0    object
COL NAME1    object
COL NAME2    object
COL NAME3    object
COL NAME4    object
dtype: object

df['Total'] = df.sum(axis=1)
print (df)
    COL NAME0 COL NAME1 COL NAME2 COL NAME3 COL NAME4  Total
0     Alabama  4.099099  4.090001  2.042345       NaN    0.0
1      Alaska  1.396396  1.390001  1.000000  1.000000    0.0
2     Arizona  4.189189       NaN  2.003257  1.537777    0.0
3    Arkansas  2.927928  2.920001  2.208723       NaN    0.0
4  California  3.378378  3.780001  1.754930  2.012395    0.0

df['Total'] = df.iloc[:, 1:].astype(float).sum(axis=1)
print (df)
    COL NAME0 COL NAME1 COL NAME2 COL NAME3 COL NAME4      Total
0     Alabama  4.099099  4.090001  2.042345       NaN  10.231445
1      Alaska  1.396396  1.390001  1.000000  1.000000   4.786397
2     Arizona  4.189189       NaN  2.003257  1.537777   7.730223
3    Arkansas  2.927928  2.920001  2.208723       NaN   8.056652
4  California  3.378378  3.780001  1.754930  2.012395  10.925704

If need processing columns later:

df =  df.astype(dict.fromkeys(df.columns[1:], 'float'))
print (df.dtypes)
COL NAME0     object
COL NAME1    float64
COL NAME2    float64
COL NAME3    float64
COL NAME4    float64
dtype: object


df['Total'] = df.sum(axis=1)
print (df)
    COL NAME0  COL NAME1  COL NAME2  COL NAME3  COL NAME4      Total
0     Alabama   4.099099   4.090001   2.042345        NaN  10.231445
1      Alaska   1.396396   1.390001   1.000000   1.000000   4.786397
2     Arizona   4.189189        NaN   2.003257   1.537777   7.730223
3    Arkansas   2.927928   2.920001   2.208723        NaN   8.056652
4  California   3.378378   3.780001   1.754930   2.012395  10.925704
Sign up to request clarification or add additional context in comments.

Comments

2

you could also select the columns by dtypes :

df['Total'] = df.select_dtypes(exclude=['object']).sum(axis=1)


print(df)

    COL_NAME0   COL_NAME1   COL_NAME2   COL_NAME3   COL_NAME4   Total
0   Alabama     4.099099    4.090001    2.042345    NaN         10.231445
1   Alaska      1.396396    1.390001    1.000000    1.000000    4.786397
2   Arizona     4.189189    NaN         2.003257    1.537777    7.730223
3   Arkansas    2.927928    2.920001    2.208723    NaN         8.056652
4   California  3.378378    3.780001    1.754930    2.012395    10.925704

4 Comments

I thinking about this solution too, but here working nice OP solution.
I like yours but I think his data frame might have more columns so this works a little better I think (: (also it might have bools and category datatype so mine may error too)
No, I think sometjing else, df['Total'] = df.select_dtypes(exclude=['object']).sum(axis=1) and df['Total'] = df.sum(axis=1) working with sample data same way.
oh, you are right, strange he must have a different df to the sample one then. we will have to wait for OP to see if either work

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.