4

I have a DataFrame which looks like this (with many additional columns)

          age1     age2      age3     age 4   \
Id#     
1001         5        6         2          8  
1002         7        6         1          0
1003        10        9         7          5
1004         9       12         5          9 

I am trying write a loop that sums each column with the previous ones before it and returns it to a new DataFrame. I have started out, simply, with this:

New = pd.DataFrame()
New[0] = SFH2.ix[:,0]
for x in SFH2:
    ls = [x,x+1]
    B = SFH2[ls].sum(axis=1)
    New[x] = B

print(New)  

and the error I get is

    ls = [x,x+1]

TypeError: Can't convert 'int' object to str implicitly

I know that int and str are different objects, but how can I overcome this, or is there a different way to iterate through columns? Thanks!

4
  • Can you clarify exactly what you want the output to be? Commented Aug 3, 2016 at 9:12
  • In other words, do you want each column to be the sum of all the columns to the left, or simply that column and a single column to the left (right?). Commented Aug 3, 2016 at 9:26
  • I want each column to be the sum of all the columns to the left. Commented Aug 3, 2016 at 9:27
  • @cmf05 - I think the best is add desired output to question, maybe in another question you can do it ;) Commented Aug 3, 2016 at 9:33

2 Answers 2

2

It sounds like cumsum is what you are looking for:

In [5]: df
Out[5]: 
      age1  age2  age3  age4
Id#                         
1001     5     6     2     8
1002     7     6     1     0
1003    10     9     7     5
1004     9    12     5     9

In [6]: df.cumsum(axis=1)
Out[6]: 
      age1  age2  age3  age4
Id#                         
1001     5    11    13    21
1002     7    13    14    14
1003    10    19    26    31
1004     9    21    26    35
Sign up to request clarification or add additional context in comments.

3 Comments

Ah thank you! Clearly I need to get a bit more familiar with pandas.
@piRSquared Well, OP was a bit ambiguous. The code seemed to imply the rolling sum with window of 2, but the description of the desired output implied cumsum
@cmf05 If you find yourself writing for-loops to work with pandas objects, then there is almost always a better way.
2

You can use add with shifted DataFrame:

print (df.shift(-1,axis=1))
      age1  age2  age3  age4
Id#                         
1001   6.0   2.0   8.0   NaN
1002   6.0   1.0   0.0   NaN
1003   9.0   7.0   5.0   NaN
1004  12.0   5.0   9.0   NaN

print (df.add(df.shift(-1,axis=1), fill_value=0))
      age1  age2  age3  age4
Id#                         
1001  11.0   8.0  10.0   8.0
1002  13.0   7.0   1.0   0.0
1003  19.0  16.0  12.0   5.0
1004  21.0  17.0  14.0   9.0

If need shift with 1 (default parameter, omited):

print (df.shift(axis=1))
      age1  age2  age3  age4
Id#                         
1001   NaN   5.0   6.0   2.0
1002   NaN   7.0   6.0   1.0
1003   NaN  10.0   9.0   7.0
1004   NaN   9.0  12.0   5.0

print (df.add(df.shift(axis=1), fill_value=0))
      age1  age2  age3  age4
Id#                         
1001   5.0  11.0   8.0  10.0
1002   7.0  13.0   7.0   1.0
1003  10.0  19.0  16.0  12.0
1004   9.0  21.0  17.0  14.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.