Python pandas dataframe add previous row values

Question

I have a pandas dataframe that looks like this:

                     AAPL   IBM  GOOG  XOM
2011-01-10 16:00:00  1500     0     0    0
2011-01-11 16:00:00     0     0     0    0
2011-01-12 16:00:00     0     0     0    0
2011-01-13 16:00:00 -1500  4000     0    0
2011-01-14 16:00:00     0     0     0    0
2011-01-18 16:00:00     0     0     0    0

My goal is to fill the rows by adding the previous row values. The result would look like this:

                     AAPL   IBM  GOOG  XOM
2011-01-10 16:00:00  1500     0     0    0
2011-01-11 16:00:00  1500     0     0    0
2011-01-12 16:00:00  1500     0     0    0
2011-01-13 16:00:00     0  4000     0    0
2011-01-14 16:00:00     0  4000     0    0
2011-01-18 16:00:00     0  4000     0    0

I tried to iterate through the dataframe index with

    for date in df.index:

and to increment dates with

    dt_nextDate = date + dt.timedelta(days=1)

but there are gaps in the dataframe index that stand for weekends.

Can I iterate through the index from the second row to the end, refer back to the previous row and add the values?

Looks like a Georgia Tech "Machine Learning for Trading" homework assignment. — joshmcode
– joshmcode, Commented Oct 22, 2018 at 16:11

Viktor Kerkez · Accepted Answer · 2013-09-29 10:56:52Z

9

Your example result is not the output of your example algorithm, so I'm not sure what are you exactly asking for?

The desired result you showed is a cumulative sum, which you can get using:

>>> df.cumsum()
                    AAPL   IBM  GOOG  XOM
index                                    
2011-01-1016:00:00  1500     0     0    0
2011-01-1116:00:00  1500     0     0    0
2011-01-1216:00:00  1500     0     0    0
2011-01-1316:00:00     0  4000     0    0
2011-01-1416:00:00     0  4000     0    0
2011-01-1816:00:00     0  4000     0    0

But the thing you explained you want and the algorithm you showed, are more likely to be a rolling sum with a window size equals to 2:

>>> result = pd.rolling_sum(df, 2)
>>> result
                    AAPL   IBM  GOOG  XOM
index                                    
2011-01-1016:00:00   NaN   NaN   NaN  NaN
2011-01-1116:00:00  1500     0     0    0
2011-01-1216:00:00     0     0     0    0
2011-01-1316:00:00 -1500  4000     0    0
2011-01-1416:00:00 -1500  4000     0    0
2011-01-1816:00:00     0     0     0    0

To fix the NaNs just do:

>>> result.iloc[0,:] = df.iloc[0,:]
>>> result
                    AAPL   IBM  GOOG  XOM
index                                    
2011-01-1016:00:00  1500     0     0    0
2011-01-1116:00:00  1500     0     0    0
2011-01-1216:00:00     0     0     0    0
2011-01-1316:00:00 -1500  4000     0    0
2011-01-1416:00:00 -1500  4000     0    0
2011-01-1816:00:00     0     0     0    0

answered Sep 29, 2013 at 10:56

Viktor Kerkez

46.8k13 gold badges109 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Georges Cunty Over a year ago

The cumsum thing is precisely what I was looking for. Thank you very much for your help!

Collectives™ on Stack Overflow

Python pandas dataframe add previous row values

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related