I am trying to calculate an index value over a time series within a pandas dataframe. This index depends on the previous row's result to calculate each row after the first iteration. I've attempted to do this recursively, within iteration over the dataframe's rows, but I find that the first two rows of the calculation are correct, but the third and subsequent rows are inaccurate.
I think this is because after the initial value, subsquent index calculations are going wrong and then set all other subsequent calculations wrong.
What is causing this inaccuracy. Is there a better approach than the one I've taken?
A sample of the output looks like this:
ticket_cat Sector Year factor Incorrect_index_value correct_index_value prev_row
Revenue LSE Jan 2004 100.00 100.00
Revenue LSE Jan 2005 4.323542894 104.3235 104.3235 100.00
Revenue LSE Jan 2006 3.096308080 98.823 107.5537 <--incorrect row
Revenue LSE Jan 2007 6.211666 107.476 114.2345 <--incorrect row
Revenue LD Jan 2004 100.00 100.0000
Revenue LD Jan 2005 3.5218 103.5218 103.5218
Revenue LD Jan 2006 2.7417 99.2464 106.3602 <--- incorrect row
Revenue LD Jan 2007 3.3506 104.1353 109.9239 <--- incorrect row
The code snippet I have is as follows: stpassrev is the dataframe
#insert initial value for index
stpassrev['index_value'] = np.where(
(stpassrev['Year'] == 'Jan 2004' ) & (stpassrev['Ticket_cat']=='Revenue'),
100.00,np.nan )
#set up initial values for prec_row column
stpassrev['prev_row'] = np.where(
#only have relevant row impacted
(stpassrev['Year'] == 'Jan 2005' ) & (stpassrev['Ticke_cat']=='Revenue'),
100.00,
np.nan
)
#calculate the index_value
for i in range(1,len(stpassrev)):
stpassrev.loc[i,'passrev'] = np.where(
(stpassrev.loc[i,'Ticket_cat']=='Revenue' ) & (pd.isna(stpassrev.loc[i,'factor'])==False),
((100+stpassrev.loc[i,'factor'] ) /stpassrev.loc[i-1,'index_value'])*100,
stpassrev.loc[i,'index_value'])
stpassrev.loc[i,'prev_row'] = stpassrev.loc[i-1,'index_value']
factoronly null at the beginning?ticket_cat,Yearandfactor? and you're trying to derive theindex_valuebased on the factor and previousindex_value?factoris only NULL as the beginning. The first row is the initial index value which is 100.