I have a dataframe which is date sequenced and has 'x' values in one column when there is new information on a particular date.
I want to get the index value of the row for the date before the most recent new information date so I can reference that data for further operations
So my dataframe looks like this:
original_df
index date value newinfo
0 '2007-12-01' 75 Nan
1 '2007-12-02' 75 Nan
2 '2007-12-03' 83 x
3 '2007-12-04' 83 Nan
4 '2007-12-05' 83 Nan
5 '2007-12-06' 47 x
6 '2007-12-07' 47 Nan
7 '2007-12-08' 47 Nan
8 '2007-12-09' 47 Nan
So I'm interested in referencing row where original_df.index == 4 for some further operations.
The only way I can think of doing it is very 'clunky'. Basically I create another dataframe by filtering my original for rows where newinfo == 'x', take the index value of the last row, subtract 1, and use that value to access various columns in that row of the original dataframe using iloc. Code looks like this:
interim_df = original_df[original_df['newinfo']=='x']
index_ref_value = interim_df.index[-1] - 1
This returns an index_ref_value of 4.
I can then access value in original_df as follows:
original_df.iloc[index_ref_value,1]
In other words, I'm access the value for 2007-12-05, the day before the most recent newinfo.
This gets the job done but strikes me as complicated and sloppy. Is there a cleaner, easier, more Pythonic way to find the index_ref_value I'm looking for?