2

I have a dataframe which is date sequenced and has 'x' values in one column when there is new information on a particular date.

I want to get the index value of the row for the date before the most recent new information date so I can reference that data for further operations

So my dataframe looks like this:

original_df

index      date        value   newinfo
  0     '2007-12-01'     75      Nan
  1     '2007-12-02'     75      Nan
  2     '2007-12-03'     83       x
  3     '2007-12-04'     83      Nan
  4     '2007-12-05'     83      Nan
  5     '2007-12-06'     47       x
  6     '2007-12-07'     47      Nan
  7     '2007-12-08'     47      Nan
  8     '2007-12-09'     47      Nan

So I'm interested in referencing row where original_df.index == 4 for some further operations.

The only way I can think of doing it is very 'clunky'. Basically I create another dataframe by filtering my original for rows where newinfo == 'x', take the index value of the last row, subtract 1, and use that value to access various columns in that row of the original dataframe using iloc. Code looks like this:

interim_df = original_df[original_df['newinfo']=='x']
index_ref_value = interim_df.index[-1] - 1

This returns an index_ref_value of 4.

I can then access value in original_df as follows:

original_df.iloc[index_ref_value,1]

In other words, I'm access the value for 2007-12-05, the day before the most recent newinfo.

This gets the job done but strikes me as complicated and sloppy. Is there a cleaner, easier, more Pythonic way to find the index_ref_value I'm looking for?

1 Answer 1

3

you can combine iloc and loc into one statement:

original_df.iloc[original_df.loc[original_df['newinfo'] == 'x'].index-1]

the loc statement is taking the index of where the condition (where newinfo is x) and then getting the index of that value. iloc then takes those indexes and givies you the result you are looking for

judging from your quesiton, you may need a list of these values in the futre. try df1.iloc[df1.loc[df1['newinfo'] == 'x'].index-1].index.tolist()

edit to get the desired output:

original_df.iloc[original_df.loc[original_df['newinfo'] == 'x'].index[-1]-1]

# added a [0] at the end below to get just the value of `4`
original_df.iloc[original_df.loc[original_df['newinfo'] == 'x'].index[-1]-1][0]
Sign up to request clarification or add additional context in comments.

5 Comments

Breaking it down for myself so I understand the parts better, I just ran the following subset of your answer: original_df.loc[original_df['newinfo'] == 'x'].index-1 and original_df[original_df['newinfo'] == 'x'].index-1 (note the second one didn't have the loc). I'm not exactly clear on what the loc does here. Could you explain a bit further. Thx.
.loc in this example returns a dataframe of where the condition is true. .loc[].*index* returns the indexes of this new dataframe. These indexes are then used to be passed into iloc
Since I'm looking for the index value in the original dataframe preceding the most recent 'x' date, I think your answer should be edited as follows: original_df.iloc[original_df.loc[original_df['newinfo'] == 'x'].index[-1]-1]. (note the addition of the [-1] after index). Did I get that right?
@Windstorm1981, I'm slightly confused on the why but, yes! if that gives you the answer you are looking for, i edited my question to adjust for it
Yes. My need is for the row just before the most recent update :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.