Index Value of Last Matching Row Python Panda DataFrame

Question

I have a dataframe which has a value of either 0 or 1 in a "column 2", and either a 0 or 1 in "column 1", I would somehow like to find and append as a column the index value for the last row where Column1 = 1 but only for rows where column 2 = 1. This might be easier to see than read:

d = {'C1' : pd.Series([1, 0, 1,0,0], index=[1,2,3,4,5]),'C2' : pd.Series([0, 0,0,1,1], index=[1,2,3,4,5])}
df = pd.DataFrame(d)
print(df)

   C1  C2 
1   1   0   
2   0   0   
3   1   0   
4   0   1  
5   0   1

#I've left out my attempts as they don't even get close
df['C3'] = IF C2 = 1: Call Function that gives Index Value of last place where C1 = 1 Else 0 End

This would result in this result set:

   C1  C2  C3
1   1   0   0
2   0   0   0
3   1   0   0
4   0   1   3
5   0   1   3

I was trying to get a function to do this as there are roughly 2million rows in my data set but only ~10k where C2 =1.

Thank you in advance for any help, I really appreciate it - I only started programming with python a few weeks ago.

firelynx · Accepted Answer · 2015-07-06 07:55:54Z

2

It is not so straight forward, you have to do a few loops to get this result. The key here is the fillna method which can do forwards and backwards filling.

It is often the case that pandas methods does more than one thing, this makes it very hard to figure out what methods to use for what.

So let me talk you through this code.

First we need to set C3 to nan, otherwise we cannot use fillna later.

Then we set C3 to be the index but only where C1 == 1 (the mask does this)

After this we can use fillna with method='ffill' to propagate the last observation forwards.

Then we have to mask away all the values where C2 == 0, same way we set the index earlier, with a mask.

df['C3'] = pd.np.nan
mask = df['C1'] == 1
df['C3'].loc[mask] = df.index[mask].copy()
df['C3'] = df['C3'].fillna(method='ffill')
mask = df['C2'] == 0
df['C3'].loc[mask] = 0
df

    C1  C2  C3
1   1   0   0
2   0   0   0
3   1   0   0
4   0   1   3
5   0   1   3

EDIT:

Added a .copy() to the index, otherwise we overwrite it and the index gets all full of zeroes.

edited Jul 6, 2015 at 7:55

answered Jul 6, 2015 at 7:49

firelynx

32.5k10 gold badges94 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

SomeGuy30145 Over a year ago

So this works, so thanks! Although why the mask index only where it's true? df.index[mask] = Int64Index([1, 3], dtype='int64')

firelynx Over a year ago

@SomeGuy30145 I saw this error too and added a fix (see EDIT in the bottom of my post). This is because in the previous version of the code I set C3 to be the index, then I modified C3 (and thereby the index too). In the version posted now I resolve this by adding a .copy() to the allocation of the C3 column, this solves the issue

Collectives™ on Stack Overflow

Index Value of Last Matching Row Python Panda DataFrame

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related