1

I have a dataframe which has a value of either 0 or 1 in a "column 2", and either a 0 or 1 in "column 1", I would somehow like to find and append as a column the index value for the last row where Column1 = 1 but only for rows where column 2 = 1. This might be easier to see than read:

d = {'C1' : pd.Series([1, 0, 1,0,0], index=[1,2,3,4,5]),'C2' : pd.Series([0, 0,0,1,1], index=[1,2,3,4,5])}
df = pd.DataFrame(d)
print(df)

   C1  C2 
1   1   0   
2   0   0   
3   1   0   
4   0   1  
5   0   1

#I've left out my attempts as they don't even get close
df['C3'] = IF C2 = 1: Call Function that gives Index Value of last place where C1 = 1 Else 0 End 

This would result in this result set:

   C1  C2  C3
1   1   0   0
2   0   0   0
3   1   0   0
4   0   1   3
5   0   1   3

I was trying to get a function to do this as there are roughly 2million rows in my data set but only ~10k where C2 =1.

Thank you in advance for any help, I really appreciate it - I only started programming with python a few weeks ago.

1 Answer 1

2

It is not so straight forward, you have to do a few loops to get this result. The key here is the fillna method which can do forwards and backwards filling.

It is often the case that pandas methods does more than one thing, this makes it very hard to figure out what methods to use for what.

So let me talk you through this code.

First we need to set C3 to nan, otherwise we cannot use fillna later.

Then we set C3 to be the index but only where C1 == 1 (the mask does this)

After this we can use fillna with method='ffill' to propagate the last observation forwards.

Then we have to mask away all the values where C2 == 0, same way we set the index earlier, with a mask.

df['C3'] = pd.np.nan
mask = df['C1'] == 1
df['C3'].loc[mask] = df.index[mask].copy()
df['C3'] = df['C3'].fillna(method='ffill')
mask = df['C2'] == 0
df['C3'].loc[mask] = 0
df

    C1  C2  C3
1   1   0   0
2   0   0   0
3   1   0   0
4   0   1   3
5   0   1   3

EDIT:

Added a .copy() to the index, otherwise we overwrite it and the index gets all full of zeroes.

Sign up to request clarification or add additional context in comments.

2 Comments

So this works, so thanks! Although why the mask index only where it's true? df.index[mask] = Int64Index([1, 3], dtype='int64')
@SomeGuy30145 I saw this error too and added a fix (see EDIT in the bottom of my post). This is because in the previous version of the code I set C3 to be the index, then I modified C3 (and thereby the index too). In the version posted now I resolve this by adding a .copy() to the allocation of the C3 column, this solves the issue

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.