2

Consider the following dataframe:

A |  B |  C
_____________
a |  1 |  1
a |  5 |  NaN
b |  3 |  1
b |  4 |  NaN
c |  2 |  1
c |  2 |  NaN
a |  1 |  NaN
b |  3 |  NaN
c |  4 |  NaN

My goal is to update column C based on a rule that also includes the previous row, for each group. Just as an example, if the value from B column is smaller than the previous one, the C should have a value of 0, otherwise keep the value from the previous C.

So this would give me the following:

A |  B |  C
_____________
a |  1 |  1
a |  5 |  1
b |  3 |  1
b |  4 |  1
c |  2 |  1
c |  2 |  1
a |  1 |  0
b |  3 |  0
c |  4 |  1


I was thinking of using a kind of

df.groupby(A).apply(lambda x: x['C'].shift(1) if x['B'].shift(1) >= x['B'] else 0)

but obviously this does not work as apply cannot access former rows ( I think)

If all fails, I would build individual DF's from each group and modify them individually, so not to include another group's rows in the result, but I believe there must be a more elegant solution using the original dataframe.

Any suggestions?

2
  • keep the value from the previous C means 4 ( the last row) should be 0. what's the reason for it being 1? Commented May 5, 2021 at 13:02
  • 1
    @sammywemmy, in my example, for the last row, the B value is 4, and because the last row relates to the "c" group, it is bigger than the last B value for that group so we keep the value of 1 for C, which is the last one. If the value of B was 1, the C columnd would have had, in that case, a value of 0 Commented May 5, 2021 at 13:07

1 Answer 1

2

Try:

import numpy as np
def fill(x):
    x['C'] = x['C'].fillna(method='ffill')
    x['C'] = np.where(x['B'].values <= x['B'].shift(1).values, 0, x['C'])
    return x
df = df.groupby('A').apply(fill)

Here, the idea is to 1st fill the NAN values with the previous value then replace the value with 0 if the condition is satisfied.

Sign up to request clarification or add additional context in comments.

3 Comments

thanks for the np.where solution, i never used it before. I am still getting some weird result on my actual data, as some nan's appear after applying the function, even if I ffill before like you suggested. I will check and see where the problem is
np.where is super fast if you wanna perform some if-else condition.
i'm gonna mark your answer as correct, but for my actual dataset it didn't work as I don't think I managed to put everything in context in one question. But it's a start. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.