3

I have a pandas dataframe as below:

df = pd.DataFrame({'X':[1,1,1, 0, 0]})
df

    X
0   1
1   1
2   1
3   0
4   0

Now I want to create another variable 'Y' and Values for Y should be based on the below condition:

If X = 1 , Y=1
If X = 0 and previous X = 1, Y = 2
If X = 0 and previous x = 0, Y = 0

So, my final output should look like below:

    X    Y
0   1    1    
1   1    1
2   1    1
3   0    2
4   0    0

This can be achieved by iterating over rows and setting up a current and previous row and using iloc but I want a more efficient way of doing this faster

2 Answers 2

1

You can try using np.where and shift:

import pandas as pd
import numpy as np
df = pd.DataFrame({'X':[1,1,1, 0, 0]})
df['Y'] = np.where(df['X'] == 1,1,np.where(df['X'].shift(periods=1) == 1,2,0))
print(df)

Output:

   X  Y
0  1  1
1  1  1
2  1  1
3  0  2
4  0  0
Sign up to request clarification or add additional context in comments.

Comments

0

Celius provided an answer with nested calls to np.where. This can become unfeasible if the number of conditions grow. You can use np.select instead to achieve the same result:

import numpy as np
import pandas as pd


df = pd.DataFrame({
    'X': [1, 1, 1, 0, 0]
})
conditions = [
    df["X"] == 1,
    (df["X"] == 0) & (df["X"].shift() == 1),
    (df["X"] == 0) & (df["X"].shift() == 0)
]
values = [1, 2, 0]
df['Y'] = np.select(conditions, values, default=np.nan)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.