1

I have a dataframe df with the columns X . I would like to create a new column Y out of X

Y should be 1 if X in the same row is 1 and the 0´s (also in X) above should be minimum a count of n (variable). If the zeros above less than n, the result should be "" Y. I tried (hours) np.where, without success. I think I need a lambda-function, but have no idea how to start or research.

Exampel n = 4:

On date 2018-01-25, result is 1 because X is 1 and the 0´s above are more than 4.

On Date 2018-01-25, result is "" because 0´s above just 3 (not 4)

 Dates        X    Y (like it should be...)
2018-01-02    0
2018-01-03    0
2018-01-04    0
2018-01-05    0
2018-01-08    0
2018-01-09    0
2018-01-10    0
2018-01-11    0
2018-01-12    0
2018-01-15    0
2018-01-16    0
2018-01-17    0
2018-01-18    0
2018-01-19    0
2018-01-22    0
2018-01-23    0
2018-01-24    0
2018-01-25    1  1
2018-01-29    0  
2018-01-30    0  
2018-01-31    0  
2018-02-02    1  
2018-02-05    0  
2018-02-06    0
2018-02-07    0
2018-02-08    0
2018-02-09    1  1
2018-02-12    1
2018-02-13    0
2
  • Can you post your expected output, so I assume that if Signal_F is 1 you want to add a new column where Signal_X = 1 but only if the four rows above are 0? so IIUC, 2018-02-01 will have 1 in Signal_X ? Commented Nov 8, 2019 at 21:35
  • 1
    no i expect 0 for 2018-02-02, because there are just three 0's above not 4. i tried to post the expecet column. but i changed the names of the columns Commented Nov 8, 2019 at 21:49

1 Answer 1

1

We can groupby a temporary column and then do apply a conditional cumsum + cumcount for some conditional matching.

s = (df.assign(var1='x').groupby('var1')['X']
            .apply(lambda x : x.ne(x.shift()).ne(0).cumsum()))
# create a temp variable.

df['Count']=df.groupby([df.X,s]).cumcount()+1 # add a Count column.

matches = df.iloc[df.loc[(df['X'] == 1)].index - 1].loc[df['Count'] >= 4].index 
# find the index matches and check if the previous row has +4 or more matches

df.loc[matches + 1,'Y'] = 1 # Create your Y column.

df.drop('Count',axis=1,inplace=True) # Drop the Count Column. 

print(df[df['Y'] == 1]) # print df
    Dates  X    Y
17  2018-01-25  1  1.0
26  2018-02-09  1  1.0
Sign up to request clarification or add additional context in comments.

2 Comments

hi, thank you for your solution and time. i tried the code but i got: NullFrequencyError: Cannot shift with no freq
then you have nulls in your dataframe that you need to handle before you can use the code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.