2

I am unsure how best to create a row-based Boolean mask from column-based Boolean mask.

I am trying to extract defined length (e.g. 1,2,3 etc.) positive (or negative) run sequences from 'B' for one 'SN' into a new mask.

I have now implemented a simple mask (below) and on top of that a complicated for-loop with several if-statements to do this. Is there some more elegant way to create mask-on-mask in Pandas?

df = pd.DataFrame({
    "SN" : ["66", "66", "77", "77", "77", "77", "77"], 
    "B" : [-1, 1, 2, 3, 1, -1, 1]
})
mask = df['B'] > 0

The output with simple mask is

   SN  B
0  66 -1
1  66  1
2  77  2
3  77  3
4  77  1
5  77 -1
6  77  1

0    False
1     True
2     True
3     True
4     True
5    False
6     True

The desired output is

defined_min_length = 2

0    False
1    False
2     True
3     True
4     True
5    False
6    False

defined_min_length = 3

0    False
1    False
2     True
3     True
4     True
5    False
6    False

defined_min_length = 4

0    False
1    False 
2    False
3    False
4    False
5    False
6    False

Edit: Try to fix question's ambiguity. The key point is the "defined length". E.g. in the example defined lenght = 4 would yield all False as there is no positive run with length 4 in the data frame for any equipment (with same 'SN').

Edit 2: I reformulate the original question. Adding variable defined_min_length to indicate the desired run-length.

0

1 Answer 1

1

Use:

def ExtractPositiveSequence(df,defined_min_length):

    group_s= df.groupby(['SN',df['B'].lt(0).cumsum()])['B']

    return ( group_s.transform('size')
                    .sub(group_s.transform('first')
                                .lt(0)
                                .astype(int))
                    .ge(defined_min_length) 
                    .mul(df['B'].gt(0))
           )

ExtractPositiveSequence(df,2)
0    False
1    False
2     True
3     True
4     True
5    False
6    False
Name: B, dtype: bool

ExtractPositiveSequence(df,3)

0    False
1    False
2     True
3     True
4     True
5    False
6    False
Name: B, dtype: bool


ExtractPositiveSequence(df,4)


0    False
1    False
2    False
3    False
4    False
5    False
6    False
Name: B, dtype: bool

Note:

  • remove .mul(df['B'].gt(0)), if you want to include the initial negative before a positive sequence.

  • To search for negative sequences: ExtractPositiveSequence(df.assign(B=df['B'].mul(-1)), n)

Sign up to request clarification or add additional context in comments.

6 Comments

this works for defined lenght = 1, but I don't understand how it'd work for e.g. defined length = 4. In this case all mask2 values should be False.
Great! I had to reformulate, as I understood that the original question was not good.
I think I now understand your question. I have updated my solution
This is it! I managed to figure out the same for negative values from this.
Just one minor detail: what's the logic behind this statement : "remove .mul(df['B'].gt(0)), if you want to include the initial negative before a positive sequence." Either way it has no effect on the output [F,F,T,T,T,F,F].
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.