1

I need to go through a large pd and select consecutive rows with similar values in a column. i.e. in the pd below and selecting column x: I want to specify consecutive values in column x? Say if I want consecutive values of 3 and 5 only

col row x   y
1   1   1   1
5   7   3   0
2   2   2   2
6   3   3   8
9   2   3   4
5   3   3   9
4   9   4   4
5   5   5   1
3   7   5   2
6   6   6   6
5   8   6   2
3   7   6   0

The results output would be:

col row x   y   consecutive-count
6   3   3   8          1
9   2   3   4          1 
5   3   3   9          1
5   5   5   1          2
3   7   5   2          2 

I tried

m = df['x'].eq(df['x'].shift())
df[m|m.shift(-1, fill_value=False)]

But that includes the consecutive 6 that I don't want.

I also tried:

df.query( 'x in [3,5]') 

That prints every row where x has 3 or 5.

5
  • What should happen if you have other groups of 3 or 5? Commented Aug 11, 2022 at 21:56
  • 2
    Also, did you delete the other question? I cannot find it anymore (you should keep it, it was probably useful to others) Commented Aug 11, 2022 at 21:59
  • @mozway I accidentally deleted it. I have restored it. Commented Aug 12, 2022 at 0:27
  • @mozway I am keeping all consecutive 3s or 5ves. so no single 3's or 5ves Commented Aug 12, 2022 at 0:31
  • have you tested my answer? Does it work as you want? If not, please provide a counter example with explanation Commented Aug 12, 2022 at 3:30

2 Answers 2

2

IIUC use masks for boolean indexing. Check for 3 or 5, and use a cummax and reverse cummax to ensure having the order:

m1 = df['x'].eq(3)
m2 = df['x'].eq(5)

out = df[(m1|m2)&(m1.cummax()&m2[::-1].cummax())]

Output:

   col  row  x  y
2    6    3  3  8
3    9    2  3  4
4    5    3  3  9
6    5    5  5  1
7    3    7  5  2
Sign up to request clarification or add additional context in comments.

Comments

1

you can create a group column for consecutive values, and filter by the group count and value of x:

# create unique ids for consecutive groups, then get group length:
group_num = (df.x.shift() != df.x).cumsum()
group_len = group_num.groupby(group_num).transform("count")

# filter main df:
df2 = df[(df.x.isin([3,5])) & (group_len > 1)]

# add new group num col
df2['consecutive-count'] = (df2.x != df2.x.shift()).cumsum()

output:

   col  row  x  y  consecutive-count
3    6    3  3  8                  1
4    9    2  3  4                  1
5    5    3  3  9                  1
7    5    5  5  1                  2
8    3    7  5  2                  2

5 Comments

this worked, any idea how to add another column that counts the consecutive 3s and 5vs ? Like this stackoverflow.com/questions/73327208/…
you just want a column that increments value at each group?
if so, you can just use the same trick with cumcount()
yes, I need a column that increments the value at each group. I cant seem to make it work with cumcount(). Any example please...
i already added the example above

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.