Pandas: Get rows with consecutive column values

Question

I need to go through a large pd and select consecutive rows with similar values in a column. i.e. in the pd below and selecting column x: I want to specify consecutive values in column x? Say if I want consecutive values of 3 and 5 only

col row x   y
1   1   1   1
5   7   3   0
2   2   2   2
6   3   3   8
9   2   3   4
5   3   3   9
4   9   4   4
5   5   5   1
3   7   5   2
6   6   6   6
5   8   6   2
3   7   6   0

The results output would be:

col row x   y   consecutive-count
6   3   3   8          1
9   2   3   4          1 
5   3   3   9          1
5   5   5   1          2
3   7   5   2          2

I tried

m = df['x'].eq(df['x'].shift())
df[m|m.shift(-1, fill_value=False)]

But that includes the consecutive 6 that I don't want.

I also tried:

df.query( 'x in [3,5]')

That prints every row where x has 3 or 5.

Also, did you delete the other question? I cannot find it anymore (you should keep it, it was probably useful to others) — mozway
– mozway, Commented Aug 11, 2022 at 21:59
@mozway I am keeping all consecutive 3s or 5ves. so no single 3's or 5ves — code_error
– code_error, Commented Aug 12, 2022 at 0:31
have you tested my answer? Does it work as you want? If not, please provide a counter example with explanation — mozway
– mozway, Commented Aug 12, 2022 at 3:30

mozway · Accepted Answer · 2022-08-11 22:04:15Z

2

IIUC use masks for boolean indexing. Check for 3 or 5, and use a cummax and reverse cummax to ensure having the order:

m1 = df['x'].eq(3)
m2 = df['x'].eq(5)

out = df[(m1|m2)&(m1.cummax()&m2[::-1].cummax())]

Output:

   col  row  x  y
2    6    3  3  8
3    9    2  3  4
4    5    3  3  9
6    5    5  5  1
7    3    7  5  2

answered Aug 11, 2022 at 22:04

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

anon01 · Accepted Answer · 2022-08-12 22:44:27Z

1

you can create a group column for consecutive values, and filter by the group count and value of x:

# create unique ids for consecutive groups, then get group length:
group_num = (df.x.shift() != df.x).cumsum()
group_len = group_num.groupby(group_num).transform("count")

# filter main df:
df2 = df[(df.x.isin([3,5])) & (group_len > 1)]

# add new group num col
df2['consecutive-count'] = (df2.x != df2.x.shift()).cumsum()

output:

   col  row  x  y  consecutive-count
3    6    3  3  8                  1
4    9    2  3  4                  1
5    5    3  3  9                  1
7    5    5  5  1                  2
8    3    7  5  2                  2

edited Aug 12, 2022 at 22:44

answered Aug 12, 2022 at 1:23

anon01

11.2k8 gold badges41 silver badges64 bronze badges

5 Comments

code_error Over a year ago

this worked, any idea how to add another column that counts the consecutive 3s and 5vs ? Like this stackoverflow.com/questions/73327208/…

anon01 Over a year ago

you just want a column that increments value at each group?

anon01 Over a year ago

if so, you can just use the same trick with cumcount()

code_error Over a year ago

yes, I need a column that increments the value at each group. I cant seem to make it work with cumcount(). Any example please...

anon01 Over a year ago

i already added the example above

Collectives™ on Stack Overflow

Pandas: Get rows with consecutive column values

2 Answers 2

Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related