Pandas dataframe rolling consecutive count

Question

Suppose, I have Pandas DataFrame look like below:

  account   have  
  A         1     
  A         2     
  A         1     
  A         1     
  A         1     
  A         1     
  A         1     
  A         1     
  A         1     
  B         1     
  B         1     
  B         1     
  B         2     
  B         1     
  B         1     
  B         1     
  B         1     
  B         1     
  B         1

I want the results look like below:

  account   want  
  A         NaN   
  A         NaN   
  A         1     
  A         2     
  A         3     
  A         3     
  A         3     
  A         3     
  A         3     
  B         NaN   
  B         NaN   
  B         3     
  B         2     
  B         1     
  B         2     
  B         3     
  B         3     
  B         3     
  B         3

The idea behind is that given the rolling window equal to 3. I want to find the longest consecutive count that value equal to 1. For example, in account A, the longest consecutive count that value equal to 1 given window equal to 3 is 1 (at index 2). At index 3, the result returns 2 that because given window contained values of 2, 1, 1.

Follow the same logic above and applied to account B, the results will be as shown.

Any suggestion on this process.

Thanks a lot!

Could you explain a bit better why at index 2 the count is 1? — Dani Mesejo
– Dani Mesejo, Commented Dec 22, 2020 at 8:27
Because the longest consecutive count value of 1 is only 1. Given rolling window is 3, it then contained value of [1, 2, 1]. So, there is no consecutive value in the window here, it then return the longest consecutive count available, which is 1. — Sasiwut Chaiyadecha
– Sasiwut Chaiyadecha, Commented Dec 22, 2020 at 8:31

jezrael · Accepted Answer · 2020-12-22 08:57:09Z

1

Use:

f = lambda x: 1 if x.iat[1] != 1 else (x == 1).sum()
df['new']=df.groupby('account')['have'].rolling(3).apply(f).reset_index(level=0, drop=True)
print (df)
   account  have  new
0        A     1  NaN
1        A     2  NaN
2        A     1  1.0
3        A     1  2.0
4        A     1  3.0
5        A     1  3.0
6        A     1  3.0
7        A     1  3.0
8        A     1  3.0
9        B     1  NaN
10       B     1  NaN
11       B     1  3.0
12       B     2  2.0
13       B     1  1.0
14       B     1  2.0
15       B     1  3.0
16       B     1  3.0
17       B     1  3.0
18       B     1  3.0

edited Dec 22, 2020 at 8:57

answered Dec 22, 2020 at 8:21

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Sasiwut Chaiyadecha Over a year ago

I have a very similar idea but I found it is super slow when applying when a million rows of data.

jezrael Over a year ago

@SasiwutChaiyadecha - ya, agree I got one idea, need some time

jezrael Over a year ago

@SasiwutChaiyadecha - I think if working with million rows of data. then need pure numpy or numba solution instead .rolling function (because slow)

Sasiwut Chaiyadecha Over a year ago

Any suggestion by using numpy or numba?

jezrael Over a year ago

@SasiwutChaiyadecha - I try something and failed :( Unfortuantely.

|

Dani Mesejo · Accepted Answer · 2020-12-22 09:26:00Z

1

One approach could be:

import numpy as np


def compute_max_run(window):
    """Based on this answer https://stackoverflow.com/a/43986888/4001592"""
    diffs = np.diff(window, prepend=0, append=0)

    starts, = np.where(diffs == -1)
    ends, = np.where(diffs == 1)

    if len(ends) and len(starts):
        return (starts - ends).max()
    return 0


def compute(s, w=3, val=1):
    return s.eq(val).rolling(w).apply(compute_max_run)


df['want'] = df.groupby('account')['have'].transform(compute)
print(df)

Output

   account  have  want
0        A     1   NaN
1        A     2   NaN
2        A     1   1.0
3        A     1   2.0
4        A     1   3.0
5        A     1   3.0
6        A     1   3.0
7        A     1   3.0
8        A     1   3.0
9        B     1   NaN
10       B     1   NaN
11       B     1   3.0
12       B     2   2.0
13       B     1   1.0
14       B     1   2.0
15       B     1   3.0
16       B     1   3.0
17       B     1   3.0
18       B     1   3.0

edited Dec 22, 2020 at 9:26

answered Dec 22, 2020 at 8:53

Dani Mesejo

62.2k6 gold badges56 silver badges86 bronze badges

4 Comments

Sasiwut Chaiyadecha Over a year ago

Getting the error zero-size array to reduction operation maximum which has no identity

Dani Mesejo Over a year ago

@SasiwutChaiyadecha With the same input?

Dani Mesejo Over a year ago

@SasiwutChaiyadecha Updated the answer.

Sasiwut Chaiyadecha Over a year ago

It works but I am trying apply to my dataset which has a million of rows, it sees to be slow.

G.G · Accepted Answer · 2022-12-23 04:37:56Z

0

simple

df1.assign(want=df1.groupby('account').rolling(3)
           .apply(lambda ss:ss.diff().eq(0).sum()+1).droplevel(0))

out：

account  have  want
0        A     1   NaN
1        A     2   NaN
2        A     1   1.0
3        A     1   2.0
4        A     1   3.0
5        A     1   3.0
6        A     1   3.0
7        A     1   3.0
8        A     1   3.0
9        B     1   NaN
10       B     1   NaN
11       B     1   3.0
12       B     2   2.0

answered Dec 23, 2022 at 4:37

G.G

7654 silver badges5 bronze badges

1 Comment

G.G Over a year ago

@SasiwutChaiyadecha

Collectives™ on Stack Overflow

Pandas dataframe rolling consecutive count

3 Answers 3

9 Comments

4 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

9 Comments

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related