0

Suppose, I have Pandas DataFrame look like below:

  account   have  
  A         1     
  A         2     
  A         1     
  A         1     
  A         1     
  A         1     
  A         1     
  A         1     
  A         1     
  B         1     
  B         1     
  B         1     
  B         2     
  B         1     
  B         1     
  B         1     
  B         1     
  B         1     
  B         1  

I want the results look like below:

  account   want  
  A         NaN   
  A         NaN   
  A         1     
  A         2     
  A         3     
  A         3     
  A         3     
  A         3     
  A         3     
  B         NaN   
  B         NaN   
  B         3     
  B         2     
  B         1     
  B         2     
  B         3     
  B         3     
  B         3     
  B         3  

The idea behind is that given the rolling window equal to 3. I want to find the longest consecutive count that value equal to 1. For example, in account A, the longest consecutive count that value equal to 1 given window equal to 3 is 1 (at index 2). At index 3, the result returns 2 that because given window contained values of 2, 1, 1.

Follow the same logic above and applied to account B, the results will be as shown.

Any suggestion on this process.

Thanks a lot!

2
  • Could you explain a bit better why at index 2 the count is 1? Commented Dec 22, 2020 at 8:27
  • 1
    Because the longest consecutive count value of 1 is only 1. Given rolling window is 3, it then contained value of [1, 2, 1]. So, there is no consecutive value in the window here, it then return the longest consecutive count available, which is 1. Commented Dec 22, 2020 at 8:31

3 Answers 3

1

Use:

f = lambda x: 1 if x.iat[1] != 1 else (x == 1).sum()
df['new']=df.groupby('account')['have'].rolling(3).apply(f).reset_index(level=0, drop=True)
print (df)
   account  have  new
0        A     1  NaN
1        A     2  NaN
2        A     1  1.0
3        A     1  2.0
4        A     1  3.0
5        A     1  3.0
6        A     1  3.0
7        A     1  3.0
8        A     1  3.0
9        B     1  NaN
10       B     1  NaN
11       B     1  3.0
12       B     2  2.0
13       B     1  1.0
14       B     1  2.0
15       B     1  3.0
16       B     1  3.0
17       B     1  3.0
18       B     1  3.0
Sign up to request clarification or add additional context in comments.

9 Comments

I have a very similar idea but I found it is super slow when applying when a million rows of data.
@SasiwutChaiyadecha - ya, agree I got one idea, need some time
@SasiwutChaiyadecha - I think if working with million rows of data. then need pure numpy or numba solution instead .rolling function (because slow)
Any suggestion by using numpy or numba?
@SasiwutChaiyadecha - I try something and failed :( Unfortuantely.
|
1

One approach could be:

import numpy as np


def compute_max_run(window):
    """Based on this answer https://stackoverflow.com/a/43986888/4001592"""
    diffs = np.diff(window, prepend=0, append=0)

    starts, = np.where(diffs == -1)
    ends, = np.where(diffs == 1)

    if len(ends) and len(starts):
        return (starts - ends).max()
    return 0


def compute(s, w=3, val=1):
    return s.eq(val).rolling(w).apply(compute_max_run)


df['want'] = df.groupby('account')['have'].transform(compute)
print(df)

Output

   account  have  want
0        A     1   NaN
1        A     2   NaN
2        A     1   1.0
3        A     1   2.0
4        A     1   3.0
5        A     1   3.0
6        A     1   3.0
7        A     1   3.0
8        A     1   3.0
9        B     1   NaN
10       B     1   NaN
11       B     1   3.0
12       B     2   2.0
13       B     1   1.0
14       B     1   2.0
15       B     1   3.0
16       B     1   3.0
17       B     1   3.0
18       B     1   3.0

4 Comments

Getting the error zero-size array to reduction operation maximum which has no identity
@SasiwutChaiyadecha With the same input?
@SasiwutChaiyadecha Updated the answer.
It works but I am trying apply to my dataset which has a million of rows, it sees to be slow.
0

simple

df1.assign(want=df1.groupby('account').rolling(3)
           .apply(lambda ss:ss.diff().eq(0).sum()+1).droplevel(0))

out:

account  have  want
0        A     1   NaN
1        A     2   NaN
2        A     1   1.0
3        A     1   2.0
4        A     1   3.0
5        A     1   3.0
6        A     1   3.0
7        A     1   3.0
8        A     1   3.0
9        B     1   NaN
10       B     1   NaN
11       B     1   3.0
12       B     2   2.0

1 Comment

@SasiwutChaiyadecha

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.