Pandas DataFrame rolling count

Question

I have the following pandas dataframe (just an example):

import pandas as pd
df = pd.DataFrame(pd.Series(['a','a','a','b','b','c','c','c','c','b','c','a']), columns = ['Data'])

The goal is to get another column, Stats, that count the element of Data column as following:

   Data Stats
0     a      
1     a      
2     a    a3
3     b      
4     b    b2
5     c      
6     c      
7     c      
8     c    c4
9     b    b1
10    c    c1
11    a    a1

Where, for example, a3 means "three consecutive a elements", c4 means "four consecutive c elements" and so on...

Thank you in advance for your help

jpp · Accepted Answer · 2018-07-26 10:12:53Z

2

Here's one way using groupby:

counts = df.groupby((df['Data'] != df['Data'].shift()).cumsum()).cumcount() + 1

df['Stats'] = np.where(df['Data'] != df['Data'].shift(-1),
                       df['Data'] + counts.astype(str), '')

print(df)

   Data Stats
0     a      
1     a      
2     a    a3
3     b      
4     b    b2
5     c      
6     c      
7     c      
8     c    c4
9     b    b1
10    c    c1
11    a    a1

answered Jul 26, 2018 at 10:12

jpp

166k37 gold badges301 silver badges363 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Gilberto Over a year ago

thank you @jpp I will study both solutions, yours and the one from jezrael.

jezrael · Accepted Answer · 2018-07-26 10:16:54Z

1

Create helper Series s for consecutive values of column Data, get count per groups by GroupBy.transform and last repalce duplicated values to empty strings:

s = df['Data'].ne(df['Data'].shift()).cumsum()
a = df.groupby(s)['Data'].transform('size')

df['Stats'] = np.where(~s.duplicated(keep='last'), df['Data'] + a.astype(str), '')
print (df)
   Data Stats
0     a      
1     a      
2     a    a3
3     b      
4     b    b2
5     c      
6     c      
7     c      
8     c    c4
9     b    b1
10    c    c1
11    a    a1

Detail:

print (s)
0     1
1     1
2     1
3     2
4     2
5     3
6     3
7     3
8     3
9     4
10    5
11    6
Name: Data, dtype: int32

print (a)
0     3
1     3
2     3
3     2
4     2
5     4
6     4
7     4
8     4
9     1
10    1
11    1
Name: Data, dtype: int64

Without removing duplicates solution is simplier:

df['Stats'] = df['Data'] + a.astype(str)
print (df)

   Data Stats
0     a    a3
1     a    a3
2     a    a3
3     b    b2
4     b    b2
5     c    c4
6     c    c4
7     c    c4
8     c    c4
9     b    b1
10    c    c1
11    a    a1

answered Jul 26, 2018 at 10:16

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

3 Comments

Gilberto Over a year ago

thank you @jezrael for the solution, I will study it for improve my knowledge on python

jezrael Over a year ago

@Gilberto - I only pointed it because I see my solution was accepted and then no ;)

Gilberto Over a year ago

I wanted to give both the check. To me both solutions solve the problem and I think they're both very interesting to me (I'm quite new to python)

Collectives™ on Stack Overflow

Pandas DataFrame rolling count

2 Answers 2

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related