Count values in column using pandas

Question

I have data

member_id  device_id                       
19404      dfbc9d3230304cdfb0316cc32c41b67f    [2016-04-28, 2016-04-27, 2016-04-26, 2016-04-22]
19555      176e307bd8714a00ac2b99276123f0a7    [2016-04-29, 2016-04-28, 2016-04-27, 2016-04-23]
19632      a6d4b631e09a4b31afef4c93472c7da3                 [2016-04-29, 2016-04-28, 2016-04-27]
19792      0146b09048ce4c47af4bbc69e7999137    [2016-04-23, 2016-04-22, 2016-04-21, 2016-04-20]
20258      1510f9b4efc14183ad412eb54c9e058f                                         [2016-04-09]
           5f42f4d02d38456689e58d6a1b9a3e16    [2016-04-29, 2016-04-28, 2016-04-25, 2016-04-22]

and I need to count values in the third column in list. I try len(), I thought it returns length of list, but it's wrong. new = data.groupby(['member_id', 'device_id'])['event_date'].unique() count() returns sum of all values

MaxU - stand with Ukraine · Accepted Answer · 2016-05-05 15:26:05Z

1

assuming that you have a list of values in your last column l:

In [113]: df.l.map(len)
Out[113]:
0    4
1    4
2    3
3    4
4    1
5    4
Name: l, dtype: int64

if your last column is string, you can convert it to list first:

df.l.str.replace('[\[\]]', '').str.split('\s*,\s*').map(len)

answered May 5, 2016 at 15:26

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

pmaniyan · Accepted Answer · 2016-05-05 15:35:57Z

Is this what you are looking for:

import pandas as pd

df = pd.DataFrame(columns=('member_id','device_id','event_date'),data=[
[19404,'dfbc9d3230304cdfb0316cc32c41b67f',['2016-04-28', '2016-04-27', '2016-04-26', '2016-04-22']],
[19555,'176e307bd8714a00ac2b99276123f0a7',['2016-04-29', '2016-04-28', '2016-04-27', '2016-04-23']],
[19632,'a6d4b631e09a4b31afef4c93472c7da3',['2016-04-29', '2016-04-28', '2016-04-27']],
[19792,'0146b09048ce4c47af4bbc69e7999137',['2016-04-23', '2016-04-22', '2016-04-21', '2016-04-20']],
[20258,'1510f9b4efc14183ad412eb54c9e058f',['2016-04-09']],
[20258,'5f42f4d02d38456689e58d6a1b9a3e16',['2016-04-29', '2016-04-28', '2016-04-25', '2016-04-22']]
])

new = df.groupby(['member_id', 'device_id'])['event_date']

for each_n in new:
    print each_n[0],len(each_n[1].values[0])

Output

(19404, 'dfbc9d3230304cdfb0316cc32c41b67f') 4
(19555, '176e307bd8714a00ac2b99276123f0a7') 4
(19632, 'a6d4b631e09a4b31afef4c93472c7da3') 3
(19792, '0146b09048ce4c47af4bbc69e7999137') 4
(20258, '1510f9b4efc14183ad412eb54c9e058f') 1
(20258, '5f42f4d02d38456689e58d6a1b9a3e16') 4

Alexander · Accepted Answer · 2016-05-05 16:25:26Z

1

You can apply the len function to the grouped column. The .iat[0] gets the first item in the group, which in this case is your list.

>>> df.groupby(['member_id', 'device_id'])['event_date'].agg(
        {'event_count': lambda group: len(group.iat[0])})
                                            event_count
member_id device_id                                    
19404     dfbc9d3230304cdfb0316cc32c41b67f            4
19555     176e307bd8714a00ac2b99276123f0a7            4
19632     a6d4b631e09a4b31afef4c93472c7da3            3
19792     0146b09048ce4c47af4bbc69e7999137            4
20258     1510f9b4efc14183ad412eb54c9e058f            1
          5f42f4d02d38456689e58d6a1b9a3e16            4

edited May 5, 2016 at 16:25

answered May 5, 2016 at 15:28

Alexander

111k32 gold badges212 silver badges208 bronze badges

1 Comment

user6241246 Over a year ago

It's strange, but it returns uncorrect result to my data

Collectives™ on Stack Overflow

Count values in column using pandas

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related