0

I have data

member_id  device_id                       
19404      dfbc9d3230304cdfb0316cc32c41b67f    [2016-04-28, 2016-04-27, 2016-04-26, 2016-04-22]
19555      176e307bd8714a00ac2b99276123f0a7    [2016-04-29, 2016-04-28, 2016-04-27, 2016-04-23]
19632      a6d4b631e09a4b31afef4c93472c7da3                 [2016-04-29, 2016-04-28, 2016-04-27]
19792      0146b09048ce4c47af4bbc69e7999137    [2016-04-23, 2016-04-22, 2016-04-21, 2016-04-20]
20258      1510f9b4efc14183ad412eb54c9e058f                                         [2016-04-09]
           5f42f4d02d38456689e58d6a1b9a3e16    [2016-04-29, 2016-04-28, 2016-04-25, 2016-04-22]

and I need to count values in the third column in list. I try len(), I thought it returns length of list, but it's wrong. new = data.groupby(['member_id', 'device_id'])['event_date'].unique() count() returns sum of all values

3 Answers 3

1

assuming that you have a list of values in your last column l:

In [113]: df.l.map(len)
Out[113]:
0    4
1    4
2    3
3    4
4    1
5    4
Name: l, dtype: int64

if your last column is string, you can convert it to list first:

df.l.str.replace('[\[\]]', '').str.split('\s*,\s*').map(len)
Sign up to request clarification or add additional context in comments.

Comments

1

Is this what you are looking for:

import pandas as pd

df = pd.DataFrame(columns=('member_id','device_id','event_date'),data=[
[19404,'dfbc9d3230304cdfb0316cc32c41b67f',['2016-04-28', '2016-04-27', '2016-04-26', '2016-04-22']],
[19555,'176e307bd8714a00ac2b99276123f0a7',['2016-04-29', '2016-04-28', '2016-04-27', '2016-04-23']],
[19632,'a6d4b631e09a4b31afef4c93472c7da3',['2016-04-29', '2016-04-28', '2016-04-27']],
[19792,'0146b09048ce4c47af4bbc69e7999137',['2016-04-23', '2016-04-22', '2016-04-21', '2016-04-20']],
[20258,'1510f9b4efc14183ad412eb54c9e058f',['2016-04-09']],
[20258,'5f42f4d02d38456689e58d6a1b9a3e16',['2016-04-29', '2016-04-28', '2016-04-25', '2016-04-22']]
])

new = df.groupby(['member_id', 'device_id'])['event_date']

for each_n in new:
    print each_n[0],len(each_n[1].values[0])

Output

(19404, 'dfbc9d3230304cdfb0316cc32c41b67f') 4
(19555, '176e307bd8714a00ac2b99276123f0a7') 4
(19632, 'a6d4b631e09a4b31afef4c93472c7da3') 3
(19792, '0146b09048ce4c47af4bbc69e7999137') 4
(20258, '1510f9b4efc14183ad412eb54c9e058f') 1
(20258, '5f42f4d02d38456689e58d6a1b9a3e16') 4

Comments

1

You can apply the len function to the grouped column. The .iat[0] gets the first item in the group, which in this case is your list.

>>> df.groupby(['member_id', 'device_id'])['event_date'].agg(
        {'event_count': lambda group: len(group.iat[0])})
                                            event_count
member_id device_id                                    
19404     dfbc9d3230304cdfb0316cc32c41b67f            4
19555     176e307bd8714a00ac2b99276123f0a7            4
19632     a6d4b631e09a4b31afef4c93472c7da3            3
19792     0146b09048ce4c47af4bbc69e7999137            4
20258     1510f9b4efc14183ad412eb54c9e058f            1
          5f42f4d02d38456689e58d6a1b9a3e16            4

1 Comment

It's strange, but it returns uncorrect result to my data

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.