1

I have dataframe. It's part of them

       member_id          event_time                       event_path    event_duration  \
0        2333678 2016-12-27 04:17:16  youtube.com/watch?v=w5ZIb05NO58    12  
1        2333678 2016-12-27 04:17:26  youtube.com/watch?v=w5ZIb05NO58     12 
2        2333678 2016-12-27 04:17:36  youtube.com/watch?v=w5ZIb05NO58   10   
3        2333678 2016-12-27 04:17:40  youtube.com/watch?v=w5ZIb05NO58   35   
4        5611206 2016-12-30 17:16:01  youtube.com/watch?v=qZrQWA5IsKA   35   
5        5611206 2016-12-30 17:16:10  youtube.com/watch?v=qZrQWA5IsKA    12  
6        5611206 2016-12-30 17:16:27  youtube.com/watch?v=6YM5UhnElcE   10   
7        5611206 2016-12-30 17:16:37  youtube.com/watch?v=6YM5UhnElcE   10   
8        5611206 2016-12-30 17:16:47  youtube.com/watch?v=6YM5UhnElcE   10

Desire output

       member_id          event_time                       event_path   event_duration
0        2333678 2016-12-27 04:17:16  youtube.com/watch?v=w5ZIb05NO58    69     
4        5611206 2016-12-30 17:16:01  youtube.com/watch?v=qZrQWA5IsKA    47    
6        5611206 2016-12-30 17:16:27  youtube.com/watch?v=6YM5UhnElcE    30      

I use

g = (df.event_path != df.event_path.shift()).cumsum()
df = (df.groupby([df.member_id, df.event_time, g], sort=False).agg({'event_duration':'sum', 'event_path':'first'})
     .reset_index(level='event_path', drop=True)
     .reset_index()
     .reindex(columns=df.columns))

But it doesn't concat all strings.

1
  • You have different event_time. You want to take the first one in groups? Commented Apr 19, 2017 at 10:10

2 Answers 2

1

If you want to have the first item for each group from event_time you can use the following (you have also used this for event_path):

>>> df.groupby([df.member_id, df.event_path]).agg({'event_duration':'sum', 'event_time': 'first'}).reset_index().reindex(columns=df.columns)

    member_id event_time                       event_path  event_duration
0  2016-12-27   04:17:16  youtube.com/watch?v=w5ZIb05NO58              69
1  2016-12-30   17:16:27  youtube.com/watch?v=6YM5UhnElcE              30
2  2016-12-30   17:16:01  youtube.com/watch?v=qZrQWA5IsKA              47
Sign up to request clarification or add additional context in comments.

Comments

1
    df.groupby(['member_id','event_path']).agg({'event_time':'min','event_duration':'sum'}).reset_index()

Output:

  member_id                       event_path           event_time  \
0   2333678  youtube.com/watch?v=w5ZIb05NO58  2016-12-27 04:17:16   
1   5611206  youtube.com/watch?v=6YM5UhnElcE  2016-12-30 17:16:27   
2   5611206  youtube.com/watch?v=qZrQWA5IsKA  2016-12-30 17:16:01   

   event_duration  
0              69  
1              30  
2              47  

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.