Aggregate rows in Pandas DataFrame

Question

I have pandas DataFrame with the following columns:

VisitorID (unique for each user - cookie based)

VisitNumber (1 = first visit, 2 = second visit and etc...)

TimeSpentOnVist (visit duration in seconds)

Channel (the referrer of the visit. ex: Facebook, Google and Bing )

Media type (paid or organic)

The visitor ID is repeating for each visit (1, 2, 3). I would like to aggregate it considering the last visit for channel and media type, but at the same time, aggregate (summing up) the timespent across all the visits. My goal is to group by the visitorID so there is no duplication.

What is the most efficient way to perform this aggregation in Pandas ?

BENY · Accepted Answer · 2017-11-03 16:38:38Z

3

IIUC

df.sort_values(['VisitNumber']).groupby('VisitorID').\
     agg({'TimeSpentOnVist':'sum','Channel':'last','Media type':'last'})

answered Nov 3, 2017 at 16:38

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Sebastian Mendez · Accepted Answer · 2017-11-03 16:49:30Z

0

Wen answers the aggregation question, but I'd also create a MultiIndex to organize the DataFrame:

 df.set_index(['VisitorID','VisitNumber']).sort_index()

answered Nov 3, 2017 at 16:49

Sebastian Mendez

2,99117 silver badges26 bronze badges

Collectives™ on Stack Overflow

Aggregate rows in Pandas DataFrame

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related