1

I have pandas DataFrame with the following columns:

  1. VisitorID (unique for each user - cookie based)
  2. VisitNumber (1 = first visit, 2 = second visit and etc...)
  3. TimeSpentOnVist (visit duration in seconds)
  4. Channel (the referrer of the visit. ex: Facebook, Google and Bing )
  5. Media type (paid or organic)

The visitor ID is repeating for each visit (1, 2, 3). I would like to aggregate it considering the last visit for channel and media type, but at the same time, aggregate (summing up) the timespent across all the visits. My goal is to group by the visitorID so there is no duplication.

What is the most efficient way to perform this aggregation in Pandas ?

0

2 Answers 2

3

IIUC

df.sort_values(['VisitNumber']).groupby('VisitorID').\
     agg({'TimeSpentOnVist':'sum','Channel':'last','Media type':'last'})
Sign up to request clarification or add additional context in comments.

Comments

0

Wen answers the aggregation question, but I'd also create a MultiIndex to organize the DataFrame:

 df.set_index(['VisitorID','VisitNumber']).sort_index()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.