Problem Statement:
I have a file as below.
name | date | count
John | 201406 | 1
John | 201410 | 2
Mary | 201409 | 180
Mary | 201410 | 154
Mary | 201411 | 157
Mary | 201412 | 153
Mary | 201501 | 223
Mary | 201502 | 166
Mary | 201503 | 163
Mary | 201504 | 169
Mary | 201505 | 157
Tara | 201505 | 2
The file shows count data for three people John, Mary and Tara for a couple of months. I would like to analyze this data and come up with a status tag for each person i.e. active, inactive or new.
A person is active if they have entries for 201505 and other previous months - like Mary
A person is inactive if they do not have entries for 201505 - like John
A person is new if they ONLY have 1 entry for 201505 - like Tara.
Furthermore, if a person is active, I would like to get a median of their last 5 counts. For example, for Mary, I would like to get the mean as ((157 + 169 + 163 + 166 + 223 ) / 5).
Question:
I would like to understand how I should read this file in Python 2.7 in order to fulfill my requirements. I started with the following but was not sure how I could get previous entries (i.e. previous lines in file) for a particular person.
for line in data:
col = line.split('\t')
name = col[0]
date = col[1]
count = col[2]
Pandas, then you can use the.groupby('name')function to look at each person individually.