1

I have a python program that Does the following.

  • reads in a .csv
  • creates a dataframe with values from specific columns of the csv
  • converts the timestamp from unix timestamp
  • groups the data by hour and then Finds the average of certain data in that hour.

code:

df = pd.read_csv(files,parse_dates=True)
df2 = df[['timestamp','avg_hr','avg_rr','emfit_sleep_summary_id']]
df2['timestamp'] = df2['timestamp'].astype(int)
df2['timestamp'] = pd.to_datetime(df2['timestamp'],unit='s')

df2 = df2.set_index('timestamp')
df3 = df2.groupby(df2.index.map(lambda t: t.hour))['avg_hr'].mean()
df4 = df2.groupby(df2.index.map(lambda t: t.hour))['avg_rr'].mean()

print df3
print df4

sample output:

       timestamp         avg_hr  avg_rr    emfit_sleep_summary_id
0 2015-01-28 08:14:50     101     6.4                      78
1 2015-01-28 08:14:52      98     6.4                      78
2 2015-01-28 00:25:00      60     0.0                      78 
3 2015-01-28 00:25:02      63     0.0                      78
4 2015-01-28 07:24:06      79    11.6                      78
5 2015-01-28 07:24:08      79    11.6                      78
0    99.5
7    61.5
8    78.5
Name: avg_hr, dtype: float64
0     0.000
7    11.725
8     6.400
Name: avg_rr, dtype: float64

I'm now trying to combine df3 and df4 into df2 so the result will look something like this:

       timestamp         avg_hr  avg_rr    emfit_sleep_summary_id   AVG_HR    AVG_RR
0 2015-01-28 08:14:50     101     6.4                      78        99.5       6.4 
1 2015-01-28 08:14:52      98     6.4                      78        99.5       6.4
2 2015-01-28 00:25:00      60     0.0                      78        61.5       0.0
3 2015-01-28 00:25:02      63     0.0                      78        61.5       0.0
4 2015-01-28 07:24:06      79    11.6                      78        78.5       11.6
5 2015-01-28 07:24:08      79    11.6                      78        78.5       11.6

I tried doing the following

df2['AVG_HR'] = df2.groupby(df2.index.map(lambda t: t.hour))['avg_hr'].mean()

But when I ran, it returned NAN for the entire column.

EDIT: I'd also know how to reduce the number of rows to a single one for each hour, instead of having 2 per hour.

       timestamp         avg_hr  avg_rr    emfit_sleep_summary_id   AVG_HR    AVG_RR
0 2015-01-28 08:14:50     101     6.4                      78        99.5       6.4 
1 2015-01-28 00:25:00      60     0.0                      78        61.5       0.0
2 2015-01-28 07:24:06      79    11.6                      78        78.5       11.6
6
  • I think what you want is this: df2['AVG_HR'] = df2.groupby(df2.index.map(lambda t: t.hour))['avg_hr'].transofrm('mean') can you confirm Commented Apr 9, 2015 at 14:55
  • Also you don't need a lambda to groupby the hour this should work: df3 = df2.groupby(df2.index.hour)['avg_hr'].mean() Commented Apr 9, 2015 at 14:56
  • @EdChum that worked, if you could post that as the answer, I'll accept it. Also, wondering, is there any way to reduce the rows? instead of having 2 of each timestamp, can I have just one? Commented Apr 9, 2015 at 15:07
  • So you want to reduce df2 to a single row per hour? In which case are you wanting the average of the aggregated columns or the sum? df2.groupby(df2.index.hour).mean().reset_index() should squeeze the df to an hourly one, also you could resample Commented Apr 9, 2015 at 15:12
  • yes, instead of 2 timesstamps, I need one per hour. I want the average to remain as is. Please see the edit. Commented Apr 9, 2015 at 15:20

1 Answer 1

1

To add a aggregated column from a groupby use transform this will return a Series aligned with the original df:

df2['AVG_HR'] = df2.groupby(df2.index.map(lambda t: t.hour))['avg_hr'].transofrm('mean')

Also it's unnecessary to use a lambda to groupby the hour, the index, if it is a DateTimeindex has the datetime attributes that can be accessed directly so the above can be simplified to:

df2['AVG_HR'] = df2.groupby(df2.index.hour)['avg_hr'].transform('mean')

If you want to resample by hour you could just groupby the hour and then call reset_index:

In [17]:

df.groupby(df.index.hour).mean().reset_index()
Out[17]:
   index  avg_hr  avg_rr  emfit_sleep_summary_id
0      0    61.5     0.0                      78
1      7    79.0    11.6                      78
2      8    99.5     6.4                      78
Sign up to request clarification or add additional context in comments.

6 Comments

I suppose there is no way to make the datetime stay? instead of 0, 7, 8
Not using groupby, the alternative would be to call drop_duplicates on your df after you've added the average columns but that doesn't average the other duplcate columns
I did this df2 = df2.drop_duplicates(subset='AVG_HR',take_last=True) and that worked :)
But that won't average out say 'avg_hr' though if that's what you want then that's fine, I would've posted that but thought you wanted to average all values, you can upvote too ;-)
I got the avg_hr using df2['AVG_HR'] = df2.groupby(df2.index.hour)['avg_hr'].transform('mean') this would add an extra column, I used that column to filter the hours. same with avg_rr
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.