I have a python program that Does the following.
- reads in a .csv
- creates a dataframe with values from specific columns of the csv
- converts the timestamp from unix timestamp
- groups the data by hour and then Finds the average of certain data in that hour.
code:
df = pd.read_csv(files,parse_dates=True)
df2 = df[['timestamp','avg_hr','avg_rr','emfit_sleep_summary_id']]
df2['timestamp'] = df2['timestamp'].astype(int)
df2['timestamp'] = pd.to_datetime(df2['timestamp'],unit='s')
df2 = df2.set_index('timestamp')
df3 = df2.groupby(df2.index.map(lambda t: t.hour))['avg_hr'].mean()
df4 = df2.groupby(df2.index.map(lambda t: t.hour))['avg_rr'].mean()
print df3
print df4
sample output:
timestamp avg_hr avg_rr emfit_sleep_summary_id
0 2015-01-28 08:14:50 101 6.4 78
1 2015-01-28 08:14:52 98 6.4 78
2 2015-01-28 00:25:00 60 0.0 78
3 2015-01-28 00:25:02 63 0.0 78
4 2015-01-28 07:24:06 79 11.6 78
5 2015-01-28 07:24:08 79 11.6 78
0 99.5
7 61.5
8 78.5
Name: avg_hr, dtype: float64
0 0.000
7 11.725
8 6.400
Name: avg_rr, dtype: float64
I'm now trying to combine df3 and df4 into df2 so the result will look something like this:
timestamp avg_hr avg_rr emfit_sleep_summary_id AVG_HR AVG_RR
0 2015-01-28 08:14:50 101 6.4 78 99.5 6.4
1 2015-01-28 08:14:52 98 6.4 78 99.5 6.4
2 2015-01-28 00:25:00 60 0.0 78 61.5 0.0
3 2015-01-28 00:25:02 63 0.0 78 61.5 0.0
4 2015-01-28 07:24:06 79 11.6 78 78.5 11.6
5 2015-01-28 07:24:08 79 11.6 78 78.5 11.6
I tried doing the following
df2['AVG_HR'] = df2.groupby(df2.index.map(lambda t: t.hour))['avg_hr'].mean()
But when I ran, it returned NAN for the entire column.
EDIT: I'd also know how to reduce the number of rows to a single one for each hour, instead of having 2 per hour.
timestamp avg_hr avg_rr emfit_sleep_summary_id AVG_HR AVG_RR
0 2015-01-28 08:14:50 101 6.4 78 99.5 6.4
1 2015-01-28 00:25:00 60 0.0 78 61.5 0.0
2 2015-01-28 07:24:06 79 11.6 78 78.5 11.6
df2['AVG_HR'] = df2.groupby(df2.index.map(lambda t: t.hour))['avg_hr'].transofrm('mean')can you confirmdf3 = df2.groupby(df2.index.hour)['avg_hr'].mean()df2to a single row per hour? In which case are you wanting the average of the aggregated columns or the sum?df2.groupby(df2.index.hour).mean().reset_index()should squeeze the df to an hourly one, also you could resample