I have purchasing data and want to label them with a new column, which provides information about the daytime of the purchase. For that I'm using the hour of the timestamp column of each purchase.
Labels should work like this:
hour 4 - 7 => 'morning'
hour 8 - 11 => 'before midday'
...
I've already picked the hours of the timestamp. Now, I have a DataFrame with 50 mio records which looks as follows.
user_id timestamp hour
0 11 2015-08-21 06:42:44 6
1 11 2015-08-20 13:38:58 13
2 11 2015-08-20 13:37:47 13
3 11 2015-08-21 06:59:05 6
4 11 2015-08-20 13:15:21 13
At the moment my approach is to use 6x .iterrows(), each with a different condition:
for index, row in basket_times[(basket_times['hour'] >= 4) & (basket_times['hour'] < 8)].iterrows():
basket_times['periode'] = 'morning'
then:
for index, row in basket_times[(basket_times['hour'] >= 8) & (basket_times['hour'] < 12)].iterrows():
basket_times['periode'] = 'before midday'
and so on.
However, one of those 6 loops for 50 mio records takes already like an hour. Is there a better way to do this?