I have a data frame df with 100,000 rows using DateTime index. Let the January case as an example. I would like to create a new column named 'Experiment', which may help me to identify when the experiment starts and ends, with 10 experiments in total.
df=
Place
Time
2021-01-01 00:00 home
2021-01-01 00:01 home
2021-01-01 00:02 home
2021-01-01 00:03 home
................ ....
................ ....
2021-01-31 23:57 home
2021-01-31 23:58 home
2021-01-31 23:59 home
For example, experiment A starts between 2021-01-01 00:00 and 2021-01-01 00:02 and experiment J starts between 2021-01-31 23:57 and 2021-01-31 23:59. the expected results will be like this.
df=
Place Experiment
Time
2021-01-01 00:00 home A
2021-01-01 00:01 home A
2021-01-01 00:02 home A
2021-01-01 00:03 home
................ ....
................ ....
2021-01-31 23:57 home J
2021-01-31 23:58 home J
2021-01-31 23:59 home J
My approach is like this.
df["experiment"] = ""
df["experiment"] = np.where(df.between_time('2021-01-01 00:00','2021-01-01 00:02'),'A',np.nan)
df["experiment"] = np.where(df.between_time('2021-01-31 23:57','2021-01-31 23:59'),'J',np.nan)
And I just realise that the between_time is not working when includes date. Moreover, I am facing the problem that the Length of values does not match length of index.
Thank you!