How to select a datetime index range and use if conditions in Pandas

Question

I have a data frame df with 100,000 rows using DateTime index. Let the January case as an example. I would like to create a new column named 'Experiment', which may help me to identify when the experiment starts and ends, with 10 experiments in total.

 df=
                            Place      
        Time               
        2021-01-01 00:00    home         
        2021-01-01 00:01    home       
        2021-01-01 00:02    home        
        2021-01-01 00:03    home     
        ................    ....  
        ................    ....
        2021-01-31 23:57    home
        2021-01-31 23:58    home
        2021-01-31 23:59    home

For example, experiment A starts between 2021-01-01 00:00 and 2021-01-01 00:02 and experiment J starts between 2021-01-31 23:57 and 2021-01-31 23:59. the expected results will be like this.

df=
                            Place  Experiment
        Time               
        2021-01-01 00:00    home      A   
        2021-01-01 00:01    home      A 
        2021-01-01 00:02    home      A  
        2021-01-01 00:03    home     
        ................    ....  
        ................    ....
        2021-01-31 23:57    home      J
        2021-01-31 23:58    home      J
        2021-01-31 23:59    home      J

My approach is like this.

df["experiment"] = ""
df["experiment"] = np.where(df.between_time('2021-01-01 00:00','2021-01-01 00:02'),'A',np.nan)
df["experiment"] = np.where(df.between_time('2021-01-31 23:57','2021-01-31 23:59'),'J',np.nan)

And I just realise that the between_time is not working when includes date. Moreover, I am facing the problem that the Length of values does not match length of index.

Thank you!

Quang Hoang · Accepted Answer · 2021-02-11 03:45:41Z

1

Using np.where as you do right now would override what you already created.

For multiple conditions, use .loc to update:

# the experiment time
list_starts = ['2021-01-01 00:00','2021-01-31 23:57']
list_ends = ['2021-01-01 00:02', '2021-01-31 23:59']
list_names = ['A','J']

for start_time, end_time, name in zip(list_starts, list_ends, list_names):
    df.loc[start_time:end_time, 'experiment'] = name

Another (better) way to organize your experiment time can be:

# name: (start, end)
exp_times = {
    'A': ('2021-01-01 00:00', '2021-01-01 00:02'),
    'J': ('2021-01-31 23:57', '2021-01-31 23:59')
}

for name, (start_time, end_time) in exp_times.items():
    df.loc[start_time:end_time, 'experiment'] = name

Output:

                    Place experiment
Time                                
2021-01-01 00:00:00  home          A
2021-01-01 00:01:00  home          A
2021-01-01 00:02:00  home          A
2021-01-01 00:03:00  home        NaN
2021-01-31 23:57:00  home          J
2021-01-31 23:58:00  home          J
2021-01-31 23:59:00  home          J

Note: As you may have noticed, you can use strings to slice/index a time-indexed dataframe.

edited Feb 11, 2021 at 3:45

answered Feb 11, 2021 at 3:26

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

soso Over a year ago

Thank you! The first way is not working for me but the second way is working. Btw, your answer of the first way has missed the 's' for the list_end and list_name

Quang Hoang Over a year ago

@ahsojai thanks, updated the answer. I think the two solutions are different by only how you organize the data. But I'm glad that at least one of them works for you.

Collectives™ on Stack Overflow

How to select a datetime index range and use if conditions in Pandas

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related