Split DataFrame rows by DateTime in pandas

Question

I have a DataFrame containing events like this:

location  start_time   end_time     some_value1   some_value2
LECP      00:00        01:30        25            nice info
LECP      02:00        04:00        10            other info
LECS      02:00        03:00         5            lorem
LIPM      02:55        03:15         9            ipsum

and I want to split the rows so that I get maximum intervals of 1 hour, e.g. if an event has a duration of 01:30, I want to get a row of length 01:00 and another of 00:30. If an event has a length of 02:30, I want to get three rows. And if an event has a duration of an hour or less, it should just remain being one row. Like so:

location  start_time   end_time   some_value1   some_value2
LECP      00:00        01:00      25            nice info
LECP      01:00        01:30      25            nice info

LECP      02:00        03:00      10            other info
LECP      03:00        04:00      10            other info

LECS      02:00        03:00       5            lorem
LIPM      02:55        03:15       9            ipsum

It does not matter if the remainder is at the beginning or the end. It even would not matter if the duration is distributed equally to the rows, as long as no rows has a duration of > 1 hour.

What I tried: - reading through Time Series / Date functionality and not understanding anything - searching StackOverflow.

This is because these are independent events. Sveral events may occur at the same or different places, at the same or different times — Ulu83
– Ulu83, Commented Oct 11, 2017 at 20:54
Uh... I am sorry. My question is in your expected results, should the second record start with 01:00 instead of 00:00? — Scott Boston
– Scott Boston, Commented Oct 11, 2017 at 20:57

Ulu83 · Accepted Answer · 2017-10-15 15:17:22Z

I adapted this answer to implement hourly and not daily splits. This code works in a WHIL-loop, so it will re-itereate as long as there are rows with durations still > 1hour.

mytimedelta = pd.Timedelta('1 hour')

#create boolean mask
split_rows = (dfob['duration'] > mytimedelta)    

while split_rows.any():
    #get new rows to append and adjust start time to 1 hour later.
    new_rows = dfob[split_rows].copy()
    new_rows['start'] = new_rows['start'] + mytimedelta

    #update the end time of old rows
    dfob.loc[split_rows, 'end'] = dfob.loc[split_rows, 'start'] + \
        pd.DateOffset(hours=1, seconds=-1)
    dfob = dfob.append(new_rows)

    #update the duration of all rows
    dfob['duration'] = dfob['end'] - dfob['start']

    #create an updated boolean mask
    split_rows = (dfob['duration'] > mytimedelta)

#when job is done:
dfob.sort_index().reset_index(drop=True)
dfob['duration'] = dfob['end'] - dfob['start']

Collectives™ on Stack Overflow

Split DataFrame rows by DateTime in pandas

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related