0

I have a DataFrame containing events like this:

location  start_time   end_time     some_value1   some_value2
LECP      00:00        01:30        25            nice info
LECP      02:00        04:00        10            other info
LECS      02:00        03:00         5            lorem
LIPM      02:55        03:15         9            ipsum

and I want to split the rows so that I get maximum intervals of 1 hour, e.g. if an event has a duration of 01:30, I want to get a row of length 01:00 and another of 00:30. If an event has a length of 02:30, I want to get three rows. And if an event has a duration of an hour or less, it should just remain being one row. Like so:

location  start_time   end_time   some_value1   some_value2
LECP      00:00        01:00      25            nice info
LECP      01:00        01:30      25            nice info

LECP      02:00        03:00      10            other info
LECP      03:00        04:00      10            other info

LECS      02:00        03:00       5            lorem
LIPM      02:55        03:15       9            ipsum

It does not matter if the remainder is at the beginning or the end. It even would not matter if the duration is distributed equally to the rows, as long as no rows has a duration of > 1 hour.

What I tried: - reading through Time Series / Date functionality and not understanding anything - searching StackOverflow.

3
  • This is because these are independent events. Sveral events may occur at the same or different places, at the same or different times Commented Oct 11, 2017 at 20:54
  • Uh... I am sorry. My question is in your expected results, should the second record start with 01:00 instead of 00:00? Commented Oct 11, 2017 at 20:57
  • My bad. Yes your interpretation is right. Edited the OP. Commented Oct 11, 2017 at 20:59

1 Answer 1

1

I adapted this answer to implement hourly and not daily splits. This code works in a WHIL-loop, so it will re-itereate as long as there are rows with durations still > 1hour.

mytimedelta = pd.Timedelta('1 hour')

#create boolean mask
split_rows = (dfob['duration'] > mytimedelta)    

while split_rows.any():
    #get new rows to append and adjust start time to 1 hour later.
    new_rows = dfob[split_rows].copy()
    new_rows['start'] = new_rows['start'] + mytimedelta

    #update the end time of old rows
    dfob.loc[split_rows, 'end'] = dfob.loc[split_rows, 'start'] + \
        pd.DateOffset(hours=1, seconds=-1)
    dfob = dfob.append(new_rows)

    #update the duration of all rows
    dfob['duration'] = dfob['end'] - dfob['start']

    #create an updated boolean mask
    split_rows = (dfob['duration'] > mytimedelta)

#when job is done:
dfob.sort_index().reset_index(drop=True)
dfob['duration'] = dfob['end'] - dfob['start']    
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.