0

I am trying to duplicate my pandas' data frame's rows and also adding an additional column for a time sequence in minutes between column FROM and TO.

For example, I have this data frame.

ID  FROM    TO
A   15:30   15:33
B   16:40   16:44
C   15:20   15:22

What I want the output to be is

ID  FROM    TO  time
A   15:30   15:33   15:30
A   15:30   15:33   15:31
A   15:30   15:33   15:32
A   15:30   15:33   15:33
B   16:40   16:41   16:40
B   16:40   16:41   16:41
C   15:20   15:22   15:20
C   15:20   15:22   15:21
C   15:20   15:22   15:22

In R, I could do this: new_df = setDT(df)[, .(ID, FROM, TO, time=seq(FROM,TO,by="mins")), by=1:nrow(df)], but I am having trouble finding the Python equivalent of this.

Thank you in advance!

2 Answers 2

1

Two steps to solve your problem:

pd.date_range with apply and strftime

df['duration'] = df.apply(
    lambda row: [
    i.strftime('%H:%M')
    for i in pd.date_range(
        row['FROM'], row['TO'], freq='60s'
        )
    ], 
    axis=1)

  ID   FROM     TO                             duration
0  A  15:30  15:33         [15:30, 15:31, 15:32, 15:33]
1  B  16:40  16:44  [16:40, 16:41, 16:42, 16:43, 16:44]
2  C  15:20  15:22                [15:20, 15:21, 15:22]

apply with stack

df.set_index(['ID', 'FROM', 'TO']) \
    .duration.apply(pd.Series) \
    .stack().reset_index(level=3, drop=True) \
    .reset_index() \
    .set_index('ID')

# Result

     FROM     TO      0
ID
A   15:30  15:33  15:30
A   15:30  15:33  15:31
A   15:30  15:33  15:32
A   15:30  15:33  15:33
B   16:40  16:44  16:40
B   16:40  16:44  16:41
B   16:40  16:44  16:42
B   16:40  16:44  16:43
B   16:40  16:44  16:44
C   15:20  15:22  15:20
C   15:20  15:22  15:21
C   15:20  15:22  15:22
Sign up to request clarification or add additional context in comments.

1 Comment

Good one sir date_range is the way
1

Here's a similar one that of @chrisz using concat and iterrows along with date_range confined to a single step

df = pd.concat([pd.DataFrame({
                'ID':row.ID,
                'FROM': row.FROM,
                'TO': row.TO,
                'TIME': pd.Series(pd.date_range(row.FROM, row.TO, freq='60s').time).astype(str).str[:5]
                }) for _, row in df.iterrows()])

      TIME   FROM ID     TO
0    15:30  15:30  A  15:33
1    15:31  15:30  A  15:33
2    15:32  15:30  A  15:33
3    15:33  15:30  A  15:33
0    16:40  16:40  B  16:44
1    16:41  16:40  B  16:44
2    16:42  16:40  B  16:44
3    16:43  16:40  B  16:44
4    16:44  16:40  B  16:44
0    15:20  15:20  C  15:22
1    15:21  15:20  C  15:22
2    15:22  15:20  C  15:22

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.