How to add rows to a pandas dataframe based on dates?

Question

I have a dataframe like this:

data = pd.DataFrame({'ID': [1,2,3], 'Dep':[4,5,6], 'Start Date':['2020-01-01', '2020-01-01', '2020-01-01'], 'End Date':['2020-01-03', '2020-01-01', '2020-01-04']})

   ID  Dep  Start Date  End Date
0   1   4   2020-01-01  2020-01-03
1   2   5   2020-01-01  2020-01-01
2   3   6   2020-01-01  2020-01-04

I would like to split dates based on days and create new date. Something like below:

    ID  Dep Start Date  End Date    New Date  
0   1   4   2020-01-01  2020-01-03  2020-01-01   
1   1   4   2020-01-01  2020-01-03  2020-01-02 
2   1   4   2020-01-01  2020-01-03  2020-01-03   
3   2   5   2020-01-01  2020-01-01  2020-01-01    
4   3   6   2020-01-01  2020-01-04  2020-01-01    
5   3   6   2020-01-01  2020-01-04  2020-01-02       
6   3   6   2020-01-01  2020-01-04  2020-01-03
7   3   6   2020-01-01  2020-01-04  2020-01-04

Thank you.

can't understand your logic?

deadshot
– deadshot

2021-03-25 04:56:36 +00:00
Commented Mar 25, 2021 at 4:56 — deadshot
– deadshot, Commented Mar 25, 2021 at 4:56

Mayank Porwal · Accepted Answer · 2021-03-25 05:16:05Z

3

Use pd.date_range with df.explode:

In [392]: data['New date'] = data.apply(lambda x: pd.date_range(x['Start Date'], x['End Date']), 1)

In [395]: data = data.explode('New date')

In [396]: data
Out[396]: 
   ID  Dep  Start Date    End Date   New date
0   1    4  2020-01-01  2020-01-03 2020-01-01
0   1    4  2020-01-01  2020-01-03 2020-01-02
0   1    4  2020-01-01  2020-01-03 2020-01-03
1   2    5  2020-01-01  2020-01-01 2020-01-01
2   3    6  2020-01-01  2020-01-04 2020-01-01
2   3    6  2020-01-01  2020-01-04 2020-01-02
2   3    6  2020-01-01  2020-01-04 2020-01-03
2   3    6  2020-01-01  2020-01-04 2020-01-04

answered Mar 25, 2021 at 5:16

Mayank Porwal

34.2k9 gold badges45 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

jezrael Over a year ago

I think explode is bottleneck here.

Mayank Porwal Over a year ago

Oh. What would be better?

Mayank Porwal Over a year ago

Can you also please add timings for lesser data?

Mayank Porwal Over a year ago

Maybe 1000 rows.

jezrael Over a year ago

190 ms ± 874 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

|

jezrael · Accepted Answer · 2021-03-25 05:42:01Z

If performance is important, you can use this faster solution:

#convert columns to datetimes
data["Start Date"] = pd.to_datetime(data["Start Date"])
data["End Date"] = pd.to_datetime(data["End Date"])

#subtract values and convert to days
s = data["End Date"].sub(data["Start Date"]).dt.days + 1

#repeat index
df = data.loc[data.index.repeat(s)].copy()

#add days by timedeltas
add = pd.to_timedelta(df.groupby(level=0).cumcount(), unit='d')
df['New Date'] = df["Start Date"].add(add)

print (df)
   ID  Dep Start Date   End Date   New Date
0   1    4 2020-01-01 2020-01-03 2020-01-01
0   1    4 2020-01-01 2020-01-03 2020-01-02
0   1    4 2020-01-01 2020-01-03 2020-01-03
1   2    5 2020-01-01 2020-01-01 2020-01-01
2   3    6 2020-01-01 2020-01-04 2020-01-01
2   3    6 2020-01-01 2020-01-04 2020-01-02
2   3    6 2020-01-01 2020-01-04 2020-01-03
2   3    6 2020-01-01 2020-01-04 2020-01-04

Timings for 3k rows:

data = pd.concat([data] * 1000, ignore_index=True)

In [12]: %%timeit
    ...: data["Start Date"] = pd.to_datetime(data["Start Date"])
    ...: data["End Date"] = pd.to_datetime(data["End Date"])
    ...: 
    ...: s = data["End Date"].sub(data["Start Date"]).dt.days + 1
    ...: 
    ...: df = data.loc[data.index.repeat(s)].copy()
    ...: 
    ...: df['New Date'] = df["Start Date"].add(pd.to_timedelta(df.groupby(level=0).cumcount(), unit='d'))
    ...: 
10.4 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


#Mayank Porwal answer is 56 times slowier in this sample data
In [13]: %%timeit
    ...: data['New date'] = data.apply(lambda x: pd.date_range(x['Start Date'], x['End Date']), 1)
    ...: 
    ...: data.explode('New date')
    ...: 
    ...: 
590 ms ± 67 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Collectives™ on Stack Overflow

How to add rows to a pandas dataframe based on dates?

2 Answers 2

8 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related