creating multiple df rows based on two date columns

Question

i have a df comprised of power outages with several columns, a start date column, and an end date column

what i would like to be able to do:

scan the "start date" column for the earliest date
scan the "finish date" column for the latest date
build a date index with all dates in between those two dates
for each row, create a row for each date from the start date to the finish date, thus removing the need for both date columns

so if my df looked as follows:

start date    mw outage    end date     location
01/01/2000    1000         01/04/2000   merica
01/01/2000    2000         01/03/2000   canadia

i'd want it instead to look like this

date        mw outage       location
01/01/2000  1000            merica
01/01/2000  2000            canadia
01/02/2000  1000            merica
01/02/2000  2000            canadia
01/03/2000  1000            merica
01/03/2000  2000            canadia
01/04/2000  1000            merica

i think i can use reindex to add the missing dates but i'm not sure how to identify the oldest/newest and i don't know how to create the rows in this manner

BENY · Accepted Answer · 2020-08-13 15:00:42Z

2

We need create the range date column then explode

df.startdate=pd.to_datetime(df.startdate)
df.enddate=pd.to_datetime(df.enddate)
df['date']=[pd.date_range(x, y ) for x , y in zip(df.startdate, df.enddate)]
df=df.explode('date')
Out[169]: 
   startdate  mwoutage    enddate location       date
0 2000-01-01      1000 2000-01-04   merica 2000-01-01
0 2000-01-01      1000 2000-01-04   merica 2000-01-02
0 2000-01-01      1000 2000-01-04   merica 2000-01-03
0 2000-01-01      1000 2000-01-04   merica 2000-01-04
1 2000-01-01      2000 2000-01-03  canadia 2000-01-01
1 2000-01-01      2000 2000-01-03  canadia 2000-01-02
1 2000-01-01      2000 2000-01-03  canadia 2000-01-03

answered Aug 13, 2020 at 15:00

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

visualnotsobasic Over a year ago

So sorry for the late reply here - this was a fantastic solution. If I were to want to fill dates without outages as an entry but with the mwoutage as 0, is that something that would greatly add to the complexity of this?

Collectives™ on Stack Overflow

creating multiple df rows based on two date columns

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related