3

I have a pandas dataframe like the following:

id, date, add_days
1, 2017-01-01, 3
2, 2017-03-05, 5
3, 2017-02-27, 3
.
.
.

I want to repeat the ids and increase date by given add_days:

id, date, add_days
1, 2017-01-01, 3
1, 2017-01-02, 3
1, 2017-01-03, 3
2, 2017-03-05, 5
2, 2017-03-06, 5
2, 2017-03-07, 5
2, 2017-03-08, 5
2, 2017-03-09, 5
3, 2017-02-27, 3
3, 2017-02-28, 3
3, 2017-03-01, 3
.
.
.

Is there a panda-oic way of doing this? I'm looking for an efficient solution since the initial dataframe can have millions of rows.

1 Answer 1

3

You can use melt with groupby and resample:

cols = df.columns

#add end date by timedelta, only substract one day
df['end'] = df.date + pd.to_timedelta(df.add_days.sub(1), unit='d')
print (df)
   id       date  add_days        end
0   1 2017-01-01         3 2017-01-03
1   2 2017-03-05         5 2017-03-09
2   3 2017-02-27         3 2017-03-01

df1 = pd.melt(
    df, ['id', 'add_days'],
    ['date', 'end'],
    value_name='date'
).drop('variable', 1).set_index('date')
print (df1)
            id  add_days
date                    
2017-01-01   1         3
2017-03-05   2         5
2017-02-27   3         3
2017-01-03   1         3
2017-03-09   2         5
2017-03-01   3         3
df2=df1.groupby('id').resample('D').ffill().reset_index(0, drop=True).reset_index()
#if order of columns is important
df2 = df2.reindex_axis(cols, axis=1)
print (df2)
    id       date  add_days
0    1 2017-01-01         3
1    1 2017-01-02         3
2    1 2017-01-03         3
3    2 2017-03-05         5
4    2 2017-03-06         5
5    2 2017-03-07         5
6    2 2017-03-08         5
7    2 2017-03-09         5
8    3 2017-02-27         3
9    3 2017-02-28         3
10   3 2017-03-01         3

Another solution with concat Series created by date_range and last join to original df:

df1 = pd.concat([pd.Series(r.Index, pd.date_range(r.date, r.end)) 
                                       for r in df.itertuples()]).reset_index()
df1.columns = ['date','idx']
print (df1)
         date  idx
0  2017-01-01    0
1  2017-01-02    0
2  2017-01-03    0
3  2017-03-05    1
4  2017-03-06    1
5  2017-03-07    1
6  2017-03-08    1
7  2017-03-09    1
8  2017-02-27    2
9  2017-02-28    2
10 2017-03-01    2

df2 = df1.set_index('idx').join(df[['id','add_days']]).reset_index(drop=True)
print (df2)
         date  id  add_days
0  2017-01-01   1         3
1  2017-01-02   1         3
2  2017-01-03   1         3
3  2017-03-05   2         5
4  2017-03-06   2         5
5  2017-03-07   2         5
6  2017-03-08   2         5
7  2017-03-09   2         5
8  2017-02-27   3         3
9  2017-02-28   3         3
10 2017-03-01   3         3
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.