How to duplicate and modify date rows in a pandas dataframe Python

Question

I am dealing with an enormous dataframe with multiple date columns. Here is a sample:

import pandas as pd
import numpy as np
rng = pd.date_range('2015-02-24', periods=3)
rng2 = pd.date_range('2015-02-25', periods=3)
df = pd.DataFrame({ 'Arrive': rng, 'Dept': rng2, 'Val' : np.random.randn(len(rng))})

print(df)
 Arrive       Dept       Val
0 2015-02-24 2015-02-25 -1.576528
1 2015-02-25 2015-02-26  0.803651
2 2015-02-26 2015-02-27  0.166160

Now I duplicate the rows twice using this:

dupli_df = pd.concat([df]*3, ignore_index=True)
print(dupli_df)
    Arrive       Dept       Val
0 2015-02-24 2015-02-25 -1.576528
1 2015-02-25 2015-02-26  0.803651
2 2015-02-26 2015-02-27  0.166160
3 2015-02-24 2015-02-25 -1.576528
4 2015-02-25 2015-02-26  0.803651
5 2015-02-26 2015-02-27  0.166160
6 2015-02-24 2015-02-25 -1.576528
7 2015-02-25 2015-02-26  0.803651
8 2015-02-26 2015-02-27  0.166160

what I am trying to do is to add one day to both df['Arrive'] and df['Dept'] for one of the duplicated rows and subtract one day from both columns for the other duplicated row. So basically, I am trying to get a dataframe like this:


    Arrive       Dept       Val
0 2015-02-24 2015-02-25 -1.576528
1 2015-02-25 2015-02-26  0.803651
2 2015-02-26 2015-02-27  0.166160
3 2015-02-25 2015-02-26 -1.576528
4 2015-02-26 2015-02-27  0.803651
5 2015-02-27 2015-02-28  0.166160
6 2015-02-23 2015-02-24 -1.576528
7 2015-02-24 2015-02-25  0.803651
8 2015-02-25 2015-02-26  0.166160

I was thinking to create two separate dataframes and concat them together, but I am not sure if this is the most efficient way.

Thanks in advance for any suggestions.

ALollz · Accepted Answer · 2019-08-23 18:07:40Z

1

Can concat with keys being the offsets in days. Then we add.

import pandas as pd

res = pd.concat([df]*3, keys=[0, 1, -1])

cols = ['Arrive', 'Dept']
res[cols] = res[cols].add(pd.to_timedelta(res.index.get_level_values(0), unit='d'), axis=0)
#res = res.reset_index(drop=True)  # If you want a RangeIndex

         Arrive       Dept       Val
 0 0 2015-02-24 2015-02-25 -0.038529
   1 2015-02-25 2015-02-26 -0.025718
   2 2015-02-26 2015-02-27  1.037771
 1 0 2015-02-25 2015-02-26 -0.038529
   1 2015-02-26 2015-02-27 -0.025718
   2 2015-02-27 2015-02-28  1.037771
-1 0 2015-02-23 2015-02-24 -0.038529
   1 2015-02-24 2015-02-25 -0.025718
   2 2015-02-25 2015-02-26  1.037771

answered Aug 23, 2019 at 18:07

ALollz

59.7k7 gold badges73 silver badges97 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ben.T · Accepted Answer · 2019-08-23 18:13:58Z

you can slice the right part of dupli_df after the concat and use pd.DateOffset such as:

dupli_df = pd.concat([df]*3, ignore_index=True)
# get all the column that are datetime and the length of the dataframe
l_col_datetime = dupli_df.select_dtypes('datetime').columns
len_df = len(df)
#add or remove a day depending on the slice 
dupli_df.loc[len_df:2*len_df-1, l_col_datetime ] +=  pd.DateOffset(days=1)
dupli_df.loc[2*len_df:, l_col_datetime ] -=  pd.DateOffset(days=1)

print(dupli_df)
      Arrive       Dept       Val
0 2015-02-24 2015-02-25  1.450079
1 2015-02-25 2015-02-26 -1.478552
2 2015-02-26 2015-02-27 -0.596992
3 2015-02-25 2015-02-26  1.450079
4 2015-02-26 2015-02-27 -1.478552
5 2015-02-27 2015-02-28 -0.596992
6 2015-02-23 2015-02-24  1.450079
7 2015-02-24 2015-02-25 -1.478552
8 2015-02-25 2015-02-26 -0.596992

Collectives™ on Stack Overflow

How to duplicate and modify date rows in a pandas dataframe Python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related