2

I am dealing with an enormous dataframe with multiple date columns. Here is a sample:

import pandas as pd
import numpy as np
rng = pd.date_range('2015-02-24', periods=3)
rng2 = pd.date_range('2015-02-25', periods=3)
df = pd.DataFrame({ 'Arrive': rng, 'Dept': rng2, 'Val' : np.random.randn(len(rng))})

print(df)
 Arrive       Dept       Val
0 2015-02-24 2015-02-25 -1.576528
1 2015-02-25 2015-02-26  0.803651
2 2015-02-26 2015-02-27  0.166160

Now I duplicate the rows twice using this:

dupli_df = pd.concat([df]*3, ignore_index=True)
print(dupli_df)
    Arrive       Dept       Val
0 2015-02-24 2015-02-25 -1.576528
1 2015-02-25 2015-02-26  0.803651
2 2015-02-26 2015-02-27  0.166160
3 2015-02-24 2015-02-25 -1.576528
4 2015-02-25 2015-02-26  0.803651
5 2015-02-26 2015-02-27  0.166160
6 2015-02-24 2015-02-25 -1.576528
7 2015-02-25 2015-02-26  0.803651
8 2015-02-26 2015-02-27  0.166160

what I am trying to do is to add one day to both df['Arrive'] and df['Dept'] for one of the duplicated rows and subtract one day from both columns for the other duplicated row. So basically, I am trying to get a dataframe like this:


    Arrive       Dept       Val
0 2015-02-24 2015-02-25 -1.576528
1 2015-02-25 2015-02-26  0.803651
2 2015-02-26 2015-02-27  0.166160
3 2015-02-25 2015-02-26 -1.576528
4 2015-02-26 2015-02-27  0.803651
5 2015-02-27 2015-02-28  0.166160
6 2015-02-23 2015-02-24 -1.576528
7 2015-02-24 2015-02-25  0.803651
8 2015-02-25 2015-02-26  0.166160

I was thinking to create two separate dataframes and concat them together, but I am not sure if this is the most efficient way.

Thanks in advance for any suggestions.

2 Answers 2

1

Can concat with keys being the offsets in days. Then we add.

import pandas as pd

res = pd.concat([df]*3, keys=[0, 1, -1])

cols = ['Arrive', 'Dept']
res[cols] = res[cols].add(pd.to_timedelta(res.index.get_level_values(0), unit='d'), axis=0)
#res = res.reset_index(drop=True)  # If you want a RangeIndex

         Arrive       Dept       Val
 0 0 2015-02-24 2015-02-25 -0.038529
   1 2015-02-25 2015-02-26 -0.025718
   2 2015-02-26 2015-02-27  1.037771
 1 0 2015-02-25 2015-02-26 -0.038529
   1 2015-02-26 2015-02-27 -0.025718
   2 2015-02-27 2015-02-28  1.037771
-1 0 2015-02-23 2015-02-24 -0.038529
   1 2015-02-24 2015-02-25 -0.025718
   2 2015-02-25 2015-02-26  1.037771
Sign up to request clarification or add additional context in comments.

Comments

1

you can slice the right part of dupli_df after the concat and use pd.DateOffset such as:

dupli_df = pd.concat([df]*3, ignore_index=True)
# get all the column that are datetime and the length of the dataframe
l_col_datetime = dupli_df.select_dtypes('datetime').columns
len_df = len(df)
#add or remove a day depending on the slice 
dupli_df.loc[len_df:2*len_df-1, l_col_datetime ] +=  pd.DateOffset(days=1)
dupli_df.loc[2*len_df:, l_col_datetime ] -=  pd.DateOffset(days=1)

print(dupli_df)
      Arrive       Dept       Val
0 2015-02-24 2015-02-25  1.450079
1 2015-02-25 2015-02-26 -1.478552
2 2015-02-26 2015-02-27 -0.596992
3 2015-02-25 2015-02-26  1.450079
4 2015-02-26 2015-02-27 -1.478552
5 2015-02-27 2015-02-28 -0.596992
6 2015-02-23 2015-02-24  1.450079
7 2015-02-24 2015-02-25 -1.478552
8 2015-02-25 2015-02-26 -0.596992

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.