I am dealing with an enormous dataframe with multiple date columns. Here is a sample:
import pandas as pd
import numpy as np
rng = pd.date_range('2015-02-24', periods=3)
rng2 = pd.date_range('2015-02-25', periods=3)
df = pd.DataFrame({ 'Arrive': rng, 'Dept': rng2, 'Val' : np.random.randn(len(rng))})
print(df)
Arrive Dept Val
0 2015-02-24 2015-02-25 -1.576528
1 2015-02-25 2015-02-26 0.803651
2 2015-02-26 2015-02-27 0.166160
Now I duplicate the rows twice using this:
dupli_df = pd.concat([df]*3, ignore_index=True)
print(dupli_df)
Arrive Dept Val
0 2015-02-24 2015-02-25 -1.576528
1 2015-02-25 2015-02-26 0.803651
2 2015-02-26 2015-02-27 0.166160
3 2015-02-24 2015-02-25 -1.576528
4 2015-02-25 2015-02-26 0.803651
5 2015-02-26 2015-02-27 0.166160
6 2015-02-24 2015-02-25 -1.576528
7 2015-02-25 2015-02-26 0.803651
8 2015-02-26 2015-02-27 0.166160
what I am trying to do is to add one day to both df['Arrive'] and df['Dept'] for one of the duplicated rows and subtract one day from both columns for the other duplicated row. So basically, I am trying to get a dataframe like this:
Arrive Dept Val
0 2015-02-24 2015-02-25 -1.576528
1 2015-02-25 2015-02-26 0.803651
2 2015-02-26 2015-02-27 0.166160
3 2015-02-25 2015-02-26 -1.576528
4 2015-02-26 2015-02-27 0.803651
5 2015-02-27 2015-02-28 0.166160
6 2015-02-23 2015-02-24 -1.576528
7 2015-02-24 2015-02-25 0.803651
8 2015-02-25 2015-02-26 0.166160
I was thinking to create two separate dataframes and concat them together, but I am not sure if this is the most efficient way.
Thanks in advance for any suggestions.