Python Pandas: Inserting new rows for date gaps in data

Question

I'm having Pandas data frame like this

Date      Curr    Amount
1/1/2015  USD 100.00
1/2/2015  USD 125.00
1/5/2015  USD 110.00
1/6/2015  USD 115.00

1/1/2015  AUD 100.00
1/2/2015  AUD 125.00
1/5/2015  AUD  110.00
1/6/2015  AUD 115.00

The desired output

    Date  curr  Amount
1/1/2015  usd 100.00
1/2/2015  usd 125.00
1/3/2015  usd 125.00
1/4/2015  usd 125.00
1/5/2015  usd 110.00
1/6/2015  usd 115.00
1/1/2015  aud 100.00
1/2/2015  aud 125.00
1/3/2015  aud 125.00
1/4/2015  aud 125.00
1/5/2015  aud 110.00
1/6/2015  aud 115.00

The source data only records changes in amounts, and I want to insert the missing dates with the amounts pre skip.

From my example, it skips from 1/2 to 1/5. I want the amount column to fill in using the 1/2 amount and create 3 new rows for the missing dates.

Thanks

Alexander · Accepted Answer · 2015-04-17 23:10:42Z

A very long two liner that should be broken up:

idx = pd.DatetimeIndex(start=min(df.Date), end=max(df.Date), freq='D')
df2 = (pd.DataFrame(df.set_index(['Date', 'Curr']).unstack('Curr'), index=idx).fillna(0) 
+ df.set_index(['Date', 'Curr']).unstack('Curr')).ffill().stack()
>>> df2
                 Amount
           Curr        
2015-01-01 AUD      100
           USD      100
2015-01-02 AUD      125
           USD      125
2015-01-03 AUD      125
           USD      125
2015-01-04 AUD      125
           USD      125
2015-01-05 AUD      110
           USD      110
2015-01-06 AUD      115
           USD      115

Looking in detail, I first create a DatetimeIndex using the min and max dates from the original DataFrame. I set the frequency to Daily ('D'), but you may want to use another offset frequency such as Business Days ('B'):

idx = pd.DatetimeIndex(start=min(df.Date), end=max(df.Date), freq='D')

I then unstack the DataFrame so that I just have the dates in the index.

df_temp = df.set_index(['Date', 'Curr']).unstack('Curr')

>>> df_temp
          Amount     
Curr         AUD  USD
Date                 
1/1/2015     100  100
1/2/2015     125  125
1/5/2015     110  110
1/6/2015     115  115

I create a temporary DataFrame that will be all NaNs but contain my new expanded list of dates. I fill this DataFrame with zeros and overlay it with the values from df_temp:

df_temp2 = (pd.DataFrame(df_temp, index=idx).fillna(0) + df_temp)

>>> df_temp2
            Amount     
Curr           AUD  USD
2015-01-01     100  100
2015-01-02     125  125
2015-01-03     NaN  NaN
2015-01-04     NaN  NaN
2015-01-05     110  110
2015-01-06     115  115

Finally, I fill forward the values to remove the NaNs, and stack the currencies:

>>> df_temp2.ffill().stack() 
                 Amount
           Curr        
2015-01-01 AUD      100
           USD      100
2015-01-02 AUD      125
           USD      125
2015-01-03 AUD      125
           USD      125
2015-01-04 AUD      125
           USD      125
2015-01-05 AUD      110
           USD      110
2015-01-06 AUD      115
           USD      115

Community · Accepted Answer · 2017-05-23 10:29:51Z

You pretty much want to do the the same thing as here: How to fill the missing record of Pandas dataframe in pythonic way?

You need to construct a full index and then use the fillna method with forward-filling 'ffill' option.

import pandas
from io import StringIO
data = StringIO("""\
Date      Curr    Amount
1/1/2015  USD 100.00
1/2/2015  USD 125.00
1/5/2015  USD 110.00
1/6/2015  USD 115.00
1/1/2015  AUD 100.00
1/2/2015  AUD 125.00
1/5/2015  AUD 110.00
1/6/2015  AUD 115.00
""")

df = pandas.read_table(data, sep='\s+', parse_dates=[0])

full_index = pandas.MultiIndex.from_product([
        pandas.date_range(start='2015-01-01', end='2015-01-08'),
        ['USD', 'AUD']        
], names=['Date', 'Curr'])
df2 = (
    df.set_index(['Date', 'Curr'])
      .reindex(full_index)
      .unstack(level='Curr') # pivot Curr into columns
      .fillna(method='ffill')  # drag the last valid value into the NaNs
      .stack(level='Curr')  # put Curr back into rows
      .reset_index()  # remove the index
      .sort(['Curr', 'Date']) # sort the row
      .reset_index(drop=True) # set the index back to 0, 1, ... N
)
print(df2)

Which gives us:

         Date Curr  Amount
0  2015-01-01  AUD     100
1  2015-01-02  AUD     125
2  2015-01-03  AUD     125
3  2015-01-04  AUD     125
4  2015-01-05  AUD     110
5  2015-01-06  AUD     115
6  2015-01-07  AUD     115
7  2015-01-08  AUD     115
8  2015-01-01  USD     100
9  2015-01-02  USD     125
10 2015-01-03  USD     125
11 2015-01-04  USD     125
12 2015-01-05  USD     110
13 2015-01-06  USD     115
14 2015-01-07  USD     115
15 2015-01-08  USD     115

Collectives™ on Stack Overflow

Python Pandas: Inserting new rows for date gaps in data

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related