2

I'm having Pandas data frame like this

Date      Curr    Amount
1/1/2015  USD 100.00
1/2/2015  USD 125.00
1/5/2015  USD 110.00
1/6/2015  USD 115.00

1/1/2015  AUD 100.00
1/2/2015  AUD 125.00
1/5/2015  AUD  110.00
1/6/2015  AUD 115.00

The desired output

    Date  curr  Amount
1/1/2015  usd 100.00
1/2/2015  usd 125.00
1/3/2015  usd 125.00
1/4/2015  usd 125.00
1/5/2015  usd 110.00
1/6/2015  usd 115.00
1/1/2015  aud 100.00
1/2/2015  aud 125.00
1/3/2015  aud 125.00
1/4/2015  aud 125.00
1/5/2015  aud 110.00
1/6/2015  aud 115.00

The source data only records changes in amounts, and I want to insert the missing dates with the amounts pre skip.

From my example, it skips from 1/2 to 1/5. I want the amount column to fill in using the 1/2 amount and create 3 new rows for the missing dates.

Thanks

2 Answers 2

3

A very long two liner that should be broken up:

idx = pd.DatetimeIndex(start=min(df.Date), end=max(df.Date), freq='D')
df2 = (pd.DataFrame(df.set_index(['Date', 'Curr']).unstack('Curr'), index=idx).fillna(0) 
+ df.set_index(['Date', 'Curr']).unstack('Curr')).ffill().stack()
>>> df2
                 Amount
           Curr        
2015-01-01 AUD      100
           USD      100
2015-01-02 AUD      125
           USD      125
2015-01-03 AUD      125
           USD      125
2015-01-04 AUD      125
           USD      125
2015-01-05 AUD      110
           USD      110
2015-01-06 AUD      115
           USD      115

Looking in detail, I first create a DatetimeIndex using the min and max dates from the original DataFrame. I set the frequency to Daily ('D'), but you may want to use another offset frequency such as Business Days ('B'):

idx = pd.DatetimeIndex(start=min(df.Date), end=max(df.Date), freq='D')

I then unstack the DataFrame so that I just have the dates in the index.

df_temp = df.set_index(['Date', 'Curr']).unstack('Curr')

>>> df_temp
          Amount     
Curr         AUD  USD
Date                 
1/1/2015     100  100
1/2/2015     125  125
1/5/2015     110  110
1/6/2015     115  115

I create a temporary DataFrame that will be all NaNs but contain my new expanded list of dates. I fill this DataFrame with zeros and overlay it with the values from df_temp:

df_temp2 = (pd.DataFrame(df_temp, index=idx).fillna(0) + df_temp)

>>> df_temp2
            Amount     
Curr           AUD  USD
2015-01-01     100  100
2015-01-02     125  125
2015-01-03     NaN  NaN
2015-01-04     NaN  NaN
2015-01-05     110  110
2015-01-06     115  115

Finally, I fill forward the values to remove the NaNs, and stack the currencies:

>>> df_temp2.ffill().stack() 
                 Amount
           Curr        
2015-01-01 AUD      100
           USD      100
2015-01-02 AUD      125
           USD      125
2015-01-03 AUD      125
           USD      125
2015-01-04 AUD      125
           USD      125
2015-01-05 AUD      110
           USD      110
2015-01-06 AUD      115
           USD      115
Sign up to request clarification or add additional context in comments.

Comments

3

You pretty much want to do the the same thing as here: How to fill the missing record of Pandas dataframe in pythonic way?

You need to construct a full index and then use the fillna method with forward-filling 'ffill' option.

import pandas
from io import StringIO
data = StringIO("""\
Date      Curr    Amount
1/1/2015  USD 100.00
1/2/2015  USD 125.00
1/5/2015  USD 110.00
1/6/2015  USD 115.00
1/1/2015  AUD 100.00
1/2/2015  AUD 125.00
1/5/2015  AUD 110.00
1/6/2015  AUD 115.00
""")

df = pandas.read_table(data, sep='\s+', parse_dates=[0])

full_index = pandas.MultiIndex.from_product([
        pandas.date_range(start='2015-01-01', end='2015-01-08'),
        ['USD', 'AUD']        
], names=['Date', 'Curr'])
df2 = (
    df.set_index(['Date', 'Curr'])
      .reindex(full_index)
      .unstack(level='Curr') # pivot Curr into columns
      .fillna(method='ffill')  # drag the last valid value into the NaNs
      .stack(level='Curr')  # put Curr back into rows
      .reset_index()  # remove the index
      .sort(['Curr', 'Date']) # sort the row
      .reset_index(drop=True) # set the index back to 0, 1, ... N
)
print(df2)

Which gives us:

         Date Curr  Amount
0  2015-01-01  AUD     100
1  2015-01-02  AUD     125
2  2015-01-03  AUD     125
3  2015-01-04  AUD     125
4  2015-01-05  AUD     110
5  2015-01-06  AUD     115
6  2015-01-07  AUD     115
7  2015-01-08  AUD     115
8  2015-01-01  USD     100
9  2015-01-02  USD     125
10 2015-01-03  USD     125
11 2015-01-04  USD     125
12 2015-01-05  USD     110
13 2015-01-06  USD     115
14 2015-01-07  USD     115
15 2015-01-08  USD     115

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.