Add Missing Dates for Groups in Pandas, using min/max Dates of the Group

Question

I have a data set that has has the sales amount and profit of a given company in a given month. In some months, a company won't conduct any business, and thus it will be missing from the data for that given month. I would like to create rows for these missing months and fill the amount/profit variables with 0.

Code to Reproduce the Data:

import pandas as pd
import numpy as np
import itertools
from datetime import datetime

np.random.seed(2021)

company_list = ['Company_A', 'Company_B', 'Company_C']
datelist = pd.date_range(start="2020-01-01", end='2020-08-01', freq='MS').to_list()

df = pd.DataFrame(list(itertools.product(company_list, datelist)))
df.columns = ['Company', 'Date']

df['Amount'] = np.random.choice(range(0,10000), df.shape[0])
df['Profit'] = np.random.choice(range(0,10000), df.shape[0])

df['Date'] = df['Date'] + pd.tseries.offsets.MonthEnd(1)

df = df.drop([0,1,5,10,11,15,16,17,22,23]).reset_index(drop=True) #Dropping some rows just to illustrate what my data looks like

df

I've figure out how to do this using the help I found in this post, however it uses the min/max ranges of all dates in the 'Date' column. Because the observations for my companies begin and end at different times, this adds dates from before I first observe them and after they disappear from the data, which is not what I want.

df.set_index(
    ['Date', 'Company']
).unstack(
    fill_value = 0
).asfreq(
    'M'
).stack().sort_index(level=1).reset_index()

What I want instead is to only add the missing dates that would result from the min/max dates of the specific group, i.e. just the dates missing in between, not on the ends.

How the Data should look:

How about you just delete the row with 0 value or the date not in your original dataframe at end after you fill it uses the min/max ranges like you currently using? — Mr. For Example
– Mr. For Example, Commented Feb 3, 2021 at 2:20

Chelsea Boy · Accepted Answer · 2021-02-03 13:26:59Z

1

This will do the trick:

df.drop_duplicates(
  ['Date', 'Company'], 'last'
).groupby('Company').apply(
  lambda x: x.set_index('Date').asfreq('M', fill_value = 0)
).drop('Company', axis = 1).reset_index()

answered Feb 3, 2021 at 13:26

Chelsea Boy

213 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Add Missing Dates for Groups in Pandas, using min/max Dates of the Group

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related