1

I have a data-frame which looks like:

 month  desc value
201911   a  1164
201912   a   971
202001   a  1125
202005   b  1549
202005   a  1038
202006   b  1244
202006   a  1475
202007   a   960

My month column ranges from 201911 to 202007. The column desc has categorical values. For each value of desc, I want to have all month data and if its not present I will the value column with None.

Expected Output:

month   desc value
201911   a  1164
201912   a   971
202002   a   None
202003   a   None
202004   a   None
202001   a  1125
202005   b  1549
202005   a  1038
202006   b  1244
202006   a  1475
202007   a   960
201911   b  None
201912   b  None
202001   b  None
202002   b  None
202003   b  None
202004   b  None
202007   b  None

Now for every desc value I have all the month.

Is there any smarter way to do this?

6
  • Are column month filled by strings? All values are unique? Commented Oct 28, 2020 at 12:07
  • Set month as index and reindex, fillna with None Commented Oct 28, 2020 at 12:08
  • At desc level its unique. Commented Oct 28, 2020 at 12:08
  • Did not got your last point? Can you please show? Commented Oct 28, 2020 at 12:10
  • Order is important? Commented Oct 28, 2020 at 12:12

1 Answer 1

1

You can use DataFrame.reindex by new MultiIndex created by date_range with minimal and maximal value from string:

df['month'] = pd.to_datetime(df['month'], format='%Y%m')
desc = df['desc'].unique()
d = pd.date_range(df['month'].min(), '2020-07-01', freq='MS')

df = df.set_index(['month', 'desc']) 

mux = pd.MultiIndex.from_product([d, desc], names=['month','desc'])
df = df.reindex(mux).reset_index()
df['month'] = df['month'].dt.strftime('%Y%m')
print (df)
     month desc   value
0   201911    a  1164.0
1   201911    b     NaN
2   201912    a   971.0
3   201912    b     NaN
4   202001    a  1125.0
5   202001    b     NaN
6   202002    a     NaN
7   202002    b     NaN
8   202003    a     NaN
9   202003    b     NaN
10  202004    a     NaN
11  202004    b     NaN
12  202005    a  1038.0
13  202005    b  1549.0
14  202006    a  1475.0
15  202006    b  1244.0
16  202007    a   960.0
17  202007    b     NaN
Sign up to request clarification or add additional context in comments.

1 Comment

Also, how did you get month value only till << 2020-07-01>> ??

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.