1

I have a df with Costs listed for each Month from 1-12, for some Months without any costs I would like to complete the series of Months with a Cost of 0. What would be the best way to do this? Input:

  Section | Maintenance | Month | Group | Costs 
 ---------|-------------|-------|-------|------- 
  A2      | Painting    |     3 |     0 |  2000 
  A2      | Painting    |     4 |     0 |  3500 
  A2      | Painting    |     5 |     0 |  1000 
  A2      | Painting    |     7 |     0 |  2500 
  A2      | Painting    |     8 |     0 |  1500 
  A2      | Painting    |     9 |     0 |  3000 
  A2      | Painting    |    10 |     0 |  2000 
  A2      | Painting    |    11 |     0 |  2000 
  A2      | Painting    |    12 |     0 |  1000 
  A2      | Painting    |     3 |     1 |  4000 
  A2      | Painting    |     4 |     1 |  5000 
  A2      | Painting    |     6 |     1 |  2000 
  A2      | Painting    |     7 |     1 |  1500 
  A2      | Painting    |     8 |     1 |  4000 
  A2      | Painting    |    10 |     1 |  3500 
  A2      | Painting    |    12 |     1 |  6000
  A3      | Painting    |     2 |     0 |  3000

Desired output:

  Section | Maintenance | Month | Group | Costs 
 ---------|-------------|-------|-------|------- 
  A2      | Painting    |     1 |     0 |     0 
  A2      | Painting    |     2 |     0 |     0 
  A2      | Painting    |     3 |     0 |  2000 
  A2      | Painting    |     4 |     0 |  3500 
  A2      | Painting    |     5 |     0 |  1000 
  A2      | Painting    |     6 |     0 |     0 
  A2      | Painting    |     7 |     0 |  2500 
  A2      | Painting    |     8 |     0 |  1500 
  A2      | Painting    |     9 |     0 |  3000 
  A2      | Painting    |    10 |     0 |  2000 
  A2      | Painting    |    11 |     0 |  2000 
  A2      | Painting    |    12 |     0 |  1000 
  A2      | Painting    |     1 |     1 |     0 
  A2      | Painting    |     2 |     1 |     0 
  A2      | Painting    |     3 |     1 |  4000 
  A2      | Painting    |     4 |     1 |  5000
  A2      | Painting    |     5 |     1 |     0 
  A2      | Painting    |     6 |     1 |     0
  A2      | Painting    |     7 |     1 |     0
  A2      | Painting    |     8 |     1 |     0
  A2      | Painting    |     9 |     1 |     0
  A2      | Painting    |    10 |     1 |     0
  A2      | Painting    |    11 |     1 |     0
  A2      | Painting    |    12 |     1 |     0
  A3      | Painting    |     1 |     0 |     0
  A3      | Painting    |     2 |     0 |  3000
  A3      | Painting    |     3 |     0 |     0
  A3      | Painting    |     4 |     0 |     0
  A3      | Painting    |     5 |     0 |     0
  A3      | Painting    |     6 |     0 |     0
  A3      | Painting    |     7 |     0 |     0
  A3      | Painting    |     8 |     0 |     0
  A3      | Painting    |     9 |     0 |     0
  A3      | Painting    |    10 |     0 |     0
  A3      | Painting    |    11 |     0 |     0
  A3      | Painting    |    12 |     0 |     0

edit: wrong maintenance type sneaked in, expanded input/output example

0

1 Answer 1

1

Use DataFrame.reindex with unique values of column and range for months, but per groups:

def f(x):
    mux = (pd.MultiIndex.from_product([x['Section'].unique(), 
                                       x['Maintenance'].unique(),
                                       range(1, 13), 
                                       x['Group'].unique()],
           names=['Section','Maintenance','Month','Group']))
            
    return x.set_index(['Section','Maintenance','Month', 'Group']).reindex(mux, fill_value=0)
  

df3 = df.groupby(['Section','Maintenance','Group'], group_keys=False).apply(f).reset_index()

print (df3)
   Section Maintenance  Month  Group  Costs
0       A2    Painting      1      0      0
1       A2    Painting      2      0      0
2       A2    Painting      3      0   2000
3       A2    Painting      4      0   3500
4       A2    Painting      5      0   1000
5       A2    Painting      6      0      0
6       A2    Painting      7      0   2500
7       A2    Painting      8      0   1500
8       A2    Painting      9      0   3000
9       A2    Painting     10      0   2000
10      A2    Painting     11      0   2000
11      A2    Painting     12      0   1000
12      A2    Painting      1      1      0
13      A2    Painting      2      1      0
14      A2    Painting      3      1   4000
15      A2    Painting      4      1   5000
16      A2    Painting      5      1      0
17      A2    Painting      6      1   2000
18      A2    Painting      7      1   1500
19      A2    Painting      8      1   4000
20      A2    Painting      9      1      0
21      A2    Painting     10      1   3500
22      A2    Painting     11      1      0
23      A2    Painting     12      1   6000
24      A3    Painting      1      0      0
25      A3    Painting      2      0   3000
26      A3    Painting      3      0      0
27      A3    Painting      4      0      0
28      A3    Painting      5      0      0
29      A3    Painting      6      0      0
30      A3    Painting      7      0      0
31      A3    Painting      8      0      0
32      A3    Painting      9      0      0
33      A3    Painting     10      0      0
34      A3    Painting     11      0      0
35      A3    Painting     12      0      0
Sign up to request clarification or add additional context in comments.

5 Comments

your result contain lot of duplicate rows
Doesn't this approach combine all 4 columns with each other for all attributes possible and not just the ones which appear in the df? My df of 20k rows resulted in a df with 1.5million rows. Is there a way to just fill in the months for the combinations in the original df?
Group numbers can be repeating for different Sections or Maintenance types. I need 12 rows for the Months for every occuring Section/Maintenance/Group pair. I edited my question, maybe it is more clear that way :)
I think your solution is almost right, I just need it without creating the redundance of the last rows 36-47 because in the input there is no pair of: {Section:A3, Maintenance: Painting, Group:1} Only one pair where the Group is 0. I guess deleting them later on is quite simple but probably very inefficient.
YESS! You did it! Thank you so much, I just took a quick look but it looks exactly how I need it!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.