Adding rows inbetween df to complete missing values in series

Question

I have a df with Costs listed for each Month from 1-12, for some Months without any costs I would like to complete the series of Months with a Cost of 0. What would be the best way to do this? Input:

  Section | Maintenance | Month | Group | Costs 
 ---------|-------------|-------|-------|------- 
  A2      | Painting    |     3 |     0 |  2000 
  A2      | Painting    |     4 |     0 |  3500 
  A2      | Painting    |     5 |     0 |  1000 
  A2      | Painting    |     7 |     0 |  2500 
  A2      | Painting    |     8 |     0 |  1500 
  A2      | Painting    |     9 |     0 |  3000 
  A2      | Painting    |    10 |     0 |  2000 
  A2      | Painting    |    11 |     0 |  2000 
  A2      | Painting    |    12 |     0 |  1000 
  A2      | Painting    |     3 |     1 |  4000 
  A2      | Painting    |     4 |     1 |  5000 
  A2      | Painting    |     6 |     1 |  2000 
  A2      | Painting    |     7 |     1 |  1500 
  A2      | Painting    |     8 |     1 |  4000 
  A2      | Painting    |    10 |     1 |  3500 
  A2      | Painting    |    12 |     1 |  6000
  A3      | Painting    |     2 |     0 |  3000

Desired output:

  Section | Maintenance | Month | Group | Costs 
 ---------|-------------|-------|-------|------- 
  A2      | Painting    |     1 |     0 |     0 
  A2      | Painting    |     2 |     0 |     0 
  A2      | Painting    |     3 |     0 |  2000 
  A2      | Painting    |     4 |     0 |  3500 
  A2      | Painting    |     5 |     0 |  1000 
  A2      | Painting    |     6 |     0 |     0 
  A2      | Painting    |     7 |     0 |  2500 
  A2      | Painting    |     8 |     0 |  1500 
  A2      | Painting    |     9 |     0 |  3000 
  A2      | Painting    |    10 |     0 |  2000 
  A2      | Painting    |    11 |     0 |  2000 
  A2      | Painting    |    12 |     0 |  1000 
  A2      | Painting    |     1 |     1 |     0 
  A2      | Painting    |     2 |     1 |     0 
  A2      | Painting    |     3 |     1 |  4000 
  A2      | Painting    |     4 |     1 |  5000
  A2      | Painting    |     5 |     1 |     0 
  A2      | Painting    |     6 |     1 |     0
  A2      | Painting    |     7 |     1 |     0
  A2      | Painting    |     8 |     1 |     0
  A2      | Painting    |     9 |     1 |     0
  A2      | Painting    |    10 |     1 |     0
  A2      | Painting    |    11 |     1 |     0
  A2      | Painting    |    12 |     1 |     0
  A3      | Painting    |     1 |     0 |     0
  A3      | Painting    |     2 |     0 |  3000
  A3      | Painting    |     3 |     0 |     0
  A3      | Painting    |     4 |     0 |     0
  A3      | Painting    |     5 |     0 |     0
  A3      | Painting    |     6 |     0 |     0
  A3      | Painting    |     7 |     0 |     0
  A3      | Painting    |     8 |     0 |     0
  A3      | Painting    |     9 |     0 |     0
  A3      | Painting    |    10 |     0 |     0
  A3      | Painting    |    11 |     0 |     0
  A3      | Painting    |    12 |     0 |     0

edit: wrong maintenance type sneaked in, expanded input/output example

jezrael · Accepted Answer · 2020-09-02 13:53:02Z

1

Use DataFrame.reindex with unique values of column and range for months, but per groups:

def f(x):
    mux = (pd.MultiIndex.from_product([x['Section'].unique(), 
                                       x['Maintenance'].unique(),
                                       range(1, 13), 
                                       x['Group'].unique()],
           names=['Section','Maintenance','Month','Group']))
            
    return x.set_index(['Section','Maintenance','Month', 'Group']).reindex(mux, fill_value=0)
  

df3 = df.groupby(['Section','Maintenance','Group'], group_keys=False).apply(f).reset_index()

print (df3)
   Section Maintenance  Month  Group  Costs
0       A2    Painting      1      0      0
1       A2    Painting      2      0      0
2       A2    Painting      3      0   2000
3       A2    Painting      4      0   3500
4       A2    Painting      5      0   1000
5       A2    Painting      6      0      0
6       A2    Painting      7      0   2500
7       A2    Painting      8      0   1500
8       A2    Painting      9      0   3000
9       A2    Painting     10      0   2000
10      A2    Painting     11      0   2000
11      A2    Painting     12      0   1000
12      A2    Painting      1      1      0
13      A2    Painting      2      1      0
14      A2    Painting      3      1   4000
15      A2    Painting      4      1   5000
16      A2    Painting      5      1      0
17      A2    Painting      6      1   2000
18      A2    Painting      7      1   1500
19      A2    Painting      8      1   4000
20      A2    Painting      9      1      0
21      A2    Painting     10      1   3500
22      A2    Painting     11      1      0
23      A2    Painting     12      1   6000
24      A3    Painting      1      0      0
25      A3    Painting      2      0   3000
26      A3    Painting      3      0      0
27      A3    Painting      4      0      0
28      A3    Painting      5      0      0
29      A3    Painting      6      0      0
30      A3    Painting      7      0      0
31      A3    Painting      8      0      0
32      A3    Painting      9      0      0
33      A3    Painting     10      0      0
34      A3    Painting     11      0      0
35      A3    Painting     12      0      0

edited Sep 2, 2020 at 13:53

answered Sep 2, 2020 at 11:15

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

deadshot Over a year ago

your result contain lot of duplicate rows

canoe Over a year ago

Doesn't this approach combine all 4 columns with each other for all attributes possible and not just the ones which appear in the df? My df of 20k rows resulted in a df with 1.5million rows. Is there a way to just fill in the months for the combinations in the original df?

canoe Over a year ago

Group numbers can be repeating for different Sections or Maintenance types. I need 12 rows for the Months for every occuring Section/Maintenance/Group pair. I edited my question, maybe it is more clear that way :)

canoe Over a year ago

I think your solution is almost right, I just need it without creating the redundance of the last rows 36-47 because in the input there is no pair of: {Section:A3, Maintenance: Painting, Group:1} Only one pair where the Group is 0. I guess deleting them later on is quite simple but probably very inefficient.

canoe Over a year ago

YESS! You did it! Thank you so much, I just took a quick look but it looks exactly how I need it!

Collectives™ on Stack Overflow

Adding rows inbetween df to complete missing values in series

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related