Add "missing" rows to multi-index groupby pandas dataframe

Question

I have a DataFrame that looks like this:

                         numberSold        
date | location | time
3/10      FL      12:00        4  
                  1:00         1
                  4:00         5  
3/11      FL      1:00         2
                  2:00         3
                  3:00         0
3/12      FL      2:00         6
                  5:00         6

It's multi-index (date, location, time). I want the output to look as follows:

                         numberSold        
date | location | time
3/10      FL      12:00        4
                  1:00         1
                  4:00         5    
3/11      FL      12:00        4
                  1:00         2
                  2:00         3
                  3:00         0
                  4:00         5  
3/12      FL      12:00        4
                  1:00         2
                  2:00         6
                  3:00         0
                  5:00         6

Here is the first DataFrame in dictionary format:

{'numberSold': {('3/10', 'FL', '12:00'): 4,
  ('3/10', 'FL', '1:00'): 1,
  ('3/10', 'FL', '4:00'): 5,
  ('3/11', 'FL', '1:00'): 2,
  ('3/11', 'FL', '2:00'): 3,
  ('3/11', 'FL', '3:00'): 0,
  ('3/12', 'FL', '2:00'): 6,
  ('3/12', 'FL', '5:00'): 6}}

Basically, I want the table to build off of the previous entries. If the entry exists in the current entry, then use the current entry (like how 3/11 1:00 uses "2" and not "1"), but if it doesn't exist, then just add on what the previous row had (like how 3/11 has the 4:00 value from 3/10).

I'm not sure how to use Pandas to do something like this, I feel like it's pretty simple, but my attempts have all failed.

here is the dictionary: {'numberSold': {('3/10', 'FL', '12:00'): 4, ('3/10', 'FL', '1:00'): 1, ('3/10', 'FL', '4:00'): 5, ('3/11', 'FL', '1:00'): 2, ('3/11', 'FL', '2:00'): 3, ('3/11', 'FL', '3:00'): 0, ('3/12', 'FL', '2:00'): 6, ('3/12', 'FL', '5:00'): 6}} — Eric Aldrin
– Eric Aldrin, Commented Mar 18, 2022 at 19:54
I think your desired output is incorrect. For the 3rd day, it should either include entry for 4:00 or it shouldn't include entry for 12:00. Please check if it's correct. — user7864386
– user7864386, Commented Mar 18, 2022 at 20:15

user7864386 · Accepted Answer · 2022-03-18 20:19:19Z

2

You could pivot + ffill to get the missing data; then stack to get the DataFrame back in previous shape:

df.index.names = ['date', 'location', 'time']
out = df.reset_index().pivot(['date', 'location'], 'time', 'numberSold').ffill().stack().to_frame(name='numberSold')

Output:

                     numberSold
date location time             
3/10 FL       12:00         4.0
              1:00          1.0
              4:00          5.0
3/11 FL       12:00         4.0
              1:00          2.0
              2:00          3.0
              3:00          0.0
              4:00          5.0
3/12 FL       12:00         4.0
              1:00          2.0
              2:00          6.0
              3:00          0.0
              4:00          5.0
              5:00          6.0

answered Mar 18, 2022 at 20:19

user7864386

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Add "missing" rows to multi-index groupby pandas dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related