2

I have a DataFrame that looks like this:

                         numberSold        
date | location | time
3/10      FL      12:00        4  
                  1:00         1
                  4:00         5  
3/11      FL      1:00         2
                  2:00         3
                  3:00         0
3/12      FL      2:00         6
                  5:00         6

It's multi-index (date, location, time). I want the output to look as follows:

                         numberSold        
date | location | time
3/10      FL      12:00        4
                  1:00         1
                  4:00         5    
3/11      FL      12:00        4
                  1:00         2
                  2:00         3
                  3:00         0
                  4:00         5  
3/12      FL      12:00        4
                  1:00         2
                  2:00         6
                  3:00         0
                  5:00         6

Here is the first DataFrame in dictionary format:

{'numberSold': {('3/10', 'FL', '12:00'): 4,
  ('3/10', 'FL', '1:00'): 1,
  ('3/10', 'FL', '4:00'): 5,
  ('3/11', 'FL', '1:00'): 2,
  ('3/11', 'FL', '2:00'): 3,
  ('3/11', 'FL', '3:00'): 0,
  ('3/12', 'FL', '2:00'): 6,
  ('3/12', 'FL', '5:00'): 6}}

Basically, I want the table to build off of the previous entries. If the entry exists in the current entry, then use the current entry (like how 3/11 1:00 uses "2" and not "1"), but if it doesn't exist, then just add on what the previous row had (like how 3/11 has the 4:00 value from 3/10).

I'm not sure how to use Pandas to do something like this, I feel like it's pretty simple, but my attempts have all failed.

2
  • here is the dictionary: {'numberSold': {('3/10', 'FL', '12:00'): 4, ('3/10', 'FL', '1:00'): 1, ('3/10', 'FL', '4:00'): 5, ('3/11', 'FL', '1:00'): 2, ('3/11', 'FL', '2:00'): 3, ('3/11', 'FL', '3:00'): 0, ('3/12', 'FL', '2:00'): 6, ('3/12', 'FL', '5:00'): 6}} Commented Mar 18, 2022 at 19:54
  • I think your desired output is incorrect. For the 3rd day, it should either include entry for 4:00 or it shouldn't include entry for 12:00. Please check if it's correct. Commented Mar 18, 2022 at 20:15

1 Answer 1

2

You could pivot + ffill to get the missing data; then stack to get the DataFrame back in previous shape:

df.index.names = ['date', 'location', 'time']
out = df.reset_index().pivot(['date', 'location'], 'time', 'numberSold').ffill().stack().to_frame(name='numberSold')

Output:

                     numberSold
date location time             
3/10 FL       12:00         4.0
              1:00          1.0
              4:00          5.0
3/11 FL       12:00         4.0
              1:00          2.0
              2:00          3.0
              3:00          0.0
              4:00          5.0
3/12 FL       12:00         4.0
              1:00          2.0
              2:00          6.0
              3:00          0.0
              4:00          5.0
              5:00          6.0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.