2

I have a pandas dataframe that incorporates dates, customers, items, and then dollar value for purchases.

   date     customer   product   amt  
 1/1/2017   tim        apple       3  
 1/1/2017   jim        melon       2  
 1/1/2017   tom        apple       5  
 1/1/2017   tom        melon       4  
 1/4/2017   tim        melon       3  
 1/4/2017   jim        apple       2  
 1/4/2017   tom        melon       1  
 1/4/2017   tom        orange      4  

I'm trying to just look at performance, but I want to forward fill all dates from my min and max date range, and also fill for each customer for each product

Something like:

   date     customer   product   amt  
 1/1/2017   tim        apple       3  
 1/1/2017   tim        melon       0  
 1/1/2017   tim        orange      0  
 1/1/2017   jim        melon       2  
 1/1/2017   jim        apple       0  
 1/1/2017   jim        orange      0  
 1/1/2017   tom        apple       5  
 1/1/2017   tom        melon       4  
 1/1/2017   tom        orange      0  
 1/2/2017   tim        apple       0  
 1/2/2017   tim        melon       0  
 1/2/2017   tim        orange      0  
 1/2/2017   jim        melon       0  
 1/2/2017   jim        apple       0  
 1/2/2017   jim        orange      0  
 1/2/2017   tom        apple       0  
 1/2/2017   tom        melon       0  
 1/2/2017   tom        orange      0  
 1/3/2017   tim        apple       0  
 1/3/2017   tim        melon       0  
 1/3/2017   tim        orange      0  
 1/3/2017   jim        melon       0  
 1/3/2017   jim        apple       0  
 1/3/2017   jim        orange      0  
 1/3/2017   tom        apple       0  
 1/3/2017   tom        melon       0  
 1/3/2017   tom        orange      0  
 1/4/2017   tim        melon       3  
 1/4/2017   tim        apple       0  
 1/4/2017   tim        orange      0  
 1/4/2017   jim        apple       2  
 1/4/2017   jim        melon       0  
 1/4/2017   jim        orange      0  
 1/4/2017   tom        melon       1  
 1/4/2017   tom        orange      4  
 1/4/2017   tom        apple       0  

I know that I can create a reindex based off of the max and min dates, but this also makes my customer and product values 0. Is there any other way to go about this? Am I missing a step or something? Appreciate the help

2
  • You want to use the ffill method as in this answer Commented Dec 18, 2017 at 21:50
  • however this doesn't fill with dates? and doesn't keep the columns customer and product Commented Dec 18, 2017 at 21:58

4 Answers 4

2

Notice ,this using the stack and unstack couple of times

df.set_index(['date','customer','product']).amt.unstack(-3).\
  reindex(columns=pd.date_range(df['date'].min(), 
    df['date'].max()),fill_value=0).\
      stack(dropna=False).unstack().stack(dropna=False).\
        unstack('customer').stack(dropna=False).reset_index().\
          fillna(0).sort_values(['level_1','customer','product'])
Out[314]: 
   product    level_1 customer    0
0    apple 2017-01-01      jim  0.0
12   melon 2017-01-01      jim  2.0
24  orange 2017-01-01      jim  0.0
1    apple 2017-01-01      tim  3.0
13   melon 2017-01-01      tim  0.0
25  orange 2017-01-01      tim  0.0
2    apple 2017-01-01      tom  5.0
14   melon 2017-01-01      tom  4.0
26  orange 2017-01-01      tom  0.0
3    apple 2017-01-02      jim  0.0
15   melon 2017-01-02      jim  0.0
27  orange 2017-01-02      jim  0.0
4    apple 2017-01-02      tim  0.0
16   melon 2017-01-02      tim  0.0
28  orange 2017-01-02      tim  0.0
5    apple 2017-01-02      tom  0.0
17   melon 2017-01-02      tom  0.0
29  orange 2017-01-02      tom  0.0
6    apple 2017-01-03      jim  0.0
18   melon 2017-01-03      jim  0.0
30  orange 2017-01-03      jim  0.0
7    apple 2017-01-03      tim  0.0
19   melon 2017-01-03      tim  0.0
31  orange 2017-01-03      tim  0.0
8    apple 2017-01-03      tom  0.0
20   melon 2017-01-03      tom  0.0
32  orange 2017-01-03      tom  0.0
9    apple 2017-01-04      jim  2.0
21   melon 2017-01-04      jim  0.0
33  orange 2017-01-04      jim  0.0
10   apple 2017-01-04      tim  0.0
22   melon 2017-01-04      tim  3.0
34  orange 2017-01-04      tim  0.0
11   apple 2017-01-04      tom  0.0
23   melon 2017-01-04      tom  1.0
35  orange 2017-01-04      tom  4.0
Sign up to request clarification or add additional context in comments.

11 Comments

@DickThompson ran it what kind of error code you received ?
@DickThompson sorry typo, change | to \ , you will get teh result
ah ok- and what would I do if I had another column classifier? like payment type? Just add to the indexes?
@DickThompson yes and you may need another round of stack and unstack
this is what I did: df.set_index(['date','customer','product','type']).amt.unstack(-4).\ reindex(columns=pd.date_range(df['date'].min(), df['date'].max()),fill_value=0).\ stack(dropna=False).unstack().stack(dropna=False).\ unstack('customer').stack(dropna=False).reset_index().\ unstack('product').stack(dropna=False).reset_index().\ fillna(0).sort_values(['level_1','customer','product','type']) but I got a stacking error. Is there anything else I need to add?
|
1

IIUC you can do it this way:

In [63]: dates = pd.date_range(df['date'].min(), df['date'].max())

In [64]: idx = pd.MultiIndex.from_product((dates,
                                           df['customer'].unique(), 
                                           df['product'].unique()))

In [72]: (df.set_index(['date','customer','product'])
            .reindex(idx, fill_value=0)
            .reset_index()
            .set_axis(df.columns, axis=1, inplace=False))
Out[72]:
         date customer product  amt
0  2017-01-01      tim   apple    3
1  2017-01-01      tim   melon    0
2  2017-01-01      tim  orange    0
3  2017-01-01      jim   apple    0
4  2017-01-01      jim   melon    2
5  2017-01-01      jim  orange    0
6  2017-01-01      tom   apple    5
7  2017-01-01      tom   melon    4
8  2017-01-01      tom  orange    0
9  2017-01-02      tim   apple    0
..        ...      ...     ...  ...
26 2017-01-03      tom  orange    0
27 2017-01-04      tim   apple    0
28 2017-01-04      tim   melon    3
29 2017-01-04      tim  orange    0
30 2017-01-04      jim   apple    2
31 2017-01-04      jim   melon    0
32 2017-01-04      jim  orange    0
33 2017-01-04      tom   apple    0
34 2017-01-04      tom   melon    1
35 2017-01-04      tom  orange    4

[36 rows x 4 columns]

Comments

1

Maybe because of my SQL mindset, consider a left join merge on an expanded helper dataframe:

helper_df_list = [pd.DataFrame({'date': pd.date_range(df['date'].min(), df['date'].max()), 
                                'customer': c, 'product': p }) 
                    for c in df['customer'].unique() 
                            for p in df['product'].unique()]

helper_df = pd.concat(helper_df_list, ignore_index=True)

final_df = pd.merge(helper_df, df, on=['date', 'customer', 'product'], how='left')\
                    .fillna(0).sort_values(['date', 'customer']).reset_index(drop=True)

Output

print(final_df)
#    customer       date product  amt
# 0       jim 2017-01-01   apple  0.0
# 1       jim 2017-01-01   melon  2.0
# 2       jim 2017-01-01  orange  0.0
# 3       tim 2017-01-01   apple  3.0
# 4       tim 2017-01-01   melon  0.0
# 5       tim 2017-01-01  orange  0.0
# 6       tom 2017-01-01   apple  5.0
# 7       tom 2017-01-01   melon  4.0
# 8       tom 2017-01-01  orange  0.0
# 9       jim 2017-01-02   apple  0.0
# 10      jim 2017-01-02   melon  0.0
# 11      jim 2017-01-02  orange  0.0
# 12      tim 2017-01-02   apple  0.0
# 13      tim 2017-01-02   melon  0.0
# 14      tim 2017-01-02  orange  0.0
# 15      tom 2017-01-02   apple  0.0
# 16      tom 2017-01-02   melon  0.0
# 17      tom 2017-01-02  orange  0.0
# 18      jim 2017-01-03   apple  0.0
# 19      jim 2017-01-03   melon  0.0
# 20      jim 2017-01-03  orange  0.0
# 21      tim 2017-01-03   apple  0.0
# 22      tim 2017-01-03   melon  0.0
# 23      tim 2017-01-03  orange  0.0
# 24      tom 2017-01-03   apple  0.0
# 25      tom 2017-01-03   melon  0.0
# 26      tom 2017-01-03  orange  0.0
# 27      jim 2017-01-04   apple  2.0
# 28      jim 2017-01-04   melon  0.0
# 29      jim 2017-01-04  orange  0.0
# 30      tim 2017-01-04   apple  0.0
# 31      tim 2017-01-04   melon  3.0
# 32      tim 2017-01-04  orange  0.0
# 33      tom 2017-01-04   apple  0.0
# 34      tom 2017-01-04   melon  1.0
# 35      tom 2017-01-04  orange  4.0

Comments

1

Let's use product from itertools, pd.date_range, and merge:

from itertools import product

daterange = pd.date_range(df['date'].min(), df['date'].max(), freq='D')
d1 = pd.DataFrame(list(product(daterange, 
                               df['customer'].unique(),
                               df['product'].unique())), 
                  columns=['date', 'customer', 'product'])
d1.merge(df, on=['date', 'customer', 'product'], how='left').fillna(0)

Output:

         date customer product  amt
0  2017-01-01      tim   apple  3.0
1  2017-01-01      tim   melon  0.0
2  2017-01-01      tim  orange  0.0
3  2017-01-01      jim   apple  0.0
4  2017-01-01      jim   melon  2.0
5  2017-01-01      jim  orange  0.0
6  2017-01-01      tom   apple  5.0
7  2017-01-01      tom   melon  4.0
8  2017-01-01      tom  orange  0.0
9  2017-01-02      tim   apple  0.0
10 2017-01-02      tim   melon  0.0
11 2017-01-02      tim  orange  0.0
12 2017-01-02      jim   apple  0.0
13 2017-01-02      jim   melon  0.0
14 2017-01-02      jim  orange  0.0
15 2017-01-02      tom   apple  0.0
16 2017-01-02      tom   melon  0.0
17 2017-01-02      tom  orange  0.0
18 2017-01-03      tim   apple  0.0
19 2017-01-03      tim   melon  0.0
20 2017-01-03      tim  orange  0.0
21 2017-01-03      jim   apple  0.0
22 2017-01-03      jim   melon  0.0
23 2017-01-03      jim  orange  0.0
24 2017-01-03      tom   apple  0.0
25 2017-01-03      tom   melon  0.0
26 2017-01-03      tom  orange  0.0
27 2017-01-04      tim   apple  0.0
28 2017-01-04      tim   melon  3.0
29 2017-01-04      tim  orange  0.0
30 2017-01-04      jim   apple  2.0
31 2017-01-04      jim   melon  0.0
32 2017-01-04      jim  orange  0.0
33 2017-01-04      tom   apple  0.0
34 2017-01-04      tom   melon  1.0
35 2017-01-04      tom  orange  4.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.