2

I have a shipping records table with approx. 100K rows and I want to calculate, for each row, for each material, how many qtys were shipped in last 30 days. As you can see in below example, calculated qty depends on "material, shipping date". I've tried to write very basic code and couldn't find a way to apply it to all rows.

df[(df['malzeme']==material) & (df['cikistarihi'] < shippingDate) & (df['cikistarihi'] >= (shippingDate-30))]['qty'].sum()
material shippingDate qty shipped qtys in last 30 days
A 23.01.2019 8 0
A 28.01.2019 41 8
A 31.01.2019 66 49 (8+41)
A 20.03.2019 67 0
B 17.02.2019 53 0
B 26.02.2019 35 53
B 11.03.2019 4 88 (53+35)
B 20.03.2019 67 106 (35+4+67)

1 Answer 1

1

You can use .groupby with .rolling:

# convert the shippingData to datetime:
df["shippingDate"] = pd.to_datetime(df["shippingDate"], dayfirst=True)

# sort the values (if they aren't already)
df = df.sort_values(["material", "shippingDate"])

df["shipped qtys in last 30 days"] = (
    df.groupby("material")
    .rolling("30D", on="shippingDate", closed="left")["qty"]
    .sum()
    .fillna(0)
    .values
)
print(df)

Prints:

  material shippingDate  qty  shipped qtys in last 30 days
0        A   2019-01-23    8                           0.0
1        A   2019-01-28   41                           8.0
2        A   2019-01-31   66                          49.0
3        A   2019-03-20   67                           0.0
4        B   2019-02-17   53                           0.0
5        B   2019-02-26   35                          53.0
6        B   2019-03-11    4                          88.0
7        B   2019-03-20   67                          39.0

EDIT: Add .sort_values() before groupby

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your quick response but i think i am missing something. When i changed dates and their order, there is something wrong. data = {'material':['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'], 'shippingDate':['2019-01-23', '2019-01-28', '2019-01-31', '2019-01-20', '2019-01-23', '2019-03-28', '2019-03-11', '2019-03-20'], 'qty':[8, 41, 66, 67, 53, 35, 4, 67]} print as below: material shippingdate qty shipped30d 1 A 2019-01-28 41 182.0 3 A 2019-01-20 67 115.0 5 B 2019-03-28 35 0.0 7 B 2019-03-20 67 39.0
@CanerU I think the logics is perfectly ok. You just need to ensure the dates are in chronological order by doing df = df.sort_values(['material', 'shippingDate']) before the main logics.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.