1

I have a dataframe which looks like this

customerId Date         Amount_Spent
123        01/01/2018   500
456        01/01/2018   250
123        02/01/2018   300
456        02/01/2018   100

I want to count customers (distinct/non-distinct) who have spent more than 200 on two consecutive days.

So I expect to get

customerId Date1        Date2         Total_Amount_Spent
123        01/01/2018   02/01/2018    800

Can someone help me with this?

1 Answer 1

2

There is two check , one check the days diff, and another is check the amount always more than 100 which using all , then both situation satisfied we select the ID.

s=df.groupby('customerId').agg({'Date':lambda x : (x.iloc[0]-x.iloc[-1]).days==-1,'Amount_Spent':lambda x : (x>100).all()}).all(1)
newdf=df.loc[df.customerId.isin(s.index),]
newdf
Out[1242]:
   customerId       Date  Amount_Spent
0         123 2018-01-01           500
2         123 2018-01-02           300

Using groupby + agg again to get the format you need

newdf.groupby('customerId').agg({'Date':['first','last'],'Amount_Spent':'sum'})
Out[1244]: 
                 Date            Amount_Spent
                first       last          sum
customerId                                   
123        2018-01-01 2018-01-02          800
Sign up to request clarification or add additional context in comments.

3 Comments

@w-b Can you please provide some explanation of what your first code block is doing?
@checking two date different whether it is continue or not , and check all the value in group should be greater than 100
@w-b can u plz check your code. Its not working. I tried to fix it but no luck. trying to understand how to use lambda in agg().

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.