0

I created a function that assigns customers to a "bucket" based on their annual purchase history. The function operates as intended when I pass individual values in (curryear, last year). How do I pass all values from two sepearate columns in curryear, lastyear?

When I try the following I receive

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

My code:

#FUNCTION FOR CATEGORIZING ANNUAL CUSTOMER PURCHASE BEHAVIOR
def bucket(curryear, lastyear):
    if ((lastyear > 0) & (curryear <= 0)):
        return 'Attrition'
    elif ((lastyear > curryear) & (curryear > 0)):
        return 'Organic Attrition'
    elif ((lastyear <= 0) & (curryear > 0)):
        return 'New Sales'
    elif ((curryear > lastyear) & (lastyear > 0)):
        return 'Organic Growth'
    elif ((lastyear == 0) & (curryear == 0)):
        return 'None'
    else:
        return 'Flat'

bucket(df['2019'],df['2018'])  

Here is a sample of the data I am using: Sample Data

4 Answers 4

1

The error pretty much says the exact reason for the error you get (testing stuff like >0 for a whole column is ambiguous, as you could mean to check if every value is above 0 or just a single value from the column). You could apply the function you wrote on the individual values row-wise like this:

def bucket(curryear, lastyear):
    if ((lastyear > 0) & (curryear <= 0)):
        return 'Attrition'
    elif ((lastyear > curryear) & (curryear > 0)):
        return 'Organic Attrition'
    elif ((lastyear <= 0) & (curryear > 0)):
        return 'New Sales'
    elif ((curryear > lastyear) & (lastyear > 0)):
        return 'Organic Growth'
    elif ((lastyear == 0) & (curryear == 0)):
        return 'None'
    else:
        return 'Flat'

df["bucket"] = df.apply(lambda x: bucket(x["2019"], x["2018"]), axis=1)
Sign up to request clarification or add additional context in comments.

Comments

0

Basically, pandas has a function called "apply", which supports the lambda function.

df['bucket'] = df.apply(lambda x: bucket(x.2019, x.2018), axis=1)

Comments

0

Rewrite your function so that it can be parallelized over the columns:

def bucket(curryear, lastyear):
    ly_pos, cy_pos = lastyear > 0, curryear > 0
    out = np.select( (ly_pos & (~cy_pos), (lastyear > curryear) & cy_pos,
                      (~ly_pos) & cy_pos, (curryear>lastyear)&ly_pos,
                      (lastyear==0) & (curryear==0)
                     ),
                     ('Attritrion', 'Organic Attrition',
                      'New Sales', 'Organic Growth', 'None'),
                    'Flat'
                   )
    return out

bucket(df['2019'], df['2018'])

Comments

0

@mabergerx This worked - big thanks. This is how I executed it which I realize is probably inefficient.

df['2013 Bucket'] = df.apply(lambda x: bucket(x["2013"], x["2012"]), axis=1)
df['2014 Bucket'] = df.apply(lambda x: bucket(x["2014"], x["2013"]), axis=1)
df['2015 Bucket'] = df.apply(lambda x: bucket(x["2015"], x["2014"]), axis=1)
df['2016 Bucket'] = df.apply(lambda x: bucket(x["2016"], x["2015"]), axis=1)
df['2017 Bucket'] = df.apply(lambda x: bucket(x["2017"], x["2016"]), axis=1)
df['2018 Bucket'] = df.apply(lambda x: bucket(x["2018"], x["2017"]), axis=1)
df['2019 Bucket'] = df.apply(lambda x: bucket(x["2019"], x["2018"]), axis=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.