1

I have been trying for hours to do simple multi-condition in pandas but I am facing errors and cannot achieve what I want although I feel it is pretty simple!

I have this df:

df1 = pd.DataFrame({'name':['Sara',  'John', 'Christine'],

                   'country': ['US', 'UK', 'CA'],
                   'Age': [10,20,40]})

df1

looks like:

    name        country     Age
0   Sara             US     10
1   John             UK     20
2   Christine        CA     40

I want to achieve multi-condition like if the Age is > 10 and country is in allowed list of countries a result will appear in a new column.

What I did:

I tried to use np.where but I got an error The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I also tried to use np.logical_and , the same error was thrown, tried to use apply & lambda but also no success.

allowed_countries = ['UK','India','Germany']


conditions  = [np.logical_and(df1['Age'] >= 10 , df1['country'] in allowed_countries), np.logical_and((df1['Age'] >= 10), (df1['country'] == 'CA'))]
choices     = [ "Allowed", 'Partially allowed']


df1['admission'] = np.select(conditions, choices, default=np.nan)

My original df has 50K rows so I am looking for the most efficient way. Thanks

2
  • 1
    I also suggested using an array to see if this helps - see the answer update. Over a larger dataset it may help. Commented Aug 26, 2021 at 23:44
  • @MDR Thank you, it works perfectly with np.array Commented Aug 26, 2021 at 23:51

1 Answer 1

1

Maybe...

df1['country'].isin(allowed_countries)

So...

df1 = pd.DataFrame({'name':['Sara',  'John', 'Christine'],

                   'country': ['US', 'UK', 'CA'],
                   'Age': [10,20,40]})

allowed_countries = ['UK','India','Germany']


conditions  = [np.logical_and(df1['Age'] >= 10 , df1['country'].isin(allowed_countries)), np.logical_and((df1['Age'] >= 10), (df1['country'] == 'CA'))]
choices     = [ "Allowed", 'Partially allowed']


df1['admission'] = np.select(conditions, choices, default=np.nan)

Output of df1:

    name       country  Age     admission
0   Sara       US       10      nan
1   John       UK       20      Allowed
2   Christine  CA       40      Partially allowed

Additional:

For a tweak in speed maybe try:

allowed_countries = np.array(['UK','India','Germany'])

Compared to the list for the example data:

list:       2.12 ms ± 128 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
np.array:   2.08 ms ± 85.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.