Constructing a new column with multi-condition in pandas & numpy

Question

I have been trying for hours to do simple multi-condition in pandas but I am facing errors and cannot achieve what I want although I feel it is pretty simple!

I have this df:

df1 = pd.DataFrame({'name':['Sara',  'John', 'Christine'],

                   'country': ['US', 'UK', 'CA'],
                   'Age': [10,20,40]})

df1

looks like:

    name        country     Age
0   Sara             US     10
1   John             UK     20
2   Christine        CA     40

I want to achieve multi-condition like if the Age is > 10 and country is in allowed list of countries a result will appear in a new column.

What I did:

I tried to use np.where but I got an error The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I also tried to use np.logical_and , the same error was thrown, tried to use apply & lambda but also no success.

allowed_countries = ['UK','India','Germany']


conditions  = [np.logical_and(df1['Age'] >= 10 , df1['country'] in allowed_countries), np.logical_and((df1['Age'] >= 10), (df1['country'] == 'CA'))]
choices     = [ "Allowed", 'Partially allowed']


df1['admission'] = np.select(conditions, choices, default=np.nan)

My original df has 50K rows so I am looking for the most efficient way. Thanks

I also suggested using an array to see if this helps - see the answer update. Over a larger dataset it may help. — MDR
– MDR, Commented Aug 26, 2021 at 23:44

MDR · Accepted Answer · 2021-08-26 23:41:00Z

Maybe...

df1['country'].isin(allowed_countries)

So...

df1 = pd.DataFrame({'name':['Sara',  'John', 'Christine'],

                   'country': ['US', 'UK', 'CA'],
                   'Age': [10,20,40]})

allowed_countries = ['UK','India','Germany']


conditions  = [np.logical_and(df1['Age'] >= 10 , df1['country'].isin(allowed_countries)), np.logical_and((df1['Age'] >= 10), (df1['country'] == 'CA'))]
choices     = [ "Allowed", 'Partially allowed']


df1['admission'] = np.select(conditions, choices, default=np.nan)

Output of df1:

    name       country  Age     admission
0   Sara       US       10      nan
1   John       UK       20      Allowed
2   Christine  CA       40      Partially allowed

Additional:

For a tweak in speed maybe try:

allowed_countries = np.array(['UK','India','Germany'])

Compared to the list for the example data:

list:       2.12 ms ± 128 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
np.array:   2.08 ms ± 85.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Collectives™ on Stack Overflow

Constructing a new column with multi-condition in pandas & numpy

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related