Combine if statement with apply in python

Question

New to python. I am trying to figure out the best way to create a column based on other columns. Ideally, the code would be as such.

df['new'] = np.where(df['Country'] == 'CA', df['x'], df['y'])

I do not think this works because it thinks that I am calling the entire column. I tried to do the same thing with apply but was having trouble with syntax.

df['my_col'] = df.apply(
    lambda row: 
    if row.country == 'CA':
        row.my_col == row.x
        else:
            row.my_col == row.y

I feel like there must be an easier way.

there is nothing wrong with your np.where code. Check that your syntax for your actual code is the same syntax as what you posted here. And don't use your second block of code. If you are new to python and pandas, familiarize yourself with vectorized methods. — David Erickson
– David Erickson, Commented May 27, 2022 at 0:35
Your lambda should be lambda row: row.x if row.country == 'CA' else row.y, but the where thing should work. Remember that a lambda should have no side effects -- it is just an expression that returns a value. — Tim Roberts
– Tim Roberts, Commented May 27, 2022 at 0:38
That error could not have been raised unless df isn't a data frame or those columns have nested objects. Please provide a reproducible example. — Parfait
– Parfait, Commented May 27, 2022 at 0:39

constantstranger · Accepted Answer · 2022-05-27 00:47:44Z

Any of these three approaches (np.where, apply, mask) seems to work:

df['where'] = np.where(df.country=='CA', df.x, df.y)
df['apply'] = df.apply(lambda row: row.x if row.country == 'CA' else row.y, axis=1)
mask = df.country=='CA'
df.loc[mask, 'mask'] = df.loc[mask, 'x']
df.loc[~mask, 'mask'] = df.loc[~mask, 'y']

Full test code:

import pandas as pd
import numpy as np
df = pd.DataFrame({'country':['CA','US','CA','UK','CA'], 'x':[1,2,3,4,5], 'y':[6,7,8,9,10]})
print(df)

df['where'] = np.where(df.country=='CA', df.x, df.y)
df['apply'] = df.apply(lambda row: row.x if row.country == 'CA' else row.y, axis=1)
mask = df.country=='CA'
df.loc[mask, 'mask'] = df.loc[mask, 'x']
df.loc[~mask, 'mask'] = df.loc[~mask, 'y']
print(df)

Input:

  country  x   y
0      CA  1   6
1      US  2   7
2      CA  3   8
3      UK  4   9
4      CA  5  10

Output

  country  x   y  where  apply  mask
0      CA  1   6      1      1   1.0
1      US  2   7      7      7   7.0
2      CA  3   8      3      3   3.0
3      UK  4   9      9      9   9.0
4      CA  5  10      5      5   5.0

ArchAngelPwn · Accepted Answer · 2022-05-27 00:44:51Z

1

This might also work for you

data = {
    'Country' : ['CA', 'NY', 'NC', 'CA'], 
    'x' : ['x_column', 'x_column', 'x_column', 'x_column'],
    'y' : ['y_column', 'y_column', 'y_column', 'y_column']
}
df = pd.DataFrame(data)
condition_list = [df['Country'] == 'CA']
choice_list = [df['x']]
df['new'] = np.select(condition_list, choice_list, df['y'])
df

Your np.where() looked fine though so I would double check that your columns are labeled correctly.

answered May 27, 2022 at 0:44

ArchAngelPwn

3,0461 gold badge6 silver badges17 bronze badges

Collectives™ on Stack Overflow

Combine if statement with apply in python

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related