0

New to python. I am trying to figure out the best way to create a column based on other columns. Ideally, the code would be as such.

df['new'] = np.where(df['Country'] == 'CA', df['x'], df['y'])

I do not think this works because it thinks that I am calling the entire column. I tried to do the same thing with apply but was having trouble with syntax.

df['my_col'] = df.apply(
    lambda row: 
    if row.country == 'CA':
        row.my_col == row.x
        else:
            row.my_col == row.y

I feel like there must be an easier way.

8
  • 2
    Are you sure the np.where() version doesn't work? Commented May 27, 2022 at 0:31
  • 2
    there is nothing wrong with your np.where code. Check that your syntax for your actual code is the same syntax as what you posted here. And don't use your second block of code. If you are new to python and pandas, familiarize yourself with vectorized methods. Commented May 27, 2022 at 0:35
  • 1
    You can combine multiple conditions with & and |. Commented May 27, 2022 at 0:37
  • 1
    Your lambda should be lambda row: row.x if row.country == 'CA' else row.y, but the where thing should work. Remember that a lambda should have no side effects -- it is just an expression that returns a value. Commented May 27, 2022 at 0:38
  • 3
    That error could not have been raised unless df isn't a data frame or those columns have nested objects. Please provide a reproducible example. Commented May 27, 2022 at 0:39

2 Answers 2

2

Any of these three approaches (np.where, apply, mask) seems to work:

df['where'] = np.where(df.country=='CA', df.x, df.y)
df['apply'] = df.apply(lambda row: row.x if row.country == 'CA' else row.y, axis=1)
mask = df.country=='CA'
df.loc[mask, 'mask'] = df.loc[mask, 'x']
df.loc[~mask, 'mask'] = df.loc[~mask, 'y']

Full test code:

import pandas as pd
import numpy as np
df = pd.DataFrame({'country':['CA','US','CA','UK','CA'], 'x':[1,2,3,4,5], 'y':[6,7,8,9,10]})
print(df)

df['where'] = np.where(df.country=='CA', df.x, df.y)
df['apply'] = df.apply(lambda row: row.x if row.country == 'CA' else row.y, axis=1)
mask = df.country=='CA'
df.loc[mask, 'mask'] = df.loc[mask, 'x']
df.loc[~mask, 'mask'] = df.loc[~mask, 'y']
print(df)

Input:

  country  x   y
0      CA  1   6
1      US  2   7
2      CA  3   8
3      UK  4   9
4      CA  5  10

Output

  country  x   y  where  apply  mask
0      CA  1   6      1      1   1.0
1      US  2   7      7      7   7.0
2      CA  3   8      3      3   3.0
3      UK  4   9      9      9   9.0
4      CA  5  10      5      5   5.0
Sign up to request clarification or add additional context in comments.

Comments

1

This might also work for you

data = {
    'Country' : ['CA', 'NY', 'NC', 'CA'], 
    'x' : ['x_column', 'x_column', 'x_column', 'x_column'],
    'y' : ['y_column', 'y_column', 'y_column', 'y_column']
}
df = pd.DataFrame(data)
condition_list = [df['Country'] == 'CA']
choice_list = [df['x']]
df['new'] = np.select(condition_list, choice_list, df['y'])
df

Your np.where() looked fine though so I would double check that your columns are labeled correctly.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.