Using Two Variables In Lambda Python

Question

I want to make a new column based on two variables. I want my new column to have the value "Good" if (column 1 >= .5 or column 2 < 0.5) and (column 1 < .5 or column 2 >= 0.5) otherwise "Bad".

I tried using lambda and if.

df["new column"] = df[["column 1", "column 2"]].apply(
    lambda x, y: "Good" if (x >= 0.5 or y < 0.5) and (x < 0.5 or y >= 0.5) else "Bad"
)

Got

TypeError: ("() missing 1 required positional argument: 'y'", 'occurred at index column 1')

Scott Boston · Accepted Answer · 2020-02-11 23:05:01Z

Use np.where, pandas does intrinsic data alignment, meaning you don't need to use apply or iterate row by row, pandas will align the data on index:

df['new column'] = df['new column'] = np.where(((df['y'] <= .5) | (df['x'] > .5)) & ((df['x'] < .5) | (df['y'] >= .5)), 'Good', 'Bad')
df

Using @YunaA. setup....

import pandas as pd

df = pd.DataFrame({'x': [1, 2, 0.1, 0.1], 
                   'y': [1, 2, 0.7, 0.2], 
                   'column 3': [1, 2, 3, 4]})

df['new column'] = df['new column'] = np.where(((df['y'] <= .5) | (df['x'] > .5)) & ((df['x'] < .5) | (df['y'] >= .5)), 'Good', 'Bad')
df

Output:

     x    y  column 3 new column
0  1.0  1.0         1       Good
1  2.0  2.0         2       Good
2  0.1  0.7         3        Bad
3  0.1  0.2         4       Good

Timings:

import pandas as pd
import numpy as np

np.random.seed(123)
df = pd.DataFrame({'x':np.random.random(100)*2, 
                   'y': np.random.random(100)*1})
def update_column(row):
    if (row['x'] >= .5 or row['y'] <= .5) and (row['x'] < .5 or row['y'] >= .5):
        return "Good"
    return "Bad"

Results

%timeit df['new column'] = np.where(((df['y'] <= .5) | (df['x'] > .5))
& ((df['x'] < .5) | (df['y'] >= .5)), 'Good', 'Bad')

1.45 ms ± 72.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['new_column'] = df.apply(update_column, axis=1)

5.83 ms ± 484 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Robert Smith · Accepted Answer · 2020-02-11 22:28:29Z

2

Try this:

import pandas as pd 

def update_column(row):
    if (row['x'] >= .5 or row['y'] <= .5) and (row['x'] < .5 or row['y'] >= .5):
        return "Good"
    return "Bad"

df['new_column'] = df.apply(update_column, axis=1)

answered Feb 11, 2020 at 22:28

Robert Smith

361 bronze badge

3 Comments

Maarten Fabré Over a year ago

Why loop if there is a vectorised option?

Robert Smith Over a year ago

Sure, there are a few different ways to solve this problem.

Maarten Fabré Over a year ago

But looping is generally a lot slower, and apply is harfly faster than a python loop. Here the DataFrame.where method is faster and as expressive. In the longer run it also pays off to get to know the tools

Yuna A. · Accepted Answer · 2020-02-11 22:28:22Z

Pass the row into the lambda instead.

df['new column'] = df[['column 1', 'column 2']].apply(lambda row: "Good" if (row['column 1'] >= .5 or row['column 2'] < .5) and (row['column 1'] < .5 or row['column 2'] >= .5) else "Bad", axis=1)

Example:

import pandas as pd

df = pd.DataFrame({'column 1': [1, 2, 0.1, 0.1], 
                   'column 2': [1, 2, 0.7, 0.2], 
                   'column 3': [1, 2, 3, 4]})
df['new column'] = df[['column 1', 'column 2']].apply(lambda row: "Good" if (row['column 1'] >= .5 or row['column 2'] < .5) and (row['column 1'] < .5 or row['column 2'] >= .5) else "Bad", axis=1)

print(df)

Output:

   column 1  column 2  column 3 new column
0       1.0       1.0         1       Good
1       2.0       2.0         2       Good
2       0.1       0.7         3        Bad
3       0.1       0.2         4       Good

cottontail · Accepted Answer · 2022-07-31 01:22:05Z

0

You just need to reference the columns by their index in the array you are passing the the lambda expression, like this:

df["new column"] = df[["column 1", "column 2"]].apply(
    lambda x: "Good" if (x[0] >= 0.5 or x[1] < 0.5) and (x[0] < 0.5 or x[1] >= 0.5) else "Bad", axis=1
)

NOTE: don't forget to include axis=1

edited Jul 31, 2022 at 1:22

cottontail

25.6k25 gold badges184 silver badges176 bronze badges

answered Jul 29, 2022 at 9:20

JasonF

12 bronze badges

Collectives™ on Stack Overflow

Using Two Variables In Lambda Python

4 Answers 4

Timings:

Comments

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Timings:

Comments

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related