Implementing if-else in python dataframe using lambda when there are multiple variables

Question

I am trying to implement if-elif or if-else logic in python while working on a dataframe. I am struggling when working with more than one column.

sample data frame

df=pd.DataFrame({"one":[1,2,3,4,5],"two":[6,7,8,9,10], "name": 'a', 'b', 'a', 'b', 'c'})

If my if-else logic is based on only one column - I know how to do it.

df['one'] = df["one"].apply(lambda x: x*10 if x<2 else (x**2 if x<4 else x+10))

But I want to modify column 'one' based on values of column 'two' - and I feel its going be something like this -

lambda x, y: x*100 if y>8 else (x*1 if y<8 else x**2)

But I am not sure how to specify the second column. I tried this way but obviously that's incorrect

df['one'] = df["one"]["two"].apply(lambda x, y: x*100 if y>8 else (x*1 if y<8 else x**2))

Question 1 - what'd be the correct syntax for the above code ?

Question 2 - How to implement below logic using lambda ?

if df['name'].isin(['a','b'])  df['one'] = 100 else df['one'] = df['two']

If I write something like x.isin(['a','b']) it won't work.

jpp · Accepted Answer · 2018-06-06 23:41:53Z

6

Apply across columns

Use pd.DataFrame.apply instead of pd.Series.apply and specify axis=1:

df['one'] = df.apply(lambda row: row['one']*100 if row['two']>8 else \
                     (row['one']*1 if row['two']<8 else row['one']**2), axis=1)

Unreadable? Yes, I agree. Let's try again but this time rewrite as a named function.

Using a function

Note lambda is just an anonymous function. We can define a function explicitly and use it with pd.DataFrame.apply:

def calc(row):
    if row['two'] > 8:
        return row['one'] * 100
    elif row['two'] < 8:
        return row['one']
    else:
        return row['one']**2

df['one'] = df.apply(calc, axis=1)

Readable? Yes. But this isn't vectorised. We're looping through each row one at at at time. We might as well have used a list. Pandas isn't just for clever table formatting, you can use it for vectorised calculations using arrays in contiguous memory blocks. So let's try one more time.

Vectorised calculations

Using numpy.where:

df['one'] = np.where(row['two'] > 8, row['one'] * 100,
                     np.where(row['two'] < 8, row['one'],
                              row['one']**2))

There we go. Readable and efficient. We have effectively vectorised our if / else statements. Does this mean that we are doing more calculations than necessary? Yes! But this is more than offset by the way in which we are performing the calculations, i.e. with well-defined blocks of memory rather than pointers. You will find an order of magnitude performance improvement.

Another example

Well, we can just use numpy.where again.

df['one'] = np.where(df['name'].isin(['a', 'b']), 100, df['two'])

edited Jun 6, 2018 at 23:41

answered Jun 6, 2018 at 23:22

jpp

166k37 gold badges301 silver badges362 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

singularity2047 Over a year ago

Thanks for you very elaborate response. Even if lambda method is unreadable I just wanted to learn how it works. I was trying myself and figured out the 2nd method, but I couldn't make the df.isin() method work while using the function. I tried this - if df['name'].isin(['a','b']): df['one'] = 10 return df['one'] . But I am getting error 'str' object has no attribute isin.

bobrobbob · Accepted Answer · 2018-06-06 22:51:52Z

1

you can do

df.apply(lambda x: x["one"] + x["two"], axis=1)

but i don't think that such a long lambda as lambda x: x["one"]*100 if x["two"]>8 else (x["one"]*1 if x["two"]<8 else x["one"]**2) is very pythonic. apply takes any callback:

def my_callback(x):
    if x["two"] > 8:
        return x["one"]*100
    elif x["two"] < 8:
        return x["one"]
    else:
        return x["one"]**2

df.apply(my_callback, axis=1)

answered Jun 6, 2018 at 22:51

bobrobbob

1,28111 silver badges21 bronze badges

Collectives™ on Stack Overflow

Implementing if-else in python dataframe using lambda when there are multiple variables

2 Answers 2

Apply across columns

Using a function

Vectorised calculations

Another example

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Apply across columns

Using a function

Vectorised calculations

Another example

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related