3

I have a pandas dataframe that I would prefer to use a lambda function rather than a loop to solve my problem.

The problem is as such;

df = pd.DataFrame({'my_fruits':['fruit', 'fruit', 'fruit', 'fruit', 'fruit'],
         'fruit_a': ['apple', 'banana', 'vegetable', 'vegetable', 'cherry'],
         'fruit_b': ['vegetable', 'apple', 'vegeatble', 'pineapple', 'pear']})

If I apply the following loop;

for i in np.arange(0,len(df)):
    if df['fruit_a'][i] == 'vegetable' or df['fruit_b'][i] == 'vegetable':
        df['my_fruits'][i] = 'not_fruit'

I am able to get the result that I want. This is that if either of the fruit_a or fruit_b columns containing the value vegetable, I want the my_fruits column to be equal to not_fruit.

How can I possible set this up in a lamda function. Was not able to understand how two columns inputs can be used to change a different columns values. Thanks!

1
  • 1
    I don't get the question. A lambda expression is simply an alternative syntax for defining a function in the special case of when the function body consists of only return <expression>. A function is not an alternative for a for loop. The alternative to certain special cases of for loop is a comprehension, but your loop is not such a special case. Commented Jan 19, 2017 at 21:07

3 Answers 3

3

You can use Series.mask by boolean mask:

mask = (df['fruit_a'] == 'vegetable') | (df['fruit_b'] == 'vegetable')
print (mask)
0     True
1    False
2     True
3     True
4    False
dtype: bool


df.my_fruits = df.my_fruits.mask(mask, 'not_fruits')
print (df)
     fruit_a    fruit_b   my_fruits
0      apple  vegetable  not_fruits
1     banana      apple       fruit
2  vegetable  vegetable  not_fruits
3  vegetable  pineapple  not_fruits
4     cherry       pear       fruit

Another solution for mask is compare all selected columns by vegetable and then get all True at least in one column by any:

print ((df[['fruit_a', 'fruit_b']] == 'vegetable'))
  fruit_a fruit_b
0   False    True
1   False   False
2    True    True
3    True   False
4   False   False

mask = (df[['fruit_a', 'fruit_b']] == 'vegetable').any(axis=1) 
print (mask)
0     True
1    False
2     True
3     True
4    False
dtype: bool
Sign up to request clarification or add additional context in comments.

2 Comments

Much appreciated for the alternative method
Thank you for accepting, yes, another method is better if many columns.
3

you can do this with apply method:

>>> df.my_fruits = df.apply(lambda x: 'not_fruit' if x['fruit_a'] == 'vegetable' or x['fruit_b'] == 'vegetable' else x['my_fruits'], axis=1)
0    not_fruit
1        fruit
2    not_fruit
3    not_fruit
4        fruit

Or you can do it like this:

>>> df.my_fruits[(df['fruit_a'] == 'vegetable') | (df['fruit_b'] == 'vegetable')] = 'not_fruit'
>>> df
     fruit_a    fruit_b  my_fruits
0      apple  vegetable  not_fruit
1     banana      apple      fruit
2  vegetable  vegeatble  not_fruit
3  vegetable  pineapple  not_fruit
4     cherry       pear      fruit

3 Comments

agreed, just wanted to show how it could be done with lambda function
Sure, alternative solution is better.
Thanks, this at least shows me how it could be done with apply. Thanks
2

Using pd.Series.where and checking if 'vegetable' in one step combined with any.
where is opposite of mask which is why I use the negation of cond.
Otherwise, this is very similar in spirit to jezrael's answer

cond = df[['fruit_a', 'fruit_b']].eq('vegetable').any(1)
df.my_fruits = df.my_fruits.where(~cond, 'not_fruit')

Answered from my phone. Please forgive typos.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.