1

Is there an elegant way to assign values based on multiple columns in a dataframe in pandas? Let's say I have a dataframe with 2 columns: FruitType and Color.

import pandas as pd
df = pd.DataFrame({'FruitType':['apple', 'banana','kiwi','orange','loquat'],
'Color':['red_black','yellow','greenish_yellow', 'orangered','orangeyellow']})

I would like to assign the value of a third column, 'isYellowSeedless', based on both 'FruitType' and 'Color' columns.

I have a list of fruits that I consider seedless, and would like to check the Color column to see if it contains the str "yellow".

seedless = ['banana', 'loquat']

How do I string this all together elegantly?

This is my attempt that didn't work:

df[(df['FruitType'].isin(seedless)) & (culture_table['Color'].str.contains("yellow"))]['isYellowSeedless'] = True

2 Answers 2

2

Use loc with mask:

m = (df['FruitType'].isin(seedless)) & (df['Color'].str.contains("yellow"))

df.loc[m, 'isYellowSeedless'] = True
print (df)
             Color FruitType isYellowSeedless
0        red_black     apple              NaN
1           yellow    banana             True
2  greenish_yellow      kiwi              NaN
3        orangered    orange              NaN
4     orangeyellow    loquat             True

If need True and False output:

df['isYellowSeedless'] = m
print (df)
             Color FruitType  isYellowSeedless
0        red_black     apple             False
1           yellow    banana              True
2  greenish_yellow      kiwi             False
3        orangered    orange             False
4     orangeyellow    loquat              True

For if-else by some scalars use numpy.where:

df['isYellowSeedless'] = np.where(m, 'a', 'b')
print (df)
             Color FruitType isYellowSeedless
0        red_black     apple                b
1           yellow    banana                a
2  greenish_yellow      kiwi                b
3        orangered    orange                b
4     orangeyellow    loquat                a

And for convert to 0 and 1:

df['isYellowSeedless'] = m.astype(int)
print (df)
             Color FruitType  isYellowSeedless
0        red_black     apple                 0
1           yellow    banana                 1
2  greenish_yellow      kiwi                 0
3        orangered    orange                 0
4     orangeyellow    loquat                 1
Sign up to request clarification or add additional context in comments.

1 Comment

Really nice solution(s). Thank you!
2

Or you can try

df['isYellowSeedless']=df.loc[df.FruitType.isin(seedless),'Color'].str.contains('yellow')
df
Out[546]: 
             Color FruitType isYellowSeedless
0        red_black     apple              NaN
1           yellow    banana             True
2  greenish_yellow      kiwi              NaN
3        orangered    orange              NaN
4     orangeyellow    loquat             True

2 Comments

I really like this one too. Didn't know they could be chained together like that! Thanks!
@J.W. they are connect by the index. Yw :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.