23

I have a dataframe and a list

df = pd.DataFrame({'IDs':[1234,5346,1234,8793,8793],
                    'Names':['APPLE ABCD ONE','APPLE ABCD','NO STRAWBERRY YES','ORANGE AVAILABLE','TEA AVAILABLE']})

kw = ['APPLE ABCD', 'ORANGE', 'LEMONS', 'STRAWBERRY', 'BLUEBERRY', 'TEA COFFEE']

I want to create a new column flag such that if Names column contain keyword from kw, flag will be 1 else 0.

Expected Output:

    IDs     Names               Flag
0   1234    APPLE ABCD ONE      1
1   5346    APPLE ABCD          1
2   1234    NO STRAWBERRY YES   1
3   8793    ORANGE AVAILABLE    1
4   8793    TEA AVAILABLE       0

I am able to get the output using below code:

ind=[]
for idx, value in df.iterrows():
    x = 0
    for u in kw:
        if u in value['Names']:
            ind.append(True)
            x = 1
            break
    if x == 0:
        ind.append(False)

df['flag'] = ind

Is there an alternate way to avoid for loop and making it more efficient?

1

2 Answers 2

32

Use apply and lambda like:

df['Names'].apply(lambda x: any([k in x for k in kw]))

0     True
1     True
2     True
3     True
4    False
Name: Names, dtype: bool
Sign up to request clarification or add additional context in comments.

1 Comment

It works perfectly, thanks Franco. It would be convenient to count all the 'true' in resulting object, so: names = df['Names'].apply(lambda x: any([k in x for k in kw])); names.value_counts()
23

You can use the isin function of pandas

df['Names'].isin(kw)

2 Comments

Does this check substrings? Or just perfect matches?
It only checks perfect matches.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.