90

If I have a frame like this

frame = pd.DataFrame({
    "a": ["the cat is blue", "the sky is green", "the dog is black"]
})

and I want to check if any of those rows contain a certain word I just have to do this.

frame["b"] = (
   frame.a.str.contains("dog") |
   frame.a.str.contains("cat") |
   frame.a.str.contains("fish")
)

frame["b"] outputs:

0     True
1    False
2     True
Name: b, dtype: bool

If I decide to make a list:

mylist = ["dog", "cat", "fish"]

How would I check that the rows contain a certain word in the list?

1

4 Answers 4

140
frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']})

frame
                  a
0   the cat is blue
1  the sky is green
2  the dog is black

The str.contains method accepts a regular expression pattern:

mylist = ['dog', 'cat', 'fish']
pattern = '|'.join(mylist)

pattern
'dog|cat|fish'

frame.a.str.contains(pattern)
0     True
1    False
2     True
Name: a, dtype: bool

Because regex patterns are supported, you can also embed flags:

frame = pd.DataFrame({'a' : ['Cat Mr. Nibbles is blue', 'the sky is green', 'the dog is black']})

frame
                     a
0  Cat Mr. Nibbles is blue
1         the sky is green
2         the dog is black

pattern = '|'.join([f'(?i){animal}' for animal in mylist])  # python 3.6+

pattern
'(?i)dog|(?i)cat|(?i)fish'
 
frame.a.str.contains(pattern)
0     True  # Because of the (?i) flag, 'Cat' is also matched to 'cat'
1    False
2     True
Sign up to request clarification or add additional context in comments.

9 Comments

This significantly speeds up what I was doing. Is there any way to return the sub pattern (say, dog) matched instead of True False?
Figured it out: to return the matched pattern use frame.a.str.extract(pattern)
@Andy Hayden How to print pattern values in case output is True
@Andy Hayden not it doesn't work tried it gives value error. Can you suggest something else.
I mean the expected output should be True Cat etc.., in place of True alone
|
61

For list should work

print(frame[frame["a"].isin(mylist)])

See pandas.DataFrame.isin().

2 Comments

Will this work even if you're looking for potentially a substring from a list? That is, if you want to match any substring of column 'a' to any element in mylist, will this catch it?
No it does not work for substrings. It only matches the entire string and is case-sensitive.
9

After going through the comments of the accepted answer of extracting the string, this approach can also be tried.

frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']})

frame
              a
0   the cat is blue
1  the sky is green
2  the dog is black

Let us create our list which will have strings that needs to be matched and extracted.

mylist = ['dog', 'cat', 'fish']
pattern = '|'.join(mylist)

Now let create a function which will be responsible to find and extract the substring.

import re
def pattern_searcher(search_str:str, search_list:str):

    search_obj = re.search(search_list, search_str)
    if search_obj :
        return_str = search_str[search_obj.start(): search_obj.end()]
    else:
        return_str = 'NA'
    return return_str

We will use this function with pandas.DataFrame.apply

frame['matched_str'] = frame['a'].apply(lambda x: pattern_searcher(search_str=x, search_list=pattern))

Result :

              a             matched_str
   0   the cat is blue         cat
   1  the sky is green         NA
   2  the dog is black         dog

2 Comments

pattern_searcher() will also return those characters in the string. for example, catastrophe will also return cat. hotdog will return dog
Moreover, if mylist = ['dog', 'cat', 'fish', dogs], the function will identify by order. for example, dogs are cool will return dog instead of dogs.
0

We can check for three patterns simultaneously using pipe, for example

for i in range(len(df)):
       if re.findall(r'car|oxide|gen', df.iat[i,1]):
           df.iat[i,2]='Yes'
       else:
           df.iat[i,2]='No'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.