10

I have a pandas dataframe which I want to check for substrings of a certain column. At the moment I have 30 lines of code of this kind:

df['NAME'].str.upper().str.contains('LIMITED')) |
(df['NAME'].str.upper().str.contains('INC')) |
(df['NAME'].str.upper().str.contains('CORP')) 

They are all linked with an or condition and if any of them is true, the name is the name of a company rather than a person.

But to me this doesn't seem very elegant. Is there a way to check a pandas string column for "does the string in this column contain any of the substrings in the following list" ['LIMITED', 'INC', 'CORP'].

I found the pandas.DataFrame.isin function, but this is only working for entire strings, not for my substrings.

1
  • Note: There is a solution described by @unutbu which is more efficient than using pd.Series.str.contains. If performance is an issue, then this may be worth investigating. Commented May 6, 2018 at 22:12

1 Answer 1

12

You can use regex, where '|' is an "or" in regular expressions:

l = ['LIMITED','INC','CORP']  
regstr = '|'.join(l)
df['NAME'].str.upper().str.contains(regstr)

MVCE:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'NAME':['Baby CORP.','Baby','Baby INC.','Baby LIMITED
   ...: ']})

In [3]: df
Out[3]: 
           NAME
0    Baby CORP.
1          Baby
2     Baby INC.
3  Baby LIMITED

In [4]: l = ['LIMITED','INC','CORP']  
   ...: regstr = '|'.join(l)
   ...: df['NAME'].str.upper().str.contains(regstr)
   ...: 
Out[4]: 
0     True
1    False
2     True
3     True
Name: NAME, dtype: bool

In [5]: regstr
Out[5]: 'LIMITED|INC|CORP'
Sign up to request clarification or add additional context in comments.

3 Comments

Can you please suggest something for 'and' condition. I want to check if all the words in my list exist in each row of dataframe.
@SyedMdIsmail '&'.join(l)
I had tried it before asking. It's able to identify the 'or' condition but not the 'and'. Thanks for the reply.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.