136

I would like to see if a particular string exists in a particular column within my dataframe.

I'm getting the error

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

import pandas as pd

BabyDataSet = [('Bob', 968), ('Jessica', 155), ('Mary', 77), ('John', 578), ('Mel', 973)]

a = pd.DataFrame(data=BabyDataSet, columns=['Names', 'Births'])

if a['Names'].str.contains('Mel'):
    print ("Mel is there")
0

11 Answers 11

188

a['Names'].str.contains('Mel') will return an indicator vector of boolean values of size len(BabyDataSet)

Therefore, you can use

mel_count=a['Names'].str.contains('Mel').sum()
if mel_count>0:
    print ("There are {m} Mels".format(m=mel_count))

Or any(), if you don't care how many records match your query

if a['Names'].str.contains('Mel').any():
    print ("Mel is there")
Sign up to request clarification or add additional context in comments.

3 Comments

If there's NaN values in a['Names'], use the na parameter of the contains() function. pandas.pydata.org/pandas-docs/stable/reference/api/…
Gotcha number 2: str.contains('Mel') matches on every substring of every row in dataframe column. So ABCMelABC == Mel.
This answer is incorrect & misleading since you are checking if 'Mel' is contained in any of the string in the column e.g. 'hi Mel' in the column will also evaluate to true whereas an exact match of the string is required
45

You should use any()

In [98]: a['Names'].str.contains('Mel').any()
Out[98]: True

In [99]: if a['Names'].str.contains('Mel').any():
   ....:     print("Mel is there")
   ....:
Mel is there

a['Names'].str.contains('Mel') gives you a series of bool values

In [100]: a['Names'].str.contains('Mel')
Out[100]:
0    False
1    False
2    False
3    False
4     True
Name: Names, dtype: bool

1 Comment

If i want to check whether either of the words exist a['Names'].str.contains("Mel|word_1|word_2") works. Can you please suggest something for 'and' condition. I want to check if all the words in my list exist in each row of dataframe
25

OP meant to find out whether the string 'Mel' exists in a particular column, not contained in any string in the column. Therefore the use of contains is not needed, and is not efficient.

A simple equals-to is enough:

df = pd.DataFrame({"names": ["Melvin", "Mel", "Me", "Mel", "A.Mel"]})

mel_count = (df['names'] == 'Mel').sum() 
print("There are {num} instances of 'Mel'. ".format(num=mel_count)) 
 
mel_exists = (df['names'] == 'Mel').any() 
print("'Mel' exists in the dataframe.".format(num=mel_exists)) 

mel_exists2 = 'Mel' in df['names'].values 
print("'Mel' is in the dataframe: " + str(mel_exists2)) 

Prints:

There are 2 instances of 'Mel'. 
'Mel' exists in the dataframe.
'Mel' is in the dataframe: True

3 Comments

a similar solution: (a['Names'].eq('Mel')).any()
This is the most accurate answer
Why does one have to go down to numpy simply to check if a string is contained in a Series of strings? (like 'Mel' in df['names'].values). Seems contra-productive. I would expect 'Mel' in df['names'] to work?
13

I bumped into the same problem, I used:

if "Mel" in a["Names"].values:
    print("Yep")

But this solution may be slower since internally pandas create a list from a Series.

1 Comment

it works for multiple string in that columns, thanks
4

If there is any chance that you will need to search for empty strings,

    a['Names'].str.contains('') 

will NOT work, as it will always return True.

Instead, use

    if '' in a["Names"].values

to accurately reflect whether or not a string is in a Series, including the edge case of searching for an empty string.

Comments

4

For case-insensitive search.

a['Names'].str.lower().str.contains('mel').any()

Comments

2

Pandas seem to be recommending df.to_numpy since the other methods still raise a FutureWarning: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy

So, an alternative that would work int this case is:

b=a['Names']
c = b.to_numpy().tolist()
if 'Mel' in c:
     print("Mel is in the dataframe column Names")

Comments

2
import re
s = 'string'

df['Name'] = df['Name'].str.findall(s, flags = re.IGNORECASE)

#or
df['Name'] = df[df['Name'].isin(['string1', 'string2'])]

Comments

1
import pandas as pd

(data_frame.col_name=='str_name_to_check').sum()

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.
0

If you want to save the results then you can use this:

a['result'] = a['Names'].apply(lambda x : ','.join([item for item in str(x).split() if item.lower() in ['mel', 'etc']]))

Comments

-1

You should check the value of your line of code like adding checking length of it.

if(len(a['Names'].str.contains('Mel'))>0):
    print("Name Present")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.