Check if string is in a pandas dataframe

Question

I would like to see if a particular string exists in a particular column within my dataframe.

I'm getting the error

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

import pandas as pd

BabyDataSet = [('Bob', 968), ('Jessica', 155), ('Mary', 77), ('John', 578), ('Mel', 973)]

a = pd.DataFrame(data=BabyDataSet, columns=['Names', 'Births'])

if a['Names'].str.contains('Mel'):
    print ("Mel is there")

Uri Goren · Accepted Answer · 2015-06-19 20:30:08Z

188

a['Names'].str.contains('Mel') will return an indicator vector of boolean values of size len(BabyDataSet)

Therefore, you can use

mel_count=a['Names'].str.contains('Mel').sum()
if mel_count>0:
    print ("There are {m} Mels".format(m=mel_count))

Or any(), if you don't care how many records match your query

if a['Names'].str.contains('Mel').any():
    print ("Mel is there")

answered Jun 19, 2015 at 20:30

Uri Goren

13.8k8 gold badges62 silver badges113 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Sander Vanden Hautte Over a year ago

If there's NaN values in a['Names'], use the na parameter of the contains() function. pandas.pydata.org/pandas-docs/stable/reference/api/…

Eric Leschinski Over a year ago

Gotcha number 2: str.contains('Mel') matches on every substring of every row in dataframe column. So ABCMelABC == Mel.

umar Over a year ago

This answer is incorrect & misleading since you are checking if 'Mel' is contained in any of the string in the column e.g. 'hi Mel' in the column will also evaluate to true whereas an exact match of the string is required

Oren · Accepted Answer · 2022-12-01 14:36:47Z

45

You should use any()

In [98]: a['Names'].str.contains('Mel').any()
Out[98]: True

In [99]: if a['Names'].str.contains('Mel').any():
   ....:     print("Mel is there")
   ....:
Mel is there

a['Names'].str.contains('Mel') gives you a series of bool values

In [100]: a['Names'].str.contains('Mel')
Out[100]:
0    False
1    False
2    False
3    False
4     True
Name: Names, dtype: bool

edited Dec 1, 2022 at 14:36

Oren

5,4795 gold badges45 silver badges68 bronze badges

answered Jun 19, 2015 at 18:06

Zero

77.4k22 gold badges153 silver badges153 bronze badges

1 Comment

Syed Md Ismail Over a year ago

If i want to check whether either of the words exist a['Names'].str.contains("Mel|word_1|word_2") works. Can you please suggest something for 'and' condition. I want to check if all the words in my list exist in each row of dataframe

Eric Leschinski · Accepted Answer · 2021-05-31 16:47:42Z

25

OP meant to find out whether the string 'Mel' exists in a particular column, not contained in any string in the column. Therefore the use of contains is not needed, and is not efficient.

A simple equals-to is enough:

df = pd.DataFrame({"names": ["Melvin", "Mel", "Me", "Mel", "A.Mel"]})

mel_count = (df['names'] == 'Mel').sum() 
print("There are {num} instances of 'Mel'. ".format(num=mel_count)) 
 
mel_exists = (df['names'] == 'Mel').any() 
print("'Mel' exists in the dataframe.".format(num=mel_exists)) 

mel_exists2 = 'Mel' in df['names'].values 
print("'Mel' is in the dataframe: " + str(mel_exists2))

Prints:

There are 2 instances of 'Mel'. 
'Mel' exists in the dataframe.
'Mel' is in the dataframe: True

edited May 31, 2021 at 16:47

Eric Leschinski

155k96 gold badges423 silver badges337 bronze badges

answered Nov 8, 2019 at 17:35

meizy

3893 silver badges4 bronze badges

3 Comments

ivegotaquestion Over a year ago

a similar solution: (a['Names'].eq('Mel')).any()

thentangler Over a year ago

This is the most accurate answer

K.-Michael Aye Over a year ago

Why does one have to go down to numpy simply to check if a string is contained in a Series of strings? (like 'Mel' in df['names'].values). Seems contra-productive. I would expect 'Mel' in df['names'] to work?

Christian Pao. · Accepted Answer · 2020-02-05 13:15:30Z

13

I bumped into the same problem, I used:

if "Mel" in a["Names"].values:
    print("Yep")

But this solution may be slower since internally pandas create a list from a Series.

answered Feb 5, 2020 at 13:15

Christian Pao.

5534 silver badges16 bronze badges

1 Comment

PyBoss Over a year ago

it works for multiple string in that columns, thanks

baileyw · Accepted Answer · 2020-06-04 21:10:02Z

4

If there is any chance that you will need to search for empty strings,

    a['Names'].str.contains('')

will NOT work, as it will always return True.

Instead, use

    if '' in a["Names"].values

to accurately reflect whether or not a string is in a Series, including the edge case of searching for an empty string.

answered Jun 4, 2020 at 21:10

baileyw

991 silver badge3 bronze badges

Comments

Hayat · Accepted Answer · 2021-06-10 12:17:18Z

4

For case-insensitive search.

a['Names'].str.lower().str.contains('mel').any()

answered Jun 10, 2021 at 12:17

Hayat

1,6494 gold badges22 silver badges32 bronze badges

Comments

β.εηοιτ.βε · Accepted Answer · 2020-06-28 19:35:11Z

2

Pandas seem to be recommending df.to_numpy since the other methods still raise a FutureWarning: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy

So, an alternative that would work int this case is:

b=a['Names']
c = b.to_numpy().tolist()
if 'Mel' in c:
     print("Mel is in the dataframe column Names")

edited Jun 28, 2020 at 19:35

β.εηοιτ.βε

40.3k14 gold badges81 silver badges104 bronze badges

answered Jun 28, 2020 at 17:44

RusRus

312 bronze badges

Comments

janhavi kulkarni · Accepted Answer · 2022-02-03 16:28:27Z

2

import re
s = 'string'

df['Name'] = df['Name'].str.findall(s, flags = re.IGNORECASE)

#or
df['Name'] = df[df['Name'].isin(['string1', 'string2'])]

answered Feb 3, 2022 at 16:28

janhavi kulkarni

213 bronze badges

Comments

camille · Accepted Answer · 2022-01-20 15:22:40Z

1

import pandas as pd

(data_frame.col_name=='str_name_to_check').sum()

edited Jan 20, 2022 at 15:22

camille

16.9k18 gold badges44 silver badges67 bronze badges

answered Jan 16, 2022 at 20:33

Harshit Sharma

111 bronze badge

1 Comment

Keegan Murphy Over a year ago

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

SaNa · Accepted Answer · 2021-07-26 04:12:04Z

0

If you want to save the results then you can use this:

a['result'] = a['Names'].apply(lambda x : ','.join([item for item in str(x).split() if item.lower() in ['mel', 'etc']]))

edited Jul 26, 2021 at 4:12

answered Jul 26, 2021 at 4:02

SaNa

3431 gold badge4 silver badges14 bronze badges

Comments

Shahir Ansari · Accepted Answer · 2019-07-01 08:08:51Z

-1

You should check the value of your line of code like adding checking length of it.

if(len(a['Names'].str.contains('Mel'))>0):
    print("Name Present")

edited Jul 1, 2019 at 8:08

answered Jul 1, 2019 at 7:12

Shahir Ansari

1,86818 silver badges21 bronze badges

Collectives™ on Stack Overflow

Check if string is in a pandas dataframe

11 Answers 11

3 Comments

1 Comment

3 Comments

1 Comment

Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

11 Answers 11

3 Comments

1 Comment

3 Comments

1 Comment

Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related