Pandas dataframe reports no matching string when the string is present

Question

Fairly new to python. This seems to be a really simple question but I can't find any information about it. I have a list of strings, and for each string I want to check whether it is present in a dataframe (actually in a particular column of the dataframe. Not whether a substring is present, but the whole exact string.

So my dataframe is something like the following:

A=pd.DataFrame(["ancestry","time","history"])

I should simply be able to use the "string in dataframe" method, as in

"time" in A

This returns False however. If I run

"time" == A.iloc[1]

it returns "True", but annoyingly as part of a series, and this depends on knowing where in the dataframe the corresponding string is. Is there some way I can just use the string in df method, to easily find out whether the strings in my list are in the dataframe?

ASGM · Accepted Answer · 2022-05-10 13:49:37Z

1

Add .to_numpy() to the end:

'time' in A.to_numpy()

As you've noticed, the x in pandas.DataFrame syntax doesn't produce the result you want. But .to_numpy() transforms the dataframe into a Numpy array, and x in numpy.array works as you expect.

edited May 10, 2022 at 13:49

user17242583

answered May 9, 2022 at 21:21

ASGM

11.5k1 gold badge37 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user17242583 · Accepted Answer · 2022-05-09 21:20:07Z

The way to deal with this is to compare the whole dataframe with "time". That will return a mask where each value of the DF is True if it was time, False otherwise. Then, you can use .any() to check if there are any True values:

>>> A = pd.DataFrame(["ancestry","time","history"])
>>> A
          0
0  ancestry
1      time
2   history

>>> A == "time"  # or A.eq("time")
       0
0  False
1   True
2  False

>>> (A == "time").any()
0    True
dtype: bool

Notice in the above output, (A == "time").any() returns a Series where each entry is a column and whether or not that column contained time. If you want to check the entire dataframe (across all columns), call .any() twice:

>>> (A == "time").any().any()
True

Mose Wintner · Accepted Answer · 2022-05-09 21:20:11Z

0

I believe (myseries==mystr).any() will do what you ask. The special __contains__ method of DataFrames (which informs behavior of in) checks whether your string is a column of the DataFrame, e.g.

>>> A = pd.DataFrame({"c": [0,1,2], "d": [3,4,5]})
>>> 'c' in A
True
>>> 0 in A
False

answered May 9, 2022 at 21:20

Mose Wintner

3081 silver badge10 bronze badges

Comments

Daniel Weigel · Accepted Answer · 2022-05-09 21:23:26Z

0

I would slightly modify your dataframe and use .str.contains for checking where the string is present in your series.

df=pd.DataFrame()
df['A']=pd.Series(["ancestry","time","history"])

df['A'].str.contains("time")

answered May 9, 2022 at 21:23

Daniel Weigel

1,1372 gold badges11 silver badges15 bronze badges

Collectives™ on Stack Overflow

Pandas dataframe reports no matching string when the string is present

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related