0

Fairly new to python. This seems to be a really simple question but I can't find any information about it. I have a list of strings, and for each string I want to check whether it is present in a dataframe (actually in a particular column of the dataframe. Not whether a substring is present, but the whole exact string.

So my dataframe is something like the following:

A=pd.DataFrame(["ancestry","time","history"])

I should simply be able to use the "string in dataframe" method, as in

"time" in A

This returns False however. If I run

"time" == A.iloc[1]

it returns "True", but annoyingly as part of a series, and this depends on knowing where in the dataframe the corresponding string is. Is there some way I can just use the string in df method, to easily find out whether the strings in my list are in the dataframe?

4 Answers 4

1

Add .to_numpy() to the end:

'time' in A.to_numpy() 

As you've noticed, the x in pandas.DataFrame syntax doesn't produce the result you want. But .to_numpy() transforms the dataframe into a Numpy array, and x in numpy.array works as you expect.

Sign up to request clarification or add additional context in comments.

Comments

0

The way to deal with this is to compare the whole dataframe with "time". That will return a mask where each value of the DF is True if it was time, False otherwise. Then, you can use .any() to check if there are any True values:

>>> A = pd.DataFrame(["ancestry","time","history"])
>>> A
          0
0  ancestry
1      time
2   history

>>> A == "time"  # or A.eq("time")
       0
0  False
1   True
2  False

>>> (A == "time").any()
0    True
dtype: bool

Notice in the above output, (A == "time").any() returns a Series where each entry is a column and whether or not that column contained time. If you want to check the entire dataframe (across all columns), call .any() twice:

>>> (A == "time").any().any()
True

Comments

0

I believe (myseries==mystr).any() will do what you ask. The special __contains__ method of DataFrames (which informs behavior of in) checks whether your string is a column of the DataFrame, e.g.

>>> A = pd.DataFrame({"c": [0,1,2], "d": [3,4,5]})
>>> 'c' in A
True
>>> 0 in A
False

Comments

0

I would slightly modify your dataframe and use .str.contains for checking where the string is present in your series.

df=pd.DataFrame()
df['A']=pd.Series(["ancestry","time","history"])

df['A'].str.contains("time")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.