0

I have a csv that I'm loading into a dataframe. I only need the rows for which the Organization column contains a target string affiliation.

When I try to use str.contains() I get ValueError: cannot index with vector containing NA / NaN values.

I've looked at Value Error when Slicing in Pandas and pandas + dataframe - select by partial string and the following solution that both have works for me:

df = df[df['Organization'].str.contains(affiliation)==True]

or

df = df[df['Organization'].str.contains(affiliation).fillna(False)]

But, as a test, I did this:

print(len(df)) #99228
df = df[pd.notnull(df['Organization'])] #or df = df.dropna(subset=['Organization'])
print(len(df)) #99228
df = df[df['Organization'].str.contains(affiliation).fillna(False)]
print(len(df)) #1605

My question is: the ValueError I was getting without ==True or fillna(False) attached to str.contains() seems to imply that the Organization column has NaNs. But then why do I get the same sized df after keeping only the non-null Organization rows? What am I missing here?

Thanks!

2 Answers 2

1

Check the content of your column Organization. It very likely contains strings and other data types. Therefore, df['Organization'].str.contains(affiliation) results in NaNfor these values with other data types. You cannot use NaNfor indexing but need to convert it into False.

Sign up to request clarification or add additional context in comments.

Comments

1

You need to specify str.contains('affiliation', na=False). [docs]

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: s = pd.Series(['foo','bar',np.nan,'this contains affiliation','baz',np.nan])

In [4]: s.str.contains('affiliation')
Out[4]:
0    False
1    False
2      NaN
3     True
4    False
5      NaN
dtype: object

In [5]: s.str.contains('affiliation', na=False)
Out[5]:
0    False
1    False
2    False
3     True
4    False
5    False
dtype: bool

Then you can index your DataFrame with that Boolean array.

1 Comment

Thanks but I knew I needed to do that. I didn't understand why it was happening if it wasn't because of NaNs

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.