0

I have a pandas script where I get an excel sheet and put it on a pandas dataframe, then I am looking in this dataframe for a specific word, then I create a mask of 1 and 0 of the df, where I find the word.

I don't have a specific format for the excel sheet so I get all the info as is, and I look for the word and create a mask with this line which produce the error:

mask = np.column_stack([df[col].str.find(word) for col in df.columns.tolist()]).astype(int)

this line sometimes produce this error:

pandas can only use .str accessor with string values, which use np.object_ dtype in pandas

any idea why and how to make it work?

thank you

1
  • Try df.select_dtypes([np.object]).columns.tolist() instead of df.columns.tolist() in your list comprehension. Right now, you are selecting all the dtypes of columns which could very well be mixed. You need to confine this to only the string ones for str.find() function to work properly. Commented Jan 17, 2017 at 8:43

1 Answer 1

1

You can use applymap with a lambda function to convert the dataframe to a mask. If df is your input dataframe, you can do the following to convert all fields to 1 if the string word is in it or 0 otherwise.

mask = df.applymap(lambda x: 1 if word in str(x) else 0)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.