3

I am using the Auto MPG dataset which contains missing values in the column/attribute horsepower in the form of ? characters.

Hence when I use the code-

data.isnull.values.any()

OR

data["horsepower"].isnull.values.any()

Both of them return False since these codes work for NaN values or blank values.

How can I locate such missing values containing special character, which in my case happens to be ? rather than the traditional NaN value(s).

Thanks!

3
  • 2
    replace ? with NaN using df.horsepower.replace('?',np.NaN, inplace=True) Then go as usual Commented Jan 2, 2019 at 11:07
  • 3
    If you are reading your data from csv file then provide na_values as ? in read_csv. for more details visit pandas.pydata.org/pandas-docs/stable/generated/… Commented Jan 2, 2019 at 11:08
  • @MohamedThasinah I've verified my answer before posting. It's working. Commented Jan 2, 2019 at 11:15

3 Answers 3

3

Use replace before checking NaNs:

data["horsepower"].replace('?',np.nan).isnull().values.any()

If DataFrame is created by read_csv add parameter na_values for converting ? to NaNs:

data = pd.read_csv(path, na_values=["?"])
data["horsepower"].isnull().values.any()
Sign up to request clarification or add additional context in comments.

Comments

2

you need to convert ? to NaN first. After then You can go for finding null values in it.

1) to convert ? to NaN :

data.replace('?',np.NaN)

2) to find null values:

pd.isna(data['horsepower'])

it will return dataframe with series of True/False.

Comments

2

you can define na_values as ? or use the below:

df.replace(r'[\W]',np.nan,regex=True)

\W finds any character that is not a letter, numeric digit, or the underscore character.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.