23

I have a pandas dataframe. One of my columns should only be floats. When I try to convert that column to floats, I'm alerted that there are strings in there. I'd like to delete all rows where values in this column are strings...

4 Answers 4

28

Use convert_objects with param convert_numeric=True this will coerce any non numeric values to NaN:

In [24]:

df = pd.DataFrame({'a': [0.1,0.5,'jasdh', 9.0]})
df
Out[24]:
       a
0    0.1
1    0.5
2  jasdh
3      9
In [27]:

df.convert_objects(convert_numeric=True)
Out[27]:
     a
0  0.1
1  0.5
2  NaN
3  9.0
In [29]:

You can then drop them:

df.convert_objects(convert_numeric=True).dropna()
Out[29]:
     a
0  0.1
1  0.5
3  9.0

UPDATE

Since version 0.17.0 this method is now deprecated and you need to use to_numeric unfortunately this operates on a Series rather than a whole df so the equivalent code is now:

df.apply(lambda x: pd.to_numeric(x, errors='coerce')).dropna()
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for this! My dataframe has multiple columns. Some columns need to have strings. For instance, I have a column 'name' and a column 'age'. The column 'age' needs to be numeric. I tried: df.age.convert_objects(convert_numeric=True) and got 'Series' object has no attribute 'convert_objects'.
You need to do df[['age']].convert_objects(convert_numeric=True) in that case
Oh I see, so [['age']] picks out a the column in df. Very helpful. However, I'm getting a TypeError: convert_objects() got an unexpected keyword argument 'convert_numeric. I just checked the documentation and 'convert_numeric = True' is the correct argument. Thoughts?
Okay, I think that my pandas is out of date. Updating now.
Hi. I get a 'convert_objects deprecated' FutureWarning when trying to use this. Any suggestions?
|
6

One of my columns should only be floats. I'd like to delete all rows where values in this column are strings

You can convert your series to numeric via pd.to_numeric and then use pd.Series.notnull. Conversion to float is required as a separate step to avoid your series reverting to object dtype.

# Data from @EdChum

df = pd.DataFrame({'a': [0.1, 0.5, 'jasdh', 9.0]})

res = df[pd.to_numeric(df['a'], errors='coerce').notnull()]
res['a'] = res['a'].astype(float)

print(res)

     a
0  0.1
1  0.5
3  9.0

Comments

1

Assume your data frame is df and you wanted to ensure that all data in one of the column of your data frame is numeric in specific pandas dtype, e.g float:

df[df.columns[n]] = df[df.columns[n]].apply(pd.to_numeric, errors='coerce').fillna(0).astype(float).dropna()

Comments

0

You can find the data type of a column from the dtype.kind attribute. Something like df[col].dtype.kind. See the numpy docs for more details. Transpose the dataframe to go from indices to columns.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.