7

I have a dataframe with some columns containing data of type object because of some funky data entries (aka a . or whatnot).

I have been able to correct this by identifying the object columns and then doing this:

obj_cols = df.loc[:, df.dtypes == object]
conv_cols = obj_cols.convert_objects(convert_numeric='force')

This works fine and allows me to run the regression I need, but generates this error:

FutureWarning: convert_objects is deprecated.

Is there a better way to do this so as to avoid the error? I also tried constructing a lambda function but that didn't work.

1
  • You can use astype(int) or pd.to_numeric Commented Apr 16, 2017 at 21:29

2 Answers 2

14

Convert_objects is deprecated. Use this instead. You can add parameter errors='coerce' to convert bad non numeric values to NaN.

conv_cols = obj_cols.apply(pd.to_numeric, errors = 'coerce')

The function will be applied to the whole DataFrame. Columns that can be converted to a numeric type will be converted, while columns that cannot (e.g. they contain non-digit strings or dates) will be left alone.

Sign up to request clarification or add additional context in comments.

9 Comments

This gave me: Value Error: ('Unable to parse string "." at position...)
I tried that earlier. I get TypeError: arg must be a list, tuple, 1-d array, or Series
Because obj_cols is a dataframe
The answer with apply should work along with the argument errors = 'coerce'
i believe errors = 'coerce' converts all non-digit strings to NaN , so it rather should be errors='ignore'
|
3

If you have a sample data frame:

sales = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 'f', 'Mar': 140},
     {'account': 'Alpha Co',  'Jan': 'e', 'Feb': 210, 'Mar': 215},
     {'account': 'Blue Inc',  'Jan': 50,  'Feb': 90,  'Mar': 'g' }]
df = pd.DataFrame(sales)

and you want to get rid of the strings in the columns that should be numeric, you can do this with pd.to_numeric

cols = ['Jan', 'Feb', 'Mar']
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce', axis=1)

your new data frame will have NaN in place of the 'wacky' data

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.