How to replace non integer values in a pandas Dataframe?

Question

I have a dataframe consisting of two columns, Age and Salary

Age   Salary
21    25000
22    30000
22    Fresher
23    2,50,000
24    25 LPA
35    400000
45    10,00,000

How to handle outliers in Salary column and replace them with an integer?

df['col']=df['col'].str.replace("[^0-9]",'')

Aseem
– Aseem

2019-06-10 19:01:19 +00:00
Commented Jun 10, 2019 at 19:01 — Aseem
– Aseem, Commented Jun 10, 2019 at 19:01

jezrael · Accepted Answer · 2017-03-21 14:44:04Z

16

If need replace non numeric values use to_numeric with parameter errors='coerce':

df['new'] = pd.to_numeric(df.Salary.astype(str).str.replace(',',''), errors='coerce')
              .fillna(0)
              .astype(int)
print (df)
   Age     Salary      new
0   21      25000    25000
1   22      30000    30000
2   22    Fresher        0
3   23   2,50,000   250000
4   24     25 LPA        0
5   35     400000   400000
6   45  10,00,000  1000000

edited Mar 21, 2017 at 14:44

answered Mar 21, 2017 at 14:36

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

jezrael Over a year ago

It replace non numeric to NaN.

jezrael Over a year ago

and then fillna replace NaN to some int, e.g. 0

3novak Over a year ago

There are three options for errors: raise is the default and throws an error when it encounters nonnumeric characters. coerce returns NaN when it encounters nonnumeric characters. ignore returns the original value when it can't convert to numeric.

jezrael Over a year ago

@latish - please check last edit, need cast values to str, because mixed content - int with str values

Joe Rivera Over a year ago

@jezrael what would this look like for multiple columns?

Shenglin Chen · Accepted Answer · 2017-03-21 14:53:35Z

3

Use numpy where to find non digit value, replace with '0'.

df['New']=df.Salary.apply(lambda x: np.where(x.isdigit(),x,'0'))

answered Mar 21, 2017 at 14:53

Shenglin Chen

4,56413 silver badges11 bronze badges

1 Comment

Liz Over a year ago

What if you want to use this to replace part of a value, say it was 25x and you want it to be 250?

Arash · Accepted Answer · 2019-03-05 21:58:24Z

0

If you use Python 3 use the following. I am not sure how other Python versions return type(x). However I would not replace missing or inconsistent values with 0, it is better to replace them with None. But let's say you want to replace string values (outliers or inconsistent values) with 0 :

df['Salary']=df['Salary'].apply(lambda x: 0 if str(type(x))=="<class 'str'>" else x)

answered Mar 5, 2019 at 21:58

Arash

1,0741 gold badge9 silver badges17 bronze badges

Collectives™ on Stack Overflow

How to replace non integer values in a pandas Dataframe?

3 Answers 3

5 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related