How to delete junk strings appearing in an integer column

Question

I have a column of integers (sample row: 123456789) and some of the values are interspersed with junk alphabets. Ex: 1234y5678. I want to delete the alphabets appearing in such cells and retain the numbers. How do I go about it using Pandas?

Assume my dataframe is df and the column name is mobile.

Should I use np.where with conditions such as df[df['mobile'].str.contains('a-z')] and use string replace?

Alain T. · Accepted Answer · 2019-06-03 12:48:30Z

4

If your junk characters are not limited to letters, you should use this:

yourSeries.str.replace('[^0-9]', '')

answered Jun 3, 2019 at 12:48

Alain T.

42.2k4 gold badges36 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Chris · Accepted Answer · 2019-06-03 12:44:57Z

2

Use pd.Series.str.replace:

import pandas as pd

s = pd.Series(['125109a181', '1361q1j1', '85198m4'])
s.str.replace('[a-zA-Z]', '').astype(int)

Output:

0    125109181
1       136111
2       851984

answered Jun 3, 2019 at 12:44

Chris

29.8k3 gold badges34 silver badges56 bronze badges

Comments

Chris Adams · Accepted Answer · 2019-06-04 08:31:07Z

1

Use the regex character class \D (not a digit):

df['mobile'] = df['mobile'].str.replace('\D', '').astype('int64')

edited Jun 4, 2019 at 8:31

answered Jun 3, 2019 at 13:13

Chris Adams

18.7k4 gold badges26 silver badges44 bronze badges

Collectives™ on Stack Overflow

How to delete junk strings appearing in an integer column

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related