1

I have a column of integers (sample row: 123456789) and some of the values are interspersed with junk alphabets. Ex: 1234y5678. I want to delete the alphabets appearing in such cells and retain the numbers. How do I go about it using Pandas?

Assume my dataframe is df and the column name is mobile.

Should I use np.where with conditions such as df[df['mobile'].str.contains('a-z')] and use string replace?

0

3 Answers 3

4

If your junk characters are not limited to letters, you should use this:

yourSeries.str.replace('[^0-9]', '')
Sign up to request clarification or add additional context in comments.

Comments

2

Use pd.Series.str.replace:

import pandas as pd

s = pd.Series(['125109a181', '1361q1j1', '85198m4'])
s.str.replace('[a-zA-Z]', '').astype(int)

Output:

0    125109181
1       136111
2       851984

Comments

1

Use the regex character class \D (not a digit):

df['mobile'] = df['mobile'].str.replace('\D', '').astype('int64')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.