2

I currently have a csv file. The data originally is derived from PDF and doing a further analysis on the data, There are certain rows where the extracted data contains letters in place of numbers,

I need instead of numbers the letters of the variables. So trying to replace the int values by the letters

Such as in the following example:

col_state

2567i
28981
2534s
0123o

in the above table i am looking out to replace (i=1, s=5, o=0)

Expected Output:

col_state

25671
28981
25345
01230

What i have tried so far:

import re
chars_to_remove = ['i', '1', 's', '5', '']
regular_expression = '[' + re.escape (''. join (chars_to_remove)) + ']'

df['col_state'].str.replace(regular_expression, '', regex=True)

print(df['HSN_Code'])

So I have no clue how to handle this problem :(

2 Answers 2

4

You might use translate method as follows

import pandas as pd
data = pd.Series(["2567i","28981","2534s","0123o"])
t = str.maketrans("iso","150")
data = data.str.translate(t)
print(list(data))

output

['25671', '28981', '25345', '01230']

Explanation: translate is useful when you need replace single characters using other single characters, str.maketrans when using in 2-argument form create replacement table so i-th element of 1st argument is replaced using i-th element of 2nd argument (arguments must be equal length) which is then usable in translate. translate is method of strs and can be used without pandas.

Sign up to request clarification or add additional context in comments.

Comments

2

Try with:

repl = {'i':'1', 's':'5', 'o':'0'}
df['col_state'] = df['col_state'].replace(repl, regex=True)

Output:

  col_state
0     25671
1     28981
2     25345
3     01230

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.