0

enter image description here

This is print(df['Title']) result.

I am performing regex to replace unnecessary characters

def remove_punctuations(text):
    return re.sub(r']!@-#$%^&*(){};:,./<>?\|`~=_+',' ',text)

df1 = pd.read_csv(file2)
print(df1["Title"])
df1['Title'] = df1['Title'].apply(remove_punctuations)
print(df1["Title"])

What I am doing wrong. Please anyone point this out. Regards,

2 Answers 2

1

You should be enclosing the special characters inside a character class, which is denoted by [...] square brackets:

def remove_punctuations(text):
    return re.sub(r'\s*[\[\]!@#$%^&*(){};:,./<>?\|`~=_+-]\s*', ' ', text).strip()

Note that the replacement logic used replaces standalone special characters with a single space. For the edge cases where special characters might start or end the input, we use strip().

Sign up to request clarification or add additional context in comments.

2 Comments

Can you tell if I want to replace square brackets too. How can I do that ?
My answer should already be replacing square brackets. Look closely.
1

Your regex expression is looking for an exact chain of "]!@-#$%^&*(){};:,./<>?\| punctuations before substituting with a blank " ".

Replace your function with:

def remove_punctuations(text):
    return re.sub(r'[^\w\s]',' ',text)

where it would look for any instance of punctuations or white space.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.