0

I just can't seem to get this right.

I now have a Pandas series called text

It consists of 105 rows of article text.

I want to loop through each of these rows and replace certain characters like " and -. Here's my code

cleaned = []
for i in text:
    i.replace('“', '')
    i.replace('”', '')
    i.replace('–', '')
    cleaned.append(i)

However, when I try to print out the text in this cleaned list, the characters above aren't removed. Where am I going wrong? Thanks

for i in cleaned:
    print(i)
0

2 Answers 2

1

string.replace() returns the string with the replaced values. It doesn't modify the original so do something like this:

for i in text:
    i = i.replace('“', '')
    i = i.replace('”', '')
    i = i.replace('–', '')
    cleaned.append(i)
Sign up to request clarification or add additional context in comments.

Comments

1

Use regular expressions to clean your text. The syntax can be a little confusing when you start, but it's much more powerful when you need to up your text cleaning.

import re

cleaned = []
for i in text:
    i = re.sub(r'\“', '', i)
    i = re.sub(r'\”', '', i
    i = re.sub(r'_', '', i)
    cleaned.append(i)

You can also replace all non letters and numbers using

i = re.sub(r'\W', '', i)

Remember that \ is for character escapes.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.