2

I have a text file with lines like:

str = '0|Crazy Taxi\xe2\x84\xa2 City Rush^Truck Racing Super Gear^Candy Crush Soda Saga^Car Parking^BMX Kid^Hill Climb Racing^UNLimited Kareena Kapoor^3D Car Parking^Find My Android Phone!^Christmas Trains^Top Free Games^Telegram^Door Screen Lock^Adventure of Ted 2 - Free^Sonic Jump^'

I want to to remove "\xe2\x84\xa2" which i am able to do using the following code line:

print unicode(str,errors="ignore")

output = '0|Crazy Taxi City Rush^Truck Racing Super Gear^Candy Crush Soda Saga^Car Parking^BMX Kid^Hill Climb Racing^UNLimited Kareena Kapoor^3D Car Parking^Find My Android Phone!^Christmas Trains^Top Free Games^Telegram^Door Screen Lock^Adventure of Ted 2 - Free^Sonic Jump^'

But when I am running the same logic on complete files using below mentioned code:

with open('train_data_dump.txt', mode='r') as document:
    for line in document:
        print unicode(line,errors='ignore')

It is printing the line as it was before.

Feel free to ask If I am not clear enough in asking question and please help.

4
  • check the indentation. Commented Sep 3, 2015 at 10:39
  • That is posting mistake, I will edit that. Thanks Commented Sep 3, 2015 at 10:51
  • So the text file contains Python source code? Commented Sep 3, 2015 at 11:09
  • No Sir, It has text written after crwaling web pages seperated by "^". Commented Sep 3, 2015 at 11:58

1 Answer 1

3

When you assign a variable from a file it is as if you assigned a raw string - the backslash is considered a normal letter. You need to decode the escaped chars at first.

unicode(i.decode("string_escape"), errors="ignore")

Python Specific Encodings

Sign up to request clarification or add additional context in comments.

2 Comments

will you please explain how it works internally and why it differs? Even i am trying to find a solution of this.
Thank you Sir, It has worked for me. I was also thinking alike and was using unicode(i.strip(), errors="ignore") at my end. But that was not working. Most probably because strip removes white spaces not the backslash.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.