1

I have troubles to understand how encodings works:

Why strings inside python code can be encoded:

s = 'Au\xc3\x9fenformat\n'
print s.encode('utf-8')
>>>Außenformnat

But if I read such a string from a text file I get:

f = open('out.txt', 'r')
data = f.read()
print data.encode('utf-8')
>>>Au\xc3\x9fenformat\n

Any suggestions?

5
  • 1
    Did you mean decode? And you are reading from a text file, not from a sqlite database here. Commented Apr 22, 2013 at 14:52
  • Yes decode (however in the first example it's give me the same result!?). I have the problem with text files and sqlite database, the database example is more complex and I thought it is due to the same problem. I can post it if it is not... Commented Apr 22, 2013 at 15:09
  • 2
    Before you do, make sure you have read the Python Unicode HOWTO, then read this article and this one too. The sqlite3 module handles Unicode fine, but verify the module documentation to be sure you didn't accidentally misconfigure things. Commented Apr 22, 2013 at 15:11
  • The Howto I read did not solve my problem. Thanks for the other two readings they are more conclusive. Commented Apr 22, 2013 at 15:31
  • Does your file actually contain slashes and x'ses? Could you post cat out.txt? Commented Apr 22, 2013 at 15:35

2 Answers 2

3

Try this and you should see the file contents printed correctly:

f = open('out.txt', 'r')
data = f.read()
print data.decode('string_escape')

This is because the backslashes in the txt from the file are being escaped:

>>> open('out.txt').read()
'Au\\xc3\\x9fenformat\\n\n'
Sign up to request clarification or add additional context in comments.

Comments

0
>>> f = open('out.txt', 'r')
>>> data= f.read()
>>> print data.decode("string_escape")
ußenformat

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.