0

I need to read a file with some strange string lines like : \x72\xFE\x20TEST_STRING\0\0\0

but when I do a print of this string (with repr()) it prints this : r\xfe TEST_STRING\x00\x00\x00

Example :

>>> test = '\x72\xFE\x20TEST_STRING\0\0\0'
>>> print test
r? TEST_STRING
>>> print repr(test)
'r\xfe TEST_STRING\x00\x00\x00'

How can I get the same line from a file in Python and my editor ? Is python changing encoding during string manipulation ?

2
  • Are you asking why the output of print s differs from the output of print repr(s)? Commented Aug 26, 2011 at 17:43
  • What is actually in your file? Are you sure? How did you verify it? '\x72' in a string literal does not mean "a backslash, the letter x, the digit 7 and the digit 2"; it means "the byte whose value is written as '72' in hexadecimal, i.e. 114, which happens to be the letter r". Commented Aug 26, 2011 at 18:47

4 Answers 4

1

You should use python's raw strings, like this (note the 'r' in front of the string)

test = r'\x72\xFE\x20TEST_STRING\0\0\0'

Then it won't try to interpret the escapes as special characters.

When reading from a text file python shouldn't be trying to interpret the string as having multi-byte unicode characters. You should get a exactly what's in the file:

In [22]: fp = open("test.txt", "r")

In [23]: s = fp.read()

In [24]: s
Out[24]: '\\x72\\xFE\\x20TEST_STRING\\0\\0\\0\n\n'

In [25]: print s
\x72\xFE\x20TEST_STRING\0\0\0
Sign up to request clarification or add additional context in comments.

Comments

1

\x20 is a space. When you put that into a Python string it is stored exactly the same way as a space.

If you have printable characters in a string it does not matter whether they were typed as the actual character or some escape sequence, they will be represented the same way because they are in fact the same value.

Consider the following examples:

>>> ' ' == '\x20'
True

>>> hex(ord('a'))
'0x61'
>>> '\x61'
'a'

Comments

1

Python did not change the encoding:

When printing Python just resolved the printable chars in your string: chr(0x72) is a "r", chr(0xfe) is not printable, so you get the "?", chr(0x20) is chr(32) that is a space " ", and zero bytes are not printed at all.

repr() resolves the "r", leaves the chr(0xfe), and prints the chr(0) in full hexadecimal notation for chr(0x00).

So if you want the same line in your editor and for repr(), you have to type your string in your editor in the same notation repr() does, that is you write

test='r\xfe TEST_STRING\x00\x00\x00'

and repr(test) should print the same string:

Comments

0

To avoid having python interpret the backslashes as escaped characters, prefix your string with an "r" character:

    >>> test = r'\x72\xFE\x20TEST_STRING\0\0\0'
    >>> print test
    \x72\xFE\x20TEST_STRING\0\0\0`

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.