1

I have a string x defined as below

x = b'LF                                                           \xa9 2020 by S&P Global Inc.,200523\n'

In iPython2

In [10]: x
Out[10]: 'LF                                                           \xa9 2020 by S&P Global Inc.,200523\n'

In [11]: print(x)
LF                                                           � 2020 by S&P Global Inc.,200523

In [12]: x.decode('ISO-8859-1')
Out[12]: u'LF                                                           \xa9 2020 by S&P Global Inc.,200523\n'

In [13]: print(x.decode('ISO-8859-1'))
LF                                                           © 2020 by S&P Global Inc.,200523

Question 1: why is the output for x and print(x) different? The same between x.decode('ISO-8859-1') and print(x.decode('ISO-8859-1')).

In iPython3

In [3]: x                                                                                                                                                                                           
Out[3]: b'LF                                                           \xa9 2020 by S&P Global Inc.,200523\n'

In [4]: print(x)                                                                                                                                                                                    
b'LF                                                           \xa9 2020 by S&P Global Inc.,200523\n'

In [5]: x.decode('ISO-8859-1')                                                                                                                                                                      
Out[5]: 'LF                                                           © 2020 by S&P Global Inc.,200523\n'

In [7]: print(x.decode('ISO-8859-1'))                                                                                                                                                               
LF                                                           © 2020 by S&P Global Inc.,200523

Question 2: As you can see, in Python3, the output for x and print(x) are the same. So are x.decode('ISO-8859-1') and print(x.decode('ISO-8859-1')). In Python2, it is not the case. Why is this distinction between Python2 and Python3?

Question 3: why the output of print(x) in Python 2 and 3 are different, the output of x is the same?

Question 4: why the output of x.decode('ISO-8859-1') in Python 2 and 3 are different, but print are the same?

1 Answer 1

1

Question 1: why is the output for x and print(x) different?

Just typing x into a REPL can be thought of as:

>>> print repr(x)
'LF                                                           \xa9 2020 by S&P Global Inc.,200523\n'

Question 2: As you can see, in Python3, the output for x and print(x) are the same. So are x.decode('ISO-8859-1') and print(x.decode('ISO-8859-1')). In Python2, it is not the case. Why is this distinction between Python2 and Python3?

Because x is a bytes object in Python 3, where print() will not attempt to decode the bytestring. Python 3 bytes representation display binary values over 127 using the corresponding escape sequence.

Question 3: why the output of print(x) in Python 2 and 3 are different, the output of x is the same?

Because repr(x) gives the same thing on Python 2 and 3.

Question 4: why the output of x.decode('ISO-8859-1') in Python 2 and 3 are different, but print are the same?

Because x.decode('ISO-8859-1') in Python 2 produces a unicode object in Python 2 and a str object in Python 3, whose __repr__() differ in how they display non-ASCII.


If you want a more thorough read on all of this, check out Unicode & Character Encodings in Python: A Painless Guide. (Disclosure: I wrote it.)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.