Unfamiliar encoding in Python

Question

I am trying to create a binary converter with Python, but I encounter some strange codes:

>>> print '\x97'
—
>>> print '\x96'
–
>>> print '\x94'
”
>>> print '\x95'
•

What is that encoding called?

John Machin · Accepted Answer · 2012-02-12 06:06:05Z

2

That encoding could be ANY of the nine Windows single-byte "ANSI" encodings, cp1250 to cp1258 inclusive:

>>> guff = "\x97\x96\x94\x95"
>>> uguff0 = guff.decode('1250')
>>> all(guff.decode(str(e)) == uguff0 for e in xrange(1251, 1259))
True

Usage:

1250: Central/Eastern Europe languages with Latin-based alphabets e.g. Polish, Czech, Slovak, Hungarian
1251: Cyrillic alphabet e.g. Russian
1252: Western European languages with Latin-based alphabets
The others are single-language encodings for Turkish, Greek, Hebrew, Arabic, and Vietnamese.

To find out what is in use on your computer:

>>> import locale
>>> locale.getpreferredencoding()
'cp1252'

Here's what the codes mean:

>>> from unicodedata import name
>>> for c in uguff0:
...     print repr(c), name(c)
...
u'\u2014' EM DASH
u'\u2013' EN DASH
u'\u201d' RIGHT DOUBLE QUOTATION MARK
u'\u2022' BULLET
>>>

edited Feb 12, 2012 at 6:06

answered Feb 12, 2012 at 5:52

John Machin

83.2k12 gold badges147 silver badges193 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

paxdiablo · Accepted Answer · 2012-02-12 05:24:59Z

1

That would be hex encoding. It means take the hex value 97, which is 151 in decimal, and use that character inside the string.

Character 151 is the em-dash, 150 is the en-dash, 148 is the end-double-quote and 149 is the bullet point, as shown here, keeping in mind that these characters are not Unicode code points (as stated) but Windows code page characters.

edited Feb 12, 2012 at 5:24

answered Feb 12, 2012 at 5:10

paxdiablo

888k243 gold badges1.6k silver badges2k bronze badges

Collectives™ on Stack Overflow

Unfamiliar encoding in Python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related