2

I've created a dictionnary with Python but I've got problems with extended Ascii codes.

The loop that creats the dictionnary is : (ascii number 128 to 164 : é,à etc)

#extented ascii codes
i = 128
while i <= 165 :
    dictionnary[chr(i)] = 'extended ascii'
    i = i + 1

But when I try to use dictionnary :

    >>> dictionnary['è']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: '\xc3\xa8'

I've got # -- coding: utf-8 -- in the header of the python script. I've tried encode,decode etc but the result is always bad.

To understand what happens, I've tried :

>>> ord('é')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ord() expected a character, but string of length 2 found

and

    >>> ord(u'é')
233

I'am confused with ord(u'é') because 'é' is number 130 in extended ascii table and not 233.

I understand that extended ascii codes contains "two characters" but I don't understand how to solve the problem with dictionnary ?

Thanks in advance ! :-)

2
  • 2
    There is no such thing as "extended ASCII". there are a lot of encodings (cpXXXX in Windows, latinXX, iso-8859-XX and others in the real world) where 247 can mean different things. Commented Jan 21, 2014 at 9:37
  • Extended Ascii is the characters in the range 128 and above. Ascii = 0-127, Extended Ascii = 128-255. This dates back to the 60ies and 70ies. Now it is not important except for its residual effects like when you can't print out characters above 128 but you can for less than 128. Dates back to dumb terminals. Commented Aug 1, 2017 at 0:24

1 Answer 1

4

Use unichr instead of chr. The function chr produces a string containing a single byte, whereas unichr produces a string containing a single unicode character. Finally, do lookups using unicode characters too: d[u'é'] because d['é'] will look up the utf-8 encoding of é.

You have 3 things in your code: a latin-1 encoded str, a utf-8 encoded str, and a unicode string. Getting it clear in your head which you've got at any point in time requires a lot of knowledge about how Python works and a decent understanding of Unicode and encodings.

No answer about encodings and Unicode is complete without a link to Joel Spolsky's article on the matter: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Sign up to request clarification or add additional context in comments.

2 Comments

Did you mean to say, "No answer about encodings and Unicode is complete without a link..."?
thanks for your reply. I've installed python3 and it works perfectly :-)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.