0

I need to insert a series of names (like 'Alam\xc3\xa9') into a list, and than I have to save them into a SQLite database.

I know that I can render these names correctly by tiping:

print eval(repr(NAME)).decode("utf-8")

But I have to insert them into a list, so I can't use the print

Other way for doing this without the print?

1
  • Are you trying to store bytes or characters in the database? Commented Oct 14, 2011 at 15:31

2 Answers 2

6

Lots and lots of misconceptions here.

The string you quote is not Unicode. It is a byte string, encoded in UTF-8.

You can convert it to Unicode by decoding it:

unicode_name = name.decode('utf-8')

When you print the value of unicode_name to the console, you will see one of two things:

>>> unicode_name
u'Alam\xe9'
>>> print unicode_name
Alamé

Here, you can see that just typing the name and pressing enter shows a representation of the Unicode code points. This is the same as typing print repr(unicode_name). However, doing print unicode_name prints the actual characters - ie behind the scenes, it encodes it to the correct encoding for your terminal, and prints the result.

But this is all irrelevant, because Unicode strings can only be represented internally. As soon as you want to store it in a database, or a file, or anywhere, you need to encode it. And the most likely encoding to choose is UTF-8 - which is what it was in originally.

>>> name
'Alam\xc3\xa9'
>>> print name
Alamé

As you can see, using the original non-decoded version of the name, repr and print once again show the codes and the characters. So it's not that converting it to Unicode actually makes it any more "really" the correct character.

So, what to do if you want to store it in a database? Nothing. Nothing at all. Sqlite accepts UTF-8 input, and stores its data in UTF-8 format on the disk. So there is absolutely no conversion needed to store the original value of name in the database.

Sign up to request clarification or add additional context in comments.

1 Comment

thank you very much...now I understand a little more... One last thing: now everything it's ok, but with only one exception: \u00f2 is printed as it is, instead of ò. Do you know why?
0

Are you looking for something like this?

[n.decode("utf-8") for n in ['Alam\xc3\xa9', 'Alam\xc3\xa9', 'Alam\xc3\xa9']]

3 Comments

Which is the same thing eval(repr('Alam\xc3\xa9')).decode("utf-8") will produce. What are you trying to do?
exactly, infact also eval(repr('Alam\xc3\xa9')).decode("utf-8") it's incorrect...the trick is made by the print before it
The print statement is just attempting to display the unicode characters, whereas repr() doesn't (in Python 2). u'\x39' is just how the character é appears in a repr. So that is what you want to save.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.