1

I need to create and connect to a database PostgreSQL 9.2 using SQLAlchemy. So far, I am able to create the full db in UTF-8, but I have trouble putting non-ASCII characters into it. This is how I connect to the db:

url = URL(drivername=s'postgresql', username='uname', password='pwd', host='localhost', port='5432', database='postgres')
self.engine = create_engine(url)

Then I create the new db, switch to it, and start to populate it: everything is ok. I get this:

entercursor.execute(statement, parameters)
sqlalchemy.exc.DataError: (DataError) invalid byte sequence for encoding "UTF8": 0xec2d43
'INSERT INTO province (codice_regione, codice, tc_provincia_id, nome, sigla) VALUES (%(codice_regione)s, %(codice)s, %(tc_provincia_id)s, %(nome)s, %(sigla)s) RETURNING province.id' {'nome': 'Forl\xec-Cesena', 'codice': 40, 'codice_regione': 8, 'tc_provincia_id': 34, 'sigla': 'FC'}

I have the same code for the same db on MySQL 5, it works perfectly. I don't know what is wrong. I registered the extension of postgres for unicode, but this does not work. I am puzzled, I need the help of somebody more experienced.

2 Answers 2

3

The 0xec2d43 sequence corresponds in iso-8859-1 to the 3 characters ì-C which would be part of the name 'Forlì-Cesena', according to the error log.

So the program is sending valid iso-8559-1, not UTF-8, while the server expects UTF-8.

The simplest way to fix the problem is to inform the server about the actual encoding, by issuing at the client side this SQL statement:

SET client_encoding=latin1;

Either that or convert the data to UTF-8 before passing it to the database, which is @Tometzky's answer.

Sign up to request clarification or add additional context in comments.

2 Comments

thanks for your answers, I added the encoding='latin1' on create_engine and when I pass the data to the costructor of my mapper , now I get this error return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xec in position 4: invalid continuation byte
@arpho client_encoding affects the data returned from the DB, as well as how the DB interprets data you send to it. So you must decode the data from the DB as latin1 or iso-8859-1 not as utf-8.
1

Make sure, that your data, which can contain international characters, are Unicode strings. A string 'Forl\xec-Cesena' which you try to insert, is in Latin1 (ISO-8859-1) encoding. So use

unicode('Forl\xec-Cesena','Latin1')

to convert it to unicode string.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.