UnicodeDecodeError on sqlalchemy connection.execute() for select queries

Question

I'm using sqlalchemy core to execute string based queries. I have set charset to utf8mb4 on the connection string like this:

"mysql+mysqldb://{user}:{password}@{host}:{port}/{db}?charset=utf8mb4"

For some simple select queries (e.g, select name from users where id=XXX limit 1), when the resultset has some unicode characters (e.g, ', ì, etc), it errors out with the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9a in position 11: invalid start byte

But the error itself is not reproducible. When I run the same query from a python shell, it works without errors. But it errors out on a web request or background job.

I'm using Python 3.8 and sqlalchemy 1.3.24.

I have also tried explicitly specifying charset: utf8mb4 as a connect_args property with create_engine().

The underlying database is mysql 5.7 and all the unicode columns have utf8mb4 explicitly set as the characters set in the schema. Update: The database is actually AWS RDS Aurora MySQL.

Appreciate any insights on the error or how to reproduce it reliably.

yonran · Accepted Answer · 2021-10-09 17:19:28Z

2

The MySQL documentation Connect-Time Error Handling describes a bug in the MySQL 8.0 client library when you use the MySQL 8.0 client library to connect to a MySQL 5.7 server with the utf8mb4 charset. The MySQL 8.0 client asks for the utf8mb4_0900_ai_ci collation, but the MySQL 5.7 server does not recognize that collation, so the server silently falls back to the latin1 charset with latin1_swedish_ci collation. Subsequently the server sends latin1 result sets, but the client thinks that it is receiving utf8mb4, which eventually results in a UnicodeDecodeError. As a workaround you have to explicitly SET NAMES utf8mb4. I created an issue mysqlclient#504 to ask that the python client do that every time.

To confirm that the charset is incorrect after connecting, double check the server’s value of character_set_client (the charset that statements are interpreted in), character_set_connection (the charset that statements are converted to), and character_set_results (the charset that result sets are sent as). If they are latin1 despite you trying to connect using utf8mb4, then this bug may have been triggered.

with con.cursor() as c:
  c.execute("show variables like 'character_set_%'")
  for row in c:
    print(row)
(b'character_set_client', b'latin1')
(b'character_set_connection', b'latin1')
(b'character_set_database', b'latin1')
(b'character_set_filesystem', b'binary')
(b'character_set_results', b'latin1')
(b'character_set_server', b'latin1')
(b'character_set_system', b'utf8')
(b'character_sets_dir', b'/usr/share/mysql/charsets/')

I believe that a workaround of the issue would be to do the following after connecting:

# explicitly set connection charset to the same as MySQLdb.connect()
con.query("SET NAMES utf8mb4")
con.store_result()

edited Oct 9, 2021 at 17:19

answered Oct 7, 2021 at 18:30

yonran

19.3k8 gold badges84 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Modasser Billah Over a year ago

For my case, only the character_set_database and character_set_server are set to latin1, the rest are correctly set as utf8mb4. Would you still say that this is the root cause?

yonran Over a year ago

@ModasserBillah, character_set_server configures the default character set used for CREATE TABLE among other things, so it’s highly recommended to set it to utf8mb4 to prevent other errors. But I don’t think that it would cause a client-side UnicodeDecodeError.

safwan · Accepted Answer · 2021-05-25 13:39:24Z

1

Can you try with use_unicode=true parameter in the url?

answered May 25, 2021 at 13:39

safwan

4033 silver badges6 bronze badges

Collectives™ on Stack Overflow

UnicodeDecodeError on sqlalchemy connection.execute() for select queries

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related