3

Assuming this MySQL table schema:

CREATE TABLE `user` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `uuid` binary(16) NOT NULL,
  `email` varchar(255) NOT NULL,
  `name` varchar(255) DEFAULT NULL,
  `photo` binary(16) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `uuid` (`uuid`),
  UNIQUE KEY `email` (`email`)
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=utf8mb4;

When I use the execute() API from SQLAlchemy connection class as such:

with self.engine.begin() as connection:
  user_uuid = uuid.UUID("...")
  result = connection.execute("SELECT email, name, photo FROM user WHERE uuid=%s", user_uuid.bytes)

If the UUID is F393A167-A919-4B50-BBB7-4AD356E89E6B, then SQLAlchemy prints this warning:

/site-packages/sqlalchemy/engine/default.py:450: Warning: Invalid utf8mb4 character string: 'F393A1'

The uuid column is a BINARY column, so why is SQLAlchemy considering this parameter a text one instead of a binary one and how to prevent this?

2 Answers 2

6

The explanation and solution is actually in this bug report in MySQL:

replace:

cursor.execute(""" INSERT INTO user (uuid) VALUES (%s) """, my_uuid)

with

cursor.execute(""" INSERT INTO user (uuid) VALUES (_binary %s) """, my_uuid)

Mind the underscore. It's "_binary", not "binary". This "_binary" tells MySQL that the following string is to be interpreted as binary, not to be interpreted/validated as utf8.

Sign up to request clarification or add additional context in comments.

Comments

1

The problem doesn't happen on Python 3, so I think that the problem is that the database driver is unable to distinguish what should be bytes given the Python 2 str type.

Regardless, it seems using SQLAlchemy core directly works correctly, presumably because it knows the column type directly.

from sqlalchemy import MetaData, Table, select

meta = MetaData()
user = Table('user', meta, autoload_with=engine)
result = select([user]).where(user.c.uuid == user_uuid.bytes)

If you wish to continue executing a string, you can cast to bytesarray like SQLAlchemy appears to be doing:

with self.engine.begin() as connection:
    user_uuid = uuid.UUID("...")
    result = connection.execute(
        "SELECT email, name, photo FROM user WHERE uuid=%s",
        bytearray(user_uuid.bytes))

Or to tell SQLAlchemy what type the bound parameter is to get this automatically:

from sqlalchemy import text, bindparam, BINARY

with self.engine.begin() as connection:
    user_uuid = uuid.UUID("...")
    stmt = text("SELECT email, name, photo FROM user WHERE uuid = :uuid")
    stmt = stmt.bindparams(bindparam('uuid', user_uuid.bytes, type_=BINARY))
    result = connection.execute(stmt)

6 Comments

You're most likely right about the Python 2 str vs unicode problem. If you log the SQL commands issued by SQLAlchemy Core, what does it use?
It uses bytearray on Python 2, and bytes on Python 3.
I mean what's the actual SQL generated by SQLAlchemy Core and passed to the MySQL driver (it's visible by setting echo=True on the engine)?
@Pol I know, that's what I used. It's SELECT tbl.uuid FROM tbl WHERE tbl.uuid = %(uuid_1)s and the only difference is the next log line with the passed value (bytes vs bytearray)
So it's not putting _binary in the SQL. I'm not sure why it works then. Maybe the MySQL driver sees the bytearray type and adds it automatically...
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.