7

I'm using SQLAlchemy and MySQL, with a files table to store files. That table is defined as follows:

mysql> show full columns in files;
+---------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
| Field   | Type         | Collation       | Null | Key | Default | Extra | Privileges                      | Comment |
+---------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
| id      | varchar(32)  | utf8_general_ci | NO   | PRI | NULL    |       | select,insert,update,references |         |
| created | datetime     | NULL            | YES  |     | NULL    |       | select,insert,update,references |         |
| updated | datetime     | NULL            | YES  |     | NULL    |       | select,insert,update,references |         |
| content | mediumblob   | NULL            | YES  |     | NULL    |       | select,insert,update,references |         |
| name    | varchar(500) | utf8_general_ci | YES  |     | NULL    |       | select,insert,update,references |         |
+---------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+

The content column of type MEDIUMBLOB is where the files are stored. In SQLAlchemy that column is declared as:

__maxsize__ = 12582912 # 12MiB                                                                                                                              
content = Column(LargeBinary(length=__maxsize__))                           

I am not quite sure about the difference between SQLAlchemy's BINARY type and LargeBinary type. Or the difference between MySQL's VARBINARY type and BLOB type. And I am not quite sure if that matters here.

Question: Whenever I store an actual binary file in that table, i.e. a Python bytes or b'' object , then I get the following warning

.../python3.4/site-packages/sqlalchemy/engine/default.py:451: Warning: Invalid utf8 character string: 'BCB121'
  cursor.execute(statement, parameters)

I don't want to just ignore the warning, but it seems that the files are in tact. How do I handle this warning gracefully, how can I fix its cause?

Side note: This question seems to be related, and it seems to be a MySQL bug that it tries to convert all incoming data to UTF-8 (this answer).

6
  • You seem to be running Python. The two "Side notes" refer to PHP and Perl issues. Something in Python is failing to honor your request to use "blob" data. Commented Jan 6, 2016 at 23:34
  • @RickJames: Yes, all Python. The "this answer" link in the Side note, however, seems to indicate this to be a MySQL issue. If this is a Python issue though, I'd still like to understand what I'm missing here... Commented Jan 7, 2016 at 7:55
  • If the client (PHP, Python, etc) treats the string as "characters", then this problem can arise. If it treats it as arbitrary "bytes", then the problem does not occur. Storing into a MySQL BLOB does no utf8 checking; storing into a TEXT does. Commented Jan 7, 2016 at 16:59
  • So, I claim, some detail in the client is missing. (Or Alchemy fails to handle BLOBs.) Commented Jan 7, 2016 at 17:00
  • @RickJames: it's a b' ' objects, i.e. binary bytes... Commented Jan 7, 2016 at 18:13

2 Answers 2

1

Turns out that this was a driver issue. Apparently the default MySQL driver stumbles with Py3 and utf8 support. Installing cymysql into the virtual Python environment resolved this problem and the warnings disappear.

The fix: Find out if MySQL connects through socket or port (see here), and then modify the connection string accordingly. In my case using a socket connection:

mysql+cymysql://user:pwd@localhost/database?unix_socket=/var/run/mysqld/mysqld.sock

Use the port argument otherwise.

Edit: While the above fixed the encoding issue, it gave rise to another one: blob size. Due to a bug in CyMySQL blobs larger than 8M fail to commit. Switching to PyMySQL fixed that problem, although it seems to have a similar issue with large blobs.

Sign up to request clarification or add additional context in comments.

2 Comments

Actually, it is apparently the MySQL server that is generating this warning, and the way around it is to use _binary to say that the thing you're inserting should not be interpreted. See bugs.mysql.com/bug.php?id=79317
@Wodin: which means to add binary_prefix=true to the connect string, see also this discussion.
0

Not sure, but your problem might have the same roots as the one I had several years ago in python 2.7: https://stackoverflow.com/a/9535736/68998. In short, Mysql's interface does not let you be certain if you are working with a true binary string or a text in a binary collation (used because of a lack of case-sensitive utf8 collation). Therefore, a Mysql binding has the following options:

  • return all string fields as binary strings, and leave the decoding to you
  • decode only the fields that do not have a binary flag (so much fun when some of the fields are unicode and other are str)
  • have an option to force decoding to unicode for all string fields, even true binary

My guess is that in your case, the third option is somewhere enabled in the underlying Mysql binding. And the first suspect is your connection string (connection params).

1 Comment

My current connection string is: mysql://user@localhost/database?charset=utf8. According to the documentation the charset option “Specifies the character set that should be used to encode all queries sent to the server.” Hmm—encode all queries?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.