Check if String is Valid MySQL UTF8?

Question

I have a MySQL column encoded as utf8. That utf8 is not actually the full utf8 set, but only BMP characters only up to 3 bytes in length. I don't want to try to insert utf8 into MySQL only to find it does not meet MySQL's parameters for what utf8 should be. Is there a way to test in Python if something meets MySQL's parameters before attempting to insert? For obvious reasons, catching exceptions on some_string.encode('utf-8') is not strict enough.

Ignacio Vazquez-Abrams · Accepted Answer · 2015-09-09 00:56:00Z

4

>>> len(u'\uffff'.encode('utf8')) < 4 # Good; fits in utf8
True
>>> len(u'\U00010000'.encode('utf8')) < 4 # Bad; utf8mb4 only
False
>>> ord(u'\uffff') < 65536 # Good; fits in utf8
True
>>> ord(u'\U00010000') < 65536 # Bad; utf8mb4 only
False

answered Sep 9, 2015 at 0:56

Ignacio Vazquez-Abrams

804k160 gold badges1.4k silver badges1.4k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user149341 · Accepted Answer · 2015-09-09 01:20:16Z

2

To check whether a string contains a Unicode character above U+FFFF (and which thus can't be stored in a MySQL table using the "utf8" encoding), you can use the following regular expression:

re.match(u"[^\u0000-\uffff]", s)

Alternatively, if you can upgrade to MySQL 5.5 or later, you may want to consider converting your table to the utf8mb4 character set, which can store all Unicode characters.

answered Sep 9, 2015 at 1:20

user149341

1 Comment

Yura Bysaha Over a year ago

You should use re.search instead of re.match, because when you try re.match("[^\u0000-\uffff]", "yura🍁") it return None! but re.search("[^\u0000-\uffff]", "yura🍁") find this 🍁 If you want to locate a match anywhere in string, use search() instead.

Collectives™ on Stack Overflow

Check if String is Valid MySQL UTF8?

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related