MySQL/MariaDB unable to handle unique keys with when using utf8mb4

Question

We have MySQL table with utf8mb4 strings:

CREATE TABLE `test` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
`code` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `test_code_unique` (`code`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

When inserting special characters there appears to be wrong conversion:

mysql> insert into `test` (`code`, `name`) values ('munster', 'Munster');


mysql> insert into `test` (`code`, `name`) values ('münster', 'Münster');
ERROR 1062 (23000): Duplicate entry 'münster' for key 'test_code_unique'


mysql> SELECT * FROM test WHERE code='münster';
+----+---------+---------+
| id | name    | code    |
+----+---------+---------+
|  1 | Munster | munster |
+----+---------+---------+
1 row in set (0.00 sec)



mysql> SELECT * FROM test WHERE code='munster';
+----+---------+---------+
| id | name    | code    |
+----+---------+---------+
|  1 | Munster | munster |
+----+---------+---------+
1 row in set (0.00 sec)

If unique key is removed second insert works but a search returns 2 rows even if query is different:

mysql> drop table test;


CREATE TABLE `test` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
`code` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;


mysql> insert into `test` (`code`, `name`) values ('munster', 'Munster');


mysql> insert into `test` (`code`, `name`) values ('münster', 'Münster');


mysql> SELECT * FROM test WHERE code='münster';
+----+----------+----------+
| id | name     | code     |
+----+----------+----------+
|  1 | Munster  | munster  |
|  2 | Münster  | münster  |
+----+----------+----------+
2 rows in set (0.00 sec)


mysql> SELECT * FROM test WHERE code='munster';
+----+----------+----------+
| id | name     | code     |
+----+----------+----------+
|  1 | Munster  | munster  |
|  2 | Münster  | münster  |
+----+----------+----------+
2 rows in set (0.00 sec)

This has been tested on both MySQL 5.7 and MariaDB 10.2 and they both give same results.

What could be going wrong?

jaswrks · Accepted Answer · 2017-11-05 11:10:13Z

5

The reason for this seemingly mysterious problem is that you're using utf8mb4_unicode_ci collation, and that collation intentionally ignores differences in accented characters vs non-accented characters. See: https://dev.mysql.com/doc/refman/5.7/en/charset-general.html

To resolve this, change collation on code column to utf8mb4_bin, which will distinguish between accented characters and non-accented characters, and also between caSe.

answered Nov 5, 2017 at 11:10

jaswrks

1,28510 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

MySQL/MariaDB unable to handle unique keys with when using utf8mb4

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related