5

We have MySQL table with utf8mb4 strings:

CREATE TABLE `test` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
`code` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `test_code_unique` (`code`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

When inserting special characters there appears to be wrong conversion:

mysql> insert into `test` (`code`, `name`) values ('munster', 'Munster');


mysql> insert into `test` (`code`, `name`) values ('münster', 'Münster');
ERROR 1062 (23000): Duplicate entry 'münster' for key 'test_code_unique'


mysql> SELECT * FROM test WHERE code='münster';
+----+---------+---------+
| id | name    | code    |
+----+---------+---------+
|  1 | Munster | munster |
+----+---------+---------+
1 row in set (0.00 sec)



mysql> SELECT * FROM test WHERE code='munster';
+----+---------+---------+
| id | name    | code    |
+----+---------+---------+
|  1 | Munster | munster |
+----+---------+---------+
1 row in set (0.00 sec)

If unique key is removed second insert works but a search returns 2 rows even if query is different:

mysql> drop table test;


CREATE TABLE `test` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
`code` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;


mysql> insert into `test` (`code`, `name`) values ('munster', 'Munster');


mysql> insert into `test` (`code`, `name`) values ('münster', 'Münster');


mysql> SELECT * FROM test WHERE code='münster';
+----+----------+----------+
| id | name     | code     |
+----+----------+----------+
|  1 | Munster  | munster  |
|  2 | Münster  | münster  |
+----+----------+----------+
2 rows in set (0.00 sec)


mysql> SELECT * FROM test WHERE code='munster';
+----+----------+----------+
| id | name     | code     |
+----+----------+----------+
|  1 | Munster  | munster  |
|  2 | Münster  | münster  |
+----+----------+----------+
2 rows in set (0.00 sec)

This has been tested on both MySQL 5.7 and MariaDB 10.2 and they both give same results.

What could be going wrong?

1 Answer 1

5

The reason for this seemingly mysterious problem is that you're using utf8mb4_unicode_ci collation, and that collation intentionally ignores differences in accented characters vs non-accented characters. See: https://dev.mysql.com/doc/refman/5.7/en/charset-general.html

To resolve this, change collation on code column to utf8mb4_bin, which will distinguish between accented characters and non-accented characters, and also between caSe.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.