I am trying to use a regular expression to do an accent insensitive search on an accent sensitive column.
I cannot change the column to an accent insensitive collation because the column is unique and we need to allow both 'Jose' and 'José' to be entered.
I cannot coerce the collation within my query because the table holds a ton of data and coercing it will cause it to do an index scan instead of a seek and the query times out
I cannot add a new non-unique column that uses an accent insensitive collation. This would avoid the 2 issues above because the original column could be accent sensitive and unique and I would be able to do the search on the new column and use the index on it, however I'm being told I can't do this because it would cause data duplication.
So, I'm trying to do this using a regular expression search on the accent sensitive column. In my application I take in whatever string the user entered and I modify the string in such a way that for this input
"Jose"
I get the output
"J[òóôõöø][sš][eèéêë]%"
Using this string to search works. It finds "Jose" and "José" and it searches with the correct index and doesn't time out.
The only issue I have now is with the darn æ character.
If the database contains the values "aéro" & "æro" and the user enters "aero", currently my application will generate the search strings "[aàáâãäå][eèéêë]r[oòóôõöø]%" But this will only be a match for "aéro". It will not match "æro". By default in sql server it already treats "ae" the same as "æ" whether the collation is accent sensitive or insensitive, so not modifying the user's input would return "æro" as a match, but would not return "aéro" as a match.
Does anyone have any ideas how I can do a regular expression search in SQL that will match both "aé" and "æ"?
SELECT *
FROM mytable
WHERE name LIKE '[aàáâãäå][eèéêë]%'