3

I have a really weird issue with Sql queries on unicode data. Here's what I've got:

  • Sql Server Express 2008 R2 AS
  • Table containing chinese characters/words/phrases (100,000 rows)

When I run the following, I get the correct row + 36 other rows returned... when it should only be the one row:

SELECT TOP 1000 [ID]
      ,[MyChineseColumn]
      ,UNICODE([MyChineseColumn])
  FROM [dbo].[MyTableName]
  WHERE [MyChineseColumn]= N'㐅'

As you'd expect, the row with is returned, but also the following: , , and a bunch of others...

Anyone have any ideas what is going on here? This has really got me confused and I am not sure how to solve this one (tried "Googling" already)...

Thanks

6
  • I should also mention that most of the other rows are all querying perfectly fine...it's only a handful of "dodgy" ones like the above that I'd really like to figure out the reason for. Maybe it's a certain range of Unicode characters that are doing this? I haven't got a clue... Commented Feb 3, 2011 at 14:40
  • Since I don't have a font that can display 㐅 or 㐅, they look identical to me. Just as a info: the first (and second) 㐅 is U+3405 CJK UNIFIED IDEOGRAPH-3405, while the second one (the last character in the list of wrong results) is U+3BB8 CJK UNIFIED IDEOGRAPH-3BB8. Commented Feb 3, 2011 at 14:44
  • What is your column collation? Commented Feb 3, 2011 at 14:46
  • Thank you Martin... I did some research and found the collation I needed to use and it's working now. I'd like to mark this as the answer, but you've only commented. If you add it as an answer, I will mark it "ANSWERED" for you. :-) Thanks! Commented Feb 3, 2011 at 15:02
  • @Matt - Done! Just out of curiosity what collation were you using that treated those 4 characters all the same? Even under SQL_Latin1_General_CP1_CI_AI I got 2 rows back for declare @t TABLE (c nchar(1) collate SQL_Latin1_General_CP1_CI_AI) INSERT INTO @t values (N'㐅'),(N'〇'),(N'宁'),(N'㮸') SELECT DISTINCT c FROM @t Commented Feb 3, 2011 at 15:22

2 Answers 2

1

Please check the column is using an appropriate Chinese collation as that will determine the semantics used in this type of comparison.

Sign up to request clarification or add additional context in comments.

Comments

0

You may want to try and use a binary collation, these characters seem to be somehow matched as identical (possibly by ignoring case and/or accents, depending on the used collation).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.