1

I have a code:

CREATE TABLE IF NOT EXISTS Person
(
   name varchar(24) ...
)
CHARACTER SET utf8 COLLATE utf8_polish_ci;

This works OK in my application, but I read if someone put in name field a string that contains character wchich code is greater than 127, database will use 2 bytes (or more) to store this character. So i think, i will change character set to utf16:

CHARACTER SET utf16 COLLATE utf16_polish_ci;

But now when I run my application, exception apears: KeyNotFoundException. It apears exactly at these instructions:

MySqlCommand komenda = baza.Połączenie.CreateCommand ();
komenda.CommandText  = zapytanie;

MySqlDataReader dr = komenda.ExecuteReader (); // HERE, at execute reader method

if (dr.Read ()) ...

1) Anyone had similar problem? 2) Any idea how to use always 2 bytes/char in database field?

2 Answers 2

3

I'm not sure I understand why you're converting from UTF-8 to UTF-16. I'm assuming you're worried that any characters that require two bytes or more to store, won't fit in a UTF-8 encoding. This is not the case. In MySQL UTF-8 values can be stored with one, two, or three bytes. Unicode points U+0000 to U+007F take 1 byte and points U+0080 to U+07FF take 2 bytes--this range covers the Polish alphabet. Since the majority of characters in the Polish alphabet take 1 byte to store you should probably stick with UTF-8 and save some memory. However, if you want to always use 2 bytes, at the cost of wasted space, you could stick with UTF-16.

Here are some helpful links:

Unicode support in MySQL: http://dev.mysql.com/doc/refman/5.6/en/charset-unicode.html

Basic Unicode Overview : http://www.joelonsoftware.com/articles/Unicode.html

As for the exception, and this is purely a guess, it may have something to do with trying to read data that is UTF-8 encoded as if it were UTF-16 encoded. Did you change the character set after you already had UTF-8 encoded data in your table?

Sign up to request clarification or add additional context in comments.

3 Comments

Problem is if i declare varchar(24) it gives me memory for 24 bytes, not 24 characters. So if I need to store 24 character length string I need to provide backup storage for 2 or 3 bytes character codes. So simply if I need to store 24 characters I need to declare varchar(48) or even varchar(96). But it doesnt matter now, i just use backup bytes, and validation under asp level.
When you declare varchar(M), M indicates the maximum column length in characters, not bytes. MySQL will manage memory allocation for you. It will know, based on what encoding scheme your using, how many bytes to allocate for storage of your strings. So, if you want a maximum of 24 characters in your column just declare varchar(24) and MySQL will take care of allocating the appropriate space. Check out the link @Mosty Mostacho provided, it explains this in more detail.
Damn, I was wrong :O read some on stack, maybe it was about older version of mySQL, idk, whatever, now i know i can use simply varchar(16). thx.
1

Documenation says:

[...] utf8 characters can require up to three bytes per character [...]

Read this link for more information.

My advice would be not to focus on how many bytes the DBMS is using, as one of its purposess is to abstract you from that. Just focus on coding according to the selected data types.

1 Comment

Yes, if I need to store string 24 char length i need atleast varchar(24) or in worst case varchar(96).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.