Byte array to UTF8 CString

Question

I'm using Visual Studio 2008 (C++). How do I create a CString (in a non-Unicode app) from a byte array that has a string encoded in UTF8 in it?

Thanks,

kreb

EDIT: Clarification: I guess what I'm asking is.. CStringA doesn't seem to be able to interpret a UTF8 string as UTF8, but rather as ASCII or the current codepage (I think).. How do I convert this UTF8 string to a CStringW? (UTF-16..?) Thanks

MSN · Accepted Answer · 2010-02-19 06:04:41Z

3

CStringW filename= CA2W(null_terminated_byte_buffer, CP_UTF8) should do the trick.

answered Feb 19, 2010 at 6:04

MSN

54.8k7 gold badges79 silver badges108 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

krebstar Over a year ago

Does this work in non-unicode apps? Doesn't seem to work.. =/ I think I'd need to use a unicode version of CFile as well.. How do I get one from a non-Unicode app?

krebstar Over a year ago

Sorry, I did this and the CString in the debugger still shows it as if it was interpreted with the local code page, that is, no change. Anyway, I tried to open a file (CFile) with this CStringW as filename but it's still that string interpreted in the local code page.. =/

krebstar Over a year ago

I think it's failing like this because I am opening the file with CW2A(filename).. and thus converting it back into UTF8.. Is there a way to just use the unicode versions of these functions without having to port the whole app?

krebstar Over a year ago

Quick question.. If I have a statement like "CStringW filename = L"中文";" I can hover over the filename variable and it displays the text correctly... However if I do "CStringW filename = CA2W((LPCTSTR)buffer, CP_UTF8);" and I hover over the filename and buffer variables, they show the incorrectly interpreted text.. What is going on? It's like CA2W didn't do anything at all.. Could this mean my buffer isn't in UTF8?

MSN Over a year ago

It's certainly possible. What's the byte array (in hex preferably)? You should also probably be casting to an LPCSTR since CA2W stands for ANSI to Unicode.

|

Jeremy Friesner · Accepted Answer · 2010-02-19 05:55:04Z

The nice thing about UTF8 is that every UTF8 string is also a valid NUL-terminated C string. That means that you should be able to simply cast a pointer to the first character of the byte array as a (const char *) and pass it to CString like you would any NUL-terminated C string.

Note that unless CString is aware of UTF8 semantics (I'm not familiar enough with CString to know exactly how it works, but I suspect isn't), certain operations that make sense on an ASCII C string may give strange results for a UTF8 C string. For example, a Reverse() method that reversed the order of the bytes in the string would not do the right thing for a UTF8 string, because it would not know to keep multi-byte characters together in the original order, and would reverse the bytes of the multi-byte character.

Michael J · Accepted Answer · 2010-02-19 05:57:55Z

0

For most things, you can treat UTF8 the same as ASCII.

unsigned char szUtf8String[nSize] = "whatever";
CString s = static_cast<char *>(szUtf8String);

That works for manipulating and writing to a file. However you cannot easily display the string, it will treat it as ASCII and misinterpret any non-english characters.

To display it, you will need to convert to UTF16 and possibly then back to ANSI (in the local code page).

answered Feb 19, 2010 at 5:57

Michael J

8,0292 gold badges26 silver badges30 bronze badges

1 Comment

Michael J Over a year ago

On Windows, you can use MultiByteToWideChar() and WideCharToMultiByte(). On any platform you can use mbstowcs() and wcstombs() and other related functions. The former give more control but the latter are standard C++ and available on any platform.

kb1ooo · Accepted Answer · 2011-03-22 15:14:56Z

0

Following the "MSN" answer above, I think that you will ultimately want a CString, not a CStringW out of it. So add a conversion back to CString:

CStringW filenameW = CA2W(null_terminated_byte_buffer, CP_UTF8); CString filename = CW2T( filenameW );

answered Mar 22, 2011 at 15:14

kb1ooo

8,9721 gold badge16 silver badges9 bronze badges

Collectives™ on Stack Overflow

Byte array to UTF8 CString

4 Answers 4

8 Comments

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

8 Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related