2

I'm using Visual Studio 2008 (C++). How do I create a CString (in a non-Unicode app) from a byte array that has a string encoded in UTF8 in it?

Thanks,

kreb

EDIT: Clarification: I guess what I'm asking is.. CStringA doesn't seem to be able to interpret a UTF8 string as UTF8, but rather as ASCII or the current codepage (I think).. How do I convert this UTF8 string to a CStringW? (UTF-16..?) Thanks

4 Answers 4

3

CStringW filename= CA2W(null_terminated_byte_buffer, CP_UTF8) should do the trick.

Sign up to request clarification or add additional context in comments.

8 Comments

Does this work in non-unicode apps? Doesn't seem to work.. =/ I think I'd need to use a unicode version of CFile as well.. How do I get one from a non-Unicode app?
Sorry, I did this and the CString in the debugger still shows it as if it was interpreted with the local code page, that is, no change. Anyway, I tried to open a file (CFile) with this CStringW as filename but it's still that string interpreted in the local code page.. =/
I think it's failing like this because I am opening the file with CW2A(filename).. and thus converting it back into UTF8.. Is there a way to just use the unicode versions of these functions without having to port the whole app?
Quick question.. If I have a statement like "CStringW filename = L"中文";" I can hover over the filename variable and it displays the text correctly... However if I do "CStringW filename = CA2W((LPCTSTR)buffer, CP_UTF8);" and I hover over the filename and buffer variables, they show the incorrectly interpreted text.. What is going on? It's like CA2W didn't do anything at all.. Could this mean my buffer isn't in UTF8?
It's certainly possible. What's the byte array (in hex preferably)? You should also probably be casting to an LPCSTR since CA2W stands for ANSI to Unicode.
|
0

The nice thing about UTF8 is that every UTF8 string is also a valid NUL-terminated C string. That means that you should be able to simply cast a pointer to the first character of the byte array as a (const char *) and pass it to CString like you would any NUL-terminated C string.

Note that unless CString is aware of UTF8 semantics (I'm not familiar enough with CString to know exactly how it works, but I suspect isn't), certain operations that make sense on an ASCII C string may give strange results for a UTF8 C string. For example, a Reverse() method that reversed the order of the bytes in the string would not do the right thing for a UTF8 string, because it would not know to keep multi-byte characters together in the original order, and would reverse the bytes of the multi-byte character.

Comments

0

For most things, you can treat UTF8 the same as ASCII.

unsigned char szUtf8String[nSize] = "whatever";
CString s = static_cast<char *>(szUtf8String);

That works for manipulating and writing to a file. However you cannot easily display the string, it will treat it as ASCII and misinterpret any non-english characters.

To display it, you will need to convert to UTF16 and possibly then back to ANSI (in the local code page).

1 Comment

On Windows, you can use MultiByteToWideChar() and WideCharToMultiByte(). On any platform you can use mbstowcs() and wcstombs() and other related functions. The former give more control but the latter are standard C++ and available on any platform.
0

Following the "MSN" answer above, I think that you will ultimately want a CString, not a CStringW out of it. So add a conversion back to CString:

CStringW filenameW = CA2W(null_terminated_byte_buffer, CP_UTF8); CString filename = CW2T( filenameW );

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.