2

I'm using Visual Studio and C++ on Windows to work with small caps text like ʜᴇʟʟᴏ ꜱᴛᴀᴄᴋᴏᴠᴇʀꜰʟᴏᴡ using e.g. this website. Whenever I read this text from a file or put this text directly into my source code using std::string, the text visualizer in Visual Studio shows it in the wrong encoding, presumably the visualizer uses Windows (ANSI). How can I force Visual Studio to let me work with UTF-8 strings properly?

std::string message_or_file_path = "...";
auto message = message_or_file_path;

// If the file path is valid, read from that file
if (GetFileAttributes(message_or_file_path.c_str()) != INVALID_FILE_ATTRIBUTES
    && GetLastError() != ERROR_FILE_NOT_FOUND)
{
    std::ifstream file_stream(message_or_file_path);
    std::string text_file_contents((std::istreambuf_iterator<char>(file_stream)),
        std::istreambuf_iterator<char>());
    message = text_file_contents; // Displayed in wrong encoding
    message = "ʜᴇʟʟᴏ ꜱᴛᴀᴄᴋᴏᴠᴇʀꜰʟᴏᴡ"; // Displayed in wrong encoding
   std::wstring wide_message = L"ʜᴇʟʟᴏ ꜱᴛᴀᴄᴋᴏᴠᴇʀꜰʟᴏᴡ"; // Displayed in correct encoding
}

I tried the additional command line option /utf-8 for compiling and setting the locale:

std::locale::global(std::locale(""));
std::cout.imbue(std::locale());

Neither of those fixed the encoding issue.

3
  • What is the encoding of the .cpp file? Commented Jan 31, 2020 at 20:12
  • Possible duplicate of How to set standard encoding in Visual Studio Commented Jan 31, 2020 at 20:16
  • You should open the std::ifstream in binary mode to avoid any data conversions while reading the chars. That will at least ensure the std::string has the correct bytes. That doesn't mean the IDE will display it correctly, though. Otherwise, use std::wstring instead, as you already discovered. You can read it with a std::wifstream that has a UTF-8 locale imbue()'ed into it. Or read the raw bytes first and then use MultiByteToWideChar() or std::wstring_convert to convert the bytes to std:::wstring Commented Jan 31, 2020 at 20:16

2 Answers 2

5

From What’s Wrong with My UTF-8 Strings in Visual Studio?, there are a couple of ways to see the contents of a std::string with UTF-8 encoding.

Let's say you have a variable with the following initialization:

std::string s2 = "\x7a\xc3\x9f\xe6\xb0\xb4\xf0\x9f\x8d\x8c";

Use a Watch window.

  • Add the variable to Watch.
  • In the Watch window, add ,s8 to the variable name to display its contents as UTF-8.

Here's what I see in Visual Studio 2015.

image

Use the Command Window.

  • In the Command Window, use ? &s2[0],s8 to display the text as UTF-8.

Here's what I see in Visual Studio 2015.

image

Sign up to request clarification or add additional context in comments.

3 Comments

This may work for the text visualizer but it will not correct the code's encoding so it's only a semi solution. Still, you deserve your upvote.
@BullyWiiPlaza, what do you mean by the "the code's encoding"?
@R Sahu: I mean during processing of the code the string will not work correctly then. I e.g. copy the unicode text std::string object to the clipboard and when I paste it, it's screwed up. With a std::wstring version it works fine.
0

A working solution was simply rewriting all std::strings as std::wstrings and adjusting the code logic properly to work with std::wstrings, as indicated in the question as well. Now everything works as expected.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.