2

In my current project I've been using wide chars (utf16). But since my only input from the user is going to be a url, which has to end up ascii anyways, and one other string, I'm thinking about just switching the whole program to ascii.

My question is, is there any benefit to converting the strings to utf16 before I pass them to a Windows API function?

After doing some research online, it seems like a lot of people recommend this if your not working with UTF-16 on windows.

2 Answers 2

5

In the Windows API, if you call a function like

int SomeFunctionA(const char*);

then it will automatically convert the string to UTF-16 and call the real, Unicode version of the function:

int SomeFunctionW(const wchar_t*);

The catch is, it converts the string to UTF-16 from the ANSI code page. That works OK if you actually have strings encoded in the ANSI code page. It doesn't work if you have strings encoded in UTF-8, which is increasingly common these days (e.g., nearly 70% of Web pages), and isn't supported as an ANSI code page.

Also, if you use the A API, you'll run into limitations like not (easily) being able to open files that have non-ANSI characters in their names (which can be arbitrary UTF-16 strings). And won't have access to some of Windows' newer features.

Which is why I always call the W functions. Even though this means annoying explicit conversions (from the UTF-8 strings used in the non-Windows-specific parts of our software).

Sign up to request clarification or add additional context in comments.

Comments

4

The main point is that on Windows UTF-16 is the native encoding and all API functions that end in A are just wrappers around the W ones. The A functions are just carried around as compatibility to programs that were written for Windows 9x/ME and indeed, no new program should ever use them (in my opinion).

Unless you're doing heavy processing of billions of large strings I doubt there is any benefit to thinking about storing them in another (possibly more space-saving) encoding at all. Besides, even an URI can contain Unicode, if you think about IDN. So don't be too sure upfront about what data your users will pass to the program.

6 Comments

A are just wrappers around the W ones As far as I know, the API functions are just macros which expand to either the A or W version (so they aren't wrappers).
@Jesse: But the A functions themselves are mostly just wrappers around the W functions.
@dan04: I see now, so internally the A functions just call the W functions is what the answer meant. Thanks.
Jesse the API functions without suffix are just macros that expand into either ApiCallA or ApiCallW, depending on whether you're in ANSI or Unicode mode (defined symbols UNICODE and _UNICODE, iirc). But indeed, ApiCallA just delegates to ApiCallW after converting its arguments to Unicode.
Remy, well, I made a point that I think no program nowadays should ever explicitly shut itself off from Unicode. Almost regardless of its purpose
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.