Converting ASCII strings to UTF-16 before passing them to Windows API functions

Question

In my current project I've been using wide chars (utf16). But since my only input from the user is going to be a url, which has to end up ascii anyways, and one other string, I'm thinking about just switching the whole program to ascii.

My question is, is there any benefit to converting the strings to utf16 before I pass them to a Windows API function?

After doing some research online, it seems like a lot of people recommend this if your not working with UTF-16 on windows.

dan04 · Accepted Answer · 2012-04-18 05:25:06Z

In the Windows API, if you call a function like

int SomeFunctionA(const char*);

then it will automatically convert the string to UTF-16 and call the real, Unicode version of the function:

int SomeFunctionW(const wchar_t*);

The catch is, it converts the string to UTF-16 from the ANSI code page. That works OK if you actually have strings encoded in the ANSI code page. It doesn't work if you have strings encoded in UTF-8, which is increasingly common these days (e.g., nearly 70% of Web pages), and isn't supported as an ANSI code page.

Also, if you use the A API, you'll run into limitations like not (easily) being able to open files that have non-ANSI characters in their names (which can be arbitrary UTF-16 strings). And won't have access to some of Windows' newer features.

Which is why I always call the W functions. Even though this means annoying explicit conversions (from the UTF-8 strings used in the non-Windows-specific parts of our software).

Joey · Accepted Answer · 2012-04-18 05:06:56Z

4

The main point is that on Windows UTF-16 is the native encoding and all API functions that end in A are just wrappers around the W ones. The A functions are just carried around as compatibility to programs that were written for Windows 9x/ME and indeed, no new program should ever use them (in my opinion).

Unless you're doing heavy processing of billions of large strings I doubt there is any benefit to thinking about storing them in another (possibly more space-saving) encoding at all. Besides, even an URI can contain Unicode, if you think about IDN. So don't be too sure upfront about what data your users will pass to the program.

answered Apr 18, 2012 at 5:06

Joey

357k88 gold badges704 silver badges700 bronze badges

6 Comments

Jesse Good Over a year ago

A are just wrappers around the W ones As far as I know, the API functions are just macros which expand to either the A or W version (so they aren't wrappers).

dan04 Over a year ago

@Jesse: But the A functions themselves are mostly just wrappers around the W functions.

Jesse Good Over a year ago

@dan04: I see now, so internally the A functions just call the W functions is what the answer meant. Thanks.

Joey Over a year ago

Jesse the API functions without suffix are just macros that expand into either ApiCallA or ApiCallW, depending on whether you're in ANSI or Unicode mode (defined symbols UNICODE and _UNICODE, iirc). But indeed, ApiCallA just delegates to ApiCallW after converting its arguments to Unicode.

Joey Over a year ago

Remy, well, I made a point that I think no program nowadays should ever explicitly shut itself off from Unicode. Almost regardless of its purpose

|

Collectives™ on Stack Overflow

Converting ASCII strings to UTF-16 before passing them to Windows API functions

2 Answers 2

Comments

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related