3

I have a C# COM server which is consumed by a cpp client.

One of the C# methods returns a string.

In cpp the returned string is represented in Unicode (UTF-16), at least according to the memory view.

  1. Is this always the case with COM strings?
  2. Is there a way to use UTF-8 instead?
  3. I saw some code where strings were passed between cpp and c# as byte arrays. Is there any benefit in this?
1
  • This thread has turned into an unattractive downvoting fest with conflicting answers. I recommend you look up the definitions for BSTR and SysAllocString in the MSDN Library and draw your own conclusions. Commented Apr 26, 2010 at 16:04

2 Answers 2

1
  1. Yes. The standard COM string type is BSTR. It is a Unicode string encoded in UTF16, just like Windows' native string type.
  2. No, a COM method isn't going to understand a UTF8 string, it will turn it into Chinese. UTF8 is a good encoding for a text file, not for programs manipulating strings in memory. UTF8 requires anywhere between 1 and 4 bytes to encode a Unicode codepoint. Very incompatible with basic string manipulations like getting the size or indexing a character.
  3. C and C++ programs tend to use 8-bit encodings, compatible with the "char" type. That's an old practice, dating back from an era before Unicode was around. There's nothing attractive about it, there are many 8-bit encodings. The typical problem is that data entered as text can only be interpreted correctly if it is read by a program that uses the same 8-bit encoding. In other words, when the computers are less than 1000 miles apart. Less in Europe.
Sign up to request clarification or add additional context in comments.

2 Comments

Sounds to me like you've got it backward. He's calling into a C# COM component from C++.
@sblom: yes, your answer mystified me. COM looks the same way on both ends. Automation has always been Unicode enabled.
0
  1. No.
  2. Yes. Put the attribute [return: MarshalAs(UnmanagedType.LPStr)] before the method definition in C# if you'd like to return the string as an ANSI string instead of Unicode.
  3. Yeah--the author may have done that to maintain very fine-grained control on the encoding of the contents of the string by side-stepping the default marshalling behavior.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.