Python C-API: How to pass an UNICODE UTF-16 null terminated C string to my python app without converting to UTF-8?

Question

Pythonistas,

I'm trying to write a Python extension in C that passes a big amount of null terminated, UNICODE UTF-16 encoded C strings to my Python application. The UNICODE strings from my C library are guarenteed to be always 16 bit. I'm NOT using the wchar_t in my C library on LINUX due to the fact that the size of wchar_t may vary.

I found a lot of functions (PyUnicode_AsUTF8String, PyString_FromStringAndSize, PyString_FromString, etc.) that do exactly what i want but all theses functions are designed for 8 bit character/string representation.

The Python documentation (http://docs.python.org/howto/unicode.html) says:

"Under the hood, Python represents Unicode strings as either 16- or 32-bit integers, depending on how the Python interpreter was compiled."

I'm really keen to avoid the performance penalty of converting all my UTF-16 C strings to UTF-8 C strings only for Python interface purposes, especially on Windows if the Python interpreter uses 16 bit "under the hood" as well.

Any idea how to tackle this challenge is highly appreciated.

Thanks, Thomas

Thomas Wouters · Accepted Answer · 2012-04-06 08:11:09Z

2

You can't avoid copying the data (unless you break through the Python C API) but you can create Python unicode objects directly from UTF-16 data, using PyUnicode_DecodeUTF16; see http://docs.python.org/c-api/unicode.html#utf-16-codecs.

answered Apr 6, 2012 at 8:11

Thomas Wouters

134k23 gold badges153 silver badges123 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python C-API: How to pass an UNICODE UTF-16 null terminated C string to my python app without converting to UTF-8?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related