3

Pythonistas,

I'm trying to write a Python extension in C that passes a big amount of null terminated, UNICODE UTF-16 encoded C strings to my Python application. The UNICODE strings from my C library are guarenteed to be always 16 bit. I'm NOT using the wchar_t in my C library on LINUX due to the fact that the size of wchar_t may vary.

I found a lot of functions (PyUnicode_AsUTF8String, PyString_FromStringAndSize, PyString_FromString, etc.) that do exactly what i want but all theses functions are designed for 8 bit character/string representation.

The Python documentation (http://docs.python.org/howto/unicode.html) says:

"Under the hood, Python represents Unicode strings as either 16- or 32-bit integers, depending on how the Python interpreter was compiled."

I'm really keen to avoid the performance penalty of converting all my UTF-16 C strings to UTF-8 C strings only for Python interface purposes, especially on Windows if the Python interpreter uses 16 bit "under the hood" as well.

Any idea how to tackle this challenge is highly appreciated.

Thanks, Thomas

1 Answer 1

2

You can't avoid copying the data (unless you break through the Python C API) but you can create Python unicode objects directly from UTF-16 data, using PyUnicode_DecodeUTF16; see http://docs.python.org/c-api/unicode.html#utf-16-codecs.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.