4

I have a simple python script

import _tph
str = u'Привет, <b>мир!</b>' # Some unicode string with a russian characters
_tph.strip_tags(str)

and C library, which is compiled into _tph.so. This is a strip_tags function from it:

PyObject *strip_tags(PyObject *self, PyObject *args) {
    PyUnicodeObject *string;
    Py_ssize_t length;

    PyArg_ParseTuple(args, "u#", &string, &length);
    printf("%d, %d\n", string->length, length);

    // ...
}

printf function prints this: 1080, 19. So, str length is really 19 symbols, but from what deep of hell I'm getting those 1080 characters?

When I'm printing string, I got my str, null char, and then a lot of junk bytes.

Junk memory looks like this:

u'\u041f\u0440\u0438\u0432\u0435\u0442, <b>\u043c\u0438\u0440!</b>\x00\x00\u0299\Ub7024000\U08c55800\Ub7025904\x00\Ub777351c\U08c79e58\x00\U08c7a0b4\x00\Ub7025904\Ub7025954\Ub702594c\Ub702591c\Ub702592c\Ub7025934\x00\x00\x00

How I can get a normal string here?

1 Answer 1

6

The "string" argument here isn't well named. It is a pointer to a Python Unicode object, so your printf is seeing a lot of binary data (the object type, GC headers, the ref count, and the encoded unicode code points) until it happens to find a zero byte which printf interprets as the end of the string.

The simplest way to view the string is with PyObject_Print(string). You can find the C functions for manipulating Python unicode objects at: http://docs.python.org/c-api/unicode.html#unicode-objects

Sign up to request clarification or add additional context in comments.

2 Comments

In fact, I'm getting a segmentation fault with a code kind of this: PyObject_Print((PyObject *)string, stdout, 0); And I had tried to save thread state for GIL, yep.
"string" is declared as PyUnicode object. To get that object, change the parsing code to "O" and use PyObject_Print() on the result. Alternatively, change the declaration to a unicode buffer pointer and continue to use "u#". The latter gives you a pointer to a counted array (not null terminated for use with printf).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.