Cython: Convert Python string list to 2D character array

Question

I am trying to convert a list of python strings to a 2D character array, and then pass it into a C function.

Python version: 3.6.4, Cython version: 0.28.3, OS Ubuntu 16.04

My first try looks like this:

def my_function(name_list):
    cdef char name_array[50][30]

    for i in range(len(name_list)):
        name_array[i] = name_list[i]

The code builds, but during runtime I receive the following response:

Traceback (most recent call last):
  File "test.py", line 532, in test_my_function
    my_function(name_list)
  File "my_module.pyx", line 817, in my_module.my_function
  File "stringsource", line 93, in 
carray.from_py.__Pyx_carray_from_py_char
IndexError: not enough values found during array assignment, expected 25, got 2

I then tried to make sure that the string on the right-hand side of the assignment is exactly 30 characters by doing the following:

def my_function(name_list):
    cdef char name_array[50][30]

    for i in range(len(name_list)):
        name_array[i] = (name_list[i] + ' '*30)[:30]

This caused another error, as follows:

Traceback (most recent call last):
  File "test.py", line 532, in test_my_function
    my_function(name_list)
  File "my_module.pyx", line 818, in my_module.my_function
  File "stringsource", line 87, in carray.from_py.__Pyx_carray_from_py_char
TypeError: an integer is required

I will appreciate any help. Thanks.

it works if you do name_array[i] = bytearray((name_list[i]+'a'*30)[:30]). Somehow with str Cython decides that it needs an integer, not sure why though... — ead
– ead, Commented Jun 21, 2018 at 13:29
Not sure why you need char name_array[50][30], but I would not do it, if I don't absolutely have to. — ead
– ead, Commented Jun 21, 2018 at 13:32
@ead: This is enforced by the third-party C library I am calling. Unfortunately I don't have a say in the matter. — johzi
– johzi, Commented Jun 21, 2018 at 13:35
I would write the copying routine myself: it would be more efficient and clearer than this strange +30-business. You might also want to take \0-termination in consideration - Cython would not do it for you. — ead
– ead, Commented Jun 21, 2018 at 13:42

ead · Accepted Answer · 2019-01-06 06:30:41Z

I don't like this functionality of Cython and seems to be at least not very well thought trough:

It is convenient to use char-array and thus to avoid the hustle with allocating/freeing of dynamically allocated memory. However, it is only natural that the allocated buffer is larger than the strings for which it is used. Enforcing equal lengths doesn't make sense.
C-strings are null-terminated. Not always is \0 at the end needed, but often it is necessary, so some additional steps are needed to ensure this.

Thus, I would roll out my own solution:

%%cython
from libc.string cimport memcpy

cdef int from_str_to_chararray(source, char *dest, size_t N, bint ensure_nullterm) except -1:
    cdef size_t source_len = len(source) 
    cdef bytes as_bytes = source.encode('ascii')    #hold reference to the underlying byte-object
    cdef const char *as_ptr = <const char *>(as_bytes)
    if ensure_nullterm:
        source_len+=1
    if source_len > N:
        raise IndexError("destination array too small")
    memcpy(dest, as_ptr, source_len)
    return 0

and then use it as following:

%%cython
def test(name):
    cdef char name_array[30]
    from_str_to_chararray(name, name_array, 30, 1)
    print("In array: ", name_array)

A quick test yields:

>>> tests("A")
In array: A
>>> test("A"*29)
In array: AAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>>> test("A"*30)
IndexError: destination array too small

Some additional remarks to the implementation:

it is necessary to hold the reference of the underlying bytes object, to keep it alive, otherwise as_ptr will become dangling as soon as it is created.
internal representation of bytes-objects has a trailing \0, so memcpy(dest, as_ptr, source_len) is safe even if source_len=len(source)+1.
except -1 in the signature is needed, so the exception is really passed to/checked in Python code.

Obviously, not everything is perfect: one has to pass the size of the array manually and this will leads to errors in the long run - something Cython's version does automatically right. But given the lacking functionality in Cython's version right now, the roll-out version is the better option in my opinion.

johzi · Accepted Answer · 2018-06-21 13:56:36Z

1

Thanks to @ead for responding. It got me to something that works. I am not convinced that it is the best way, but for now it is OK.

I addressed null termination, as @ead suggested, by appending null characters.

I received a TypeError: string argument without an encoding error, and had to encode the string before converting it to a bytearray. That is what the added .encode('ASCII') bit is for.

Here is the working code:

def my_function(name_list):
    cdef char name_array[50][30]

    for i in range(len(name_list)):
        name_array[i] = bytearray((name_list[i] + '\0'*30)[:30].encode('ASCII'))

answered Jun 21, 2018 at 13:56

johzi

6910 bronze badges

1 Comment

ead Over a year ago

sorry, I forgot you use Python3 - there is probably no need for bytearray(...) it should also work directly with bytes-object directly after encoding.

Collectives™ on Stack Overflow

Cython: Convert Python string list to 2D character array

2 Answers 2

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related