2

I'm trying to write Cython code to dump a dense feature matrix, target vector pair to libsvm format faster than sklearn's built in code. I get a compilation error complaining about a type issue with passing the target vector (a numpy array of ints) to the relevant c function.

Here's the code:

import numpy as np
cimport numpy as np
cimport cython

cdef extern from "cdump.h":
    int filedump( double features[], int numexemplars, int numfeats, int target[], char* outfname)

@cython.boundscheck(False)
@cython.wraparound(False)
def fastdumpdense_libsvmformat(np.ndarray[np.double_t,ndim=2] X, y, outfname):
    if X.shape[0] != len(y):
        raise ValueError("X and y need to have the same number of points")

    cdef int numexemplars = X.shape[0]
    cdef int numfeats = X.shape[1]

    cdef bytes py_bytes = outfname.encode()
    cdef char* outfnamestr = py_bytes

    cdef np.ndarray[np.double_t, ndim=2, mode="c"] X_c
    cdef np.ndarray[np.int_t, ndim=1, mode="c"] y_c
    X_c = np.ascontiguousarray(X, dtype=np.double)
    y_c = np.ascontiguousarray(y, dtype=np.int)
    retval = filedump( &X_c[0,0], numexemplars, numfeats, &y_c[0], outfnamestr)

    return retval

When I attempt to compile this code using distutils, I get the error

cythoning fastdump_svm.pyx to fastdump_svm.cpp

Error compiling Cython file:
------------------------------------------------------------ ...

    cdef np.ndarray[np.double_t, ndim=2, mode="c"] X_c
    cdef np.ndarray[np.int_t, ndim=1, mode="c"] y_c
    X_c = np.ascontiguousarray(X, dtype=np.double)
    y_c = np.ascontiguousarray(y, dtype=np.int)
    retval = filedump( &X_c[0,0], numexemplars, numfeats, &y_c[0], outfnamestr)
                                                         ^
------------------------------------------------------------

fastdump_svm.pyx:24:58: Cannot assign type 'int_t *' to 'int *'

Any idea how to fix this error? I originally was following the paradigm of passing y_c.data, which works, but this is apparently not the recommended way.

2 Answers 2

4

You can also use dtype=np.dtype("i") when initiating a numpy array to match the C int on your machine.

cdef int [:] y_c
c_array = np.ascontiguousarray(y, dtype=np.dtype("i"))
Sign up to request clarification or add additional context in comments.

Comments

3

The problem is that numpy.int_t is not the same as int, you can easily check this by having your program print sizeof(numpy.int_t) and sizeof(int).

int is a c int, defined by the c standard as being at least 16 bits, but it's 32 bits on my machine. numpy.int_t is usually 32 bits or 64 bits depending on whether you're using a 32 or 64 bit version of numpy, but of course there is some exception (probably for windows users). If you want to know which numpy dtype matches your c_int you can do np.dtype(cytpes.c_int).

So to pass your numpy array to c code you can do:

import ctypes
cdef np.ndarray[int, ndim=1, mode="c"] y_c
y_c = np.ascontiguousarray(y, dtype=ctypes.c_int)
retval = filedump( &X_c[0,0], numexemplars, numfeats, &y_c[0], outfnamestr)

2 Comments

The dtype in the cdef should be int_t, no? When I try it without int_t I get an error about not being able to cast pointers to Python objects. When I use int_t, I get the same error as before about casting from int_t * to int *.
Sorry about my last comment, I seem to have misunderstood your followup question. The cdef should have the same type as the function declaration of filedump so if the argument is defined as int target[], the cdef should use int. If you can change the signature of filedump you can set them both to np.int_t for example, but they should be the same. Make sure you're using int and not np.int, the first is a c basic type (at least when you use it in a cdef block) and the latter is a python type.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.