6

I'm using SWIG to glue together some C++ code to Python (2.6), and part of that glue includes a piece of code that converts large fields of data (millions of values) from the C++ side to a Numpy array. The best method I can come up with implements an iterator for the class and then provides a Python method:

def __array__(self, dtype=float):
    return np.fromiter(self, dtype, self.size())

The problem is that each iterator next call is very costly, since it has to go through about three or four SWIG wrappers. It takes far too long. I can guarantee that the C++ data are stored contiguously (since they live in a std::vector), and it just feels like Numpy should be able to take a pointer to the beginning of that data alongside the number of values it contains, and read it directly.

Is there a way to pass a pointer to internal_data_[0] and the value internal_data_.size() to numpy so that it can directly access or copy the data without all the Python overhead?

4 Answers 4

2

You will want to define __array_interface__() instead. This will let you pass back the pointer and the shape information directly.

Sign up to request clarification or add additional context in comments.

3 Comments

can you provide a little more detail for a practical implementation? Is there also a way to do it without having to compile my project against the Numpy header files? Thanks.
it also says that's a legacy interface.
__array_interface__ is just a plain dict with plain types inside of it. No need to compile with any Numpy headers. Ignore the note that calls it "legacy". I thought I had deleted that already. If you like, you can implement the PEP 3118 buffer interface, but this is easier.
1

Maybe it would be possible to use f2py instead of swig. Despite its name, it is capable of interfacing python with C as well as Fortran. See http://www.scipy.org/Cookbook/f2py_and_NumPy

The advantage is that it handles the conversion to numpy arrays automatically.

Two caveats: if you don't already know Fortran, you may find f2py a bit strange; and I don't know how well it works with C++.

2 Comments

Thanks for the response. I do know some FORTRAN, but I'm using a lot of C++-y features in my code: templates, typedefs, etc. I'd also rather not introduce another dependency.
Fair enough re C++. You would probably have to write intermediate plain C wrappers, which could be a pain. On the other hand, it is not really another dependency since f2py is part of numpy, which you are already using. You would not need a fortran compiler.
0

If you wrap your vector in an object that implements Pythons Buffer Interface, you can pass that to the numpy array for initialization (see docs, third argument). I would bet that this initialization is much faster, since it can just use memcpy to copy the data.

3 Comments

Thanks for the tip. Do you have any examples of using the pybuffer_mutable_binary or other interface in SWIG to implement the __buffer__ interface for, e.g., floats?
@Seth: Sorry, I cannot help you there.
So it looks like I would have to implement the entire buffer interface for this class by hand from scratch. SWIG only provides the capability of reading other buffers, not exporting buffer functions.
0

So it looks like the only real solution is to base something off pybuffer.i that can copy from C++ into an existing buffer. If you add this to a SWIG include file:

%insert("python") %{
import numpy as np
%}

/*! Templated function to copy contents of a container to an allocated memory
 * buffer
 */
%inline %{
//==== ADDED BY numpy.i
#include <algorithm>

template < typename Container_T >
void copy_to_buffer(
        const Container_T& field,
        typename Container_T::value_type* buffer,
        typename Container_T::size_type length
        )
{
//    ValidateUserInput( length == field.size(),
//            "Destination buffer is the wrong size" );
    // put your own assertion here or BAD THINGS CAN HAPPEN

    if (length == field.size()) {
        std::copy( field.begin(), field.end(), buffer );
    }
}
//====

%}

%define TYPEMAP_COPY_TO_BUFFER(CLASS...)
%typemap(in) (CLASS::value_type* buffer, CLASS::size_type length)
(int res = 0, Py_ssize_t size_ = 0, void *buffer_ = 0) {

    res = PyObject_AsWriteBuffer($input, &buffer_, &size_);
    if ( res < 0 ) {
        PyErr_Clear();
        %argument_fail(res, "(CLASS::value_type*, CLASS::size_type length)",
                $symname, $argnum);
    }
    $1 = ($1_ltype) buffer_;
    $2 = ($2_ltype) (size_/sizeof($*1_type));
}
%enddef


%define ADD_NUMPY_ARRAY_INTERFACE(PYVALUE, PYCLASS, CLASS...)

TYPEMAP_COPY_TO_BUFFER(CLASS)

%template(_copy_to_buffer_ ## PYCLASS) copy_to_buffer< CLASS >;

%extend CLASS {
%insert("python") %{
def __array__(self):
    """Enable access to this data as a numpy array"""
    a = np.ndarray( shape=( len(self), ), dtype=PYVALUE )
    _copy_to_buffer_ ## PYCLASS(self, a)
    return a
%}
}

%enddef

then you can make a container "Numpy"-able with

%template(DumbVectorFloat) DumbVector<double>;
ADD_NUMPY_ARRAY_INTERFACE(float, DumbVectorFloat, DumbVector<double>);

Then in Python, just do:

# dvf is an instance of DumbVectorFloat
import numpy as np
my_numpy_array = np.asarray( dvf )

This has only the overhead of a single Python <--> C++ translation call, not the N that would result from a typical length-N array.

A slightly more complete version of this code is part of my PyTRT project at github.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.