Fast conversion of C/C++ vector to Numpy array

Question

I'm using SWIG to glue together some C++ code to Python (2.6), and part of that glue includes a piece of code that converts large fields of data (millions of values) from the C++ side to a Numpy array. The best method I can come up with implements an iterator for the class and then provides a Python method:

def __array__(self, dtype=float):
    return np.fromiter(self, dtype, self.size())

The problem is that each iterator next call is very costly, since it has to go through about three or four SWIG wrappers. It takes far too long. I can guarantee that the C++ data are stored contiguously (since they live in a std::vector), and it just feels like Numpy should be able to take a pointer to the beginning of that data alongside the number of values it contains, and read it directly.

Is there a way to pass a pointer to internal_data_[0] and the value internal_data_.size() to numpy so that it can directly access or copy the data without all the Python overhead?

Robert Kern · Accepted Answer · 2011-03-24 19:19:22Z

2

You will want to define __array_interface__() instead. This will let you pass back the pointer and the shape information directly.

answered Mar 24, 2011 at 19:19

Robert Kern

13.5k3 gold badges37 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Seth Johnson Over a year ago

can you provide a little more detail for a practical implementation? Is there also a way to do it without having to compile my project against the Numpy header files? Thanks.

Seth Johnson Over a year ago

it also says that's a legacy interface.

Robert Kern Over a year ago

__array_interface__ is just a plain dict with plain types inside of it. No need to compile with any Numpy headers. Ignore the note that calls it "legacy". I thought I had deleted that already. If you like, you can implement the PEP 3118 buffer interface, but this is easier.

deprecated · Accepted Answer · 2011-03-24 22:26:40Z

1

Maybe it would be possible to use f2py instead of swig. Despite its name, it is capable of interfacing python with C as well as Fortran. See http://www.scipy.org/Cookbook/f2py_and_NumPy

The advantage is that it handles the conversion to numpy arrays automatically.

Two caveats: if you don't already know Fortran, you may find f2py a bit strange; and I don't know how well it works with C++.

answered Mar 24, 2011 at 22:26

deprecated

2,31519 silver badges12 bronze badges

2 Comments

Seth Johnson Over a year ago

Thanks for the response. I do know some FORTRAN, but I'm using a lot of C++-y features in my code: templates, typedefs, etc. I'd also rather not introduce another dependency.

deprecated Over a year ago

Fair enough re C++. You would probably have to write intermediate plain C wrappers, which could be a pain. On the other hand, it is not really another dependency since f2py is part of numpy, which you are already using. You would not need a fortran compiler.

Björn Pollex · Accepted Answer · 2011-03-24 19:25:11Z

0

If you wrap your vector in an object that implements Pythons Buffer Interface, you can pass that to the numpy array for initialization (see docs, third argument). I would bet that this initialization is much faster, since it can just use memcpy to copy the data.

answered Mar 24, 2011 at 19:25

Björn Pollex

77.1k30 gold badges206 silver badges290 bronze badges

3 Comments

Seth Johnson Over a year ago

Thanks for the tip. Do you have any examples of using the pybuffer_mutable_binary or other interface in SWIG to implement the __buffer__ interface for, e.g., floats?

Björn Pollex Over a year ago

@Seth: Sorry, I cannot help you there.

Seth Johnson Over a year ago

So it looks like I would have to implement the entire buffer interface for this class by hand from scratch. SWIG only provides the capability of reading other buffers, not exporting buffer functions.

Seth Johnson · Accepted Answer · 2011-04-24 11:46:29Z

So it looks like the only real solution is to base something off pybuffer.i that can copy from C++ into an existing buffer. If you add this to a SWIG include file:

%insert("python") %{
import numpy as np
%}

/*! Templated function to copy contents of a container to an allocated memory
 * buffer
 */
%inline %{
//==== ADDED BY numpy.i
#include <algorithm>

template < typename Container_T >
void copy_to_buffer(
        const Container_T& field,
        typename Container_T::value_type* buffer,
        typename Container_T::size_type length
        )
{
//    ValidateUserInput( length == field.size(),
//            "Destination buffer is the wrong size" );
    // put your own assertion here or BAD THINGS CAN HAPPEN

    if (length == field.size()) {
        std::copy( field.begin(), field.end(), buffer );
    }
}
//====

%}

%define TYPEMAP_COPY_TO_BUFFER(CLASS...)
%typemap(in) (CLASS::value_type* buffer, CLASS::size_type length)
(int res = 0, Py_ssize_t size_ = 0, void *buffer_ = 0) {

    res = PyObject_AsWriteBuffer($input, &buffer_, &size_);
    if ( res < 0 ) {
        PyErr_Clear();
        %argument_fail(res, "(CLASS::value_type*, CLASS::size_type length)",
                $symname, $argnum);
    }
    $1 = ($1_ltype) buffer_;
    $2 = ($2_ltype) (size_/sizeof($*1_type));
}
%enddef


%define ADD_NUMPY_ARRAY_INTERFACE(PYVALUE, PYCLASS, CLASS...)

TYPEMAP_COPY_TO_BUFFER(CLASS)

%template(_copy_to_buffer_ ## PYCLASS) copy_to_buffer< CLASS >;

%extend CLASS {
%insert("python") %{
def __array__(self):
    """Enable access to this data as a numpy array"""
    a = np.ndarray( shape=( len(self), ), dtype=PYVALUE )
    _copy_to_buffer_ ## PYCLASS(self, a)
    return a
%}
}

%enddef

then you can make a container "Numpy"-able with

%template(DumbVectorFloat) DumbVector<double>;
ADD_NUMPY_ARRAY_INTERFACE(float, DumbVectorFloat, DumbVector<double>);

Then in Python, just do:

# dvf is an instance of DumbVectorFloat
import numpy as np
my_numpy_array = np.asarray( dvf )

This has only the overhead of a single Python <--> C++ translation call, not the N that would result from a typical length-N array.

A slightly more complete version of this code is part of my PyTRT project at github.

Collectives™ on Stack Overflow

Fast conversion of C/C++ vector to Numpy array

4 Answers 4

3 Comments

2 Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

2 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related