Passing a python function in a container to a C++ object and pybind wrappers

Question

I am writing a set of Python wrappers via pybind11 for an optimization library whose heavy-lifting code was written in C++.

The abstract class hierarchy of my C++ code to be wrapped currently looks something like this (multivariate.h):

typedef std::function<double(const double*)> multivariate;

struct multivariate_problem {

    // objective
    multivariate _f;
    int _n;

    // bound constraints
    double *_lower;
    double *_upper;

    multivariate_problem(const multivariate f, const int n, double *lower,
            double *upper):
            _f(f), _n(n), _lower(lower), _upper(upper)) {
    }
};

struct multivariate_solution {
  ... // some code here for return a solution (seems to work ok)
};

class MultivariateOptimizer {

public:
    virtual ~MultivariateOptimizer() {
    }

        // initialize the optimizer
    virtual void init(const multivariate_problem &f, const double *guess) = 0;

        // perform one step of iteration
    virtual void iterate() = 0;

        // retrieve current solution
    virtual multivariate_solution solution() = 0;

        // this essentially calls init(), performs a number of iterate() calls, returns solution()
    virtual multivariate_solution optimize(const multivariate_problem &f,
            const double *guess) = 0;
};

Now, my pybind code looks something like this currently (multivariate_py.cpp):

// wrap the function expressions
typedef std::function<double(const py::array_t<double>&)> py_multivariate;

// wrap the multivariable optimizer
void build_multivariate(py::module_ &m) {

// wrap the solution object
py::class_<multivariate_solution> solution(m, "MultivariateSolution");
...

// wrap the solver
py::class_<MultivariateOptimizer> solver(m, "MultivariateSearch");

// corresponds to MultivariateSearch::optimize()
    solver.def("optimize",
            [](MultivariateOptimizer &self, py_multivariate py_f,
                    py::array_t<double> &py_lower,
                    py::array_t<double> &py_upper,
                    py::array_t<double> &py_guess) {
                const int n = py_lower.size();
                double *lower = static_cast<double*>(py_lower.request().ptr);
                double *upper = static_cast<double*>(py_upper.request().ptr);

                const multivariate &f = [&py_f, &n](const double *x) -> double {
                    const auto &py_x = py::array_t<double>(n, x);
                    return py_f(py_x);
                };

                const multivariate_problem &prob { f, n, lower, upper };
                double *guess = static_cast<double*>(py_guess.request().ptr);
                return self.optimize(prob, guess);
            }, "f"_a, "lower"_a, "upper"_a, "guess"_a,
            py::call_guard<py::scoped_ostream_redirect,
                    py::scoped_estream_redirect>());

// corresponds to MultivariateSearch::init()
    solver.def("initialize",
            [](MultivariateOptimizer &self, py_multivariate py_f,
                    py::array_t<double> &py_lower,
                    py::array_t<double> &py_upper,
                    py::array_t<double> &py_guess) {
                const int n = py_lower.size();
                double *lower = static_cast<double*>(py_lower.request().ptr);
                double *upper = static_cast<double*>(py_upper.request().ptr);

                const multivariate &f = [&py_f, &n](const double *x) -> double {
                    const auto &py_x = py::array_t<double>(n, x);
                    return py_f(py_x);
                };

                const multivariate_problem &prob { f, n, lower, upper };
                double *guess = static_cast<double*>(py_guess.request().ptr);
                return self.init(prob, guess);
            }, "f"_a, "lower"_a, "upper"_a, "guess"_a,
            py::call_guard<py::scoped_ostream_redirect,
                    py::scoped_estream_redirect>());

// corresponds to MultivariateSearch::iterate()
    solver.def("iterate", &MultivariateOptimizer::iterate);

// corresponds to MultivariateSearch::solution()
    solver.def("solution", &MultivariateOptimizer::solution);

// put algorithm-specific bindings here
build_acd(m);
build_amalgam(m);
build_basin_hopping(m);
...

As you can see, I internally wrap the python function to a C++ function, then wrap the function inside a multivariate_problem object to pass to my C++ backend.

Which would then be called by pybind with the following entry point (mylibrary.cpp):


namespace py = pybind11;

void build_multivariate(py::module_&);

PYBIND11_MODULE(mypackagename, m) {
    build_multivariate(m);
    ...
}

pybind11 will compile this without errors through setup.py via typical pip install . commands. In fact, most of the code is working correctly for both initialize() and optimize() calls. For example, the following Python code will run fine (assuming mylibrary is replaced with the name of my package in setup.py, and MySolverName is a particular instance of MultivariateSearch:

import numpy as np
from mylibrary import MySolverName


# function to optimize
def fx(x):
    total = 0.0
    for x1, x2 in zip(x[:-1], x[1:]):
        total += 100 * (x2 - x1 ** 2) ** 2 + (1 - x1) ** 2
    return total


n = 10  # dimension of problem
alg = MySolverName(some_args_to_pass...)
sol = alg.optimize(fx,
                   lower=-10 * np.ones(n),
                   upper=10 * np.ones(n),
                   guess=np.random.uniform(low=-10., high=10., size=n))
print(sol)

However, here is where I am currently stuck. I would also like the user to have an option to run the solvers interactively, which is where the initialize and iterate functions come into play. However, the binding provided above, which works for optimize, does not work for iterate.

To illustrate a use case, the following code does not run:

import numpy as np
from mylibrary import MySolverName


# function to optimize
def fx(x):
    ... return result here for np.ndarray x


n = 10  # dimension of problem
alg = MySolverName(some_args_to_pass...)
alg.initialize(f=fx,
               lower=-10 * np.ones(n),
               upper=10 * np.ones(n),
               guess=np.random.uniform(low=-10., high=10., size=n))
alg.iterate()
print(alg.solution())

When the iterate command is executed, the script hangs and then terminates after a while, typically with no error, and it does not print the last line. In rare cases, it produces a crash on the Python side that the "dimension cannot be negative", but there is no guidance on which memory this pertains to. Without the iterate() line, the code will successfully print the last line as expected.

When I std::cout inside the C++ iterate() function, it always hangs at the point right before any calls to the function fx and does not print anything after the call, so I have narrowed down the problem to the Python function not persisting correctly with iterate() as the entry point. However, optimize() calls iterate() internally within the C++ code, and the previous example is working successfully, so the error is rather odd.

I've tried to go through the documentation for pybind, but I could not find an example that achieves precisely what I am trying to do above. I have tried other solutions like keep_alive<>, to no avail.

Can you help me modify the pybind wrappers, such that the internally stored Python function will persist inside the C++ object, and execute correctly even after the control is again restored to the Python interpreter between iterate() calls? Thanks for your help!

Judging by your solver.def("initialize" definition, you probably have dangling references and/or pointers that cause undefined behavior. It might not manifest in optimize, because it's executed as a single statement. To be sure you would need to include some simple example of concrete Solver class. But even better, change your interfaces to take std::vector<double> instead of double*, and then all your problems will magically disappear. If you don' want to change MultivariateOptimizer, then provide a wrapper class. — pptaszni
– pptaszni, Commented Oct 24, 2024 at 6:30
Could you please explain how to build such a wrapper? I've tried making a wrapper on the C++ side, but it would require a reference to a MultivariateOptimizer object from the python side (as there are hundreds of classes, making a wrapper for each class individually will be a tremendous effort to build and maintain). — mgim
– mgim, Commented Oct 28, 2024 at 0:04

pptaszni · Accepted Answer · 2024-10-29 11:51:18Z

If you would like to wrap your C-style interfaces in modern c++ compatible with pybind11, you can do like this:

Considering that we have some dummy impl of your optimizer:

typedef std::function<double(const double *)> multivariate;

struct multivariate_problem {

  // objective
  multivariate _f;
  int _n;

  // bound constraints
  double *_lower;
  double *_upper;
  multivariate_problem() : multivariate_problem(nullptr, 0, nullptr, nullptr) {}
  multivariate_problem(const multivariate f, const int n, double *lower,
                       double *upper)
      : _f(f), _n(n), _lower(lower), _upper(upper) {}
};

struct multivariate_solution {
  // some code here for return a solution (seems to work ok)
};

class MultivariateOptimizer {

public:
  virtual ~MultivariateOptimizer() {}

  // store the args
  virtual void init(const multivariate_problem &f, const double *guess) {
    _problem = f;
    _guess = guess;
  }

  // just dbg print to check if all is ok
  virtual void iterate() {
    double result = _problem._f(_guess);
    std::cout << "iterate, f(guess): " << result << '\n';
  }

  virtual multivariate_solution solution() { return multivariate_solution{}; }

  // this essentially calls init(), performs a number of iterate() calls,
  // returns solution()
  virtual multivariate_solution optimize(const multivariate_problem &f,
                                         const double *guess) {
    init(f, guess);  // WARN: after `optimize` returns, guess will dangle, so don't call `iterate` without calling `init` first
    iterate();
    iterate();
    return multivariate_solution{};
  }

private:
  multivariate_problem _problem;
  const double *_guess;
};

we can add the following 2 wrappers around the problem definition and the optimizer class:

typedef std::function<double(std::span<const double>)> Multivariate;

class MultiVarProblemWrapper {
public:
  friend void swap(MultiVarProblemWrapper &first,
                   MultiVarProblemWrapper &second) {
    using std::swap;
    swap(first._f, second._f);
    swap(first._n, second._n);
    swap(first._lower, second._lower);
    swap(first._upper, second._upper);
    swap(first._problem, second._problem);
  }
  MultiVarProblemWrapper() = default;
  MultiVarProblemWrapper(Multivariate f, size_t n, std::vector<double> lower,
                         std::vector<double> upper)
      : _f{f}, _n{n}, _lower{lower}, _upper{upper},
        _problem{[this, f, n](const double *arg) {
                   std::span<const double> sp{arg, n};
                   return f(sp);
                 },
                 static_cast<int>(n), _lower.data(), _upper.data()} {}
  // need to implement copy-CTor, because default version would be incorrect
  MultiVarProblemWrapper &operator=(MultiVarProblemWrapper rhs) {
    swap(*this, rhs);
    return *this;
  }
  MultiVarProblemWrapper(const MultiVarProblemWrapper &other)
      : MultiVarProblemWrapper(other._f, other._n, other._lower, other._upper) {
  }
  MultiVarProblemWrapper(MultiVarProblemWrapper &&) = default;
  MultiVarProblemWrapper &operator=(MultiVarProblemWrapper &&) = default;
  multivariate_problem getProblem() const { return _problem; }

private:
  Multivariate _f;
  size_t _n;
  std::vector<double> _lower;
  std::vector<double> _upper;
  multivariate_problem _problem;
};

class MultiVarOptWrapper {
public:
  MultiVarOptWrapper() : _impl{new MultivariateOptimizer} {}
  void init(const MultiVarProblemWrapper &f, std::vector<double> guess) {
    _problem = f;
    _guess = guess;
    _impl->init(_problem.getProblem(), _guess.data());
  }
  void iterate() { _impl->iterate(); }
  multivariate_solution solution() { return _impl->solution(); }
  multivariate_solution optimize(const MultiVarProblemWrapper &f,
                                 std::vector<double> guess) {
    return _impl->optimize(f.getProblem(), guess.data());
  }

private:
  std::unique_ptr<MultivariateOptimizer> _impl;
  MultiVarProblemWrapper _problem;
  std::vector<double> _guess;
};

and then the python bindings:

PYBIND11_MODULE(test_binding, m) {
  typedef std::function<double(const py::array_t<double> &)> py_multivariate;
  py::class_<multivariate_solution>(m, "multivariate_solution")
      .def(py::init<>());
  py::class_<MultiVarProblemWrapper>(m, "MultiVarProblemWrapper")
      .def(py::init<>())
      // explicit conversion from span to array_t; hopefully this will be supported out-of-the-box in the future pybind11 versions
      .def(py::init([](py_multivariate py_f, size_t n,
                       std::vector<double> lower, std::vector<double> upper) {
        auto f = [py_f](std::span<const double> sp) {
          py::array_t<double> arr(sp.size(), sp.data());
          return py_f(arr);
        };
        return MultiVarProblemWrapper{f, n, lower, upper};
      }));
  py::class_<MultiVarOptWrapper>(m, "MultiVarOptWrapper")
      .def(py::init<>())
      .def("init", &MultiVarOptWrapper::init)
      .def("iterate", &MultiVarOptWrapper::iterate)
      .def("solution", &MultiVarOptWrapper::solution)
      .def("optimize", &MultiVarOptWrapper::optimize);
}

And then you can test it in python:

def foo(data):
    print(data)
    return data[0] + data[1]

problem = MultiVarProblemWrapper(foo, 2, [1, 2], [3,4])
optimizer = MultiVarOptWrapper()
optimizer.init(problem, [6.9,9.6])
optimizer.iterate()

In MultiVarOptWrapper default constructor I hardcoded dummy algorithm implementation. Of course you can create a parametric MultiVarOptWrapper constructor were you pass arguments needed to create a real solver, like in your python example alg = MySolverName(some_args_to_pass...).

In MultiVarProblemWrapper I used copy and swap idiom.

mgim · Accepted Answer · 2024-10-28 01:32:15Z

Inspired by pptaszni's comment, I think I have managed to fix this by removing all dangling references within the pybind wrapped function itself:

solver.def("initialize",
        [](MultivariateOptimizer &self, py_multivariate &py_f,
                py::array_t<double> &py_lower,
                py::array_t<double> &py_upper,
                py::array_t<double> &py_guess) {
            const int n = py_lower.size();
            double *lower = static_cast<double*>(py_lower.request().ptr);
            double *upper = static_cast<double*>(py_upper.request().ptr);

            multivariate f = [py_f, n](const double *x) -> double {
                const auto py_x = py::array_t<double>(n, x);
                return py_f(py_x);
            };

            multivariate_problem prob { f, n, lower, upper };
            double *guess = static_cast<double*>(py_guess.request().ptr);
            return self.init(prob, guess);
        }, "f"_a, "lower"_a, "upper"_a, "guess"_a,
        py::call_guard<py::scoped_ostream_redirect,
                py::scoped_estream_redirect>());

I am not sure if this is the correct or the most efficient way to handle this, but it works and never crashes!

Collectives™ on Stack Overflow

Passing a python function in a container to a C++ object and pybind wrappers

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related