Wrapping Fortran for Python and Julia

A while ago, I set myself the challenge of creating Python bindings for a Fortran package. The main goal was to learn how to do it, but I ended up having to learn far more than I had initially planned.

Once that was done, I decided I should also learn how to make Julia bindings. Again, I was forced to dig deeper than expected — but in a good way. It made me revisit and improve the Python bindings.

Along the way, I realized there aren’t many projects out there that you can straightforwardly copy from. I hope these projects can be a small contribution toward filling that gap:

The order of steps for creating bindings is slightly different between Python and Julia.

Python bindings

  1. Create a C API for the functions you want to wrap, along with the corresponding header files.
  2. Implement the bindings using a suitable framework (ctypes, cffi, pybind11, nanobind, etc.).
  3. Build wheels for all (or most) OS and architecture combinations.
  4. Publish to PyPI.

Julia bindings

  1. Create a C API for the functions you want to wrap; header files are optional.
  2. Create an X_jll package (a Julia “binary wrapper”) for all (or most) OS and architecture combinations.
  3. Write the bindings in pure Julia, invoking the X_jll package.
  4. Publish to the Julia General Registry.

The first step (creating the C API) is essentially the same for both Python and Julia. In absolute terms it’s not complicated, but there are many small details (skill issues included…) that can eat up a lot of time. More than once I found myself wishing that the Fortran Best Practices guide had a section on this topic; in my opinion, it’s something best learned by studying well-crafted examples.

When it comes to writing the bindings themselves, the experience in Python and Julia differs greatly. In Julia, there is essentially one standard way to do it: no ambiguity. In contrast, the multitude of Python binding frameworks is dizzying — and to make matters worse, writing the bindings and building the wheels must be considered together. It’s easy to end up with something that works perfectly on a given OS/architecture in your local development setup but refuses to run inside the (cloud) wheel builders. This is the main reason I initially chose pybind11 (and later switched to nanobind): I was confident they could be made to work in wheel builders.

:warning: Callbacks and Closures

What I struggled with most were callbacks and closures. Wrapping a function that takes another function as an argument requires special care:

  • The Fortran code cannot contain closures involving the callback.
    With GCC, this results in a trampoline (executable stack), which causes all sorts of trouble. (No, special trampoline flags will not help.) Binaries (wheels or X_jll) cannot be generated for musllinux. Even worse, in Julia you get a stack overflow error when the code is executed from the REPL — but not when run from a “new process.” (It’s as crazy as it sounds.)

    • For example, this “clever” idea to convert a C callback with explicit-shape arrays into a modern Fortran callback with assumed-shape arrays runs into the abovementioned issues.
  • The Julia code cannot have closures on the C callback that will be passed into the Fortran library, because Julia’s cFunction does not support closures on macOS + ARM.

    • In practice, this means the API of the Fortran library must be (re)designed to allow passing arbitrary data/context into the callback via a void* (i.e., type(c_ptr)) argument. This isn’t strictly necessary for Python bindings, but having it does make the code cleaner.
7 Likes

Very interesting topic. Thanks for sharing your experience. I also find the options in Python overwhelming. I also am not satisfied with the auto-wrappers like f2py which can be confusing to use or are tailored to a particular Fortran processor.

When you say “callbacks and closures”, are you referring to the Python and Julia side? Fortran doesn’t have closures in the full sense of the word.

Concerning trampoline flags, are you are referring to -ftrampoline-impl, discussed previously here: Implementation of a parametrized objective function without using module variables or internal subroutines - #41 by ivanpribec ?

Closures (or nested function calls; I don’t know what else to call them) are problematic both on the Fortran side and on the Julia side, quite likely for the same reason – trampolines.

Specifically, what got me in trouble on the Fortran side was the pattern shown below. This is perfectly valid Fortran code, but leads to the issues I mentioned.

subroutine solver_c(callback_c, ...) bind(c)
  procedure(callback_c_t) :: callback_c

  call solver(callback_f, ...)

  contains

    subroutine callback_f(x, y)
      real, intent(in) :: x(:)
      real, intent(out):: y(:)
      call callback_c(n, x, y)
    end subroutine 

end subroutine

Yes, I tried -ftrampoline-impl=heap and the like. I don’t mean the flags do not do what they are supposed to. The difficulty is that one needs a solution that works across all OS+arch, and is compatible with the specific constraints of the cloud wheel builders.

We should 100% do it

2 Likes

From my experience, the best solution for Python wrappers is to write manually the C extensions. This way you know exactly what you are doing. However, you have to get familiar with the Python C API and it can be time consuming.

As already mentioned the automated such as f2py or Cython are overwhelming.