The multithreaded executor is often recommended, but it seems that
executor is subject to the global interpreter lock (GIL).
Yes, looking at the implementation of rclpy.MultiThreadedExecutor in the latest ROS2 LTS release (Humble), it's built on concurrent.futures.ThreadPoolExecutor, which is itself built on Python threads, therefore it's subject to the GIL.
One alternative that has recently become available is to use a free-threading version of Python. On Ubuntu you can set it up like this:
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.13-nogil python3.13-venv
python3.13-nogil -m venv --system-site-packages ros2_nogil
source ros2_nogil/bin/activate
...but I haven't tried it myself yet.
Another possibility is to use the multiprocessing module to run your computations on multiple processes, but keeping the ROS2 callbacks in a single thread. For example, have a pair of multiprocessing.Queue objects, one sending incoming messages to a process pool for handling, another receiving outgoing messages for publishing. That way the ROS message callbacks are freed from processing duties, increasing their potential throughput.