1

I have three scripts, scheduler.py which is a parallel task runner based on multiprocessing.Process and multiprocessing.Pipe, and the second script is simulation.pyx which is a script containing some classes and functions that I want to perform in parallel via scheduler.py and lastly a small main script where I create an instance of the parallelization class from scheduler.py, pass it to the classes in simulation.pyx and run the whole thing.

When the target parallel function is on the top level in simulation.pyx everything works fine, but as soon as I try to use scheduler.py with a class function in simulation.pyx I get a pickling error.

Since the code is several thousand of lines I'll only give some conceptual code:

small_main_script.py:

import simulation
import scheduler


if __name__ == '__main__':

    main = simulation.Main()
    scheduler = scheduler.parallel()
    main.simulate(scheduler)


simulation.pyx:

import scheduler

cdef do_something_with_job(job):
...

cdef class Main:
    cdef public ...
    ...

    def __init__(self):
    ...

    def some_function(self,job):
        ...
        do_something_with_job(job)
        ...

    def simulate(self, scheduler):

        for job in job_list:
            scheduler.add_jobs(job)

        scheduler.target_function = self.some_function

        scheduler.run_in_parallel()

The thing is that if I use useless dummy function like

def sleep(job):
    time.sleep(2)

and put it on the top level i.e. outside the classes, the parallelization works fine but as soon as i put it inside the class Main i get a pickling error. I get the same error if I use my real target function which is also defined in the class Main and I don't want to move it to the top level. The following is what happens when I use the dummy function sleep(self,job) inside the class Main. When it's outside the class it works fine.

PicklingError: Can't pickle <built-in method sleep of simulation.Main
object at 0x0D4A3C00>: it's not found as __main__.sleep

In [2]: Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Python27\lib\multiprocessing\forking.py", line 381, in main self = load(from_parent)
  File "C:\Python27\lib\pickle.py", line 1384, in load return Unpickler(file).load()
  File "C:\Python27\lib\pickle.py", line 864, in load dispatch[key](self)
  File "C:\Python27\lib\pickle.py", line 886, in load_eof

    raise EOFError
EOFError

I'm using Python 2.7

Update

I have managed to further isolate the problem. When using third party package pathos multiprocessing I'm able to pickle class functions. The problem now seems to be that I get an error when using function arguments that are class instances.

2
  • I'm the pathos author. If you give more detail (as you have above for your original pickling issue) maybe you'll be more likely to receive further help. What does the new error look like, and for what code? Commented Nov 17, 2016 at 10:28
  • "The problem now seems to be that I get an error when using function arguments that are class instances" => EVERYTHING in Python is a 'class instance' so the problem is with your specific class, not with "class instances". Commented Oct 16, 2018 at 13:35

1 Answer 1

2

From Python multiprocessing programming guidelines:

Picklability: Ensure that the arguments to the methods of proxies are picklable.

Only top level functions are picklable.

The reason why it is hard to pickle non top level functions (class/instance methods, nested functions etc) is because it is hard to look them up in a portable manner in the child process. The process you are sending the instance method to execute might not have any idea about the object which owns the method itself.

As the programming guidelines suggest:

However, one should generally avoid sending shared objects to other processes using pipes or queues. Instead you should arrange the program so that a process which needs access to a shared resource created elsewhere can inherit it from an ancestor process.

In other words, create a process passing the method to the target keyword.

Pathos library extends the pickle protocol allowing to serialise more types than the standard protocol supports.

In general it is not recommended to mix OOP and multiprocessing as there are several corner cases which can be misleading. This is one of them.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.