8

I am trying to pass the keyword arguments to the map function in Python's multiprocessing.Pool instance.

Extrapolating from Using map() function with keyword arguments, I know I can use functools.partial() such as the following:

from multiprocessing import Pool
from functools import partial
import sys

# Function to multiprocess
def func(a, b, c, d):
    print(a * (b + 2 * c - d))
    sys.stdout.flush()

if __name__ == '__main__':
    p = Pool(2)
    # Now, I try to call func(a, b, c, d) for 10 different a values,
    # but the same b, c, d values passed in as keyword arguments
    a_iter = range(10)
    kwargs = {'b': 1, 'c': 2, 'd': 3}

    mapfunc = partial(func, **kwargs)
    p.map(mapfunc, a_iter)

The output is correct:

0
2
4
6
8
10
12
14
16
18

Is this the best practice (most "pythonic" way) to do so? I felt that:

1) Pool is commonly used;

2) Keyword arguments are commonly used;

3) But the combined usage like my example above is a little bit like a "hacky" way to achieve this.

2
  • 1
    Seems fine to me. map only takes positional parameters, so using partial to create a suitable function object is perfectly reasonable. Commented Mar 3, 2016 at 1:21
  • Does it works only if iterable argument is FIRST in function defenition? I mean if I want use map for argument b I need to swap them in function defenition? Because I have an error if just define a key in kwargs and drop b from it Commented Jun 19, 2019 at 11:45

1 Answer 1

2

Using partial may be suboptimal if the default arguments are large. The function passed to map is repeatedly pickle-ed when sent to the workers (once for every argument in the iterable); a global Python function is (essentially) pickle-ed by sending the qualified name (because the same function is defined on the other side without needing to transfer any data) while partial is pickle-ed as the pickle of the function and all the provided arguments.

If kwargs is all small primitives, as in your example, this doesn't really matter; the incremental cost of sending along the extra arguments is trivial. But if kwargs is big, say, kwargs = {'b': [1] * 10000, 'c': [2] * 20000, 'd': [3]*30000}, that's a nasty price to pay.

In that case, you have some options:

  1. Roll your own function at the global level that works like partial, but pickles differently:

    class func_a_only(a):
        return func(a, 1, 2, 3)
    
  2. Using the initializer argument to Pool so each worker process sets up state once, instead of once per task, allowing you to ensure data is available even if you're working in spawn based environment (e.g. Windows)

  3. Using Managers to share a single copy of data among all processes

And probably a handful of other approaches. Point is, partial is fine for arguments that don't produce huge pickles, but it can kill you if the bound arguments are huge.

Note: In this particular case, if you're in Python 3.3+, you don't actually need partial, and avoiding the dict in favor of tuples saves a trivial amount of overhead. Without adding any new functions, just some imports, you could replace:

kwargs = {'b': 1, 'c': 2, 'd': 3}
mapfunc = partial(func, **kwargs)
p.map(mapfunc, a_iter)

with:

from itertools import repeat

p.starmap(func, zip(a_iter, repeat(1), repeat(2), repeat(3)))

to achieve a similar effect. To be clear, there is nothing wrong with partial that this "fixes" (both approaches would have the same problem with pickling large objects), this is just an alternate approach that is occasionally useful.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your answer! When you mention that I don't really need partial in Python >= 3.3, where I do not need to avoid dict in favor of tuples, do you mean that I can use Pool.apply_async() instead? I assume that if I use Pool.map(), I can only use iterables like a tuple or list (pool.map(func, iterable)) and can't use a dict (pool.apply_async(func, args, kwargs)). Was I getting it right?
@ShawnWang: I was referring to the ability to use Pool.starmap instead of Pool.map, allowing you to use zip to construct a tuple of arguments that starmap unpacks for you. That alternative is what I'm demonstrating in the final code block, providing repetitive values for b, c, and d positionally by using repeat and zip to construct the complete set of positional arguments over and over instead of using keyword arguments at all.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.