1

I wonder if the built in map function splits a list in x-chunks to apply the give function in parallel (Threads)?

The docu doesnt say anything about it but I would wonder why it is not implemented like this.

    def map_func(x):
    '''

   :param x: 
   :return: 2x
   >>> map_func(4)
   4
   '''
    return x * x


new_list = list(map(map_func, range(1, 2 ** 25)))
print(new_list)

From the task manager i cannot clearly see if its done by one thread or more.

Can someone explain please if its sequential and if so, why?

2
  • No, it does not. Commented Feb 2, 2019 at 20:24
  • If your question is "why map doesn't process in parallel", well, that's either too broad or opinion based. If you want multiprocessing, import the module. Commented Feb 2, 2019 at 20:35

1 Answer 1

1

It's sequential because map the higher-order function in general has to apply a function to data and return the results in the same order as the original data:

map(f, [1,2,3,4]) => [f(1), f(2), f(3), f(4)]

Making it parallel will introduce the need of synchronisation, which'll defeat the purpose of parallelism.

multiprocessing.Pool.map is a parallel version of the built-in map that will split the workload into chunks and correctly organise the results.

Sign up to request clarification or add additional context in comments.

4 Comments

Your last sentence is slightly misleading. map can be run in parallel without making it useless. See this question and answers.
@ninesalt, well, yeah, but it still has to stitch the individual results together somehow, and that's additional work. It also can't operate on infinite iterables because it converts the iterable to list, which is also additional work
I'm pretty sure this is not the reason that map() does not run in parallel in cpython. A parallel version of map() could easily reserve space for the results up front and insert the results as they became ready. I think the reason is that cpython does not support running interpreted code in parallel. Multithreading in cpython still serializes code execution (with the GIL).
@RogerDahl, yeah, two years later, I'm not a fan of this answer either. That the results have to be in the same order doesn't mean that the mapping can't be parallel: for example, the main operations in Apache Spark are map and reduce, so parallel map is most definitely a thing - it's just not the built-in one

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.