Python 3 does inbuilt map function using parallel [duplicate]

Question

I wonder if the built in map function splits a list in x-chunks to apply the give function in parallel (Threads)?

The docu doesnt say anything about it but I would wonder why it is not implemented like this.

    def map_func(x):
    '''

   :param x: 
   :return: 2x
   >>> map_func(4)
   4
   '''
    return x * x


new_list = list(map(map_func, range(1, 2 ** 25)))
print(new_list)

From the task manager i cannot clearly see if its done by one thread or more.

Can someone explain please if its sequential and if so, why?

If your question is "why map doesn't process in parallel", well, that's either too broad or opinion based. If you want multiprocessing, import the module. — cs95
– cs95, Commented Feb 2, 2019 at 20:35

ForceBru · Accepted Answer · 2020-12-30 08:03:05Z

1

It's sequential because map the higher-order function in general has to apply a function to data and return the results in the same order as the original data:

map(f, [1,2,3,4]) => [f(1), f(2), f(3), f(4)]

~~Making it parallel will introduce the need of synchronisation, which'll defeat the purpose of parallelism.~~

multiprocessing.Pool.map is a parallel version of the built-in map that will split the workload into chunks and correctly organise the results.

edited Dec 30, 2020 at 8:03

answered Feb 2, 2019 at 20:24

ForceBru

45k10 gold badges71 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

ninesalt Over a year ago

Your last sentence is slightly misleading. map can be run in parallel without making it useless. See this question and answers.

ForceBru Over a year ago

@ninesalt, well, yeah, but it still has to stitch the individual results together somehow, and that's additional work. It also can't operate on infinite iterables because it converts the iterable to list, which is also additional work

Roger Dahl Over a year ago

I'm pretty sure this is not the reason that map() does not run in parallel in cpython. A parallel version of map() could easily reserve space for the results up front and insert the results as they became ready. I think the reason is that cpython does not support running interpreted code in parallel. Multithreading in cpython still serializes code execution (with the GIL).

ForceBru Over a year ago

@RogerDahl, yeah, two years later, I'm not a fan of this answer either. That the results have to be in the same order doesn't mean that the mapping can't be parallel: for example, the main operations in Apache Spark are map and reduce, so parallel map is most definitely a thing - it's just not the built-in one

Collectives™ on Stack Overflow

Python 3 does inbuilt map function using parallel [duplicate]

1 Answer 1

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Linked

Related