1

What i am trying to do is to check which multiprocessing is best for my data. I tried to multiprocess this loop:

def __pure_calc(args):

    j = args[0]
    point_array = args[1]
    empty = args[2]
    tree = args[3] 

    for i in j:
            p = tree.query(i)   

            euc_dist = math.sqrt(np.sum((point_array[p[1]]-i)**2))  

            ##add one row at a time to empty list
            empty.append([i[0], i[1], i[2], euc_dist, point_array[p[1]][0], point_array[p[1]][1], point_array[p[1]][2]]) 

    return empty

Just pure function is taking 6.52 sec.

My first approach was multiprocessing.map:

from multiprocessing import Pool 

def __multiprocess(las_point_array, point_array, empty, tree):

    pool = Pool(os.cpu_count()) 

    for j in las_point_array:
        args=[j, point_array, empty, tree]
        results = pool.map(__pure_calc, args)

    #close the pool and wait for the work to finish 
    pool.close() 
    pool.join() 

    return results

When i checked other answers how to multiprocess function it should be easy as that: map(call function, inputs) - done. But for some reason my multiproccess is not excepting my inputs, rising error that scipy.spatial.ckdtree.cKDTree object is not subscriptable.

So i tried with apply_async:

from multiprocessing.pool import ThreadPool

def __multiprocess(arSegment, wires_point_array, ptList, tree):

    pool = ThreadPool(os.cpu_count())

    args=[arSegment, point_array, empty, tree]

    result = pool.apply_async(__pure_calc, [args])

    results = result.get()

It run with out problems. For my test data i manage to calculate it in 6.42 sec.

Why apply_async is accepting ckdtree with out any problem and pool.map not? What i need to change to make it running?

1 Answer 1

2

pool.map(function, iterable), it basically has the same footprint with itertool's map. Each item from the iterable will be the args for your __pure_calc function.

In this case I guess you might change into this:

def __multiprocess(las_point_array, point_array, empty, tree):

    pool = Pool(os.cpu_count()) 

    args_list = [
        [j, point_array, empty, tree]
        for j in las_point_array
    ]

    results = pool.map(__pure_calc, args_list)

    #close the pool and wait for the work to finish 
    pool.close() 
    pool.join() 

    return results
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.