6

I have to create and fill huge (e.g. 96 Go, 72000 rows * 72000 columns) array with floats in each case that come from mathematical formulas. The array will be computed after.

import itertools, operator, time, copy, os, sys
import numpy 
from multiprocessing import Pool


def f2(x):  # more complex mathematical formulas that change according to values in *i* and *x*
    temp=[]
    for i in combine:
        temp.append(0.2*x[1]*i[1]/64.23)
    return temp

def combinations_with_replacement_counts(n, r):  #provide all combinations of r balls in n boxes
   size = n + r - 1
   for indices in itertools.combinations(range(size), n-1):
       starts = [0] + [index+1 for index in indices]
       stops = indices + (size,)
       yield tuple(map(operator.sub, stops, starts))

global combine
combine = list(combinations_with_replacement_counts(3, 60))  #here putted 60 but need 350 instead
print len(combine)
if __name__ == '__main__':
    t1=time.time()
    pool = Pool()              # start worker processes
    results = [pool.apply_async(f2, (x,)) for x in combine]
    roots = [r.get() for r in results]
    print roots [0:3]
    pool.close()
    pool.join()
    print time.time()-t1
  • What's the fastest way to create and fill such huge numpy array? Filling lists then aggregate then convert into numpy array?
  • Can we parallelize computation knowing that cases/columns/rows of the 2d-array are independent to speed-up the filling of the array? Clues/trails to optimize such computation using Multiprocessing?
2
  • Does it need to be real-time or can you calculate it off-line and use e.g. pickle to read it? Commented Apr 22, 2013 at 16:25
  • I prefer to be real-time but if pickling is faster, I don't mind...hope that I understood well your question? Commented Apr 22, 2013 at 16:31

2 Answers 2

1

I know that you can create shared numpy arrays that can be changed from different threads (assuming that the changed areas don't overlap). Here is the sketch of the code that you can use to do that (I saw the original idea somewhere on stackoverflow, edit: here it is https://stackoverflow.com/a/5550156/1269140 )

import multiprocessing as mp ,numpy as np, ctypes

def shared_zeros(n1, n2):
    # create a 2D numpy array which can be then changed in different threads
    shared_array_base = mp.Array(ctypes.c_double, n1 * n2)
    shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
    shared_array = shared_array.reshape(n1, n2)
    return shared_array

class singleton:
    arr = None

def dosomething(i):
    # do something with singleton.arr
    singleton.arr[i,:] = i
    return i

def main():
    singleton.arr=shared_zeros(1000,1000)
    pool = mp.Pool(16)
    pool.map(dosomething, range(1000))

if __name__=='__main__':
    main()
Sign up to request clarification or add additional context in comments.

2 Comments

does it work? I don't understand the interest/trick of the singleton class. I have TypeError: 'NoneType' object does not support item assignment. I tried modifications with no results. Could you help me a bit further please?
My code does work on linux (verified). If you have windows, then I'm afraid you have do things differently. (because the singleton.arr value won't be inherited by the processes from the pool).
0

You can create an empty numpy.memmap array with the desired shape, and then use multiprocessing.Pool to populate its values. Doing it correctly would also keep memory footprint of each process in your pool relatively small.

1 Comment

See stackoverflow.com/questions/9964809/… , so I don't think this works

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.