Fastest way to create and fill huge numpy 2D-array?

Question

I have to create and fill huge (e.g. 96 Go, 72000 rows * 72000 columns) array with floats in each case that come from mathematical formulas. The array will be computed after.

import itertools, operator, time, copy, os, sys
import numpy 
from multiprocessing import Pool


def f2(x):  # more complex mathematical formulas that change according to values in *i* and *x*
    temp=[]
    for i in combine:
        temp.append(0.2*x[1]*i[1]/64.23)
    return temp

def combinations_with_replacement_counts(n, r):  #provide all combinations of r balls in n boxes
   size = n + r - 1
   for indices in itertools.combinations(range(size), n-1):
       starts = [0] + [index+1 for index in indices]
       stops = indices + (size,)
       yield tuple(map(operator.sub, stops, starts))

global combine
combine = list(combinations_with_replacement_counts(3, 60))  #here putted 60 but need 350 instead
print len(combine)
if __name__ == '__main__':
    t1=time.time()
    pool = Pool()              # start worker processes
    results = [pool.apply_async(f2, (x,)) for x in combine]
    roots = [r.get() for r in results]
    print roots [0:3]
    pool.close()
    pool.join()
    print time.time()-t1

What's the fastest way to create and fill such huge numpy array? Filling lists then aggregate then convert into numpy array?
Can we parallelize computation knowing that cases/columns/rows of the 2d-array are independent to speed-up the filling of the array? Clues/trails to optimize such computation using Multiprocessing?

Does it need to be real-time or can you calculate it off-line and use e.g. pickle to read it? — Fredrik Pihl
– Fredrik Pihl, Commented Apr 22, 2013 at 16:25
I prefer to be real-time but if pickling is faster, I don't mind...hope that I understood well your question? — sol
– sol, Commented Apr 22, 2013 at 16:31

Community · Accepted Answer · 2017-05-23 11:59:17Z

1

I know that you can create shared numpy arrays that can be changed from different threads (assuming that the changed areas don't overlap). Here is the sketch of the code that you can use to do that (I saw the original idea somewhere on stackoverflow, edit: here it is https://stackoverflow.com/a/5550156/1269140 )

import multiprocessing as mp ,numpy as np, ctypes

def shared_zeros(n1, n2):
    # create a 2D numpy array which can be then changed in different threads
    shared_array_base = mp.Array(ctypes.c_double, n1 * n2)
    shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
    shared_array = shared_array.reshape(n1, n2)
    return shared_array

class singleton:
    arr = None

def dosomething(i):
    # do something with singleton.arr
    singleton.arr[i,:] = i
    return i

def main():
    singleton.arr=shared_zeros(1000,1000)
    pool = mp.Pool(16)
    pool.map(dosomething, range(1000))

if __name__=='__main__':
    main()

edited May 23, 2017 at 11:59

CommunityBot

11 silver badge

answered Apr 22, 2013 at 18:44

segasai

8,5581 gold badge31 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

sol Over a year ago

does it work? I don't understand the interest/trick of the singleton class. I have TypeError: 'NoneType' object does not support item assignment. I tried modifications with no results. Could you help me a bit further please?

segasai Over a year ago

My code does work on linux (verified). If you have windows, then I'm afraid you have do things differently. (because the singleton.arr value won't be inherited by the processes from the pool).

shx2 · Accepted Answer · 2013-04-22 17:32:14Z

0

You can create an empty numpy.memmap array with the desired shape, and then use multiprocessing.Pool to populate its values. Doing it correctly would also keep memory footprint of each process in your pool relatively small.

answered Apr 22, 2013 at 17:32

shx2

64.8k17 gold badges139 silver badges166 bronze badges

1 Comment

segasai Over a year ago

See stackoverflow.com/questions/9964809/… , so I don't think this works

Collectives™ on Stack Overflow

Fastest way to create and fill huge numpy 2D-array?

2 Answers 2

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related