1

I have a very large numpy array of size [256,256,256,256] which takes up about 8GB of ram.

I want to process the data quickly using multiprocessing, similar to the method used by tile based rendering software such as blender.

I would like to split my data into chunks of smaller size, create a list of processes for them and when one finishes, start working on the next. My current method uses a V split, which is then looped over to perform hsplit and get the chunks. Here is an example of the code for a much smaller data set (9 by 9 divided into 3 by 3 chunks):

import numpy as np

data = np.array(np.arange(0,81)).reshape((9,9))

print(data.shape)
print(data)
print("#"*30)
data = np.array([np.vsplit(set,3) for set in np.hsplit(data,3)])
print(data.shape)
print(data)
print("#"*30)

this doesn't quite do what I want it to, which is creating 9 chunks of 3 by 3 but that's a minor issue. The major issue is I don't think looping through an array like that to apply vsplit is very efficient, but I'm not aware of a built in function that does this automatically without looping.

I tried using map to see if it automatically applies vsplit in a more efficient way, but the timing results were very similar, so I don't think that's the case. Any suggestions on how to optimize this problem?

3
  • You may want to have a look at numexpr which AFAIK implements something very much like your strategy. Commented Nov 15, 2019 at 3:30
  • @Paul Panzer I still can't see how it helps with dividing the array into smaller chunks. That's where most of my performance increase is coming from. being able to fully utilize all CPU cores is a lot more important than a small improvement to the processing algorithm itself. Commented Nov 15, 2019 at 4:05
  • "NumExpr parses expressions into its own op-codes that are then used by an integrated computing virtual machine. The array operands are split into small chunks that easily fit in the cache of the CPU and passed to the virtual machine. The virtual machine then applies the operations on each chunk. It's worth noting that all temporaries and constants in the expression are also chunked. Chunks are distributed among the available cores of the CPU, resulting in highly parallelized code execution." That not similar to what you are trying? Commented Nov 15, 2019 at 4:48

2 Answers 2

2

You can use SciKit-Image's skimage.util.view_as_blocks(), which solves your blocking problem without moving or copying any of the data.

skimage.util.view_as_blocks(data, (3, 3))
Sign up to request clarification or add additional context in comments.

Comments

2

You want something as_strided based. I'd recommend my recipe here. Using this you can do:

patches = window_nd(data, window = 3, steps = 3)

To pass to multiprocessing, you'll need a generator. Something like:

def patch_gen(data, window, **kwargs):
    dims = len(data.shape)
    patches = window_nd(data, window, **kwargs)
    patches = patches.reshape((-1,) + patches.shape[-dims:])
    for i in range(patches.shape[0]):
        yield patches[i]

You can also use view_as_blocks as from @NilsWerner:

def patch_gen(data, window_shape):
    dims = len(data.shape)
    patches = skimage.util.view_as_blocks(data, window_shape)
    patches = patches.reshape((-1,) + patches.shape[-dims:])
    for i in range(patches.shape[0]):
        yield patches[i]

Then you can do something like:

with Pool(processors) as p:
    out = p.map(func, patch_gen(data, window = 3, steps = 3))

This method should only use the original memory of the array and not make a copy at any point (I think). That way you don't waste time and memory making copies of the original data, all the results are just pointers to the original data.

Caution: don't write to the patches! At least, in general it's a bad idea to write to as_strided results. If you don't have overlapping windows you might be ok, but as soon as your windows overlap (i.e. steps < window) you'll have a mess on your hands.

8 Comments

You can also output a list of windows from window_nd using outlist = True, but I'm thinking of changing that to a generator anyway since it can end up throwing MemoryErrors for larger use cases.
thanks, this is exactly what I was looking for. However I'm not sure how to stack the data back to the original shape without copying the data. the final output would be a list of all the processed chunks. Here is an example I wrote: repl.it/repls/PlaintiveJointDrawing
@OM222O are you looking to change the data in-place or create a new output?
If you're absolutely sure you're always going to have non-intersecting blocks, you should be able to write back to the resulting patches in-place. Problems will happen if your patches overlap (you'll get race conditions and some other weird things I'm not enough of a CS guy to untangle). Use with caution!
Or you could use @NilsWerner 's answer below and build the generator for your p.map() as I did. view_as_blocks has more built-in guard rails than my recipe and is less likely to trip you up later.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.