Python tile based processing of numpy array

Question

I have a very large numpy array of size [256,256,256,256] which takes up about 8GB of ram.

I want to process the data quickly using multiprocessing, similar to the method used by tile based rendering software such as blender.

I would like to split my data into chunks of smaller size, create a list of processes for them and when one finishes, start working on the next. My current method uses a V split, which is then looped over to perform hsplit and get the chunks. Here is an example of the code for a much smaller data set (9 by 9 divided into 3 by 3 chunks):

import numpy as np

data = np.array(np.arange(0,81)).reshape((9,9))

print(data.shape)
print(data)
print("#"*30)
data = np.array([np.vsplit(set,3) for set in np.hsplit(data,3)])
print(data.shape)
print(data)
print("#"*30)

this doesn't quite do what I want it to, which is creating 9 chunks of 3 by 3 but that's a minor issue. The major issue is I don't think looping through an array like that to apply vsplit is very efficient, but I'm not aware of a built in function that does this automatically without looping.

I tried using map to see if it automatically applies vsplit in a more efficient way, but the timing results were very similar, so I don't think that's the case. Any suggestions on how to optimize this problem?

You may want to have a look at numexpr which AFAIK implements something very much like your strategy. — Paul Panzer
– Paul Panzer, Commented Nov 15, 2019 at 3:30
@Paul Panzer I still can't see how it helps with dividing the array into smaller chunks. That's where most of my performance increase is coming from. being able to fully utilize all CPU cores is a lot more important than a small improvement to the processing algorithm itself. — OM222O
– OM222O, Commented Nov 15, 2019 at 4:05
"NumExpr parses expressions into its own op-codes that are then used by an integrated computing virtual machine. The array operands are split into small chunks that easily fit in the cache of the CPU and passed to the virtual machine. The virtual machine then applies the operations on each chunk. It's worth noting that all temporaries and constants in the expression are also chunked. Chunks are distributed among the available cores of the CPU, resulting in highly parallelized code execution." That not similar to what you are trying? — Paul Panzer
– Paul Panzer, Commented Nov 15, 2019 at 4:48

Nils Werner · Accepted Answer · 2019-11-15 08:36:21Z

2

You can use SciKit-Image's skimage.util.view_as_blocks(), which solves your blocking problem without moving or copying any of the data.

skimage.util.view_as_blocks(data, (3, 3))

answered Nov 15, 2019 at 8:36

Nils Werner

37.2k7 gold badges85 silver badges108 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Daniel F · Accepted Answer · 2019-11-18 08:50:48Z

2

You want something as_strided based. I'd recommend my recipe here. Using this you can do:

patches = window_nd(data, window = 3, steps = 3)

To pass to multiprocessing, you'll need a generator. Something like:

def patch_gen(data, window, **kwargs):
    dims = len(data.shape)
    patches = window_nd(data, window, **kwargs)
    patches = patches.reshape((-1,) + patches.shape[-dims:])
    for i in range(patches.shape[0]):
        yield patches[i]

You can also use view_as_blocks as from @NilsWerner:

def patch_gen(data, window_shape):
    dims = len(data.shape)
    patches = skimage.util.view_as_blocks(data, window_shape)
    patches = patches.reshape((-1,) + patches.shape[-dims:])
    for i in range(patches.shape[0]):
        yield patches[i]

Then you can do something like:

with Pool(processors) as p:
    out = p.map(func, patch_gen(data, window = 3, steps = 3))

This method should only use the original memory of the array and not make a copy at any point (I think). That way you don't waste time and memory making copies of the original data, all the results are just pointers to the original data.

Caution: don't write to the patches! At least, in general it's a bad idea to write to as_strided results. If you don't have overlapping windows you might be ok, but as soon as your windows overlap (i.e. steps < window) you'll have a mess on your hands.

edited Nov 18, 2019 at 8:50

answered Nov 15, 2019 at 8:15

Daniel F

14.5k2 gold badges34 silver badges59 bronze badges

8 Comments

Daniel F Over a year ago

You can also output a list of windows from window_nd using outlist = True, but I'm thinking of changing that to a generator anyway since it can end up throwing MemoryErrors for larger use cases.

OM222O Over a year ago

thanks, this is exactly what I was looking for. However I'm not sure how to stack the data back to the original shape without copying the data. the final output would be a list of all the processed chunks. Here is an example I wrote: repl.it/repls/PlaintiveJointDrawing

Daniel F Over a year ago

@OM222O are you looking to change the data in-place or create a new output?

Daniel F Over a year ago

If you're absolutely sure you're always going to have non-intersecting blocks, you should be able to write back to the resulting patches in-place. Problems will happen if your patches overlap (you'll get race conditions and some other weird things I'm not enough of a CS guy to untangle). Use with caution!

Daniel F Over a year ago

Or you could use @NilsWerner 's answer below and build the generator for your p.map() as I did. view_as_blocks has more built-in guard rails than my recipe and is less likely to trip you up later.

|

Collectives™ on Stack Overflow

Python tile based processing of numpy array

2 Answers 2

Comments

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related