4

I have a 2d numpy array (6 x 6) elements. I want to create another 2D array out of it, where each block is the average of all elements within a blocksize window. Currently, I have the foll. code:

import os, numpy

def avg_func(data, blocksize = 2):
    # Takes data, and averages all positive (only numerical) numbers in blocks
    dimensions = data.shape

    height = int(numpy.floor(dimensions[0]/blocksize))
    width = int(numpy.floor(dimensions[1]/blocksize))
    averaged = numpy.zeros((height, width))

    for i in range(0, height):
        print i*1.0/height
        for j in range(0, width):
            block = data[i*blocksize:(i+1)*blocksize,j*blocksize:(j+1)*blocksize]
            if block.any():
                averaged[i][j] = numpy.average(block[block>0])

    return averaged

arr = numpy.random.random((6,6))
avgd = avg_func(arr, 3)

Is there any way I can make it more pythonic? Perhaps numpy has something which does it already?

UPDATE

Based on M. Massias's soln below, here is an update with fixed values replaced by variables. Not sure if it is coded right. it does seem to work though:

dimensions = data.shape 
height = int(numpy.floor(dimensions[0]/block_size)) 
width = int(numpy.floor(dimensions[1]/block_size)) 

t = data.reshape([height, block_size, width, block_size]) 
avrgd = numpy.mean(t, axis=(1, 3))
2
  • could you supply an input with expected output? your code is not very self-explanatory ^^ Commented Aug 31, 2015 at 20:05
  • modified code to be inclusive of sample input and expected output Commented Aug 31, 2015 at 20:12

1 Answer 1

2

To compute some operation slice by slice in numpy, it is very often useful to reshape your array and use extra axes.

To explain the process we'll use here: you can reshape your array, take the mean, reshape it again and take the mean again. Here I assume blocksize is 2

t = np.array([[0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5],[0, 1, 2, 3, 4, 5],[0, 1, 2, 3, 4, 5],[0, 1, 2, 3, 4, 5],[0, 1, 2, 3, 4, 5],])
t = t.reshape([6, 3, 2])
t = np.mean(t, axis=2)
t = t.reshape([3, 2, 3])
np.mean(t, axis=1)

outputs

array([[ 0.5,  2.5,  4.5],
       [ 0.5,  2.5,  4.5],
       [ 0.5,  2.5,  4.5]])

Now that it's clear how this works, you can do it in one pass only:

t = t.reshape([3, 2, 3, 2])
np.mean(t, axis=(1, 3))

works too (and should be quicker since means are computed only once - I guess). I'll let you substitute height/blocksize, width/blocksize and blocksize accordingly.

See @askewcan nice remark on how to generalize this to any dimension.

Sign up to request clarification or add additional context in comments.

6 Comments

To generalize beyond 2D (and to let blocksize be different along each axis), use a.reshape(shape).mean(axes) where shape = itertools.chain(*np.broadcast(a.shape/blocksize, blocksize)) and axes = tuple(range(1, 2*a.ndim, 2))
thanks! @askewchan, I am getting the foll. error: TypeError: unsupported operand type(s) for /: 'tuple' and 'int', for the line: shape = itertools.chain(*numpy.broadcast(data.shape/block_size, block_size)).
Oh, sorry, that assumes blocksize to be a numpy array.
Hi M. Massias! Thanks for the solution. I am having a tough time replacing the constants in your solution by the variables. Would this work: dimensions = data.shape height = int(numpy.floor(dimensions[0]/block_size)) width = int(numpy.floor(dimensions[1]/block_size)) t = data.reshape([height, block_size, width, block_size]) avrgd = numpy.mean(t, axis=(1, 3))
@user308827 This code works fine for me. What's going wrong, do you get unexpected values, error messages? You can use // for integer division instead of calling np.floor.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.