0

I have a 3D numpy array data where dimensions a and b represent the resolution of an image and c is the image/frame number. I want to call np.histogram on each pixel (a and b combination) across the c dimension, with an output array of dimension (a, b, BINS). I've accomplished this task with a nested loop, but how can I vectorize this operation?

hists = np.zeros((a, b, BINS))
for row in range(a):
    for column in range(b):
        hists[row, column, :] = np.histogram(data[row, column, :], bins=BINS)[0]

I am confident that the solution is trivial, nonetheless all help is appreciated :)

1

2 Answers 2

3

np.histogram computes over the flattened array. However, you could use np.apply_along_axis.

np.apply_along_axis(lambda a: np.histogram(a, bins=BINS)[0], 2, data)
Sign up to request clarification or add additional context in comments.

1 Comment

Note that while np.apply_along_axis is simple to use, it is generally not much faster than loops due to the pure-python lambda (that cannot be truly vectorized internally). In fact, is slightly slower on my machine.
1

This is interesting problem.

Make a Minimal Working Example (MWE)

It should be the main habit in asking questions on SO.

a, b, c = 2, 3, 4
data = np.random.randint(10, size=(a, b, c))
hists = np.zeros((a, b, c), dtype=int)
for row in range(a):
    for column in range(b):
        hists[row, column, :] = np.histogram(data[row, column, :], bins=c)[0]

data
>>> array([[[6, 4, 3, 3],
            [7, 3, 8, 0],
            [1, 5, 8, 0]],

           [[5, 5, 7, 8],
            [3, 2, 7, 8],
            [6, 8, 8, 0]]])
hists
>>> array([[[2, 1, 0, 1],
            [1, 1, 0, 2],
            [2, 0, 1, 1]],

           [[2, 0, 1, 1],
            [2, 0, 0, 2],
            [1, 0, 0, 3]]])

Make it as simple as possible (but still working)

You can eliminate one loop and simplify it:

new_data = data.reshape(a*b, c)
new_hists = np.zeros((a*b, c), dtype=int)

for row in range(a*b):
    new_hists[row, :] = np.histogram(new_data[row, :], bins=c)[0]

new_hists
>>> array([[2, 1, 0, 1],
           [1, 1, 0, 2],
           [2, 0, 1, 1],
           [2, 0, 1, 1],
           [2, 0, 0, 2],
           [1, 0, 0, 3]])

new_data
>>> array([[6, 4, 3, 3],
           [7, 3, 8, 0],
           [1, 5, 8, 0],
           [5, 5, 7, 8],
           [3, 2, 7, 8],
           [6, 8, 8, 0]])

Can you find a similar problems and use keypoints of their solution?

In general, you can't vectorise something like that is being done in loop:

for row in array:
    some_operation(row)

Except the cases you can call another vectorised operation on flattened array and then move it back to the initial shape:

arr = array.ravel()
another_operation(arr)
out = arr.reshape(array.shape)

It looks you're fortunate with np.histogram because I'm pretty sure similar things have been done before.

Final solution

new_data = data.reshape(a*b, c)
m, M = new_data.min(axis=1), new_data.max(axis=1)
bins = (c * (new_data - m[:, None]) // (M-m)[:, None])
out = np.zeros((a*b, c+1), dtype=int)
advanced_indexing = np.repeat(np.arange(a*b), c), bins.ravel()
np.add.at(out, advanced_indexing, 1)
out.reshape((a, b, -1))
>>> array([[[2, 1, 0, 0, 1],
            [1, 1, 0, 1, 1],
            [2, 0, 1, 0, 1]],

           [[2, 0, 1, 0, 1],
            [2, 0, 0, 1, 1],
            [1, 0, 0, 1, 2]]])

Note that it adds an extra bin in each histogram and puts max values in it but I hope it's not hard to fix if you need.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.