Count of values in numpy.ndarray

Question

Is there any way to do the following in purely numpy (or opencv)?

img = cv2.imread("test.jpg")
counts = defaultdict(int)
for row in img:
    for val in row:
        counts[tuple(val)] += 1

The problem is that tuple(val) can obviously be one of 2^24 different values so having an array for every possible value is not possible since it'd be gigantic and mostly zeros, so I need a more efficient data structure.

you could convert the hxwx3 matrix into a hxw by bitshifting two colors up and summing. Then using np.unique You could also probably use tuples instead of bitshifting and adding but that would probably make np.unique run a lot slower... — Hammer
– Hammer, Commented Nov 12, 2013 at 2:23
@Hammer Yes that seems easy enough to convert into a 32bit 2d matrix. But how does np.unique give me how often each color appears in the array? — Voo
– Voo, Commented Nov 12, 2013 at 2:29
See the question from earlier today here. He was looking for the maximum occurrence, but the principle is the same. — Daniel
– Daniel, Commented Nov 12, 2013 at 2:32
You don't need to do anything to the data: view it as a 24 bit void dtype, then run the np.unique / np.bincount combo on it. You'll need to view the return of np.unique as 3 uint8s and reshape it to (-1, 3) to make sense of the data, but it will be much faster, as nothing is done aside from viewing the exact same memory in a different way. — Jaime
– Jaime, Commented Nov 12, 2013 at 3:45

Jaime · Accepted Answer · 2013-11-12 14:15:31Z

4

The fastest way around this, if the image is stored in "chunky" format, i.e. the color planes dimension is the last, and this last dimension is contiguous, is to take a np.void view of every 24bits pixel, then run the result through np.unique and np.bincount:

>>> arr = np.random.randint(256, size=(10, 10, 3)).astype(np.uint8)
>>> dt = np.dtype((np.void, arr.shape[-1]*arr.dtype.itemsize))
>>> if arr.strides[-1] != arr.dtype.itemsize:
...     arr = np.ascontiguousarray(arr)
... 
>>> arr_view = arr.view(dt)

The contents of arr_view look like garbage:

>>> arr_view [0, 0]
array([Â], 
      dtype='|V3')

But it's not us that have to understand the content:

>>> unq, _ = np.unique(arr_view, return_inverse=True)
>>> unq_cnts = np.bincount(_)
>>> unq = unq.view(arr.dtype).reshape(-1, arr.shape[-1])

And now you have the unique pixels and their counts in those two arrays:

>>> unq[:5]
array([[  0,  82,  78],
       [  6, 221, 188],
       [  9, 209,  85],
       [ 14, 210,  24],
       [ 14, 254,  88]], dtype=uint8)
>>> unq_cnts[:5]
array([1, 1, 1, 1, 1], dtype=int64)

answered Nov 12, 2013 at 14:15

Jaime

67.7k19 gold badges128 silver badges164 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Voo Over a year ago

Interestingly enough when measuring the different performance characteristics I get the following timings: naive ~15s, your solution ~4s, HYRY's approach ~0.4s. Quite surprised myself but apparently numpy is just way more efficient when working on natural sized integers that the additional pass-through is easily amortized. The downside being that it takes more than double the memory, but that's acceptable in my case.

HYRY · Accepted Answer · 2013-11-12 03:40:20Z

Here is my solution:

convert the image to an one-dim array with dtype=uint32
sort() the array
use diff() to find all the position that color changed.
use diff() again to find the count of every color.

the code:

In [50]:
from collections import defaultdict
import cv2
import numpy as np
img = cv2.imread("test.jpg")

In [51]:
%%time
counts = defaultdict(int)
for row in img:
    for val in row:
        counts[tuple(val)] += 1
Wall time: 1.29 s

In [53]:
%%time
img2 = np.concatenate((img, np.zeros_like(img[:, :, :1])), axis=2).view(np.uint32).ravel()
img2.sort()
pos = np.r_[0, np.where(np.diff(img2) != 0)[0] + 1]
count = np.r_[np.diff(pos), len(img2) - pos[-1]]
r, g, b, _ = img2[pos].view(np.uint8).reshape(-1, 4).T
colors = zip(r, g, b)
result = dict(zip(colors, count))
Wall time: 177 ms

In [49]:
counts == result
Out[49]:
True

If you can use pandas, you can call pandas.value_counts(), it's implemented in cython with hash table.

Collectives™ on Stack Overflow

Count of values in numpy.ndarray

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related