2

Is there any way to do the following in purely numpy (or opencv)?

img = cv2.imread("test.jpg")
counts = defaultdict(int)
for row in img:
    for val in row:
        counts[tuple(val)] += 1

The problem is that tuple(val) can obviously be one of 2^24 different values so having an array for every possible value is not possible since it'd be gigantic and mostly zeros, so I need a more efficient data structure.

11
  • Sparse matrix perhaps Commented Nov 12, 2013 at 1:51
  • 1
    you could convert the hxwx3 matrix into a hxw by bitshifting two colors up and summing. Then using np.unique You could also probably use tuples instead of bitshifting and adding but that would probably make np.unique run a lot slower... Commented Nov 12, 2013 at 2:23
  • @Hammer Yes that seems easy enough to convert into a 32bit 2d matrix. But how does np.unique give me how often each color appears in the array? Commented Nov 12, 2013 at 2:29
  • 3
    See the question from earlier today here. He was looking for the maximum occurrence, but the principle is the same. Commented Nov 12, 2013 at 2:32
  • 1
    You don't need to do anything to the data: view it as a 24 bit void dtype, then run the np.unique / np.bincount combo on it. You'll need to view the return of np.unique as 3 uint8s and reshape it to (-1, 3) to make sense of the data, but it will be much faster, as nothing is done aside from viewing the exact same memory in a different way. Commented Nov 12, 2013 at 3:45

2 Answers 2

4

The fastest way around this, if the image is stored in "chunky" format, i.e. the color planes dimension is the last, and this last dimension is contiguous, is to take a np.void view of every 24bits pixel, then run the result through np.unique and np.bincount:

>>> arr = np.random.randint(256, size=(10, 10, 3)).astype(np.uint8)
>>> dt = np.dtype((np.void, arr.shape[-1]*arr.dtype.itemsize))
>>> if arr.strides[-1] != arr.dtype.itemsize:
...     arr = np.ascontiguousarray(arr)
... 
>>> arr_view = arr.view(dt)

The contents of arr_view look like garbage:

>>> arr_view [0, 0]
array([Â], 
      dtype='|V3')

But it's not us that have to understand the content:

>>> unq, _ = np.unique(arr_view, return_inverse=True)
>>> unq_cnts = np.bincount(_)
>>> unq = unq.view(arr.dtype).reshape(-1, arr.shape[-1])

And now you have the unique pixels and their counts in those two arrays:

>>> unq[:5]
array([[  0,  82,  78],
       [  6, 221, 188],
       [  9, 209,  85],
       [ 14, 210,  24],
       [ 14, 254,  88]], dtype=uint8)
>>> unq_cnts[:5]
array([1, 1, 1, 1, 1], dtype=int64)
Sign up to request clarification or add additional context in comments.

1 Comment

Interestingly enough when measuring the different performance characteristics I get the following timings: naive ~15s, your solution ~4s, HYRY's approach ~0.4s. Quite surprised myself but apparently numpy is just way more efficient when working on natural sized integers that the additional pass-through is easily amortized. The downside being that it takes more than double the memory, but that's acceptable in my case.
2

Here is my solution:

  • convert the image to an one-dim array with dtype=uint32
  • sort() the array
  • use diff() to find all the position that color changed.
  • use diff() again to find the count of every color.

the code:

In [50]:
from collections import defaultdict
import cv2
import numpy as np
img = cv2.imread("test.jpg")

In [51]:
%%time
counts = defaultdict(int)
for row in img:
    for val in row:
        counts[tuple(val)] += 1
Wall time: 1.29 s

In [53]:
%%time
img2 = np.concatenate((img, np.zeros_like(img[:, :, :1])), axis=2).view(np.uint32).ravel()
img2.sort()
pos = np.r_[0, np.where(np.diff(img2) != 0)[0] + 1]
count = np.r_[np.diff(pos), len(img2) - pos[-1]]
r, g, b, _ = img2[pos].view(np.uint8).reshape(-1, 4).T
colors = zip(r, g, b)
result = dict(zip(colors, count))
Wall time: 177 ms

In [49]:
counts == result
Out[49]:
True

If you can use pandas, you can call pandas.value_counts(), it's implemented in cython with hash table.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.