0

I am looking to count the number of unique rows in a 3D NumPy array. Take the following array:

a = np.array([[[1, 2], [1, 2], [2, 3]], [[2, 3], [2, 3], [3, 4]], [[1, 2], [1, 2], [1, 2]]])

My desired output is a 1-D array of the same length as axis 0 of the 3-D array. array([2, 2, 1]).

In this example, the output would be 2, 2, 1 because in the first grouping [1, 2] and [2, 3] are the unique values, in the second grouping [2, 3] and [3, 4] are the unique values, and in the third grouping [1, 2] is the "unique" values. Perhaps I'm using unique incorrectly in this context but that is what I'm looking to calculate.

The difficulty I'm having is that the count of unique rows will be different. If I use np.unique, the result is broadcast as shown below:

>>> np.unique(a, axis=1)
array([[[1, 2],
        [2, 3]],

       [[2, 3],
        [3, 4]],

       [[1, 2],
        [1, 2]]])

I know I can loop over each a 2D array and use np.apply_along_axis, as described in this answer.

However, I am dealing with arrays as large as (1 000 000, 256, 2), so I would prefer to avoid loops if this is possible.

5
  • Does this answer your question? Find unique rows in numpy.array (or also this answer) Commented Aug 28, 2021 at 22:36
  • 1
    I took a look at those and they work for 2D but not 3D array. I could do an apply or a list comprehension and count the unique rows in the 2D arrays but I'm trying to figure out if there is a faster way of doing so. Commented Aug 28, 2021 at 23:05
  • Indeed. Note the example is unclear to me: isn't the result the number of different unique rows? If not, why the last value is 1? Beside this, is the last dimension always 2 and are the values of a always integers and bounded in a specific known bound range? Commented Aug 29, 2021 at 10:16
  • They are always integers and the range is between 0 and a variable that is known at the time the counting is done (it would typically be somewhere between 10 and 40). Commented Aug 29, 2021 at 13:44
  • 1
    I also edited the question to try to make it clearer. Commented Aug 29, 2021 at 13:52

1 Answer 1

1

Calling np.unique for each 2D plan appear to be extremely slow. Actually, it is np.unique which is slow and not really the pure Python loop.

A better approach is to to that manually with Numba (using a dict). While this strategy is faster, it is not a silver-bullet. However, this implementation can be easily be parallelized to run significantly faster although dict accesses are not very fast. Here is the implementation:

import numpy as np
import numba as nb

@nb.njit('i4[::1](i4[:,:,::1])', parallel=True)
def compute_unique_count(data):
    n,m,o = data.shape
    assert o == 2
    res = np.empty(n, dtype=np.int32)
    for i in nb.prange(n):
        tmp = dict()
        for j in range(m):
            tmp[(data[i, j, 0], data[i, j, 1])] = True
        res[i] = len(tmp)
    return res

Since numbers are small bounded integers, there is an efficient method to make the computation much faster. Indeed, an associative table can be used to flag values already found where the index of the array is the key of the dict and the value of the array define either the key as been found so far. The array can be flatten for sake of performance using the expression id = point[0] * maxi + point[1] where maxi is the bound (all values are assumed to be strictly smaller than it). For sake of performance, branches are avoided and maxi should be a power of two typically >= 64 (due to low-level cache effects like cache line conflicts, cache thrashing and false-sharing). The resulting implementation is very fast.

@nb.njit('int32[::1](int32[:,:,::1])', parallel=True)
def compute_unique_count_fastest(data):
    n,m,o = data.shape
    assert o == 2
    maxi = 64
    threadCount = nb.get_num_threads()
    res = np.empty(n, dtype=np.int32)
    globalUniqueVals = np.zeros((threadCount, maxi * maxi), dtype=np.uint8)
    for i in nb.prange(n):
        threadId = nb.np.ufunc.parallel._get_thread_id()
        uniqueVals = globalUniqueVals[threadId]
        uniqueVals.fill(0) # Reset the associative table
        uniqueCount = 0
        for j in range(m):
            idx = data[i, j, 0] * maxi + data[i, j, 1]
            uniqueCount += uniqueVals[idx] == 0
            uniqueVals[idx] = 1
        res[i] = uniqueCount
    return res

Here are the timings on my machine (i5-9600KF with 6 cores) with an array of size (1_000_000, 256, 2) containing random 32-bit integers from 0 to 40:

np.unique in a comprehension list:  78 000 ms
compute_unique_count:                2 050 ms 
compute_unique_count_fastest:           57 ms

The last implementation is 1370 times faster than the naive implementation.

Sign up to request clarification or add additional context in comments.

1 Comment

I guess the naive version would be to use a set, instead a dict. This is also two times faster than the dict approach. But anyway your second version is still much faster...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.