Counting occurrences of columns in numpy array

Question

Given a 2 x d dimensional numpy array M, I want to count the number of occurences of each column of M. That is, I'm looking for a general version of bincount.

What I tried so far: (1) Converted columns to tuples (2) Hashed tuples (via hash) to natural numbers (3) used numpy.bincount.

This seems rather clumsy. Is anybody aware of a more elegant and efficient way?

Interesting question. Looking forward to seeing any solutions because my first and only thought was exactly what you did. — Reti43
– Reti43, Commented Dec 12, 2015 at 1:18
So you are expecting a list of unique columns and their counts? Does the order of the columns have to be preserved? — ilyas patanam
– ilyas patanam, Commented Dec 12, 2015 at 3:52

eph · Accepted Answer · 2015-12-12 05:34:08Z

5

You can use collections.Counter:

>>> import numpy as np
>>> a = np.array([[ 0,  1,  2,  4,  5,  1,  2,  3],
...               [ 4,  5,  6,  8,  9,  5,  6,  7],
...               [ 8,  9, 10, 12, 13,  9, 10, 11]])
>>> from collections import Counter
>>> Counter(map(tuple, a.T))
Counter({(2, 6, 10): 2, (1, 5, 9): 2, (4, 8, 12): 1, (5, 9, 13): 1, (3, 7, 11):
1, (0, 4, 8): 1})

edited Dec 12, 2015 at 5:34

answered Dec 12, 2015 at 5:09

eph

2,04813 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

keshav Over a year ago

how can I do the same if it was a 3d array. Basically I have a 3 channel image, so instead of each element in the above example I have 3 digits.

Community · Accepted Answer · 2017-05-23 10:27:20Z

Given:

a = np.array([[ 0,  1,  2,  4,  5,  1,  2,  3],
              [ 4,  5,  6,  8,  9,  5,  6,  7],
              [ 8,  9, 10, 12, 13,  9, 10, 11]])
b = np.transpose(a)

A more efficient solution than hashing (still requires manipulation):

I create a view of the array with the flexible data type np.void (see here) such that each row becomes a single element. Converting to this shape will allow np.unique to operate on it.

%%timeit    
c = np.ascontiguousarray(b).view(np.dtype((np.void, b.dtype.itemsize*b.shape[1])))
_, index, counts = np.unique(c, return_index = True, return_counts = True)
#counts are in the last column, remember original array is transposed
>>>np.concatenate((b[idx], cnt[:, None]), axis = 1)
array([[ 0,  4,  8,  1],
       [ 1,  5,  9,  2],
       [ 2,  6, 10,  2],
       [ 3,  7, 11,  1],
       [ 4,  8, 12,  1],
       [ 5,  9, 13,  1]])
10000 loops, best of 3: 65.4 µs per loop

The counts appended to the unique columns of a.

Your hashing solution.

%%timeit
array_hash = [hash(tuple(row)) for row in b]
uniq, index, counts = np.unique(array_hash, return_index= True, return_counts = True)
np.concatenate((b[idx], cnt[:, None]), axis = 1)
10000 loops, best of 3: 89.5 µs per loop

Update: Eph's solution is the most efficient and elegant.

%%timeit
Counter(map(tuple, a.T))
10000 loops, best of 3: 38.3 µs per loop

Collectives™ on Stack Overflow

Counting occurrences of columns in numpy array

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related