How to count the frequency of two columns in numpy array?

Question

In [56]: df
Out[56]:
array([[3, 133, nan, ..., 202, 109, 1427],
       [3, 133, nan, ..., 183, 120, 1448],
       [3, 133, nan, ..., 205, 22, 417],
       ...,
       [8, 43, nan, ..., 88, 11, 11],
       [64, 173, nan, ..., 2774, 2029, 1210],
       [12, 85, nan, ..., 19, 10, 25]], dtype=object)
collections.Counter(df[:,[0,1]])

df is the numpy array and I want to get the count of both the first and second columns, just like count(*) from df group by col-0, col-1 But is returns the error TypeError: unhashable type: 'numpy.ndarray' How can I realize it ?

Pandas is very slow and I don't tend to use it.

.. and expected output. A minimal reproducible sample case would be better. — Divakar
– Divakar, Commented Apr 4, 2018 at 6:54

jpp · Accepted Answer · 2018-04-04 08:37:12Z

1

Since you are using numpy, you can use numpy.unique for this:

a = np.array([  [1, 2, 3],
                [1, 4, 5],
                [5, 6, 7],
                [8, 9, 10]])

res = np.unique(a[:, :3], return_counts=True)
# (array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10]), array([2, 1, 1, 1, 2, 1, 1, 1, 1, 1], dtype=int64))

res_dict = dict(zip(*res))
# {1: 2, 2: 1, 3: 1, 4: 1, 5: 2, 6: 1, 7: 1, 8: 1, 9: 1, 10: 1}

answered Apr 4, 2018 at 8:37

jpp

166k37 gold badges301 silver badges362 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

nghiep truong · Accepted Answer · 2018-04-04 08:09:09Z

0

collection.Counter is for counting hashable objects, and 'numpy.ndarray' is unhashable, so we need to convert it to hashable object. For example,

>>> a = np.array([  [1, 2, 3],
        [1, 4, 5],
        [5, 6, 7],
        [8, 9, 10]])
>>> b = np.hsplit(a,3)[0]
>>> b
array([[1],
   [1],
   [5],
   [8]])
>>> c = b.flatten().tolist()
>>> c
[1, 1, 5, 8]
>>> collections.Counter(c)
>>> c
Counter({1: 2, 8: 1, 5: 1})

Hope this helps.

answered Apr 4, 2018 at 8:09

nghiep truong

3103 silver badges8 bronze badges

Comments

filippo · Accepted Answer · 2018-04-04 08:34:40Z

a = np.array([[4, 3, 2],
              [1, 0, 3],
              [1, 2, 3],
              [0, 1, 4],
              [0, 3, 3],
              [0, 2, 0],
              [1, 4, 3],
              [4, 1, 2],
              [0, 1, 3],
              [2, 1, 0]])

Pure numpy way:

In [8]: np.apply_along_axis(np.bincount, 0, a)
Out[8]: 
array([[4, 1, 2],
       [3, 4, 0],
       [1, 2, 2],
       [0, 2, 5],
       [2, 1, 1]])

With Pandas

df = pd.DataFrame(a)

In [10]: df[0].value_counts()
Out[10]: 
0    4
1    3
4    2
2    1

And if you want multiple columns at the same time:

In [11]: df.apply(pd.Series.value_counts, axis=0)
Out[11]: 
     0  1    2
0  4.0  1  2.0
1  3.0  4  NaN
2  1.0  2  2.0
3  NaN  2  5.0
4  2.0  1  1.0

You can also get rid of NaNs

In [12]: df.apply(pd.Series.value_counts, axis=0).fillna(0)
Out[12]: 
     0  1    2
0  4.0  1  2.0
1  3.0  4  0.0
2  1.0  2  2.0
3  0.0  2  5.0
4  2.0  1  1.0

Collectives™ on Stack Overflow

How to count the frequency of two columns in numpy array?

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related