Get the frequencies of values from one numpy array in another array

Question

I have two numpy arrays, for example:

import numpy as np
a1 = np.linspace(0,2*np.pi,101)
a2 = np.random.choice(a1, 60)

I need to count how many times each value from a1 appears in a2. I can do it with a loop but I was hoping for a better solution.

Solution with a loop:

a3 = np.zeros_like(a1)
for i in range(len(a1)):
    a3[i] = np.sum(a2==a1[i])

What should your output look like? An array the size of a1 with the count at each index? — roganjosh
– roganjosh, Commented Jun 2, 2018 at 10:26

Paul Panzer · Accepted Answer · 2018-06-02 11:11:19Z

2

Another np.unique approach:

>>> import numpy as np
>>> a1 = np.linspace(0,2*np.pi,101)
>>> a2 = np.random.choice(a1, 60)
>>> 
>>> unq, idx, cnts = np.unique(np.concatenate([a1, a2]), return_inverse=True, return_counts=True)
>>> assert np.all(unq[idx[:len(a1)]] == a1)
>>> result = cnts[idx[:len(a1)]] - 1
>>> result
array([0, 0, 2, 0, 2, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0,
       0, 1, 0, 0, 0, 1, 1, 1, 2, 0, 0, 1, 2, 1, 0, 2, 0, 0, 0, 1, 0, 2,
       0, 1, 2, 1, 2, 0, 0, 1, 0, 0, 0, 0, 0, 4, 0, 0, 0, 1, 1, 1, 0, 0,
       2, 0, 0, 1, 0, 0, 2, 0, 3, 0, 0, 0, 1, 1, 2, 0, 0, 0, 1, 1, 1, 1,
       0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 2, 1])

edited Jun 2, 2018 at 11:11

answered Jun 2, 2018 at 11:03

Paul Panzer

53.3k3 gold badges60 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Andrey Chetverikov Over a year ago

I like your solution very much, it's simple and clever

Andrey Chetverikov · Accepted Answer · 2018-06-02 11:45:53Z

Here's a performance comparison for different solutions using perfplot (https://github.com/nschloe/perfplot):

import perfplot
import numpy as np
from collections import Counter

def count_a_in_b_loop(b, a = np.linspace(0,2*np.pi,101)):
    c = np.zeros_like(a)
    for i in range(len(a)):
        c[i] = np.sum(b==a[i])
    return c

def count_a_in_b_counter(b, a=np.linspace(0,2*np.pi,101)):
    c = Counter(b)
    c = np.array([(c[k] if k in c else 0)  for k in a])
    return c

def count_occ(a2,a1=np.linspace(0,2*np.pi,101),use_closeness=True):
    # Trace back indices for each elem of a2 in a1
    idx = np.searchsorted(a1,a2)
    # Set out of bounds indices to something within
    idx[idx==len(a1)] = 0
    # Check for the matches
    if use_closeness==1:
        mask = np.isclose(a1[idx],a2)
    else:
        mask = a1[idx] == a2
    # Get counts
    return np.bincount((idx+1)*mask,minlength=len(a1)+1)[1:]

def count_broadcasting(a2, a1=np.linspace(0,2*np.pi,101)):
    (a1[:,None]==a2).sum(1) # For exact matches
    return np.isclose(a1[:,None],a2).sum(1) # For close matches

def count_occ_rounding(a2, a1_lastnum=2*np.pi, a1_num_sample=101):
    s = a1_lastnum/(a1_num_sample-1)
    p = np.round(a2/s).astype(int)
    return np.bincount(p,minlength=a1_num_sample)

def count_add_to_unique(a2, a1=np.linspace(0,2*np.pi,101)):
    unq, idx, cnts = np.unique(np.concatenate([a1, a2]), return_inverse = True, return_counts = True)
    #assert np.all(unq[idx[:len(a1)]] == a1)
    return cnts[idx[:len(a1)]] - 1

perfplot.show(
        setup=lambda n: np.random.choice(np.linspace(0,2*np.pi,101), n),
        kernels=[
            count_a_in_b_loop, count_a_in_b_counter, count_occ, count_broadcasting, count_occ_rounding, add_to_unique
            ],
        labels=['loop', 'counter','searchsorted','broadcasting','occ_rounding','add_to_unique'],
        n_range=[2**k for k in range(15)],
        xlabel='len(a)'
        )

Divakar · Accepted Answer · 2018-06-02 11:40:02Z

Approach #1

One vectorized solution based on np.searchsorted and incorporating for closeness for floating-pt numbers, would be -

def count_occ(a1,a2,use_closeness=True):
    # Trace back indices for each elem of a2 in a1
    idx = np.searchsorted(a1,a2)

    # Set out of bounds indices to something within
    idx[idx==len(a1)] = 0

    # Check for the matches
    if use_closeness==1:
        mask = np.isclose(a1[idx],a2)
    else:
        mask = a1[idx] == a2

    # Get counts
    return np.bincount((idx+1)*mask,minlength=len(a1)+1)[1:]

Sample run -

In [154]: a1 = np.array([1.0000001,4,5,6])

In [155]: a2 = np.array([2,5,8,5,8,5,0.999999999999])

In [156]: count_occ(a1,a2)
Out[156]: array([1, 0, 3, 0])

In [157]: count_occ(a1,a2,use_closeness=False)
Out[157]: array([0, 0, 3, 0])

Approach #2

Alternatively, we can also use broadcasting for a short but memory-intensive method, like so -

(a1[:,None]==a2).sum(1) # For exact matches
np.isclose(a1[:,None],a2).sum(1) # For close matches

Approach #3 :Specific case with a1 as lin-spaced data

For the Specific case when a1 is a lin-spaced array and again considering the closeness, we can optimize further using rounding of a2 data, like so -

def count_occ_rounding(a2, a1_startnum=0,a1_lastnum=2*np.pi, a1_num_sample=101):
    s = (a1_lastnum-a1_startnum)/(a1_num_sample-1)
    p = np.round((a2 - a1_startnum)/s).astype(int)    
    return np.bincount(p,minlength=a1_num_sample)

Sample run to verify output with generic start, end range array for a1 -

In [284]: a1 = np.linspace(-2*np.pi,2*np.pi,201)
     ...: a2 = np.random.choice(a1, 60)
     ...: out1 = count_occ_rounding(a2, -2*np.pi, 2*np.pi, 201)
     ...: out2 = np.isclose(a1[:,None],a2).sum(1)
     ...: print np.allclose(out1, out2)
True

jpp · Accepted Answer · 2018-06-02 10:39:29Z

1

If I understand your problem correctly, this is one way using np.unique and np.isin:

import numpy as np

a1 = np.linspace(0,2*np.pi,101)
a2 = np.random.choice(a1, 60)

vals_counts = np.unique(a2, return_counts=True)

arr = np.array(list(zip(*vals_counts)))

print(arr.shape)
# (46, 2)

res = arr[np.where(np.isin(arr[:, 0], a1))]

print(res.shape)
# (46, 2)

print(res)

[[ 0.06283185  1.        ]
 [ 0.12566371  1.        ]
 ...
 [ 5.65486678  3.        ]
 [ 5.96902604  2.        ]
 [ 6.09468975  1.        ]
 [ 6.1575216   1.        ]
 [ 6.28318531  1.        ]]

answered Jun 2, 2018 at 10:39

jpp

166k37 gold badges301 silver badges363 bronze badges

2 Comments

Andrey Chetverikov Over a year ago

Thanks! But it looks like it only provides frequencies for values with above zero counts, while I'm looking for the frequencies for all values

jpp Over a year ago

Ah sorry - I suggest you go for PaulPanzer's or Divaker's approach.

Collectives™ on Stack Overflow

Get the frequencies of values from one numpy array in another array

4 Answers 4

1 Comment

Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related