2

I have a 2-D array containing values and I would like to calculate the most frequent entry (i.e., the mode) from this data according to IDs in a second array.

data = np.array([[[ 0, 10, 50, 80, 80],
                  [10, 10, 50, 80, 90],
                  [10, 10, 50, 80, 90],
                  [50, 50, 80, 80, 80]])


ID = np.array([[[ 1,  1, 2, 3, 3],
                  [1, 1, 2, 3, 3],
                  [1, 1, 2, 3, 3],
                  [1, 2, 2, 2, 3]])


#Expected Result is:

[10 50 80]

The most frequent value in data array for ID=1 is 10, ID=2 is 50 and ID=3 is 80. I've been playing around with np.unique and combinations of np.bincount and np.argmax but I can't figure out how to get the result. Any help?

4 Answers 4

4

This is one possible vectorized way to do it, if you have integer data and the number of different values and groups is not too huge.

import numpy as np

# Input data
data = np.array([[[ 0, 10, 50, 80, 80],
                  [10, 10, 50, 80, 90],
                  [10, 10, 50, 80, 90],
                  [50, 50, 80, 80, 80]]])
ID = np.array([[[1, 1, 2, 3, 3],
                [1, 1, 2, 3, 3],
                [1, 1, 2, 3, 3],
                [1, 2, 2, 2, 3]]])
# Find unique data values and group ids with reverse indexing
data_uniq, data_idx = np.unique(data, return_inverse=True)
id_uniq, id_idx = np.unique(ID, return_inverse=True)
# Number of unique data values
n = len(data_uniq)
# Number of ids
m = len(id_uniq)
# Change indices so values of each group are within separate intervals
grouped = data_idx + (n * np.arange(m))[id_idx]
# Count repetitions and reshape
# counts[i, j] has the number of apparitions of the j-th value in the i-th group
counts = np.bincount(grouped, minlength=n * m).reshape(m, n)
# Get the modes from the counts
modes = data_uniq[counts.argmax(1)]
# Print result
for group, mode in zip(id_uniq, modes):
    print(f'Mode of {group}: {mode}')

Output:

Mode of 1: 10
Mode of 2: 50
Mode of 3: 80

A quick benchmark for a particular problem size:

import numpy as np
import scipy.stats

def find_group_modes_loop(data, ID):
    # Assume ids are given sequentially starting from 1
    m = ID.max()
    modes = np.empty(m, dtype=data.dtype)
    for id in range(m):
        modes[id] = scipy.stats.mode(data[ID == id + 1])[0][0]
    return modes

def find_group_modes_vec(data, ID):
    # Assume ids are given sequentially starting from 1
    data_uniq, data_idx = np.unique(data, return_inverse=True)
    id_uniq = np.arange(ID.max(), dtype=data.dtype)
    n = len(data_uniq)
    m = len(id_uniq)
    grouped = data_idx + (n * np.arange(m))[ID.ravel() - 1]
    counts = np.bincount(grouped, minlength=n * m).reshape(m, n)
    return data_uniq[counts.argmax(1)]

# Make data
np.random.seed(0)
data = np.random.randint(0, 1_000, size=10_000_000)
ID = np.random.randint(1, 100, size=10_000_000)
print(np.all(find_group_modes_loop(data, ID) == find_group_modes_vec(data, ID)))
# True
%timeit find_group_modes_loop(data, ID)
# 212 ms ± 647 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit find_group_modes_vec(data, ID)
# 122 ms ± 3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

So at least for some cases the vectorized solution can be significantly faster than looping.

Sign up to request clarification or add additional context in comments.

1 Comment

I like this vectorized approach best as it runs quicker and logically fits with my code. Thanks so much!
2

One approach is to use scipy mode

from scipy.stats import mode

uniq_ids = np.unique(ID)
modes = []

for id in uniq_ids:
    modes.append(mode(data[ID == id])[0][0])

modes

[10 50 80]

Comments

2

I have applied this approach in numpy, I hope this will solve your issue.

n,f=np.unique(data[np.where(ID == 1)],return_counts=True)

Output: (array([ 0, 10, 50]), array([1, 5, 1]))

The output is tuple of the values and their respective frequencies

You could get value with maximum frequencies like this

n[np.argmax(f)]

The proper solution will be:

res = [] for id in np.unique(ID): n,f = np.unique(data[np.where(ID == id)],return_counts=True) res.append(n[np.argmax(f)])

Comments

1

If you want a pure numpy solution, you can reinvent the wheel in @Kenan's loop:

def mode(x):
    n, c = np.unique(x, return_counts=True)
    return n[np.argmax(c)]

modes = [mode(data[ID == id]) for id in np.unique(IDs)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.