Efficient way of selecting random entries in a multidimensional numpy array

Question

I'm trying to work out a more efficient way of doing this. Here's my problem: a have a (m, n, 2) numpy array. To make things clear, I will refer to the dimensions as population, samples, and for each sample the 0th column is frequency and the 1st column is amplitude. For each sample, some of the frequencies are repeated, but the amplitudes are different. What I want is an efficient way of selecting one (and only one) random amplitude for each frequency, and put this in an output array. Example to make things clear. Suppose the mth sample is:

1, 2
2, 3
2, 4
3, 5

and the output should be

1, 2
2, 4 (random choice between 3 and 4)
3, 5

Furthermore, the frequencies in the output array must be the ones present on another list called freq_compare. I have a working code, but it takes a while. If this helps, the frequencies are sorted but I don't know beforehand how many duplicates there will be (if any), nor which frequencies will be duplicated.

Here's what I have so far:

def make_dict(sample):
    """Produce a dictionary with the frequencies as keys and amplitudes as values."""
    per_freq = dict()
    freqs = list(set(sample[:,0]))# get list of all frequencies
    for f in freqs:
        per_freq[f] = [line[1] for line in sample if line[0] == f]
    return per_freq

output_random = np.zeros((m, len(freq_compare), 2))
for i in range(m):
    d = make_dict(all_data[i]) #original array
    keys = list(d.keys())
    for j in range(len(freq_compare)):
        if freq_compare[j] in keys:
            amp = np.random.choice(d[freq_compare[j]])
            output_random[i,j,:] = (freq_compare[j], amp)
        else:
            output_random[i,j,:] = (freq_compare[j], 0.0)

Doing this 10 times took about 15 minutes, for an array of shape (3000, 400, 2). Is there a more efficient way? Maybe building the dictionary as I iterate the lines?

Thanks a lot

You might want to try and use numpy.unique function, it can return unique indexes, which you can then use to extract unique rows. Not sure if it is going to be faster — sleepyhead
– sleepyhead, Commented Apr 30, 2020 at 16:32

FBruzzesi · Accepted Answer · 2020-04-30 18:12:12Z

1

IIUC:

import numpy

m = 3000

freq_compare = np.random.choice(np.arange(0,20), 8, replace=False)
output_random = np.zeros((m, len(freq_compare), 2))
a = np.random.randint(0, 20, (m, 400, 2))

i = 0
for r in a: 

    freqs = np.unique(r[:,0]) 
    d = [[f, np.random.choice(r[r[:,0]==f, 1])] if f in freqs else [f, 0] for f in freq_compare]

    output_random[i] = d
    i+=1

Which using %timeit in my local machine results in:

710 ms ± 97.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

edited Apr 30, 2020 at 18:12

answered Apr 30, 2020 at 16:41

FBruzzesi

6,6143 gold badges19 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

bernie Over a year ago

Thanks a lot, but this doesn't address the fact that some of the frequencies in the sample might not be in freq_compare. In this case, it should output (freq_compare[j], 0.0)

FBruzzesi Over a year ago

@bernie edited how d is computed to account for that aswell

Collectives™ on Stack Overflow

Efficient way of selecting random entries in a multidimensional numpy array

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related