1

I'm trying to work out a more efficient way of doing this. Here's my problem: a have a (m, n, 2) numpy array. To make things clear, I will refer to the dimensions as population, samples, and for each sample the 0th column is frequency and the 1st column is amplitude. For each sample, some of the frequencies are repeated, but the amplitudes are different. What I want is an efficient way of selecting one (and only one) random amplitude for each frequency, and put this in an output array. Example to make things clear. Suppose the mth sample is:

1, 2
2, 3
2, 4
3, 5

and the output should be

1, 2
2, 4 (random choice between 3 and 4)
3, 5

Furthermore, the frequencies in the output array must be the ones present on another list called freq_compare. I have a working code, but it takes a while. If this helps, the frequencies are sorted but I don't know beforehand how many duplicates there will be (if any), nor which frequencies will be duplicated.

Here's what I have so far:

def make_dict(sample):
    """Produce a dictionary with the frequencies as keys and amplitudes as values."""
    per_freq = dict()
    freqs = list(set(sample[:,0]))# get list of all frequencies
    for f in freqs:
        per_freq[f] = [line[1] for line in sample if line[0] == f]
    return per_freq

output_random = np.zeros((m, len(freq_compare), 2))
for i in range(m):
    d = make_dict(all_data[i]) #original array
    keys = list(d.keys())
    for j in range(len(freq_compare)):
        if freq_compare[j] in keys:
            amp = np.random.choice(d[freq_compare[j]])
            output_random[i,j,:] = (freq_compare[j], amp)
        else:
            output_random[i,j,:] = (freq_compare[j], 0.0)

Doing this 10 times took about 15 minutes, for an array of shape (3000, 400, 2). Is there a more efficient way? Maybe building the dictionary as I iterate the lines?

Thanks a lot

1
  • You might want to try and use numpy.unique function, it can return unique indexes, which you can then use to extract unique rows. Not sure if it is going to be faster Commented Apr 30, 2020 at 16:32

1 Answer 1

1

IIUC:

import numpy

m = 3000

freq_compare = np.random.choice(np.arange(0,20), 8, replace=False)
output_random = np.zeros((m, len(freq_compare), 2))
a = np.random.randint(0, 20, (m, 400, 2))

i = 0
for r in a: 

    freqs = np.unique(r[:,0]) 
    d = [[f, np.random.choice(r[r[:,0]==f, 1])] if f in freqs else [f, 0] for f in freq_compare]

    output_random[i] = d
    i+=1

Which using %timeit in my local machine results in:

710 ms ± 97.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot, but this doesn't address the fact that some of the frequencies in the sample might not be in freq_compare. In this case, it should output (freq_compare[j], 0.0)
@bernie edited how d is computed to account for that aswell

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.