I'm trying to work out a more efficient way of doing this.
Here's my problem: a have a (m, n, 2) numpy array. To make things clear, I will refer to the dimensions as population, samples, and for each sample the 0th column is frequency and the 1st column is amplitude. For each sample, some of the frequencies are repeated, but the amplitudes are different. What I want is an efficient way of selecting one (and only one) random amplitude for each frequency, and put this in an output array.
Example to make things clear. Suppose the mth sample is:
1, 2
2, 3
2, 4
3, 5
and the output should be
1, 2
2, 4 (random choice between 3 and 4)
3, 5
Furthermore, the frequencies in the output array must be the ones present on another list called freq_compare. I have a working code, but it takes a while. If this helps, the frequencies are sorted but I don't know beforehand how many duplicates there will be (if any), nor which frequencies will be duplicated.
Here's what I have so far:
def make_dict(sample):
"""Produce a dictionary with the frequencies as keys and amplitudes as values."""
per_freq = dict()
freqs = list(set(sample[:,0]))# get list of all frequencies
for f in freqs:
per_freq[f] = [line[1] for line in sample if line[0] == f]
return per_freq
output_random = np.zeros((m, len(freq_compare), 2))
for i in range(m):
d = make_dict(all_data[i]) #original array
keys = list(d.keys())
for j in range(len(freq_compare)):
if freq_compare[j] in keys:
amp = np.random.choice(d[freq_compare[j]])
output_random[i,j,:] = (freq_compare[j], amp)
else:
output_random[i,j,:] = (freq_compare[j], 0.0)
Doing this 10 times took about 15 minutes, for an array of shape (3000, 400, 2). Is there a more efficient way? Maybe building the dictionary as I iterate the lines?
Thanks a lot