I'm trying to sum the elements of separate data array by their characteristics efficiently. I have three identifying characteristics (age, year, and cause) in a given array, and for each age, year, cause, I have 1000 values. I need to add those values to another data array when the characteristics are the same. For now, I'm doing something like this where each datasets is ~ (80000, 1000):
import numpy as np
datasets = np.vstack(dataset1, dataset2)
for a in ages:
for y in years:
for c in causes:
output = np.sum(datasets[(age==a) & (year==y) & (cause==c)], axis = 0)
However, with 60,000 iterations, this is incredibly slow. The challenge is that the arrays don't necessarily all have the same shape. Any thoughts?
matplotlib.mlab.rec_groupby(datasets, groupby = ('age', 'year', 'cause', ), stats = (('values', np.sum, 'sums' ), ))with a structured array with age, year and cause as fields and values as a field with an array of length 1000. But the problem is that I have not figured out how you can pass theaxis = 0argument with this. Because now it sums all 1000 values of the different rows to one total sum.