Cycling through array elements efficiently in python

Question

I'm trying to sum the elements of separate data array by their characteristics efficiently. I have three identifying characteristics (age, year, and cause) in a given array, and for each age, year, cause, I have 1000 values. I need to add those values to another data array when the characteristics are the same. For now, I'm doing something like this where each datasets is ~ (80000, 1000):

import numpy as np
datasets = np.vstack(dataset1, dataset2)
for a in ages:
    for y in years:
        for c in causes:
             output = np.sum(datasets[(age==a) & (year==y) & (cause==c)], axis = 0)

However, with 60,000 iterations, this is incredibly slow. The challenge is that the arrays don't necessarily all have the same shape. Any thoughts?

I thought of the groupby function of matplotlib.mlab. This would be something like this: matplotlib.mlab.rec_groupby(datasets, groupby = ('age', 'year', 'cause', ), stats = (('values', np.sum, 'sums' ), )) with a structured array with age, year and cause as fields and values as a field with an array of length 1000. But the problem is that I have not figured out how you can pass the axis = 0 argument with this. Because now it sums all 1000 values of the different rows to one total sum. — joris
– joris, Commented Sep 12, 2011 at 21:00
I found a great result here: stackoverflow.com/questions/7416901/… — mike
– mike, Commented Sep 17, 2011 at 4:47
Ten years after asking this question, things have changed: Nowadays there is this package doing the job for you: github.com/ml31415/numpy-groupies — Michael
– Michael, Commented Jun 10, 2022 at 19:10

Carl F. · Accepted Answer · 2011-09-12 03:08:09Z

2

I'd recommend something like accumarray. Your output should be a 3-dimensional data cube where each dimension corresponds to a variable (age, year, cause). Each index in each dimension corresponds to a unique value in your input lists. You can then use something like this cookbook example to accumulate the datasets variable into the appropriate bins using age, year, and cause.

You might also consider using a proper relational database. They're quite fast at these sorts of things. Python ships with sqlite3 as a part of the core. Unfortunately, it's a rather steep learning curve if you've never worked with a relational database before. You'll want to use the group and aggregate functionality.

edited Sep 12, 2011 at 3:08

answered Sep 12, 2011 at 2:33

Carl F.

7,0843 gold badges31 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Community · Accepted Answer · 2017-05-23 10:34:29Z

0

SEE LINK BELOW

I'm not sure how to properly link another answer to this answer. When I tried one sentence followed by the link, it converted the answer to a comment. I'm now being long-winded to try and make stack-overflow think that this text is long enough to constitute an answer. Here is the link to a great answer to this question.

Summing Arrays by Characteristics in Python

edited May 23, 2017 at 10:34

CommunityBot

11 silver badge

answered Sep 18, 2011 at 20:35

mike

24k32 gold badges82 silver badges100 bronze badges

Collectives™ on Stack Overflow

Cycling through array elements efficiently in python

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related