3

I have an empty array:

empty = np.array([0, 0, 0, 0, 0])

an array of indices corresponding to positions in my array empty

ind = np.array([2, 3, 1, 2, 4, 2, 4, 2, 1, 1, 1, 2])

and an array of values

val = np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

I want to add the values in 'val' into 'empty' according to position given by 'ind'.

The non-vectorized solution is:

for i, v in zip(ind, val): maps[i] += v
>>> maps
[ 0.  4.  5.  1.  2.]

My actual arrays are multidimensional and loooong so i've got a NEED FOR SPEED I really want a vectorized solution, or a solution that is very fast.

Note this does not work:

maps[ind] += val
>>> maps
array([ 0.,  1.,  1.,  1.,  1.])

I'd be extra grateful for a solution that works in python 2.7, 3.5, 3.6 with no hiccups

1
  • 1
    it's true it is a duplicate. but my question title is much more clear Commented Feb 9, 2017 at 16:19

3 Answers 3

6

You can make use of np.add.at which operates equivalent to empty[ind] += val, except that results are accumulated for elements that are indexed more than once giving you a cumulated outcome for those indices.

>>> np.add.at(empty, ind, val)
>>> empty
array([0, 4, 5, 1, 2])
Sign up to request clarification or add additional context in comments.

Comments

2

What you are looking for is e=np.bincount(ind, weights=val, minlength=n) where n is the length of your empty array. That way you don't have to initialize empty. You only need to do this the first time, as afterward you can do e+=np.bincount(ind, weights=val)

This is at least twice as fast as np.add.at:

%timeit np.bincount(ind, val, minlength=empty.size)
The slowest run took 12.69 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.05 µs per loop

%timeit np.add.at(empty, ind, val)
The slowest run took 2822.05 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.32 µs per loop

As for multi-dimensional indices, you can do:

np.bincount(np.ravel_multi_index(ind, empty.shape), np.ravel(val), minlength=empty.size).reshape(empty.shape)

I'm not sure how to do this with np.add.at to compare speeds

4 Comments

Should this work if empty and val are multidimensional? Ex: empty.shape = (5,2,2) and val.shape = (10,2,2)?
Not as written, you'd need to ravel_multi_index your indices, ravel empty and val, and reshape the end results. At that point np.add.at is probably faster, or at least more pythonic. But that's not what you asked :)
It is not what I asked, you are right. I didnt expect it would matter. But thanks!
Added an implementation with bincount. I couldn't get np.add.at to take multiple indices, do you have a working code for it?
1

This is basically a histogram, so in the one-dimensional case:

h, b = np.histogram(ind, bins=np.arange(empty.size+1), weights=val)
empty += h

Of course you can leave out the second statement in case empty only has zeros.

1 Comment

I removed the part about np.bincount, because @DanielForsman already gave that answer, and I only saw after editing.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.