2

Consider a 1-D NumPy input array and a sorted index array. The goal is to get the sum over the input array, a, but split by the indices defined in the index array.

Below are two approaches, but both of them require slow Python for loops. Is there a pure NumPy version without the need of Python for loops?

Example:

a = np.arange(20) # Input array
idxs = np.array([7, 15, 16]) # Index array

# Goal: Split a at index 7, 15 and 16 and
# compute sum for each partition

# Solution 1:
idxs_ext = np.concatenate(([0], idxs, [a.size]))
results = np.empty(idxs.size + 1)
for i in range(results.size):
    results[i] = a[idxs_ext[i]:idxs_ext[i+1]].sum()

# Solution 2:
result = np.array(
    [a_.sum() for a_ in np.split(a, idxs)]
)

# Result: array([21., 84., 15., 70.])
3
  • 2
    check out add.reduceat Commented Dec 12, 2021 at 12:33
  • would it be ok to use numba? Commented Dec 12, 2021 at 18:06
  • Of course, Numba is totally fine Commented Dec 13, 2021 at 12:05

1 Answer 1

3

At first, you can split the a array based on your idxs array by np.split and then apply your function on that as:

np.stack(np.vectorize(np.sum)(np.array(np.split(a, idxs), dtype=object)))

Another answer is using np.add.reduceat as it is mentioned by @hpaulj in the comments and is more faster:

np.add.reduceat(a, np.insert(idxs, 0, 0), axis=0)

Update:
Using np.concatenate instead of insert reduced the runtime 5 times for data range 1000 with 7 slices; It was the fastest method I have examined:

np.add.reduceat(a, np.concatenate(([0], idxs)), axis=0)
Sign up to request clarification or add additional context in comments.

5 Comments

You iterate over the splitted arrays in Python. The only difference to my example is that you use map instead of ist comprehension, which is, as far as I know, not faster for single process applications.
@Yannic Now it is pure numpy. In large data, numba and … could make your python loops very efficient. See this useful link on SO.
I have answered your question as you asked for pure NumPy. But in terms of speed, as it is mentioned in the link in my last comment, loops may be faster in some codes. I have tested your code and mine on the google colab TPU, your first solution is the fastest among them by using just @nb.jit(); which revealed 5 times faster comparing without numba on this small sample case: 100 loops, best of 5: 5.5 µs per loop.
Adding the reduce.at saves your answer (in my eyes). Though I'd use the OP's idxs_ext[:-1] instead of the insert. np.vectorize is nearly always slower than a list comprehension. It's worth noting that np.split is itself a python level iteration.
@hpaulj Yes, you are right. In terms of speed, np.add.reduceat is the correct answer. It did not seem that you want to write the answer, and just a hint in the comment. I apologize if I used it quickly in my answer (although I mentioned you). I have checked the speed with np.concatenate(([0], idxs)) instead np.insert(idxs, 0, 0), for a data range 1000 and 7 slices, and it was about 5 times faster than insert (4.7 µs per loop). I did not know such a thing until now about np.split. Thank you for this valuable point. I will be appreciated if you reference me to read more about it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.