7

I have to do many loops of the following type

for i in range(len(a)):
    for j in range(i+1):
        c[i] += a[j]*b[i-j]

where a and b are short arrays (of the same size, which is between about 10 and 50). This can be done efficiently using a convolution:

import numpy as np
np.convolve(a, b) 

However, this gives me the full convolution (i.e. the vector is too long, compared to the for loop above). If I use the 'same' option in convolve, I get the central part, but what I want is the first part. Of course, I can chop off what I don't need from the full vector, but I would like to get rid of the unnecessary computation time if possible. Can someone suggest a better vectorization of the loops?

5
  • If the arrays are short, then why bother? Is this the bottleneck in your code? Commented Oct 10, 2012 at 9:51
  • @larsmans yes it is the bottleneck. I know it may not seem like a lot, but in principle I think the speedup could be a factor of 2, which would be great. Also, it could be interesting if someone wants to do it with a larger array. Commented Oct 10, 2012 at 10:23
  • How short is 'short', and is either input static? Commented Oct 10, 2012 at 13:36
  • @mtrw sorry should have specified this - between 10 and 50 (edited in question). One of the inputs changes for every call, the other stays the same, so I suppose it could be made static? Commented Oct 10, 2012 at 14:12
  • You may consider using convolution facility provided by scipy: docs.scipy.org/doc/scipy/reference/generated/… Commented Oct 30, 2020 at 13:13

2 Answers 2

3

You could write a small C extension in Cython:

# cython: boundscheck=False
cimport numpy as np
import numpy as np  # zeros_like

ctypedef np.float64_t np_t
def convolve_cy_np(np.ndarray[np_t] a not None,
                   np.ndarray[np_t] b not None,
                   np.ndarray[np_t] c=None):
    if c is None:
       c = np.zeros_like(a)
    cdef Py_ssize_t i, j, n = c.shape[0]
    with nogil:
        for i in range(n):
            for j in range(i + 1):
                c[i] += a[j] * b[i - j]
    return c

It performs well for n=10..50 compared to np.convolve(a,b)[:len(a)] on my machine.

Also it seems like a job for numba.

Sign up to request clarification or add additional context in comments.

3 Comments

Yeah, I've never played with numba but it seems like it could be a great way to go.
Nice! It gives me approximately a 2x speedup factor. If I look at the generated C code, there is still some overhead in the for loops (checking for something < 0). Is it possible to get rid of this, too?
ok, I can set wraparound=False, but that does not improve speed. I suppose this is really the best answer. Thanks!
2

There is no way to do a convolution with vectorized array manipulations in numpy. Your best bet is to use np.convolve(a, b, mode='same') and trim off what you don't need. Thats going to probably be 10x faster than the double loop in pure python that you have above. You could also roll your own with Cython if you are really concerned about the speed, but it likely won't be much if any faster than np.convolve().

1 Comment

you are probably right that trimming off is the best solution. For my case I would do np.convolve(a,b)[:len(a)] (which is the 'full' mode, not the 'same' mode).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.