5

I'm trying to implement convolutional neural network in Python.
However, when I use signal.convolve or np.convolve, it can not do convolution on X, Y(X is 3d, Y is 2d). X are training minibatches. Y are filters. I don't want to do for loop for every training vector like:

for i in xrange(X.shape[2]):
    result = signal.convolve(X[:,:,i], Y, 'valid')
    ....

So, is there any function I can use to do convolution efficiently?

0

1 Answer 1

5

Scipy implements standard N-dimensional convolutions, so that the matrix to be convolved and the kernel are both N-dimensional.

A quick fix would be to add an extra dimension to Y so that Y is 3-Dimensional:

result = signal.convolve(X, Y[..., None], 'valid')

I'm assuming here that the last axis corresponds to the image index as in your example [width, height, image_idx] (or [height, width, image_idx]). If it is the other way around and the images are indexed in the first axis (as it is more common in C-ordering arrays) you should replace Y[..., None] with Y[None, ...].

The line Y[..., None] will add an extra axis to Y, making it 3-dimensional [kernel_width, kernel_height, 1] and thus, converting it to a valid 3-Dimensional convolution kernel.

NOTE: This assumes that all your input mini-batches have the same width x height, which is standard in CNN's.


EDIT: Some timings as @Divakar suggested.

The testing framework is setup as follows:

def test(S, N, K):
    """ S: image size, N: num images, K: kernel size"""
    a = np.random.randn(S, S, N)
    b = np.random.randn(K, K)
    valid = [slice(K//2, -K//2+1), slice(K//2, -K//2+1)]

    %timeit signal.convolve(a, b[..., None], 'valid')
    %timeit signal.fftconvolve(a, b[..., None], 'valid')
    %timeit ndimage.convolve(a, b[..., None])[valid]

Find bellow tests for different configurations:

  • Varying image size S:

    >>> test(100, 50, 11) # 100x100 images
    1 loop, best of 3: 909 ms per loop
    10 loops, best of 3: 116 ms per loop
    10 loops, best of 3: 54.9 ms per loop
    
    >>> test(1000, 50, 11) # 1000x1000 images
    1 loop, best of 3: 1min 51s per loop
    1 loop, best of 3: 16.5 s per loop
    1 loop, best of 3: 5.66 s per loop
    
  • Varying number of images N:

    >>> test(100, 5, 11) # 5 images
    10 loops, best of 3: 90.7 ms per loop
    10 loops, best of 3: 26.7 ms per loop
    100 loops, best of 3: 5.7 ms per loop
    
    >>> test(100, 500, 11) # 500 images
    1 loop, best of 3: 9.75 s per loop
    1 loop, best of 3: 888 ms per loop
    1 loop, best of 3: 727 ms per loop
    
  • Varying kernel size K:

    >>> test(100, 50, 5) # 5x5 kernels
    1 loop, best of 3: 217 ms per loop
    10 loops, best of 3: 100 ms per loop
    100 loops, best of 3: 11.4 ms per loop
    
    >>> test(100, 50, 31) # 31x31 kernels
    1 loop, best of 3: 4.39 s per loop
    1 loop, best of 3: 220 ms per loop
    1 loop, best of 3: 560 ms per loop
    

So, in short, ndimage.convolve is always faster, except when the kernel size is very large (as K = 31 in the last test).

Sign up to request clarification or add additional context in comments.

6 Comments

Good find! For a faster version use ndimage.filters.convolve(X,Y[...,None]) maybe with some slicing?
@Divakar it would be nice to time it (I'll do it if I can get some time), but I think that ndimage,filters.convolve should be slower for large kernel sizes (as it does convolution in the spatial domain whereas signal.convolve does it in the fourier domain). The advantage of the ndimage filters are that support different boundary conditions (different from fourier which I think that by definition is equivalent to padding the image with 0s). I'll do the timings asap!
@Divakar tests done! I'm feeling like this is some kind of dejavu. As you suggested, ndimage.convolve tends to be faster in general. I remember having a similar conversation about this in a different thread haha
You know what that's good memory and not just dejavu! ;) Here's the post I think you were referring to : stackoverflow.com/a/38232755/3293881
@Divakar Haha exactly that one yes! That time I was proposing uniform_filter over signal.convolve lol. Although uniform_filter was faster due to the separability of the kernels, the above results seem to confirm the conclusion we arrived in that thread: for large kernel sizes fourier (fftconvolve) > spatial, if not spatial > fourier.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.