Convolution of 3D numpy arrays

Question

I've got a code from my supervisor to implement MDCT polyphase analysis and synthesis. Unfortunately, this code includes one very slow function with 2 loops. If somebody can help me to simplify this function and make it faster I will appreciate your help. This is part of the code:

def polmatmult(A, B):
    """polmatmult(A,B)
    multiplies two polynomial matrices (arrays) A and B, where each matrix entry is a polynomial.
    Those polynomial entries are in the 3rd dimension
    The third dimension can also be interpreted as containing the (2D) coefficient matrices of exponent of z^-1.
    Result is C=A*B;"""

    print("np.shape(A)", np.shape(A))
    print("np.shape(B)", np.shape(B))
    [NAx, NAy, NAz] = np.shape(A);
    [NBx, NBy, NBz] = np.shape(B);


    "Degree +1 of resulting polynomial, with NAz-1 and NBz-1 being the degree of the input  polynomials:"

    Deg = NAz + NBz - 1;
    print("Deg", Deg)
    C = np.zeros((NAx, NBy, Deg));

    "Convolution of matrices:"
    for n in range(0, (Deg)):
        for m in range(0, n + 1):
            if ((n - m) < NAz and m < NBz):
                C[:, :, n] = C[:, :, n] + np.dot(A[:, :, (n - m)], B[:, :, m]);      

    return C

The biggest problem is to get rid of the loops. Maybe usage of list comprehension will speed up the process? If yes, how can I rewrite these 2 loops? — Oleg Golokolenko
– Oleg Golokolenko, Commented Dec 4, 2017 at 10:31

Dmytro Oliinychenko · Accepted Answer · 2017-12-03 23:35:03Z

First of all I'm surprized that there is np.dot there and not np.multiply. Convolution already happens in for-loops and it should be broadcast to first two dimensions, right? Anyway, I will further work with np.multiply instead of np.dot and you can change it back accordingly if I'm wrong.

If this function is a real bottleneck I would use Cython to improve the speed. This is an example of the code:

myconvolve.pyx

import numpy as np
cimport numpy as np
cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
def myconvolve(np.ndarray[np.float64_t, ndim=3] A,
               np.ndarray[np.float64_t, ndim=3] B):
    cdef:
        int n, m, i, j
        int NAx = A.shape[0], NAy = A.shape[1], NAz = A.shape[2]
        int NBx = A.shape[0], NBy = A.shape[1], NBz = A.shape[2]
        int Deg = NAz + NBz - 1;
        np.ndarray[np.float64_t, ndim=3] C = np.zeros((NAx, NBy, Deg));
    assert((NAx == NBx) and (NAy == NBy))

    for n in range(0, (Deg)):
        for m in range(0, n + 1):
            if ((n - m) < NAz and m < NBz):
                for i in range(0, NAx):
                    for j in range(0, NAy):
                        C[i, j, n] = C[i, j, n] + A[i, j, (n - m)] * B[i, j, m]

    return C

This has to be compiled, I did it with

cython myconvolve.pyx -v -2
gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing       -I/usr/include/python2.7 -o myconvolve.so myconvolve.c

Then with the following comparison script

import timeit
import numpy as np
from myconvolve import myconvolve

def original_convolution(A, B):
    [NAx, NAy, NAz] = np.shape(A);
    [NBx, NBy, NBz] = np.shape(B);

    Deg = NAz + NBz - 1;
    C = np.zeros((NAx, NBy, Deg));

    for n in range(0, (Deg)):
        for m in range(0, n + 1):
            if ((n - m) < NAz and m < NBz):
                C[:, :, n] = C[:, :, n] + np.multiply(A[:, :, (n - m)], B[:, :, m])

    return C

print "Checking that implementations produce identical results."
A = np.random.rand(20, 20, 20)
B = np.random.rand(20, 20, 20)
C1 = original_convolution(A, B)
C2 = myconvolve(A, B)
assert(np.abs((C1 - C2).sum()) < 1.e-6)

mysetup = '''
import numpy as np
np.random.seed(0)
from myconvolve import myconvolve
from __main__ import A, B
from __main__ import original_convolution
'''

print 'Numpy implementation time [s]: ', min(timeit.Timer('original_convolution(A, B)', setup=mysetup).repeat(7, 100))
print 'Cython implementation time [s]: ', min(timeit.Timer('myconvolve(A, B)', setup=mysetup).repeat(7, 100))

I get:

Numpy implementation time [s]:  0.494730949402
Cython implementation time [s]:  0.0905570983887

Sebastian Mendez · Accepted Answer · 2017-12-04 00:42:29Z

EDIT: I realize now that poly1d is far more inefficient than the original solution, mainly due to poly1d being implemented in Python instead of C. A comparison of their pruns is not pretty:

    %prun np.dot(mod_A, mod_B)
          888804 function calls (872804 primitive calls) in 0.436 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    47200    0.069    0.000    0.168    0.000 polynomial.py:1076(__init__)
78800/62800    0.053    0.000    0.085    0.000 {built-in method numpy.core.multiarray.array}
    46800    0.036    0.000    0.063    0.000 shape_base.py:11(atleast_1d)
    31600    0.030    0.000    0.036    0.000 function_base.py:2209(trim_zeros)
    47200    0.024    0.000    0.024    0.000 {method 'copy' of 'numpy.ndarray' objects}
     8000    0.023    0.000    0.023    0.000 {built-in method numpy.core.multiarray.correlate}
    47200    0.020    0.000    0.050    0.000 polynomial.py:1041(coeffs)
     8000    0.020    0.000    0.304    0.000 polynomial.py:1183(__mul__)
     7600    0.019    0.000    0.041    0.000 polynomial.py:683(polyadd)
     8000    0.016    0.000    0.123    0.000 numeric.py:978(convolve)
     8000    0.014    0.000    0.204    0.000 polynomial.py:790(polymul)
     7600    0.014    0.000    0.119    0.000 polynomial.py:1197(__add__)
        1    0.013    0.013    0.436    0.436 {built-in method numpy.core.multiarray.dot}
    94400    0.012    0.000    0.012    0.000 {built-in method builtins.isinstance}
   157200    0.011    0.000    0.011    0.000 {built-in method builtins.len}
    16000    0.011    0.000    0.034    0.000 polynomial.py:1103(__array__)
    46800    0.010    0.000    0.019    0.000 numeric.py:534(asanyarray)
    62800    0.009    0.000    0.009    0.000 polynomial.py:1064(_coeffs)
    47200    0.009    0.000    0.009    0.000 polynomial.py:1067(_coeffs)
     8000    0.008    0.000    0.009    0.000 numeric.py:2135(isscalar)
     8000    0.005    0.000    0.007    0.000 numeric.py:904(_mode_from_name)
    46800    0.004    0.000    0.004    0.000 {method 'append' of 'list' objects}
    16000    0.004    0.000    0.007    0.000 numeric.py:463(asarray)
    31600    0.003    0.000    0.003    0.000 {method 'upper' of 'str' objects}
     8000    0.001    0.000    0.001    0.000 {method 'lower' of 'str' objects}
        1    0.000    0.000    0.436    0.436 <string>:1(<module>)
        1    0.000    0.000    0.436    0.436 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

%prun polymatmult(A, B)
          7 function calls in 0.004 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.004    0.004    0.004    0.004 <ipython-input-1053-a9f13893aa45>:1(original_convolution)
        1    0.000    0.000    0.000    0.000 {built-in method numpy.core.multiarray.zeros}
        1    0.000    0.000    0.004    0.004 {built-in method builtins.exec}
        1    0.000    0.000    0.004    0.004 <string>:1(<module>)
        2    0.000    0.000    0.000    0.000 fromnumeric.py:1565(shape)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

However, I will stand be it at least being easier.

If you used the poly1d type, this would be so much easier:

random_poly = np.frompyfunc(lambda i, j: np.poly1d(np.random.randint(1, 4, 3)), 2, 1)
def random_poly_array(shape):    
    return np.fromfunction(random_poly, shape)

a1 = random_poly_array((3,3))
a2 = random_poly_array((3,3))
mult_a = np.dot(a1, a2)

One can use np.convolve for normal numpy arrays instead of poly1d object. The problem here is how to do it fast with two additional dimensions, avoiding for-looping over them.
Converting a 3-D array into a 2-D poly1d array won't work either, for some reason. Logically, np.apply_along_axis(np.poly1d, 0, arr) would change it into a new array, but for some reason it keeps it as the old one.

Daniel F · Accepted Answer · 2017-12-04 07:26:28Z

0

A little manipulation to allow easier dot products and ufuc_at manipulation to remove the for loops:

def polmatmult_(A, B):
    print("np.shape(A)", np.shape(A))
    print("np.shape(B)", np.shape(B))
    [NAx, NAy, NAz] = np.shape(A)
    [NBx, NBy, NBz] = np.shape(B)

    Deg = NAz + NBz - 1
    print("Deg", Deg)
    C = np.zeros((Deg, NAx, NBy))

    m, n = np.triu_indices(NBz, 0, Deg)
    m, n = m[n - m < NAz], n[n - m < NAz]
    np.add.at(C, n,  np.moveaxis(A[:, :, (n - m)], -1, 0) @ np.moveaxis(B[:, :, m], -1, 0))

    return np.moveaxis(C, 0, -1)

In general, you want your indexing axis (z in ths case) to be the first dimension and not the last. This lets you use ufunc tricks (like add.at), @ instead of np.dot, and broadcasting. Thus all the np.moveaxis.

answered Dec 4, 2017 at 7:26

Daniel F

14.5k2 gold badges34 silver badges59 bronze badges

6 Comments

Oleg Golokolenko Over a year ago

By truth, I see such implementation (np.add.at) first time :-) so I've got an error regarding this "@". Is it a typo or there should be something else?

Oleg Golokolenko Over a year ago

First of all thanks for your help. Usually I use python2.7 and apparently this "@" stuff works only in python3.6. Isn't it? Nevertheless, I've got an error: np.shape(A) (1024, 1024, 1) np.shape(B) (1024, 1024, 2) np.add.at(C, n, np.moveaxis(A[:, :, (n - m)], -1, 0) @ np.moveaxis(B[:, :, m], -1, 0)) ValueError: array is not broadcastable to correct shape

Daniel F Over a year ago

Wait, did you notice that I changed the constructor for C?

Oleg Golokolenko Over a year ago

I overlooked that. Unfortunately, i've got MemoryError --> computer got stuck at all :-(

Daniel F Over a year ago

Yeah, the speedup of this method comes at a pretty big memory cost. You're basically doing everything simultaneously.

|

Collectives™ on Stack Overflow

Convolution of 3D numpy arrays

3 Answers 3

myconvolve.pyx

Comments

2 Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

myconvolve.pyx

Comments

2 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related