Replace For Loop with Numpy Vectorized Operation

Question

In the code below, I have a simple for-loop that I'd like to replace with, hopefully, a faster vectorized numpy operation.

import numpy as np

b = np.array([9,8100,-60,7], dtype=np.float64)
a = np.array([584,-11,23,79,1001,0,-19], dtype=np.float64)
m = 3
n = b.shape[0]
l = n-m+1
k = a.shape[0]-m+1

QT = np.array([-85224., 181461., 580047., 8108811., 10149.])
QT_first = QT.copy()

out = [None] * l
for i in range(1, l):
    QT[1:] = QT[:k-1] - b[i-1]*a[:k-1] + b[i-1+m]*a[-(k-1):]
    QT[0] = QT_first[i]

    # Update: Use this QT to do something with the ith element of array x
    # As i updates in each iteration, QT changes
    out[i] = np.argmin((QT + b_mean[i] * m) / (b_stddev[i] * m * a_stddev))

return out

I need a solution that is general enough to take on much longer arrays. Note that QT depends on the m and the length of b and will always be provided.

How can I replace the for-loop with a numpy vectorized operation so that it is faster?

Update

I modified the original code to more clearly demonstrate why a convolution does not solve my problem. Convolve only gives me the final QT but I actually need to use the intermediate QT values for another calculation before updating it for the iteration of the for-loop.

hpaulj · Accepted Answer · 2017-03-30 06:01:27Z

4

I haven't studied this in enough detail to know exactly what it's doing - b[i-1]*a[:k-1] + b[i-1+m]*a[-(k-1):]. But I suspect it can be written without an i loop. Some sort of -b[slice1]*a[slice2] + b[slice3]*a[slice4].

The leaves the question of whether

for i in range(1, l):
    QT[1:] = QT[:k-1] + c[i]
    QT[0] = QT_first[i]

is inherently serial and iterative or not. You probably understand what is going on better than I do. I'd have to do a few manual loops to get a feel for what is happening.

answered Mar 30, 2017 at 6:01

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

kennytm Over a year ago

The c loop could probably be done using cumsum.

hpaulj Over a year ago

I'd certainly keep an eye out for a cumsum like pattern.

Paul Panzer · Accepted Answer · 2017-12-13 00:46:54Z

4

+25

Edit Please by advised that the solution below does not address what OP actually meant (only what they wrote, intially ;-)).

fftconvolve seems to help quite a bit (opt is convolution solution):

k, n, m = 100, 100, 10
loopy                 0.47971359 ms
opt                   0.18304241 ms
k, n, m = 1000, 800, 20
loopy                 6.44813690 ms
opt                   0.33353859 ms
k, n, m = 10000, 10000, 1000
loopy               313.31501430 ms
opt                   2.61437489 ms

Code:

import numpy as np
from scipy import signal

import types
from timeit import timeit

b = np.array([9,8100,-60,7], dtype=np.float64)
a = np.array([584,-11,23,79,1001,0,-19], dtype=np.float64)
m = 3
n = b.shape[0]
l = n-m+1
k = a.shape[0]-m+1

QT = np.array([-85224., 181461., 580047., 8108811., 10149.])

def mock_data(k, n, m):
    QT = np.random.random((k,))
    b =  np.random.random((n,))
    a = np.random.random((k+m-1,))
    return QT, a, b, m

def f_loopy(QT, a, b, m):
    k, l = a.size - m + 1, b.size - m + 1
    QT, QT_first = QT.copy(), QT

    for i in range(1, l):
        QT[1:] = QT[:k-1] - b[i-1]*a[:k-1] + b[i-1+m]*a[-(k-1):]
        QT[0] = QT_first[i]
    return QT

def f_opt(QT, a, b, m):
    k, l = a.size - m + 1, b.size - m + 1
    ab1 = signal.fftconvolve(b[l-2::-1], a[:k-1], mode='full')[:QT.size-1]
    ab2 = signal.fftconvolve(b[l-2+m:m-1:-1], a[1-k:], mode='full')[:QT.size-1]
    return np.r_[0, ab2 - ab1] + np.r_[QT[l-1::-1], QT[1:QT.size+1-l]]

for k, n, m in [(100, 100, 10),
                (1000, 800, 20),
                (10000, 10000, 1000)]:
    data = mock_data(k, n, m)
    ref = f_loopy(*data)
    print(f'k, n, m = {k}, {n}, {m}')
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        try:
            assert np.allclose(ref, func(*data))
            print("{:16s}{:16.8f} ms".format(name[2:], timeit(
                'f(*data)', globals={'f':func, 'data':data}, number=10)*100))
        except:
            print("{:16s} apparently failed".format(name[2:]))

edited Dec 13, 2017 at 0:46

answered Dec 11, 2017 at 4:19

Paul Panzer

53.3k3 gold badges59 silver badges103 bronze badges

8 Comments

slaw Over a year ago

Thank you for your answer. I have updated the original post to more clearly articulate why convolution would not work. Your answer is technically correct as it relates to my un-updated question and I apologize for missing the most important component of my question which helps to explain why convolutions would not be helpful. That is, I'm not only interested in the final convolution but, instead, I am interested in all of the intermediate values of QT at all values of i. Hopefully, you'll be able to help me identify a better solution that replaces the slow for-loop.

slaw Over a year ago

You're correct that QT is too large to fit into memory but `some_function' is indeed vectorizable

slaw Over a year ago

I've replaced some_function to provide more information of what the dependency is. Sorry, I mispoke. The intermediates would fit into memory but the 2d array of the rows (being snapshots of QT) would not fit into memory (expect the 2d array to be around 100,000,000 x 1,000). I've thought about multithreading/multiprocessing but wanted to make sure I optimize the algorithm first before pursuing more compute power. Let me know if this helps.

slaw Over a year ago

In my haste, I omitted the argmin (added now). My original code uses different notation and so keeping track of the mapping is hindering my ability to communicate. out should be a 1d array with length 10^8 and should not be a problem in terms of memory.

slaw Over a year ago

The 10^8 x 10^3 arises from the size/shape of the input data itself where l is around 10^8 and m is around 10^3. The number of snapshots would be l-1. The b_mean[i]' and b_stddev[i]` are both scalars (single value) but obviously change depending on i.

|

Collectives™ on Stack Overflow

Replace For Loop with Numpy Vectorized Operation

2 Answers 2

2 Comments

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related