5

In the code below, I have a simple for-loop that I'd like to replace with, hopefully, a faster vectorized numpy operation.

import numpy as np

b = np.array([9,8100,-60,7], dtype=np.float64)
a = np.array([584,-11,23,79,1001,0,-19], dtype=np.float64)
m = 3
n = b.shape[0]
l = n-m+1
k = a.shape[0]-m+1

QT = np.array([-85224., 181461., 580047., 8108811., 10149.])
QT_first = QT.copy()

out = [None] * l
for i in range(1, l):
    QT[1:] = QT[:k-1] - b[i-1]*a[:k-1] + b[i-1+m]*a[-(k-1):]
    QT[0] = QT_first[i]

    # Update: Use this QT to do something with the ith element of array x
    # As i updates in each iteration, QT changes
    out[i] = np.argmin((QT + b_mean[i] * m) / (b_stddev[i] * m * a_stddev))

return out

I need a solution that is general enough to take on much longer arrays. Note that QT depends on the m and the length of b and will always be provided.

How can I replace the for-loop with a numpy vectorized operation so that it is faster?

Update

I modified the original code to more clearly demonstrate why a convolution does not solve my problem. Convolve only gives me the final QT but I actually need to use the intermediate QT values for another calculation before updating it for the iteration of the for-loop.

0

2 Answers 2

4

I haven't studied this in enough detail to know exactly what it's doing - b[i-1]*a[:k-1] + b[i-1+m]*a[-(k-1):]. But I suspect it can be written without an i loop. Some sort of -b[slice1]*a[slice2] + b[slice3]*a[slice4].

The leaves the question of whether

for i in range(1, l):
    QT[1:] = QT[:k-1] + c[i]
    QT[0] = QT_first[i]

is inherently serial and iterative or not. You probably understand what is going on better than I do. I'd have to do a few manual loops to get a feel for what is happening.

Sign up to request clarification or add additional context in comments.

2 Comments

The c loop could probably be done using cumsum.
I'd certainly keep an eye out for a cumsum like pattern.
4
+25

Edit Please by advised that the solution below does not address what OP actually meant (only what they wrote, intially ;-)).

fftconvolve seems to help quite a bit (opt is convolution solution):

k, n, m = 100, 100, 10
loopy                 0.47971359 ms
opt                   0.18304241 ms
k, n, m = 1000, 800, 20
loopy                 6.44813690 ms
opt                   0.33353859 ms
k, n, m = 10000, 10000, 1000
loopy               313.31501430 ms
opt                   2.61437489 ms

Code:

import numpy as np
from scipy import signal

import types
from timeit import timeit

b = np.array([9,8100,-60,7], dtype=np.float64)
a = np.array([584,-11,23,79,1001,0,-19], dtype=np.float64)
m = 3
n = b.shape[0]
l = n-m+1
k = a.shape[0]-m+1

QT = np.array([-85224., 181461., 580047., 8108811., 10149.])

def mock_data(k, n, m):
    QT = np.random.random((k,))
    b =  np.random.random((n,))
    a = np.random.random((k+m-1,))
    return QT, a, b, m

def f_loopy(QT, a, b, m):
    k, l = a.size - m + 1, b.size - m + 1
    QT, QT_first = QT.copy(), QT

    for i in range(1, l):
        QT[1:] = QT[:k-1] - b[i-1]*a[:k-1] + b[i-1+m]*a[-(k-1):]
        QT[0] = QT_first[i]
    return QT

def f_opt(QT, a, b, m):
    k, l = a.size - m + 1, b.size - m + 1
    ab1 = signal.fftconvolve(b[l-2::-1], a[:k-1], mode='full')[:QT.size-1]
    ab2 = signal.fftconvolve(b[l-2+m:m-1:-1], a[1-k:], mode='full')[:QT.size-1]
    return np.r_[0, ab2 - ab1] + np.r_[QT[l-1::-1], QT[1:QT.size+1-l]]

for k, n, m in [(100, 100, 10),
                (1000, 800, 20),
                (10000, 10000, 1000)]:
    data = mock_data(k, n, m)
    ref = f_loopy(*data)
    print(f'k, n, m = {k}, {n}, {m}')
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        try:
            assert np.allclose(ref, func(*data))
            print("{:16s}{:16.8f} ms".format(name[2:], timeit(
                'f(*data)', globals={'f':func, 'data':data}, number=10)*100))
        except:
            print("{:16s} apparently failed".format(name[2:]))

8 Comments

Thank you for your answer. I have updated the original post to more clearly articulate why convolution would not work. Your answer is technically correct as it relates to my un-updated question and I apologize for missing the most important component of my question which helps to explain why convolutions would not be helpful. That is, I'm not only interested in the final convolution but, instead, I am interested in all of the intermediate values of QT at all values of i. Hopefully, you'll be able to help me identify a better solution that replaces the slow for-loop.
You're correct that QT is too large to fit into memory but `some_function' is indeed vectorizable
I've replaced some_function to provide more information of what the dependency is. Sorry, I mispoke. The intermediates would fit into memory but the 2d array of the rows (being snapshots of QT) would not fit into memory (expect the 2d array to be around 100,000,000 x 1,000). I've thought about multithreading/multiprocessing but wanted to make sure I optimize the algorithm first before pursuing more compute power. Let me know if this helps.
In my haste, I omitted the argmin (added now). My original code uses different notation and so keeping track of the mapping is hindering my ability to communicate. out should be a 1d array with length 10^8 and should not be a problem in terms of memory.
The 10^8 x 10^3 arises from the size/shape of the input data itself where l is around 10^8 and m is around 10^3. The number of snapshots would be l-1. The b_mean[i]' and b_stddev[i]` are both scalars (single value) but obviously change depending on i.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.