how to speed up nested loops in Python

Question

I am trying to write a piece of nested loops in my algorithm, and meet some problems that the whole algorithm takes too long time due to these nested loops. I am quite new to Python (as you may find from my below unprofessional code :( ) and hopefully someone can guide me a way to speed up my code!

The whole algorithm is for fire detection in multi 1500*6400 arrays. A small contextual analyse is applied when go through the whole array. The contextual analyse is performed in a dynamically assigned windows size way. The windows size can go from 11*11 to 31*31 until the validate values inside the sampling windows are enough for the next round calculation, for example like below:

def ContextualWindows (arrb4,arrb5,pfire):
####arrb4,arrb5,pfire are 31*31 sampling windows from large 1500*6400 numpy array
    i=5
    while i in range (5,16):
        arrb4back=arrb4[15-i:16+i,15-i:16+i]
        ## only output the array data when it is 'large' enough 
        ## to have enough good quality data to do calculation       
        if np.ma.count(arrb4back)>=min(10,0.25*i*i):
            arrb5back=arrb5[15-i:16+i,15-i:16+i]
            pfireback=pfire[15-i:16+i,15-i:16+i]
            canfire=0
            i=20
        else: 
            i=i+1

###unknown pixel: background condition could not be characterized
    if i!=20: 
        canfire=1
        arrb5back=arrb5
        pfireback=pfire
        arrb4back=arrb4
    return (arrb4back,arrb5back,pfireback,canfire)

Then this dynamic windows will be feed into next round test, for example:

b4backave=np.mean(arrb4Windows)
b4backdev=np.std(arrb4Windows)
if b4>b4backave+3.5*b4backdev:
    firetest=True

To run the whole code to my multi 1500*6400 numpy arrays, it took over half an hour, or even longer. Just wondering if anyone got an idea how to deal with it? A general idea which part I should put my effort to would be greatly helpful!

Many thanks!

Very hard to answer the question in the current state. Can you add sample input for the function and any imports to your code? Make it so we can copy, paste and run your code without needing to guess the format of your data. Try pasting it into a file yourself and if it does not run then edit it until it does and the update the code in the question to match. See this question for a data-as-code example, these R guidelines and this SO help page. — YXD
– YXD, Commented Apr 5, 2015 at 16:46
codereview doesn't get enough traffic. numpy speedup questions like this are quite common here on SO. — hpaulj
– hpaulj, Commented Apr 5, 2015 at 17:28
@hpaulj that was true 3 years ago. 45K visitors/day is quite a bit of traffic in StackLand. — Mathieu Guindon
– Mathieu Guindon, Commented Apr 5, 2015 at 18:37
But numpy traffic is still weak, 134 questions tagged v 16,000 here. Same for followers. — hpaulj
– hpaulj, Commented Apr 5, 2015 at 19:08

user1016274 · Accepted Answer · 2015-04-05 18:42:29Z

1

Avoid while loops if speed is a concern. The loop lends itself to a for loop as start and end are fixed. Additionally, your code does a lot of copying which isn't really necessary. The rewritten function:

def ContextualWindows (arrb4,arrb5,pfire):
    ''' arrb4,arrb5,pfire are 31*31 sampling windows from 
        large 1500*6400 numpy array '''

    for i in range (5, 16):
        lo = 15 - i  # 10..0
        hi = 16 + i  # 21..31
        #  only output the array data when it is 'large' enough
        #  to have enough good quality data to do calculation
        if np.ma.count(arrb4[lo:hi, lo:hi]) >= min(10, 0.25*i*i):
            return (arrb4[lo:hi, lo:hi], arrb5[lo:hi, lo:hi], pfire[lo:hi, lo:hi], 0)
    else:    #  unknown pixel: background condition could not be characterized
        return (arrb4, arrb5, pfire, 1)

For clarity I've used style guidelines from PEP 8 (like extended comments, number of comment chars, spaces around operators etc.). Copying of a windowed arrb4 occurs twice here but only if the condition is fulfilled and this will happen only once per function call. The else clause will be executed only if the for-loop has run to it's end. We don't even need a break from the loop as we exit the function altogether.
Let us know if that speeds up the code a bit. I don't think it'll be much but then again there isn't much code anyway.

answered Apr 5, 2015 at 18:42

user1016274

4,2792 gold badges25 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

hpaulj Over a year ago

The end points aren't fixed. He starts at 5, but can quit before 16. A for-loop with a break should work, but I don't think it will make much difference in speed.

user1016274 Over a year ago

My suggestion contains a break as well in the if...then statement, only that it returns from the function altogether. It wouldn't work without as the OP wants to quit prematurely when the conditions is met. So the loop may run to it's end or may quit before that, just as in the original code. In my opinion the flow lends itself to a for loop with a break more than to a while loop. Avoiding to copy parts of the arrays will probably have more effect than this, granted.

hpaulj Over a year ago

In my tests, 1 iteration takes 50us, all 10 500us. Breaking early if possible is a good thing. For with break is clearer syntax, but not much different in speed.

hpaulj · Accepted Answer · 2015-04-06 06:23:49Z

I've run some time tests with ContextualWindows and variants. One i step takes about 50us, all ten about 500.

This simple iteration takes about the same time:

[np.ma.count(arrb4[15-i:16+i,15-i:16+i]) for i in range(5,16)]

The iteration mechanism, and the 'copying' arrays are minor parts of the time. Where possible numpy is making views, not copies.

I'd focus on either minimizing the number of these count steps, or speeding up the count.

Comparing times for various operations on these windows:

First time for 1 step:

In [167]: timeit [np.ma.count(arrb4[15-i:16+i,15-i:16+i]) for i in range(5,6)]
10000 loops, best of 3: 43.9 us per loop

now for the 10 steps:

In [139]: timeit [arrb4[15-i:16+i,15-i:16+i].shape for i in range(5,16)]
10000 loops, best of 3: 33.7 us per loop

In [140]: timeit [np.sum(arrb4[15-i:16+i,15-i:16+i]>500) for i in range(5,16)]
1000 loops, best of 3: 390 us per loop

In [141]: timeit [np.ma.count(arrb4[15-i:16+i,15-i:16+i]) for i in range(5,16)]
1000 loops, best of 3: 464 us per loop

Simply indexing does not take much time, but testing for conditions takes substantially more.

cumsum is sometimes used to speed up sums over sliding windows. Instead of taking sum (or mean) over each window, you calculate the cumsum and then use the differences between the front and end of window.

Trying something like that, but in 2d - cumsum in both dimensions, followed by differences between diagonally opposite corners:

In [164]: %%timeit
   .....: cA4=np.cumsum(np.cumsum(arrb4,0),1)
   .....: [cA4[15-i,15-i]-cA4[15+i,15+i] for i in range(5,16)]
   .....: 
10000 loops, best of 3: 43.1 us per loop

This is almost 10x faster than the (nearly) equivalent sum. Values don't quite match, but timing suggest that this may be worth refining.

Collectives™ on Stack Overflow

how to speed up nested loops in Python

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related