Python: Improving Image-processing with numpy

Question

Let there be two big (2000x2000 or higher) .tiff images consisting only of numpy float32 values (no rgb). I call them Image A and B. I want to multiply them in a special way:

Find the max value in B and roll it (using numpy.roll) to the upper-left most corner.
Multiply A and B
Add the sum of B to the index of A where the max value of B was rolled
Roll the max of B one element further
Repeat for all elements of A
save the resulting image

Both images are always the same shape. I've come up with this idea:

#A,B are loaded with PIL as numpy images and are flattend
B = np.roll(B, len(mult_img)-distance_to_max) #roll max to the first element

sum_arr = np.sum(B) #sum of B

for i in range(len(A)):
    A = np.multiply(A,  np.roll(B, i)) #roll B with i-increment and multiply
    A[i] += sum_arr #add sum to A at index

This looks like it would do the job, after reshaping the array and save it. But it takes about 40s for a 2000x2000 image and there will be hundrets of them to process. Question is: how can this be improved? or are there better numpy solutions for this task to speed things up a bit?

Thanks in advance

Divakar · Accepted Answer · 2017-08-16 14:13:08Z

2

Prospective method

Consider this :

In [154]: B = np.arange(5)

In [155]: B
Out[155]: array([0, 1, 2, 3, 4])

Use the rolled version of B :

In [156]: for i in range(len(B)):
     ...:     print np.roll(B, i)
     ...:     
[0 1 2 3 4]
[4 0 1 2 3]
[3 4 0 1 2]
[2 3 4 0 1]
[1 2 3 4 0]

So, the trick that we need to employ is to create an extended array that could be sliced to get the rolled version. The idea being that slicing in NumPy is basically free. Thus, the extended array would be -

In [157]: B_ext = np.concatenate((B[1:], B))

In [158]: B_ext
Out[158]: array([1, 2, 3, 4, 0, 1, 2, 3, 4])

Thus, the slicing steps would be -

[1, 2, 3, 4, 0, 1, 2, 3, 4]
             [            ]
          [            ]
       [            ]
    [            ]
[            ]

Employ it

Then, the extended array could be used like so -

n = len(A)
for i in range(n-1,-1,-1):
    Ac *= B_ext[i:i+n] #roll B with i-increment and multiply
    Ac[n-1-i] += sum_arr #add sum to A at index

Finalizing

Finalizing, the approaches would be -

def org_app(A, B, sum_arr): # Original approach
    for i in range(len(A)):
        A = np.multiply(A,  np.roll(B, i)) #roll B with i-increment and multiply
        A[i] += sum_arr #add sum to A at index
    return A

def app1(A, B, sum_arr): # Proposed approach
    B_ext = np.concatenate((B[1:], B))
    n = len(A)
    for i in range(n-1,-1,-1):
        A *= B_ext[i:i+n] #roll B with i-increment and multiply
        A[n-1-i] += sum_arr #add sum to A at index
    return A

Benchmarking

1) Verification -

In [144]: # Setup inputs
     ...: np.random.seed(1234)
     ...: N = 10000
     ...: A = np.random.randint(0,255,(N))
     ...: B = np.random.randint(0,255,(N))
     ...: A_copy = A.copy()
     ...: sum_arr = np.sum(B) #sum of B
     ...: 

In [145]: out1 = org_app(A, B, sum_arr)
     ...: out2 = app1(A_copy, B, sum_arr)
     ...: print "Abs. Max. Error : " + str(np.abs(out1-out2).max())
     ...: 
Abs. Max. Error : 0

2) Runtime test -

In [146]: # Setup inputs
     ...: np.random.seed(1234)
     ...: N = 10000
     ...: A = np.random.randint(0,255,(N))
     ...: B = np.random.randint(0,255,(N))
     ...: A_copy = A.copy()
     ...: sum_arr = np.sum(B) #sum of B
     ...: 

In [147]: %timeit org_app(A, B, sum_arr)
1 loop, best of 3: 196 ms per loop

In [148]: %timeit app1(A_copy, B, sum_arr)
10 loops, best of 3: 51.9 ms per loop

edited Aug 16, 2017 at 14:13

answered Aug 16, 2017 at 14:05

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user3759978 Over a year ago

Wow! About 4 times faster! Pretty nifty idea, i didn't thought about slicing :)

user3759978 Over a year ago

one note: the proposed solution is really faster when the flattend images are only abaout 10000 to 100000 elements long(a few sec.). for a 2000x2000 (4 mill. elements) image it takes really much longer (after 10 min still wasnt done). strangely its still very fast when the image isnt flattend, but the results are not as expected.

Divakar Over a year ago

@user3759978 That's the thing with vectorized solutions. If you stretch it to the memory limits, you are better off resorting to a loopy one. Regarding the results not matching, I think its because the solution assumes flattened version as we started off in the question.

Collectives™ on Stack Overflow

Python: Improving Image-processing with numpy

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related