1

Let there be two big (2000x2000 or higher) .tiff images consisting only of numpy float32 values (no rgb). I call them Image A and B. I want to multiply them in a special way:

  • Find the max value in B and roll it (using numpy.roll) to the upper-left most corner.
  • Multiply A and B
  • Add the sum of B to the index of A where the max value of B was rolled
  • Roll the max of B one element further
  • Repeat for all elements of A
  • save the resulting image

Both images are always the same shape. I've come up with this idea:

#A,B are loaded with PIL as numpy images and are flattend
B = np.roll(B, len(mult_img)-distance_to_max) #roll max to the first element

sum_arr = np.sum(B) #sum of B

for i in range(len(A)):
    A = np.multiply(A,  np.roll(B, i)) #roll B with i-increment and multiply
    A[i] += sum_arr #add sum to A at index

This looks like it would do the job, after reshaping the array and save it. But it takes about 40s for a 2000x2000 image and there will be hundrets of them to process. Question is: how can this be improved? or are there better numpy solutions for this task to speed things up a bit?

Thanks in advance

1 Answer 1

2

Prospective method

Consider this :

In [154]: B = np.arange(5)

In [155]: B
Out[155]: array([0, 1, 2, 3, 4])

Use the rolled version of B :

In [156]: for i in range(len(B)):
     ...:     print np.roll(B, i)
     ...:     
[0 1 2 3 4]
[4 0 1 2 3]
[3 4 0 1 2]
[2 3 4 0 1]
[1 2 3 4 0]

So, the trick that we need to employ is to create an extended array that could be sliced to get the rolled version. The idea being that slicing in NumPy is basically free. Thus, the extended array would be -

In [157]: B_ext = np.concatenate((B[1:], B))

In [158]: B_ext
Out[158]: array([1, 2, 3, 4, 0, 1, 2, 3, 4])

Thus, the slicing steps would be -

[1, 2, 3, 4, 0, 1, 2, 3, 4]
             [            ]
          [            ]
       [            ]
    [            ]
[            ]

Employ it

Then, the extended array could be used like so -

n = len(A)
for i in range(n-1,-1,-1):
    Ac *= B_ext[i:i+n] #roll B with i-increment and multiply
    Ac[n-1-i] += sum_arr #add sum to A at index

Finalizing

Finalizing, the approaches would be -

def org_app(A, B, sum_arr): # Original approach
    for i in range(len(A)):
        A = np.multiply(A,  np.roll(B, i)) #roll B with i-increment and multiply
        A[i] += sum_arr #add sum to A at index
    return A

def app1(A, B, sum_arr): # Proposed approach
    B_ext = np.concatenate((B[1:], B))
    n = len(A)
    for i in range(n-1,-1,-1):
        A *= B_ext[i:i+n] #roll B with i-increment and multiply
        A[n-1-i] += sum_arr #add sum to A at index
    return A

Benchmarking

1) Verification -

In [144]: # Setup inputs
     ...: np.random.seed(1234)
     ...: N = 10000
     ...: A = np.random.randint(0,255,(N))
     ...: B = np.random.randint(0,255,(N))
     ...: A_copy = A.copy()
     ...: sum_arr = np.sum(B) #sum of B
     ...: 

In [145]: out1 = org_app(A, B, sum_arr)
     ...: out2 = app1(A_copy, B, sum_arr)
     ...: print "Abs. Max. Error : " + str(np.abs(out1-out2).max())
     ...: 
Abs. Max. Error : 0

2) Runtime test -

In [146]: # Setup inputs
     ...: np.random.seed(1234)
     ...: N = 10000
     ...: A = np.random.randint(0,255,(N))
     ...: B = np.random.randint(0,255,(N))
     ...: A_copy = A.copy()
     ...: sum_arr = np.sum(B) #sum of B
     ...: 

In [147]: %timeit org_app(A, B, sum_arr)
1 loop, best of 3: 196 ms per loop

In [148]: %timeit app1(A_copy, B, sum_arr)
10 loops, best of 3: 51.9 ms per loop
Sign up to request clarification or add additional context in comments.

3 Comments

Wow! About 4 times faster! Pretty nifty idea, i didn't thought about slicing :)
one note: the proposed solution is really faster when the flattend images are only abaout 10000 to 100000 elements long(a few sec.). for a 2000x2000 (4 mill. elements) image it takes really much longer (after 10 min still wasnt done). strangely its still very fast when the image isnt flattend, but the results are not as expected.
@user3759978 That's the thing with vectorized solutions. If you stretch it to the memory limits, you are better off resorting to a loopy one. Regarding the results not matching, I think its because the solution assumes flattened version as we started off in the question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.