1

I am trying to create a heightmap by interpolating between a bunch of heights at certain points in an area. To process the whole image, I have the following code snippet:


map_ = np.zeros((img_width, img_height))

for x in range(img_width):
    for y in range(img_height):
        map_[x, y] = calculate_height(set(points.items()), x, y)

This is calculate_height:

def distance(x1, y1, x2, y2) -> float:
    return np.sqrt((x1 - x2) ** 2 + (y1 - y2) ** 2)


def calculate_height(points: set, x, y) -> float:
    total = 0
    dists = {}
    for pos, h in points:
        d = distance(pos[0], pos[1], x, y)
        if x == pos[0] and y == pos[1]:
            return h
        d = 1 / (d ** 2)
        dists[pos] = d
        total += d

    r = 0
    for pos, h in points:
        ratio = dists[pos] / total
        r += ratio * h

    r = r
    return r

This snippet works perfectly, but if the image is too big, it takes a long time to process, because this is O(n^2). The problem with this, is that a "too big" image is 800 600, and it takes almost a minute to process, which to me seems a bit excessive.

My goal is not to reduce the time complexity from O(n^2), but to reduce the time it takes to process images, so that I can get decently sized images in a reasonable amount of time.

I found this post, but I couldn't really try it out because it's all for a 1D array, and I have a 2D array, and I also need to pass the coordinates of each point and the set of existing points to the calculate_height function. What can I try to optimize this code snippet?

Edit: Moving set(points.items) out of the loop as @thethiny suggested was a HUGE improvement. I had no idea it was such a heavy thing to do. This makes it fast enough for me, but feel free to add more suggestions for the next people to come by!

Edit 2: I have further optimized this processing by including the following changes:

    # The first for loop inside the calculate_distance function
    for pos, h in points:
        d2 = distance2(pos[0], pos[1], x, y)
        if x == pos[0] and y == pos[1]:
            return h
        d2 = d2 ** -1 # 1 / (d ** 2) == d ** -2 == d2 ** -1
        dists[pos] = d2 # Having the square of d on these two lines makes no difference
        total += d2

This reduced execution time for a 200x200 image from 1.57 seconds to 0.76 seconds. The 800x600 image mentioned earlier now takes 6.13 seconds to process :D

This is what points looks like (as requested by @norok12):

# Hints added for easier understanding, I know this doesn't run
points: dict[tuple[int, int], float] = {
    (x: int, y: int): height: float,
    (x: int, y: int): height: float,
    (x: int, y: int): height: float
}
# The amount of points varies between datasets, so I can't provide a useful range other than [3, inf)
10
  • I'm having a hard time guessing what calculate_height(set(points.items()), x, y) does. Could you elaborate on that? You may be able to vectorize this whole thing and get your processin times down by a factor of 30 or more. Commented Sep 14, 2022 at 0:03
  • 1
    You can start by taking the set(points.items()) outside of the loop since that's always the same code. Second you should take this to the Code Review website where the people there are more prone to answering optimization help. Commented Sep 14, 2022 at 0:03
  • Finally, I recommend you use numba.gvectorize which deals with calculate_height on different points since they don't clash with each other. Learn about numba and it'll save you lots of time. Commented Sep 14, 2022 at 0:03
  • @thethiny I have included the implementation of calculate_height. It's just 2D interpolation Commented Sep 14, 2022 at 0:06
  • It's still the same concept, numba will do these calculations in parallel no matter what your code does. Commented Sep 14, 2022 at 0:07

2 Answers 2

1

There's a few problems with your implementation.

Essentially what you're implementing is approximation using radial basis functions.

The usual algorithm for that looks like:

sum_w = 0
sum_wv = 0
for p,v in points.items():
   d = distance(p,x)
   w = 1.0 / (d*d)
   sum_w += w
   sum_wv += w*v
return sum_wv / sum_w

Your code has some extra logic for bailing out if p==x - which is good. But it also allocates an array of distances - which this single loop form does not need.

This brings execution of an example in a workbook from 13s to 12s.

The next thing to note is that collapsing the points dict into an numpy array gives us the chance to use numpy functions.

points_array = np.array([(p[0][0],p[0][1],p[1]) for p in points.items()]).astype(np.float32)

Then we can write the function as

def calculate_height_fast(points, x, y) -> float:
    dx = points[:,0] - x
    dy = points[:,1] - y
    r = np.hypot(dx,dy)
    w = 1.0 / (r*r)
    sum_w = np.sum(w)
    return np.sum(points[:,2] * w) / np.sum(w)

This brings our time down to 658ms. But we can do better yet...

Since we're now using numpy functions we can apply numba.njit to JIT compile our function.

@numba.njit
def calculate_height_fast(points, x, y) -> float:
    dx = points[:,0] - x
    dy = points[:,1] - y
    r = np.hypot(dx,dy)
    w = 1.0 / (r*r)
    sum_w = np.sum(w)
    return np.sum(points[:,2] * w) / np.sum(w)

This was giving me 105ms (after the function had been run once to ensure it got compiled).

This is a speed up of 130x over the original implementation (for my data)

You can see the full implementations here

Sign up to request clarification or add additional context in comments.

1 Comment

You can probably get even faster by removing the unused sum_w = np.sum(w) line and moving back to spelled out looping.
0

This really is a small addition to @MichaelAnderson's detailed answer.

Probably calculate_height_fast() can get faster by optimizing a bit more with explicit looping:

@numba.njit
def calculate_height_faster(points, x, y) -> float:
    dx = points[:, 0] - x
    dy = points[:, 1] - y
    r = np.hypot(dx, dy)
    # compute weighted average
    n = r.size
    sum_w = sum_wp = 0
    for i in range(n):
        w = 1.0 / (r[i] * r[i])
        sum_w += w
        sum_wp += points[i, 2] * w
    return sum_wp / sum_w

1 Comment

Another advantage of using explicit looping is that you can reintroduce the exiting early if r==0.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.