3

I'm kind of new to Python and I have to implement "fast as possible" version of this code.

s="<%dH" % (int(width*height),)
z=struct.unpack(s, contents)

heights = np.zeros((height,width))
for r in range(0,height):
    for c in range(0,width):
        elevation=z[((width)*r)+c]
        if (elevation==65535 or elevation<0 or elevation>20000):
            elevation=0.0

        heights[r][c]=float(elevation)

I've read some of the python vectorization questions... but I don't think it applies to my case. Most of the questions are things like using np.sum instead of for loops. I guess I have two questions:

  1. Is it possible to speed up this code...I think heights[r][c]=float(elevation) is where the bottleneck is. I need to find some Python timing commands to confirm this.
  2. If it possible to speed up this code. What are my options? I have seen some people recommend cython, pypy, weave. I could do this faster in C but this code also need to generate plots so I'd like to stick with Python so I can use matplotlib.
1
  • 1
    RunSnakeRun is an excellent Python profile viewer that shows time usage in a treemap format. Get the profile by turning dostuff() into profile.runctx('dostuff()', globals(), locals(), filename='out.profile') Commented Feb 21, 2015 at 1:12

1 Answer 1

6

As you mention, the key to writing fast code with numpy involves vectorization, and pushing the work off to fast C-level routines instead of Python loops. The usual approach seems to improve things by a factor of ten or so relative to your original code:

def faster(elevation, height, width):
    heights = np.array(elevation, dtype=float)
    heights = heights.reshape((height, width))
    heights[(heights < 0) | (heights > 20000)] = 0
    return heights

>>> h,w = 100, 101; z = list(range(h*w))
>>> %timeit orig(z,h,w)
100 loops, best of 3: 9.71 ms per loop
>>> %timeit faster(z,h,w)
1000 loops, best of 3: 641 µs per loop
>>> np.allclose(orig(z,h,w), faster(z,h,w))
True

That ratio seems to hold even for longer z:

>>> h,w = 1000, 10001; z = list(range(h*w))
>>> %timeit orig(z,h,w)
1 loops, best of 3: 9.44 s per loop
>>> %timeit faster(z,h,w)
1 loops, best of 3: 675 ms per loop
Sign up to request clarification or add additional context in comments.

1 Comment

Beat me to it! You also might want to mention reading the data in using np.fromstring(contents, dtype=np.uint16) (or fromfile if it's originally in a file) instead of struct.unpack. It's usually significantly faster than unpacking into a tuple using struct and then converting to an array for large datasets.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.