Can I vectorize this Python code?

Question

I'm kind of new to Python and I have to implement "fast as possible" version of this code.

s="<%dH" % (int(width*height),)
z=struct.unpack(s, contents)

heights = np.zeros((height,width))
for r in range(0,height):
    for c in range(0,width):
        elevation=z[((width)*r)+c]
        if (elevation==65535 or elevation<0 or elevation>20000):
            elevation=0.0

        heights[r][c]=float(elevation)

I've read some of the python vectorization questions... but I don't think it applies to my case. Most of the questions are things like using np.sum instead of for loops. I guess I have two questions:

Is it possible to speed up this code...I think heights[r][c]=float(elevation) is where the bottleneck is. I need to find some Python timing commands to confirm this.
If it possible to speed up this code. What are my options? I have seen some people recommend cython, pypy, weave. I could do this faster in C but this code also need to generate plots so I'd like to stick with Python so I can use matplotlib.

RunSnakeRun is an excellent Python profile viewer that shows time usage in a treemap format. Get the profile by turning dostuff() into profile.runctx('dostuff()', globals(), locals(), filename='out.profile') — Nick T
– Nick T, Commented Feb 21, 2015 at 1:12

DSM · Accepted Answer · 2015-02-21 01:02:38Z

6

As you mention, the key to writing fast code with numpy involves vectorization, and pushing the work off to fast C-level routines instead of Python loops. The usual approach seems to improve things by a factor of ten or so relative to your original code:

def faster(elevation, height, width):
    heights = np.array(elevation, dtype=float)
    heights = heights.reshape((height, width))
    heights[(heights < 0) | (heights > 20000)] = 0
    return heights

>>> h,w = 100, 101; z = list(range(h*w))
>>> %timeit orig(z,h,w)
100 loops, best of 3: 9.71 ms per loop
>>> %timeit faster(z,h,w)
1000 loops, best of 3: 641 µs per loop
>>> np.allclose(orig(z,h,w), faster(z,h,w))
True

That ratio seems to hold even for longer z:

>>> h,w = 1000, 10001; z = list(range(h*w))
>>> %timeit orig(z,h,w)
1 loops, best of 3: 9.44 s per loop
>>> %timeit faster(z,h,w)
1 loops, best of 3: 675 ms per loop

answered Feb 21, 2015 at 1:02

DSM

355k67 gold badges606 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Joe Kington Over a year ago

Beat me to it! You also might want to mention reading the data in using np.fromstring(contents, dtype=np.uint16) (or fromfile if it's originally in a file) instead of struct.unpack. It's usually significantly faster than unpacking into a tuple using struct and then converting to an array for large datasets.

Collectives™ on Stack Overflow

Can I vectorize this Python code?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related