Minimum Distance Algorithm using GDAL and Python

Question

I'm trying to implement the Minimum Distance Algorithm for image classification using GDAL and Python. After calculating the mean pixel-value of the sample areas and storing them into a list of arrays ("sample_array"), I read the image into an array called "values". With the following code I loop through this array:

values = valBD.ReadAsArray()

# loop through pixel columns
for X in range(0,XSize):

    # loop thorugh pixel lines
    for Y in range (0, YSize):

        # initialize variables
        minDist = 9999
        # get minimum distance
        for iSample in range (0, sample_count):
            # dist = calc_distance(values[jPixel, iPixel], sample_array[iSample])

            # computing minimum distance
            iPixelVal = values[Y, X]
            mean = sample_array[iSample]
            dist = math.sqrt((iPixelVal - mean) * (iPixelVal - mean)) # only for testing

            if dist < minDist:
                minDist = dist
                values[Y, X] = iSample

classBD.WriteArray(values, xoff=0, yoff=0)

This procedure takes very long for big images. That's why I want to ask if somebody knows a faster method. I don't know much about access-speed of different variables in python. Or maybe someone knows a libary I could use. Thanks in advance, Mario

Doing anything pixel-by-pixel in pure Python is likely to be slow. You could look at PIL, the Python Imaging Library, to see if its functions are useful for this. — Thomas K
– Thomas K, Commented May 5, 2011 at 21:49

Robin Kraft · Accepted Answer · 2011-08-15 05:14:16Z

You should definitely be using NumPy. I work with some pretty large raster datasets and NumPy burns through them. On my machine, with the code below there's no noticeable delay for a 1000 x 1000 array. An explanation of how this works follows the code.

import numpy as np
from scipy.spatial.distance import cdist

# some starter data
dim = (1000,1000)
values = np.random.randint(0, 10, dim)

# cdist will want 'samples' as a 2-d array
samples = np.array([1, 2, 3]).reshape(-1, 1)

# this could be a one-liner
# 'values' must have the same number of columns as 'samples'
mins = cdist(values.reshape(-1, 1), samples)
outvalues = mins.argmin(axis=1).reshape(dim)

cdist() calculates the "distance" from each element in values to each of the elements in samples. This generates a 1,000,000 x 3 array, where each row n has the distance from pixel nin the original array to each of the sample values [1, 2, 3]. argmin(axis=1) gives you the index of the minimum value along each row, which is what you want. A quick reshape gives you the rectangular format you'd expect for an image.

smci · Accepted Answer · 2011-06-30 21:42:56Z

2

Agree with Thomas K: use PIL, or else write a C-function and wrap it using e.g. ctypes, or at very least use some numPy matrix operations. Or else use pypy on your existing code (JIT-compiled code can be 100x faster, on image code). Try pypy and tell us what speedup you got.

Bottom line: never do stuff pixel-wise like this natively in cPython, the interpreting and memory-mgt overhead will kill you.

edited Jun 30, 2011 at 21:42

answered Jun 30, 2011 at 20:36

smci

34.2k21 gold badges118 silver badges152 bronze badges

Collectives™ on Stack Overflow

Minimum Distance Algorithm using GDAL and Python

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related