3

I'm going beyond my previous question because of speed problems. I have an array of Lat/Lon coordinates of points, and I would like to assign them to an index code derived from a 2D square grid of equal size cells. This is an example of how it would be. Let's called points my first array containing coordinates (called them [x y] pairs) of six points:

points = [[ 1.5  1.5]
 [ 1.1  1.1]
 [ 2.2  2.2]
 [ 1.3  1.3]
 [ 3.4  1.4]
 [ 2.   1.5]]

Then I have another array containing the coordinates of the vertices of a grid of two cells in the form [minx,miny,maxx,maxy]; let's call it bounds:

bounds = [[ 0.  0.  2.  2.]
 [ 2.  2.  3.  3.]]

I would like to find which points are in which boundary, and then assign a code derived from the bounds array index (in this case the first cell has code 0, the second 1 and so on...). Since the cells are squares, the easiest way to compute if each point is in each cell is to evaluate:

x > minx & x < maxx & y > miny & y < maxy

So that the resulting array would appear as:

results = [0 0 1 0 NaN NaN]

where NaN means that the point is outside cells. The number of elements in my real case is of the order of finding 10^6 points into 10^4 cells. Is there a way to do this kind of things in a fast way using numpy arrays?

EDIT: to clarify, the results array expected means that the first points is inside the first cell (0 index of the bounds array) so the second, and the first is inside the second cell of the bounds array and so on...

5
  • Is [0 0 1 0 NaN NaN] the result of preceding bounds and points? and can you explain how you used bounds? Commented May 27, 2015 at 11:53
  • Yes, intersecting each point with the two cells and getting the corresponding cell code. Commented May 27, 2015 at 11:54
  • If bounds is an array of [minx miny maxx maxy] values, the problem is to implement the function x > minx & x < maxx & y > miny & y < maxy in order to determined that, for instance, the first point is in the first cell of the bounds array. Hope it helps. Commented May 27, 2015 at 11:57
  • I get it but bounds has 2 item!!! how you used that 2 item? Commented May 27, 2015 at 12:00
  • Just iterating over them. I mean, the problem here is implementing the search function over the bounds array, which has two items because it contains two cells. The first item has index 0 and the second 1, so that I would like to assign them to each point. Commented May 27, 2015 at 12:02

3 Answers 3

3

Here is a vectorized approach to your problem. It should speed things up significantly.

import numpy as np
def findCells(points, bounds):
    # make sure points is n by 2 (pool.map might send us 1D arrays)
    points = points.reshape((-1,2))

    # check for each point if all coordinates are in bounds
    # dimension 0 is bound
    # dimension 1 is is point
    allInBounds = (points[:,0] > bounds[:,None,0])
    allInBounds &= (points[:,1] > bounds[:,None,1])
    allInBounds &= (points[:,0] < bounds[:,None,2])
    allInBounds &= (points[:,1] < bounds[:,None,3])


    # now find out the positions of all nonzero (i.e. true) values
    # nz[0] contains the indices along dim 0 (bound)
    # nz[1] contains the indices along dim 1 (point)
    nz = np.nonzero(allInBounds)

    # initialize the result with all nan
    r = np.full(points.shape[0], np.nan)
    # now use nz[1] to index point position and nz[0] to tell which cell the
    # point belongs to
    r[nz[1]] = nz[0]
    return r

def findCellsParallel(points, bounds, chunksize=100):
    import multiprocessing as mp
    from functools import partial

    func = partial(findCells, bounds=bounds)

    # using python3 you could also do 'with mp.Pool() as p:'  
    p = mp.Pool()
    try:
        return np.hstack(p.map(func, points, chunksize))
    finally:
        p.close()

def main():
    nPoints = 1e6
    nBounds = 1e4

    # points = np.array([[ 1.5, 1.5],
    #                    [ 1.1, 1.1],
    #                    [ 2.2, 2.2],
    #                    [ 1.3, 1.3],
    #                    [ 3.4, 1.4],
    #                    [ 2. , 1.5]])

    points = np.random.random([nPoints, 2])

    # bounds = np.array([[0,0,2,2],
    #                    [2,2,3,3]])

    # bounds = np.array([[0,0,1.4,1.4],
    #                    [1.4,1.4,2,2],
    #                    [2,2,3,3]])

    bounds = np.sort(np.random.random([nBounds, 2, 2]), 1).reshape(nBounds, 4)

    r = findCellsParallel(points, bounds)
    print(points[:10])
    for bIdx in np.unique(r[:10]):
        if np.isnan(bIdx):
            continue
        print("{}: {}".format(bIdx, bounds[bIdx]))
    print(r[:10])

if __name__ == "__main__":
    main()

Edit:
Trying it with your amount of data gave me a MemoryError. You can avoid that and even speed things up a little more if you use multiprocessing.Pool with its map function, see updated code.

Result:

>time python test.py
[[ 0.69083585  0.19840985]
 [ 0.31732711  0.80462512]
 [ 0.30542996  0.08569184]
 [ 0.72582609  0.46687164]
 [ 0.50534322  0.35530554]
 [ 0.93581095  0.36375539]
 [ 0.66226118  0.62573407]
 [ 0.08941219  0.05944215]
 [ 0.43015872  0.95306899]
 [ 0.43171644  0.74393729]]
9935.0: [ 0.31584562  0.18404152  0.98215445  0.83625487]
9963.0: [ 0.00526106  0.017255    0.33177741  0.9894455 ]
9989.0: [ 0.17328876  0.08181912  0.33170444  0.23493507]
9992.0: [ 0.34548987  0.15906761  0.92277442  0.9972481 ]
9993.0: [ 0.12448765  0.5404578   0.33981119  0.906822  ]
9996.0: [ 0.41198261  0.50958195  0.62843379  0.82677092]
9999.0: [ 0.437169    0.17833114  0.91096133  0.70713434]
[ 9999.  9993.  9989.  9999.  9999.  9935.  9999.  9963.  9992.  9996.]

real 0m 24.352s
user 3m  4.919s
sys  0m  1.464s
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! Actually one point can be only in one and one single bound...so it should work fine!
@Fiabetto you might want to have a look at my answer again. I'm now down to 24s on my machine with 1e6 points and 1e4 bounds.
2

You can use a nested loop with to check the condition and yield the result as a generator :

points = [[ 1.5  1.5]
 [ 1.1  1.1]
 [ 2.2  2.2]
 [ 1.3  1.3]
 [ 3.4  1.4]
 [ 2.   1.5]]

bounds = [[ 0.  ,0. , 2.,  2.],
 [ 2.  ,2.  ,3.,  3.]]

import numpy as np

def pos(p,b):
  for x,y in p:
    flag=False
    for index,dis in enumerate(b):
      minx,miny,maxx,maxy=dis
      if x > minx and x < maxx and y > miny and y < maxy :
        flag=True
        yield index
    if not flag:
        yield 'NaN'


print list(pos(points,bounds))

result :

[0, 0, 1, 0, 'NaN', 'NaN']

3 Comments

Thanks Kasra, but maybe is there a typo? The result list length should be equal to the length of the points input array.
@Fiabetto Welcome. yeah :) it was just a typo!
Thanks for your help I'm testing your method on speed time compare to my old one! Thanks again!
1

I would do it like this:

import numpy as np

points = np.random.rand(10,2)

xmin = [0.25,0.5]
ymin = [0.25,0.5]

results = np.zeros(len(points))

for i in range(len(xmin)):
     bool_index_array = np.greater(points, [xmin[i],ymin[i]])
     print "boolean index of (x,y) greater (xmin, ymin): ", bool_index_array
     indicies_of_true_true = np.where(bool_index_array[:,0]*bool_index_array[:,1]==1)[0]
     print "indices of [True,True]: ", indicies_of_true_true
     results[indicies_of_true_true] += 1

print "results: ", results

[out]: [ 1.  1.  1.  2.  0.  0.  1.  1.  1.  1.]

This uses the lower boundaries to catagorize your points into the groups:

  • 1 (if xmin[0] < x <= xmin[1] & ymin[0] < y <= ymin[1])
  • 2 (if x > xmin[1] & y > ymin[1])
  • 0 if none of the conditions above are fullfilled

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.