Assign numpy array of points to a 2D square grid

Question

I'm going beyond my previous question because of speed problems. I have an array of Lat/Lon coordinates of points, and I would like to assign them to an index code derived from a 2D square grid of equal size cells. This is an example of how it would be. Let's called points my first array containing coordinates (called them [x y] pairs) of six points:

points = [[ 1.5  1.5]
 [ 1.1  1.1]
 [ 2.2  2.2]
 [ 1.3  1.3]
 [ 3.4  1.4]
 [ 2.   1.5]]

Then I have another array containing the coordinates of the vertices of a grid of two cells in the form [minx,miny,maxx,maxy]; let's call it bounds:

bounds = [[ 0.  0.  2.  2.]
 [ 2.  2.  3.  3.]]

I would like to find which points are in which boundary, and then assign a code derived from the bounds array index (in this case the first cell has code 0, the second 1 and so on...). Since the cells are squares, the easiest way to compute if each point is in each cell is to evaluate:

x > minx & x < maxx & y > miny & y < maxy

So that the resulting array would appear as:

results = [0 0 1 0 NaN NaN]

where NaN means that the point is outside cells. The number of elements in my real case is of the order of finding 10^6 points into 10^4 cells. Is there a way to do this kind of things in a fast way using numpy arrays?

EDIT: to clarify, the results array expected means that the first points is inside the first cell (0 index of the bounds array) so the second, and the first is inside the second cell of the bounds array and so on...

Is [0 0 1 0 NaN NaN] the result of preceding bounds and points? and can you explain how you used bounds? — Kasravnd
– Kasravnd, Commented May 27, 2015 at 11:53
Yes, intersecting each point with the two cells and getting the corresponding cell code. — Fabio Lamanna
– Fabio Lamanna, Commented May 27, 2015 at 11:54
If bounds is an array of [minx miny maxx maxy] values, the problem is to implement the function x > minx & x < maxx & y > miny & y < maxy in order to determined that, for instance, the first point is in the first cell of the bounds array. Hope it helps. — Fabio Lamanna
– Fabio Lamanna, Commented May 27, 2015 at 11:57
I get it but bounds has 2 item!!! how you used that 2 item? — Kasravnd
– Kasravnd, Commented May 27, 2015 at 12:00
Just iterating over them. I mean, the problem here is implementing the search function over the bounds array, which has two items because it contains two cells. The first item has index 0 and the second 1, so that I would like to assign them to each point. — Fabio Lamanna
– Fabio Lamanna, Commented May 27, 2015 at 12:02

swenzel · Accepted Answer · 2015-05-27 20:59:39Z

Here is a vectorized approach to your problem. It should speed things up significantly.

import numpy as np
def findCells(points, bounds):
    # make sure points is n by 2 (pool.map might send us 1D arrays)
    points = points.reshape((-1,2))

    # check for each point if all coordinates are in bounds
    # dimension 0 is bound
    # dimension 1 is is point
    allInBounds = (points[:,0] > bounds[:,None,0])
    allInBounds &= (points[:,1] > bounds[:,None,1])
    allInBounds &= (points[:,0] < bounds[:,None,2])
    allInBounds &= (points[:,1] < bounds[:,None,3])


    # now find out the positions of all nonzero (i.e. true) values
    # nz[0] contains the indices along dim 0 (bound)
    # nz[1] contains the indices along dim 1 (point)
    nz = np.nonzero(allInBounds)

    # initialize the result with all nan
    r = np.full(points.shape[0], np.nan)
    # now use nz[1] to index point position and nz[0] to tell which cell the
    # point belongs to
    r[nz[1]] = nz[0]
    return r

def findCellsParallel(points, bounds, chunksize=100):
    import multiprocessing as mp
    from functools import partial

    func = partial(findCells, bounds=bounds)

    # using python3 you could also do 'with mp.Pool() as p:'  
    p = mp.Pool()
    try:
        return np.hstack(p.map(func, points, chunksize))
    finally:
        p.close()

def main():
    nPoints = 1e6
    nBounds = 1e4

    # points = np.array([[ 1.5, 1.5],
    #                    [ 1.1, 1.1],
    #                    [ 2.2, 2.2],
    #                    [ 1.3, 1.3],
    #                    [ 3.4, 1.4],
    #                    [ 2. , 1.5]])

    points = np.random.random([nPoints, 2])

    # bounds = np.array([[0,0,2,2],
    #                    [2,2,3,3]])

    # bounds = np.array([[0,0,1.4,1.4],
    #                    [1.4,1.4,2,2],
    #                    [2,2,3,3]])

    bounds = np.sort(np.random.random([nBounds, 2, 2]), 1).reshape(nBounds, 4)

    r = findCellsParallel(points, bounds)
    print(points[:10])
    for bIdx in np.unique(r[:10]):
        if np.isnan(bIdx):
            continue
        print("{}: {}".format(bIdx, bounds[bIdx]))
    print(r[:10])

if __name__ == "__main__":
    main()

Edit:
Trying it with your amount of data gave me a MemoryError. You can avoid that and even speed things up a little more if you use multiprocessing.Pool with its map function, see updated code.

Result:

>time python test.py
[[ 0.69083585  0.19840985]
 [ 0.31732711  0.80462512]
 [ 0.30542996  0.08569184]
 [ 0.72582609  0.46687164]
 [ 0.50534322  0.35530554]
 [ 0.93581095  0.36375539]
 [ 0.66226118  0.62573407]
 [ 0.08941219  0.05944215]
 [ 0.43015872  0.95306899]
 [ 0.43171644  0.74393729]]
9935.0: [ 0.31584562  0.18404152  0.98215445  0.83625487]
9963.0: [ 0.00526106  0.017255    0.33177741  0.9894455 ]
9989.0: [ 0.17328876  0.08181912  0.33170444  0.23493507]
9992.0: [ 0.34548987  0.15906761  0.92277442  0.9972481 ]
9993.0: [ 0.12448765  0.5404578   0.33981119  0.906822  ]
9996.0: [ 0.41198261  0.50958195  0.62843379  0.82677092]
9999.0: [ 0.437169    0.17833114  0.91096133  0.70713434]
[ 9999.  9993.  9989.  9999.  9999.  9935.  9999.  9963.  9992.  9996.]

real 0m 24.352s
user 3m  4.919s
sys  0m  1.464s

Thanks! Actually one point can be only in one and one single bound...so it should work fine!
@Fiabetto you might want to have a look at my answer again. I'm now down to 24s on my machine with 1e6 points and 1e4 bounds.

Kasravnd · Accepted Answer · 2015-05-27 14:50:24Z

2

You can use a nested loop with to check the condition and yield the result as a generator :

points = [[ 1.5  1.5]
 [ 1.1  1.1]
 [ 2.2  2.2]
 [ 1.3  1.3]
 [ 3.4  1.4]
 [ 2.   1.5]]

bounds = [[ 0.  ,0. , 2.,  2.],
 [ 2.  ,2.  ,3.,  3.]]

import numpy as np

def pos(p,b):
  for x,y in p:
    flag=False
    for index,dis in enumerate(b):
      minx,miny,maxx,maxy=dis
      if x > minx and x < maxx and y > miny and y < maxy :
        flag=True
        yield index
    if not flag:
        yield 'NaN'


print list(pos(points,bounds))

result :

[0, 0, 1, 0, 'NaN', 'NaN']

edited May 27, 2015 at 14:50

answered May 27, 2015 at 12:20

Kasravnd

108k19 gold badges167 silver badges195 bronze badges

3 Comments

Fabio Lamanna Over a year ago

Thanks Kasra, but maybe is there a typo? The result list length should be equal to the length of the points input array.

Kasravnd Over a year ago

@Fiabetto Welcome. yeah :) it was just a typo!

Fabio Lamanna Over a year ago

Thanks for your help I'm testing your method on speed time compare to my old one! Thanks again!

Asking Questions · Accepted Answer · 2015-05-27 13:44:58Z

I would do it like this:

import numpy as np

points = np.random.rand(10,2)

xmin = [0.25,0.5]
ymin = [0.25,0.5]

results = np.zeros(len(points))

for i in range(len(xmin)):
     bool_index_array = np.greater(points, [xmin[i],ymin[i]])
     print "boolean index of (x,y) greater (xmin, ymin): ", bool_index_array
     indicies_of_true_true = np.where(bool_index_array[:,0]*bool_index_array[:,1]==1)[0]
     print "indices of [True,True]: ", indicies_of_true_true
     results[indicies_of_true_true] += 1

print "results: ", results

[out]: [ 1.  1.  1.  2.  0.  0.  1.  1.  1.  1.]

This uses the lower boundaries to catagorize your points into the groups:

1 (if xmin[0] < x <= xmin[1] & ymin[0] < y <= ymin[1])
2 (if x > xmin[1] & y > ymin[1])
0 if none of the conditions above are fullfilled

Collectives™ on Stack Overflow

Assign numpy array of points to a 2D square grid

3 Answers 3

2 Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related