Compare numpy arrays of different sizes

Question

I have two arrays of points with xy coordinates:

basic_pts = np.array([[0, 0], [1, 0], [2, 0], [0, 1], [1, 1], [0, 2]])
new_pts = np.array([[2, 2], [2, 1], [0.5, 0.5], [1.5, 0.5]])

As result I want from the array new_ptsonly those points, that fulfil the condition that there is no point in basic_ptswith a bigger x AND y value. So a result would be

res_pts = np.array([[2, 2], [2, 1], [1.5, 0.5]])

I have a solution which works but because of working with list comprehension it is not suitable for a bigger amount of data.

x_cond = ([basic_pts[:, 0] > x for x in new_pts[:, 1]])
y_cond = ([basic_pts[:, 1] > y for y in new_pts[:, 1]])
xy_cond_ = np.logical_and(x_cond, y_cond)
xy_cond = np.swapaxes(xy_cond_, 0, 1)
mask = np.invert(np.logical_or.reduce(xy_cond))
res_pts = new_pts[mask]

Is there a better way to solve this only with numpy and without list comprehension?

Divakar · Accepted Answer · 2016-03-07 17:02:48Z

1

You could use NumPy broadcasting -

# Get xy_cond equivalent after extending dimensions of basic_pts to a 2D array
# version by "pushing" separately col-0 and col-1 to axis=0 and creating a 
# singleton dimension at axis=1. 
# Compare these two extended versions with col-1 of new_pts. 
xyc = (basic_pts[:,0,None] > new_pts[:,1]) & (basic_pts[:,1,None] > new_pts[:,1])

# Create mask equivalent and index into new_pts to get selective rows from it
mask = ~(xyc).any(0)
res_pts_out = new_pts[mask]

edited Mar 7, 2016 at 17:02

answered Mar 7, 2016 at 16:55

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

val Over a year ago

This was my thought as well; note however that it ends up creating an intermediate (len(basic_pts), len(new_pts)) intermediate array, which can be pretty memory intensive (OP mentioned 'big amounts of data')

Divakar Over a year ago

@val Yeah that could be an issue with really huge datasizes. Thanks for pointing that out!

Mathias Rav · Accepted Answer · 2016-03-07 18:08:24Z

As val points out, a solution that creates an intermediate len(basic_pts) × len(new_pts) array can be too memory intensive. On the other hand, a solution that tests each point in new_pts in a loop can be too time-consuming. We can bridge the gap by picking a batch size k and testing new_pts in batches of size k using Divakar's solution:

basic_pts = np.array([[0, 0], [1, 0], [2, 0], [0, 1], [1, 1], [0, 2]])
new_pts = np.array([[2, 2], [2, 1], [0.5, 0.5], [1.5, 0.5]])
k = 2
subresults = []
for i in range(0, len(new_pts), k):
    j = min(i + k, len(new_pts))
    # Process new_pts[i:j] using Divakar's solution
    xyc = np.logical_and(
        basic_pts[:, np.newaxis, 0] > new_pts[np.newaxis, i:j, 0],
        basic_pts[:, np.newaxis, 1] > new_pts[np.newaxis, i:j, 1])
    mask = ~(xyc).any(axis=0)
    # mask indicates which points among new_pts[i:j] to use
    subresults.append(new_pts[i:j][mask])
# Concatenate subresult lists
res = np.concatenate(subresults)
print(res)
# Prints:
array([[ 2. ,  2. ],
       [ 2. ,  1. ],
       [ 1.5,  0.5]])

Collectives™ on Stack Overflow

Compare numpy arrays of different sizes

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related