Numpy: proper way of getting maximum from a list of points

Question

I have a list of points in 3d coordinates system (X, Y, Z). Moreover, each of them have assigned a float value v, so a single point might be described as (x, y, z, v). This list is represented as a numpy array of shape=(N,4). For each 2d position x, y I need to get the maximum value of v. A straightforward but computationally expensive way would be:

for index in range(points.shape[0]):
    x = points[index, 0]
    y = points[index, 1]
    v = points[index, 3]

    maxes[x, y] = np.max(maxes[x, y], v)

Is there a more "numpy" approach, which would be able to bring some gain in terms of performance?

@CedricH. They want to know the max for each x, y pair. Basically groupby x, y and find the max for each group — Giacomo Alzetta
– Giacomo Alzetta, Commented Sep 19, 2018 at 15:01
@SVengat yeap, however I wonder if numpy would be able to vectorize this in someway. — thatsme
– thatsme, Commented Sep 19, 2018 at 15:02
Can you post a sample of your list? Are they actually tuples? Or just arrays — user3483203
– user3483203, Commented Sep 19, 2018 at 15:11

user3483203 · Accepted Answer · 2018-09-19 16:00:15Z

4

Setup

points = np.array([[ 0,  0,  1,  1],
                   [ 0,  0,  2,  2],
                   [ 1,  0,  3,  0],
                   [ 1,  0,  4,  1],
                   [ 0,  1,  5, 10]])

The general idea here is sorting using the first, second, and fourth columns, and reversing that result, so that when we find our unique values, the value with the maximum value in the fourth column will be above other values with similar x and y coordinates. Then we use np.unique to find the unique values in the first and second columns, and return those results, which will have the maximum v:

Using `lexsort` and `numpy.unique`

def max_xy(a):
    res = a[np.lexsort([a[:, 3], a[:, 1], a[:, 0]])[::-1]]
    vals, idx = np.unique(res[:, :2], 1, axis=0)
    maximums = res[idx]
    return maximums[:, [0,1,3]]

array([[ 0,  0,  2],
       [ 0,  1, 10],
       [ 1,  0,  1]])

Avoiding `unique` for better performance

def max_xy_v2(a):
    res = a[np.lexsort([a[:, 3], a[:, 1], a[:, 0]])[::-1]]
    res = res[np.append([True], np.any(np.diff(res[:, :2],axis=0),1))]
    return res[:, [0,1,3]]

max_xy_v2(points)

array([[ 1,  0,  1],
       [ 0,  1, 10],
       [ 0,  0,  2]])

Notice that while both will return correct results, they will not be sorted as the original lists were, you can simply add another lexsort at the end to fix this if you like.

edited Sep 19, 2018 at 16:00

answered Sep 19, 2018 at 15:24

user3483203

51.3k10 gold badges72 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

thatsme Over a year ago

Thhe second solution is great!! Thank you a lot! :)

Greg Kramida · Accepted Answer · 2018-09-19 15:59:52Z

2

Sorry, not a purely "numpy" solution either, but numpy_indexed package provides a very convenient (and fast) way to do this.

import numpy_indexed as npi
npi.group_by(points[:, 0:2]).max(points[:,3])

Comparison to Other Methods

%timeit npi.group_by(points[:, 0:2]).max(points[:,3])
58 µs ± 435 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


%timeit pd.DataFrame(points, columns=['X', 'Y', 'Z', 'V']).groupby(by=['X', 'Y']).apply(lambda r: r['V'].max()).reset_index().values
3.15 ms ± 36.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

def max_xy_ver1(a):
    res = a[np.lexsort([a[:, 0], a[:, 1], a[:, 3]])[::-1]]
    vals, idx = np.unique(res[:, :2], 1, axis=0)
    maximums = res[idx]
    return maximums[:, [0,1,3]]

%timeit max_xy_ver1(points)
63.5 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

def max_xy_ver2(a):
    res = a[np.lexsort([a[:, 3], a[:, 1], a[:, 0]])[::-1]]
    res = res[np.append([True], np.any(np.diff(res[:, :2],axis=0),1))]
    return res[:, [0,1,3]]

%timeit_max_xy_ver2(points) # current winner
31.7 µs ± 524 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

def findmaxes_simple(points):
    maxes = {}
    for index in range(points.shape[0]):
        x = points[index, 0]
        y = points[index, 1]
        v = points[index, 3]
        maxes[(x, y)] = v if (x,y) not in maxes else max(maxes[(x, y)],v)
    return maxes

%timeit findmaxes_simple(points)
82.6 µs ± 632 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Installing numpy_indexed via Pip

pip install --user numpy_indexed

(If you are on Ubuntu and some other Linux distros, you may have to use pip3 to install the package for python 3)

Data Used for Tests

Pastebin here.

edited Sep 19, 2018 at 15:59

answered Sep 19, 2018 at 15:38

Greg Kramida

4,2845 gold badges32 silver badges49 bronze badges

2 Comments

user3483203 Over a year ago

Mind adding the timings for my updated solution? Should be considerably faster

Greg Kramida Over a year ago

@user3483203 sure thing, 1 sec :)

Cedric H. · Accepted Answer · 2018-09-19 15:16:51Z

1

This is not pure numpy and I take advantage of pandas which I believe will do its best to vectorize:

a = [
    [0, 0, 1, 1],
    [0, 0, 2, 2],
    [1, 0, 3, 0],
    [1, 0, 4, 1],
    [0, 1, 5, 10],
]
pd.DataFrame(a, columns=['X', 'Y', 'Z', 'V']).groupby(by=['X', 'Y']).apply(lambda r: r['V'].max()).reset_index().values

Returning this:

array([[ 0,  0,  2],
       [ 0,  1, 10],
       [ 1,  0,  1]])

answered Sep 19, 2018 at 15:16

Cedric H.

8,31612 gold badges61 silver badges88 bronze badges

1 Comment

user3483203 Over a year ago

Here is a vectorized version of this answer: pd.DataFrame(a, columns=['X', 'Y', 'Z', 'V']).groupby(by=['X', 'Y']).V.max().reset_index().values

Ub2r · Accepted Answer · 2018-09-20 08:11:14Z

0

In pure numpy :

import numpy as np

points = np.array([(1,2,3,4),
                   (2,3,5,6),
                   (1,2,9,8)])  #an example,

def find_vmax(x, y) :
    xpoints = points[np.where( points[:,0] == x)[0]]
    xypoints = xpoints[np.where( xpoints[:,1] == y)[0]]
    return np.max(xypoints[:, 3])

print(find_vmax(1, 2))

edited Sep 20, 2018 at 8:11

answered Sep 19, 2018 at 15:36

Ub2r

314 bronze badges

1 Comment

Greg Kramida Over a year ago

This provides only the result for a single (x,y) combination. You would have to use np.unique to get all of them and iterate through all of them in a standard for loop to get the complete solution.

Collectives™ on Stack Overflow

Numpy: proper way of getting maximum from a list of points

4 Answers 4

Using `lexsort` and `numpy.unique`

Avoiding `unique` for better performance

1 Comment

Comparison to Other Methods

Installing numpy_indexed via Pip

Data Used for Tests

2 Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Using lexsort and numpy.unique

Avoiding unique for better performance

1 Comment

Comparison to Other Methods

Installing numpy_indexed via Pip

Data Used for Tests

2 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related

Using `lexsort` and `numpy.unique`

Avoiding `unique` for better performance