6

I have a list of points in 3d coordinates system (X, Y, Z). Moreover, each of them have assigned a float value v, so a single point might be described as (x, y, z, v). This list is represented as a numpy array of shape=(N,4). For each 2d position x, y I need to get the maximum value of v. A straightforward but computationally expensive way would be:

for index in range(points.shape[0]):
    x = points[index, 0]
    y = points[index, 1]
    v = points[index, 3]

    maxes[x, y] = np.max(maxes[x, y], v)

Is there a more "numpy" approach, which would be able to bring some gain in terms of performance?

7
  • Why is np.max(points[:, 3]) not sufficient? Commented Sep 19, 2018 at 15:00
  • 1
    @CedricH. They want to know the max for each x, y pair. Basically groupby x, y and find the max for each group Commented Sep 19, 2018 at 15:01
  • @SVengat yeap, however I wonder if numpy would be able to vectorize this in someway. Commented Sep 19, 2018 at 15:02
  • @GiacomoAlzetta Thanks, I missed that. Commented Sep 19, 2018 at 15:11
  • Can you post a sample of your list? Are they actually tuples? Or just arrays Commented Sep 19, 2018 at 15:11

4 Answers 4

4

Setup

points = np.array([[ 0,  0,  1,  1],
                   [ 0,  0,  2,  2],
                   [ 1,  0,  3,  0],
                   [ 1,  0,  4,  1],
                   [ 0,  1,  5, 10]])

The general idea here is sorting using the first, second, and fourth columns, and reversing that result, so that when we find our unique values, the value with the maximum value in the fourth column will be above other values with similar x and y coordinates. Then we use np.unique to find the unique values in the first and second columns, and return those results, which will have the maximum v:

Using lexsort and numpy.unique

def max_xy(a):
    res = a[np.lexsort([a[:, 3], a[:, 1], a[:, 0]])[::-1]]
    vals, idx = np.unique(res[:, :2], 1, axis=0)
    maximums = res[idx]
    return maximums[:, [0,1,3]]

array([[ 0,  0,  2],
       [ 0,  1, 10],
       [ 1,  0,  1]])

Avoiding unique for better performance

def max_xy_v2(a):
    res = a[np.lexsort([a[:, 3], a[:, 1], a[:, 0]])[::-1]]
    res = res[np.append([True], np.any(np.diff(res[:, :2],axis=0),1))]
    return res[:, [0,1,3]]

max_xy_v2(points)

array([[ 1,  0,  1],
       [ 0,  1, 10],
       [ 0,  0,  2]])

Notice that while both will return correct results, they will not be sorted as the original lists were, you can simply add another lexsort at the end to fix this if you like.

Sign up to request clarification or add additional context in comments.

1 Comment

Thhe second solution is great!! Thank you a lot! :)
2

Sorry, not a purely "numpy" solution either, but numpy_indexed package provides a very convenient (and fast) way to do this.

import numpy_indexed as npi
npi.group_by(points[:, 0:2]).max(points[:,3])

Comparison to Other Methods

%timeit npi.group_by(points[:, 0:2]).max(points[:,3])
58 µs ± 435 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


%timeit pd.DataFrame(points, columns=['X', 'Y', 'Z', 'V']).groupby(by=['X', 'Y']).apply(lambda r: r['V'].max()).reset_index().values
3.15 ms ± 36.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

def max_xy_ver1(a):
    res = a[np.lexsort([a[:, 0], a[:, 1], a[:, 3]])[::-1]]
    vals, idx = np.unique(res[:, :2], 1, axis=0)
    maximums = res[idx]
    return maximums[:, [0,1,3]]

%timeit max_xy_ver1(points)
63.5 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

def max_xy_ver2(a):
    res = a[np.lexsort([a[:, 3], a[:, 1], a[:, 0]])[::-1]]
    res = res[np.append([True], np.any(np.diff(res[:, :2],axis=0),1))]
    return res[:, [0,1,3]]

%timeit_max_xy_ver2(points) # current winner
31.7 µs ± 524 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

def findmaxes_simple(points):
    maxes = {}
    for index in range(points.shape[0]):
        x = points[index, 0]
        y = points[index, 1]
        v = points[index, 3]
        maxes[(x, y)] = v if (x,y) not in maxes else max(maxes[(x, y)],v)
    return maxes

%timeit findmaxes_simple(points)
82.6 µs ± 632 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Installing numpy_indexed via Pip

pip install --user numpy_indexed

(If you are on Ubuntu and some other Linux distros, you may have to use pip3 to install the package for python 3)

Data Used for Tests

Pastebin here.

2 Comments

Mind adding the timings for my updated solution? Should be considerably faster
@user3483203 sure thing, 1 sec :)
1

This is not pure numpy and I take advantage of pandas which I believe will do its best to vectorize:

a = [
    [0, 0, 1, 1],
    [0, 0, 2, 2],
    [1, 0, 3, 0],
    [1, 0, 4, 1],
    [0, 1, 5, 10],
]
pd.DataFrame(a, columns=['X', 'Y', 'Z', 'V']).groupby(by=['X', 'Y']).apply(lambda r: r['V'].max()).reset_index().values

Returning this:

array([[ 0,  0,  2],
       [ 0,  1, 10],
       [ 1,  0,  1]])

1 Comment

Here is a vectorized version of this answer: pd.DataFrame(a, columns=['X', 'Y', 'Z', 'V']).groupby(by=['X', 'Y']).V.max().reset_index().values
0

In pure numpy :

import numpy as np

points = np.array([(1,2,3,4),
                   (2,3,5,6),
                   (1,2,9,8)])  #an example,

def find_vmax(x, y) :
    xpoints = points[np.where( points[:,0] == x)[0]]
    xypoints = xpoints[np.where( xpoints[:,1] == y)[0]]
    return np.max(xypoints[:, 3])

print(find_vmax(1, 2))

1 Comment

This provides only the result for a single (x,y) combination. You would have to use np.unique to get all of them and iterate through all of them in a standard for loop to get the complete solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.