numpy group rows from large array

Question

group the rows having same first element and find max according to last column.

Like row 0 and 1 have same starting element so go to last column and find max from it and return that row without looping .

utl = np.array ([[ 21. ,            0.01      ],
                 [ 21. ,            0.02      ],
                 [ 26. ,            0.04      ],
                 [ 26. ,            0.03      ],
                 [ 26. ,            0.03      ],
                 [ 34. ,            0.03      ],
                 [ 34. ,            0.09      ],
                 [ 26. ,            0.03      ]])

output must be

[ 21. ,            0.02      ]
[ 34. ,            0.09      ]
[ 26. ,            0.04      ]

Is the final order important?

mozway
– mozway

2022-01-20 14:02:35 +00:00
Commented Jan 20, 2022 at 14:02 — mozway
– mozway, Commented Jan 20, 2022 at 14:02
no, final order not important

user17984589
– user17984589

2022-01-20 14:04:06 +00:00
Commented Jan 20, 2022 at 14:04 — user17984589
– user17984589, Commented Jan 20, 2022 at 14:04

mozway · Accepted Answer · 2022-01-20 14:17:52Z

2

If the final order of the rows is not important:

# sort by first and second column
a = utl[np.lexsort((utl[:,1], utl[:,0]))]

# get positions of group change
# as we want the max, we take the last row per group
_, idx = np.unique(a[:,0], return_index=True)
idx2 = (idx-1)%a.shape[0]  # or idx2 = np.r_[idx[1:]-1, [a.shape[0]-1]]

# split
a[idx2]

output:

array([[21, 0.02],
       [26, 0.04],
       [34, 0.09]])

solution for the min

a = utl[np.lexsort((utl[:,1], utl[:,0]))]
_, idx = np.unique(a[:,0], return_index=True)
a[idx]

edited Jan 20, 2022 at 14:17

answered Jan 20, 2022 at 14:04

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Mad Physicist Over a year ago

That's not what sort(axis=0) does...

Mad Physicist Over a year ago

Which is why your answer doesn't match. You likely want lexsort. No need for unique at all.

mozway Over a year ago

@MadPhysicist thanks, oversight from my side, I fixed it. If you have a better solution, you can provide it ;)

Mad Physicist Over a year ago

unique performs a second sort. I posted a lexsort solution that only does one sort followed by diff and indexing

mozway Over a year ago

@MadPhysicist thanks for the details and solution, from a practical perspective, however, it doesn't seem to have a very different timing. Maybe once the array is sorted an additional "sorting" is not really causing much loss

Mad Physicist · Accepted Answer · 2022-01-20 14:20:23Z

1

If you use np.lexsort, you can get the maximum by index:

sort_idx = np.lexsort(utl.T[::-1])

The differences in the sorted first column are going to tell you which index to grab from the second:

max_idx = np.r_[np.flatnonzero(np.diff(utl[idx, 0])), len(idx) - 1]
min_idx = np.r_[0, np.flatnonzero(np.diff(utl[idx, 0])) + 1]

The results can be extracted immediately:

minima = utl[idx[min_idx], 1]
maxima = utl[idx[max_idx], 1]

answered Jan 20, 2022 at 14:20

Mad Physicist

116k29 gold badges202 silver badges292 bronze badges

Collectives™ on Stack Overflow

numpy group rows from large array

2 Answers 2

solution for the min

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

solution for the min

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related