Order of elements in a numpy array

Question

I have a 2-d array of shape(nx3), say arr1. Now consider a second array, arr2, of same shape as arr1 and has the same rows. However, the rows are not in the same order. I want to get the indices of each row in arr2 as they are in arr1. I am looking for fastest Pythonic way to do this as n is of the order of 10,000.

For example:

arr1 = numpy.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2 = numpy.array([[4, 5, 6], [7, 8, 9], [1, 2, 3]])
ind = [1, 2, 0]

Note that the row elements need not be integers. In fact they are floats. I have found related answers that use numpy.searchsorted but they work for 1-D arrays only.

Can you share your code ?

Kshitij Saraogi
– Kshitij Saraogi

2015-07-24 03:47:54 +00:00
Commented Jul 24, 2015 at 3:47 — Kshitij Saraogi
– Kshitij Saraogi, Commented Jul 24, 2015 at 3:47

HYRY · Accepted Answer · 2015-07-24 03:54:59Z

3

If you are ensure that arr2 is a permutation of arr1, you can use sort to get the index:

import numpy as np

n = 100000
a1 = np.random.randint(0, 100, size=(n, 3))
a2 = a1[np.random.permutation(np.arange(n))]
idx1 = np.lexsort(a1.T)
idx2 = np.lexsort(a2.T)
idx = idx2[np.argsort(idx1)]
np.all(a1 == a2[idx])

if they don't have exact the same values, you can use kdTree in scipy:

n = 100000

a1 = np.random.uniform(0, 100, size=(n, 3))
a2 = a1[np.random.permutation(np.arange(n))] + np.random.normal(0, 1e-8, size=(n, 3))
from scipy import spatial
tree = spatial.cKDTree(a2)
dist, idx = tree.query(a1)
np.allclose(a1, a2[idx])

answered Jul 24, 2015 at 3:54

HYRY

97.8k28 gold badges197 silver badges192 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Abhinav Over a year ago

The second array is not a permutation of the first. The first array is fed through a function which performs some mathematical operations on the row elements to get arr2, and as the elements are floats the values may not be exactly the same. Thanks for pointing me in the direction of kdTree. This works!

dmargol1 · Accepted Answer · 2015-07-24 04:12:50Z

0

Before we begin, you should mention whether duplicates can exist in your list.

That said, the method I would use is numpy's where function within a list comprehension like so:

[numpy.where(arr1 == x)[0][0] for x in arr2]

Though this might not be the fastest way. Another method might include building a dictionary from the rows in arr1 somehow and then looking them up with arr2.

answered Jul 24, 2015 at 4:12

dmargol1

4152 silver badges6 bronze badges

1 Comment

Abhinav Over a year ago

No the rows are unique, there are no repetitions. However, I wanted to avoid using the for loop. I know this is a pythonic and very readable piece of code but the size of my arrays makes it very slow.

Community · Accepted Answer · 2017-05-23 11:44:48Z

While this is very similar to: Find indexes of matching rows in two 2-D arrays I don't have the reputation to leave a comment.

However, based on that comment there appear to be two clear possibilities for a large matrix like yours:

def find_rows_searchsorted(a, b):
    dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))

    a_view = np.ascontiguousarray(a).view(dt).ravel()
    b_view = np.ascontiguousarray(b).view(dt).ravel()

    sort_b = np.argsort(b_view)
    where_in_b = np.searchsorted(b_view, a_view, sorter=sort_b)
    return np.take(sort_b, where_in_b)

def find_rows_iterative(a, b):
    answer = np.empty(a.shape[0], dtype=int)
    for idx, row in enumerate(a):
        answer[idx] = np.where(np.equal(b, row).all(1))[0]

    return answer

def find_rows_list_comprehension(a, b):
    return [np.where(b == x)[0][0] for x in a]

However, a little timing with a matrix of 10000 elements shows that the searchsorted based method is significantly faster than the brute force iterative method:

arr1 = np.random.randn(10000, 3)
shuffled_inds = np.arange(arr1.shape[0])
np.random.shuffle(shuffled_inds)
arr2 = arr1[new_inds, :]

np.array_equal(find_rows_searchsorted(arr2, arr1), new_inds)
>> True

np.array_equal(find_rows_iterative(arr2, arr1), new_inds)
>> True

np.array_equal(find_rows_list_comprehension(arr2, arr1), new_inds)
>> True

%timeit find_rows_iterative(arr2, arr1)
>> 1 loops, best of 3: 2.62 s per loop

%timeit find_rows_list_comprehension(arr2, arr1)
>> 1 loops, best of 3: 1.61 s per loop

%timeit find_rows_searchsorted(arr2, arr1)
>> 100 loops, best of 3: 6.53 ms per loop

Based off of HYRY's great responses I also added lexsort and kdball tests as well as a test of argsort for structured arrays.

def find_rows_lexsort(a, b):
    idx1 = np.lexsort(a.T)
    idx2 = np.lexsort(b.T)
    return idx2[np.argsort(idx1)]

def find_rows_argsort(a, b):
    a_rec  = np.core.records.fromarrays(a.transpose())
    b_rec  = np.core.records.fromarrays(b.transpose())
    idx1 = a_rec.argsort(order=a_rec.dtype.names).argsort()
    return b_rec.argsort(order=b_rec.dtype.names)[idx1]

def find_rows_kdball(a, b):
    from scipy import spatial
    tree = spatial.cKDTree(b)
    _, idx = tree.query(a)
    return idx

%timeit find_rows_lexsort(arr2, arr1)
>> 100 loops, best of 3: 4.63 ms per loop

%timeit find_rows_argsort(arr2, arr1)
>> 100 loops, best of 3: 7.37 ms per loop

%timeit find_rows_kdball(arr2, arr1)
>> 100 loops, best of 3: 18.5 ms per loop

Collectives™ on Stack Overflow

Order of elements in a numpy array

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related