Array comparison of different length arrays in numpy

Question

Hi everyone and please excuse my limited programming knoweledge. I have two arrays like:

A =([[ 0.10111977,  0.5511177 ,  0.49532397,  0.42136468, 0.43345532],
     [ 0.3812068 ,  0.97679566,  0.20473656,  0.40256096, 0.32423426],
     [ 0.2387294 ,  0.88714084,  0.01064819,  0.48275173, 0.78234234]])

B = ([[ 0.10111977,  0.5511177 ,  0.49532397],
      [ 0.2387294 ,  0.88714084,  0.01064819]])

(they actually have many thousands of lines but just to demonstrate the problem). I'd like to compare the two in order to find which of the lines in B are also present in A in order to copy the relevant row into a new array that would look like:

C =([[ 0.10111977,  0.5511177 ,  0.49532397,  0.42136468, 0.43345532],
     [ 0.2387294 ,  0.88714084,  0.01064819,  0.48275173, 0.78234234]])

The easy (brute force) solution I tried is to do something like:

for rowB in B:
    for rowA in A:
        if A[rowA,0]==B[rowB,0] and A[rowA,1]==B[rowB,1] and A[rowA,2]==B[rowB,2]:
            C.extend(row)
            continue

now this will work but as I said my datasets are huge and it takes for ever. Is there an easier\faster way to do this? I have thought of interpolation but I don't see how it can be done with those data.

I want to say something like change your if condition to: cmp(rowB, rowA[:3]) == 0 -- that'll make it easier to read but don't know if any faster. Your problem is that you go through the entirety of A for each row of B and I don't think there's a good shortcut out of that. — TZHX
– TZHX, Commented Feb 5, 2015 at 13:26
Is your B array constructed from A -- say, by selecting from it, or by both A and B being selected from a parent object -- or is it constructed independently? If it's constructed in a different way, we might have to be tolerant of some floating point error, which rules out some otherwise convenient approaches. — DSM
– DSM, Commented Feb 5, 2015 at 13:33
@DSM they are constructed from different objects unfortunately....but thanks for your comment! — wormhole spacetime
– wormhole spacetime, Commented Feb 5, 2015 at 13:43

adds68 · Accepted Answer · 2015-02-05 13:41:54Z

1

You can use set logic:

SetA & setB will return all of the items in A that are in B only:

a = set(list1)
b = set(list2)
c = a & b

c will now contain matches!

Edit, as i did not see the numpy reference, if you search the docs you can find the method that you are looking for:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.intersect1d.html#numpy.intersect1d

edited Feb 5, 2015 at 13:41

answered Feb 5, 2015 at 13:36

adds68

3012 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

wormhole spacetime Over a year ago

Thanks adds68!! I will try that as well

wormhole spacetime Over a year ago

hmmm...I think this doesn't really help as I cannot find where the rows are located in order to select the elements I want in the new array

adds68 Over a year ago

You can just store the return values of this function into another variable? c = np.insersect1d(a,b)

pv. · Accepted Answer · 2015-02-05 13:35:19Z

0

This is a version with better time complexity [O(n) on average according to https://wiki.python.org/moin/TimeComplexity ]:

import numpy as np

def common_rows(A, B):
    items = set(tuple(row) for row in B)
    return np.array([row for row in A if tuple(row[:3]) in items])

n = 10000
A = np.random.rand(n, 5)
B = np.random.rand(n, 3)

# Make some common rows
B[123,:] = A[5775,:3]
B[1443,:] = A[85,:3]

print("-- Expected:")
print(B[123])
print(B[1443])
print("-- Got:")
print(common_rows(A, B))

Numpy doesn't have a set data structure, so we convert here each row to Python object. This is somewhat inefficient, but should be faster for large n.

edited Feb 5, 2015 at 13:35

answered Feb 5, 2015 at 13:29

pv.

35.4k9 gold badges62 silver badges51 bronze badges

2 Comments

wormhole spacetime Over a year ago

thanks pv. but I think this will only work for arrays of the same shape. For my case I get an empty array back.

pv. Over a year ago

@wormholespacetime: works also for different shapes with minor modification: see update

Collectives™ on Stack Overflow

Array comparison of different length arrays in numpy

2 Answers 2

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related