1

I'm writing a code in Python and I'm having a few problems. I have two arrays, let's say A and B, both of them containing IDs. A has all IDs, and B has IDs belonging to a group. What I'm trying to do is to get the positions of the elements of B in A using the code:

>>> print B
[11600813 11600877 11600941 ..., 13432165 13432229 13434277]
>>> mask=np.nonzero(np.in1d(A, B))
>>> print A[mask]
[12966245 12993389 12665837 ..., 13091877 12965029 13091813]

But this is clearly wrong, since I'm not recovering the values of B. Checking if I was using numpy.in1d() correctly, I tried:

>>> mask=np.nonzero(np.in1d(A, B[0]))
>>> print A[mask]
[11600813]

which is right, so I'm guessing there is a problem with 'B' in numpy.in1d(). I tried using the boolean np.in1d(A, B) directly instead of converting it to indices but it didn't work. I also tried using B = numpy.array(B), B = list(B), and none of them worked.

But if I do B = numpy.array(B)[0], B = list(B)[0] it still works for that element. Unfortunately I can't do a 'for' cycle for each element because len(A) is 16777216 and len(B) is 9166 so it takes a lot of time.

I also made sure that all elements of B are in A:

>>> np.intersect1d(A, B)
[11600813 11600877 11600941 ..., 13432165 13432229 13434277]
4
  • The output you show is only wrong if A is sorted in the same way as B. Is it? If not, you'll get the values from B, but in the order given by A. Since the output you show is truncated, it's quite possible that all the values in B appear in A[mask], but in a different order. Commented Mar 20, 2013 at 3:34
  • Are IDs in A unique? If then, A[mask] has the same IDs as B, but in diffrent order. Commented Mar 20, 2013 at 3:35
  • A[mask] and B have no elements in common, I checked it using np.intersect1d(), which means that they are sorted in the same way. Commented Mar 20, 2013 at 4:32
  • Although you have a working solution, the fact that A[mask] and B have no elements in common suggests that there's something more to this problem than you've stated in your question. I can't reproduce that behavior at all; for all data I try, len(np.intersect1d(A[np.in1d(A, B)], B)) == len(B). Commented Mar 20, 2013 at 12:03

1 Answer 1

2

You can use numpy.argsort, numpy.searchsorted to get the positions:

import numpy as np
A = np.unique(np.random.randint(0, 100, 100))
B = np.random.choice(A, 10)

idxA = np.argsort(A)
sortedA = A[idxA]
idxB = np.searchsorted(sortedA, B)
pos = idxA[idxB]
print A[pos]
print B

If you want faster method, consider using pandas.

import pandas as pd
s = pd.Index(A)
pos = s.get_indexer(B)
print A[pos]
print B
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! That worked in a decent amount of time. Though I still don't know what was the error when using numpy.in1d()
numpy.in1d() doen't work because it lost the value order in B.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.