2

I have numpy 2d array having duplicate values.

I am searching the array like this.

In [104]: import numpy as np

In [105]: array = np.array

In [106]: a = array([[1, 2, 3],
     ...:            [1, 2, 3],
     ...:            [2, 5, 6],
     ...:            [3, 8, 9],
     ...:            [4, 8, 9],
     ...:            [4, 2, 3],
     ...:            [5, 2, 3])

In [107]: num_list = [1, 4, 5]

In [108]: for i in num_list :
     ...:     print(a[np.where(a[:,0] == num_list)])
     ...:
 [[1 2 3]
 [1 2 3]]
[[4 8 9]
 [4 2 3]]
[[5 2 3]]

The input is list having number similar to column 0 values. The end result I want is the resulting rows in any format like array, list or tuple for example

array([[1, 2, 3],
       [1, 2, 3],
       [4, 8, 9],
       [4, 2, 3],
       [5, 2, 3]])

My code works fine but doesn't seem pythonic. Is there any better searching strategy with multiple values?

like a[np.where(a[:,0] == l)] where only one time lookup is done to get all the values.

my real array is large

1
  • sorry for long explanation. I posted similar question on code review but I failed to explain it correctly. Commented Jul 5, 2017 at 8:50

2 Answers 2

6

Approach #1 : Using np.in1d -

a[np.in1d(a[:,0], num_list)]

Approach #2 : Using np.searchsorted -

num_arr = np.sort(num_list) # Sort num_list and get as array

# Get indices of occurrences of first column in num_list
idx = np.searchsorted(num_arr, a[:,0])

# Take care of out of bounds cases
idx[idx==len(num_arr)] = 0 

out = a[a[:,0] == num_arr[idx]]
Sign up to request clarification or add additional context in comments.

8 Comments

Sorry for changing the varialble l to num_list.
which one is in your opinion better? I am evaluating though for my case
@Scripting.FileSystemObject If you have a large number elements in num_list, I would say searchsorted is worth a try.
0.60s v 0.32s for in1d v searchsorted
Approach #2 is useless for my case. I don't know but it doesn't give all the search.
|
1

You can do

a[numpy.in1d(a[:, 0], num_list), :]

3 Comments

I am new to numpy syntax. What is last : is for?
It means "take everything". So a[0, :] is first row, a[:, 0] is first column etc.
Ok. I got confused as Diwakar has ommited : in his answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.