3

I have written this piece of code:

data = np.array([[3,6], [5,9], [4, 8]])

orig_x, orig_y = np.split(data, 2, axis=1)

x = np.array([3, 4])
y = np.zeros((len(x)))

for i in range(len(x)):
    y[i] = orig_y[np.where(orig_x == x[i])[0]]

So basically, I have a 2D NumPy array. I split it into two 1D arrays orig_x and orig_y, one storing values of the x-axis and the other values of the y-axis.

I also have another 1D NumPy array, which has some of the values that exist in the orig_x array. I want to find the y-axis values for each value in the x array. I created this method, using a simple loop, but it is extremely slow since I'm using it with thousands of values.

Do you have a better idea? Maybe by using a NumPy function?

Note: Also a better title for this question can be made. Sorry :(

3 Answers 3

4

You could create a mask over which values you want from the x column and then use this mask to select values from the y column.

data = np.array([[3,6], [5,9], [4, 8]])

# the values you want to lookup on the x-axis
x = np.array([3, 4])

mask = np.isin(data[:,0], x)
data[mask,1]

Output:

array([6, 8])

The key function here is to use np.isin. What this is basically doing is broadcasting x or data to the appropriate shape and doing an element-wise comparison:

mask = data[:,0,None] == x
y_mask = np.logical_or.reduce(mask, axis=1)
data[y_mask, 1]

Output:

array([6, 8])
Sign up to request clarification or add additional context in comments.

2 Comments

This is a very fast solution, but I think it returns the values out of order (not following the order of the elements of x). It's not clear from the OP if the order matters.
@Seb the mask is ordered on the elements of x which is the same order as elements in y.
2

I'm not 100% sure I understood the problem correctly, but I think the following should work:

>>> rows, cols = np.where(orig_x == x)
>>> y = orig_y[rows[np.argsort(cols)]].ravel()
>>> y
array([6, 8])

It assumes that all the values in orig_x are unique, but since your code example has the same restriction, I considered it a given.

3 Comments

Yes, I wanted something like this exactly, and the values in orig_x are unique, but it didn't seem to make it any faster. Maybe there is no solution to optimize it.
I'd like to do some timing tests. How long are your data and x arrays?
The data I'm testing right now has about 500K values and the x array has about 10K. However, the data might vary from 100K to 10M. Also, I have two X arrays. X1 and X2 for each data array, so this makes it slower too.
0

What about a lookup table?

import numpy as np
data = np.array([[3,6], [5,9], [4, 8]])

orig_x, orig_y = np.split(data, 2, axis=1)

x = np.array([3, 4])
y = np.zeros((len(x)))

You can pack a dict for lookup:

lookup = {i: j for i, j in zip(orig_x.ravel(), orig_y.ravel())}

And just map this into a new array:

np.fromiter(map(lambda i: lookup.get(i, np.nan), x), dtype=int, count=len(x))
array([6, 8])

If orig_x & orig_y are your smaller data structures this will probably be most efficient.

EDIT - It's occurred to me that if your values are integers the default np.nan won't work and you should figure out what value makes sense for your application if you're trying to find a value that isn't in your orig_x array.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.