1

I have a 2D numpy array of values, a list of x-coordinates, and a list of y-coordinates. the x-coordinates increase left-to-right and the y-coordinates increase top-to-bottom.

For example:

a = np.random.random((3, 3))
a[0][1] = 9.0
a[0][2] = 9.0
a[1][1] = 9.0
a[1][2] = 9.0
xs = list(range(1112, 1115))
ys = list(range(1109, 1112))

Output:

[[0.48148651 9.         9.        ]
 [0.09030393 9.         9.        ]
 [0.79271224 0.83413552 0.29724989]]

[1112, 1113, 1114]

[1109, 1110, 1111]

I want to remove the values from the 2D array that are greater than 1. I also want to combine the lists xs and ys to get a list of all the coordinate pairs for points that are kept.

In this example I want to remove a[0][1], a[0][2], a[1][1], a[1][2] and I want the list of coordinate pairs to be

[[1112, 1109], [1112,1110], [1112, 1111], [1113, 1111], [1114, 1111]]

I have been able to accomplish this using a double for loop and if statements:

a_values = []
point_pairs = []
for i in range(0, a.shape[0]):
    for j in range(0, a.shape[1]):
        if (a[i][j] < 1):
            a_values.append(a[i][j])
            point_pairs.append([xs[j], ys[i]])
print(a_values)
print(point_pairs)

Output:

[0.48148650831317796, 0.09030392566133771, 0.7927122386213029, 0.8341355206494774, 0.2972498933037804]
[[1112, 1109], [1112, 1110], [1112, 1111], [1113, 1111], [1114, 1111]]

What is a more efficient way of doing this?

1 Answer 1

1

You can use np.nonzero to get the indices of the elements you removed:

mask = a < 1
i, j = np.nonzero(mask)

The fancy indices i and j can be used to get the elements of xs and ys directly if they are numpy arrays:

xs = np.array(xs)
ys = np.array(ys)
point_pairs = np.stack((xs[j], ys[i]), axis=-1)

You can also use np.take to make the conversion happen under the hood:

point_pairs = np.stack((np.take(xs, j), np.take(ys, i)), axis=-1)

The remaining elements of a are those not covered by the mask:

a_points = a[mask]

Alternatively:

i, j = np.nonzero(a < 1)
point_pairs = np.stack((np.take(xs, j), np.take(ys, i)), axis=-1)
a_points = a[i, j]

In this context, you can use np.where as a drop-in alias for np.nonzero.

Notes

  • If you are using numpy, there is rarely a need for lists. Putting xs = np.array(xs), or even just initializing it as xs = np.arange(1112, 1115) is faster and easier.

  • Numpy arrays should generally be indexed through a single index: a[0, 1], not a[0][1]. For your simple case, the behavior just happens to be the same, but it will not be in the general case. a[0, 1] is an index into the original array. a[0] is a view of the first row of the array, i.e., a separate array object. a[0][1] is an index into that new object. You just happened to get lucky that you are getting a view that shares the base memory, so the assignment is visible in a itself. This would not be the case if you tried a mask or fancy index, for example.

  • On a related note, setting a rectangular swath in an array only requires one line: a[1:, :-1] = 9.

I would write your example something like this:

a = np.random.random((3, 3))
a[1:, :-1] = 9.0
xs = np.arange(1112, 1115)
ys = np.arange(1109, 1112)

i, j = np.nonzero(a < 1)
point_pairs = np.stack((xs[j], ys[i]), axis=-1)
a_points = a[i, j]
Sign up to request clarification or add additional context in comments.

3 Comments

thank you for this answer! the np.mask part is working and is much much faster on larger instances. however the np.stack line is giving me TypeError: only integer scalar arrays can be converted to a scalar index
@notmenotme. Right. That's because xs and ys are not numpy arrays. My fault. Fixed
@notmenotme. I've added some notes at the end to help improve your code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.