NumPy array shape mismatch on masking/assignment command

Question

I'm trying to run a loop where I develop a mask, and then use that mask to assign various values in various rows in one array with specific values from another array. The following script works, but only when there are no duplicate values in column 0 of array y. If there are duplicates, then the mask would have an assignment made to multiple rows in y, then the error throws. Thx for any help.

x = np.zeros(shape=(100,10))
x[:,0] = np.arange(100)

# this seed = 9 produces duplicate values in column 1, which seems cause the problem
# (no issues when there are no duplicate values in column 1 of y)
y = (np.random.default_rng(9).random((10,7))*100).astype(int)

for i in range(x.shape[0]):
    mask = y[:,0] == x[i,0]
    y[mask,[1,3,4,6]] = x[i,[1,2,3,4]]

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Input In [219], in <cell line: 2>()
      2 for i in range(x.shape[0]):
      3     mask = y[:,0] == x[i,0]
----> 4     y[mask,[1,3,4,6]] = x[i,[1,2,3,4]]

IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (0,) (4,)

Ali_Sh · Accepted Answer · 2022-06-17 21:45:38Z

The mask array in your example must have at least one True in each loop, because you are assigning to rows one by one in loops. You can use if condition to be sure mask contains at least one true:

1. First solution: curing the prepared loop

range_ = np.arange(y.shape[0], dtype=np.int64)
for i in range(x.shape[0]):
    mask = y[:, 0] == x[i, 0]
    if np.count_nonzero(mask) != 0:
        true_counts = np.count_nonzero(mask)
        broadcast_x = np.broadcast_to(x[i, [1, 2, 3, 4]], shape=(true_counts, 4))  # 4 is length of [1, 2, 3, 4]
        broadcast_y = np.broadcast_to([1, 3, 4, 6], shape=(true_counts, 4))
        y[range_[mask][:, None], broadcast_y] = broadcast_x

2. Second solution: vectorized way (the best)
Instead using loops, we can firstly find the intersection and then use advanced indexing as:

mask = np.in1d(y[:, 0], x[:, 0])
y[mask, np.array([1, 3, 4, 6])[:, None]] = 0

now, if the x[:, 0] is specified by np.arange, for assigning an array instead of zero, for creating this array, we need to take the related values from x. For doing so, at first, we select the corresponding rows by x[y[:, 0] - x[0, 0]] (in your case it can be just x[y[:, 0] because np.arange start from 0 so x[0, 0] = 0) and then apply the masks to bring out the needed values from specified rows and columns:

mask = np.in1d(y[:, 0], x[:, 0])        # rows mask for y
new_arr = x[y[:, 0] - x[0, 0]][mask, np.array([1, 2, 3, 4])[:, None]]
y[mask, np.array([1, 3, 4, 6])[:, None]] = new_arr

if it get error IndexError: arrays used as indices must be of integer (or boolean) type so we must ensure indices type are integers so we can use some code like (y[:, 0] - x[0, 0]).astype(np.int64) or np.array([1, 2, 3, 4], dtype=np.int64).

The more comprehensive code is to find the common elements' indices between the two arrays when we didn't fill the x[:, 0] by np.arange. So the code will be as:

mask = np.in1d(y[:, 0], x[:, 0])

# finding common indices
unique_values, index = np.unique(x[:, 0], return_index=True)
idx = index[np.searchsorted(unique_values, y[:, 0])]

new_arr = x[idx][mask, np.array([1, 2, 3, 4])[:, None]]
y[mask, np.array([1, 3, 4, 6])[:, None]] = new_arr

3. Third solution: indexing (just for the prepared toy example)
For the prepared example in the question, you can do this easily by advanced indexing instead the loop:

y[:, [1, 3, 4, 6]] = 0

This last code is working on your prepared data because values in y (< 100) involved in x first column (which is from 0 to 99).
or in case of assigning array instead 0:

new_arr = np.array([3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
y[:, [1, 3, 4, 6]] = new_arr[:, None]

I still get an error, this time on your line assigning the value of new_arr: IndexError: arrays used as indices must be of integer (or boolean) type
@EmilyBeth I update the code to consider when you did not use np.arange. Hope it be the true vectorized one. I checked on arbitrary values and get same results.

Collectives™ on Stack Overflow

NumPy array shape mismatch on masking/assignment command

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related