3

Suppose I have the following data:

mask = [[0, 1, 1, 0, 1]] # 2D mask
ip_array = [[4, 5, 2],
            [3, 2, 1],
            [1, 8, 6]] # 2D array

I want to insert columns of 0s into ip_array where ever there is 0 in the mask. So the output should be like:

[[0, 4, 5, 0, 2]
 [0, 3, 2, 0, 1]
 [0, 1, 8, 0, 6]]

I am new to numpy functions and I am looking for an efficient way to do this. Any help is appreciated!

4
  • Efficient how? no for python loop? (only numpy use?) Commented Jan 12, 2022 at 9:29
  • I am trying to do this using numpy Commented Jan 12, 2022 at 9:30
  • what is your n-values? (len(mask) and len(ip_array)) in average? (mini and maxi?) Because using numpy is time consuming at 'setup' so there is a threshold where it is a good idea to use it or not... Commented Jan 12, 2022 at 9:31
  • Length of the mask and ndarray can vary from time to time Commented Jan 12, 2022 at 9:32

5 Answers 5

4

Here's one way to do it in two steps:

(i) Create an array of zeros of the correct shape (the first dimension of ip_array and the second dimension of mask)

(ii) Use the mask across the second dimension (as a boolean mask) and assign the values of ip_array to the array of zeros.

out = np.zeros((ip_array.shape[0], mask.shape[1])).astype(int)
out[..., mask[0].astype(bool)] = ip_array
print(out)

Output:

[[0 4 5 0 2]
 [0 3 2 0 1]
 [0 1 8 0 6]]
Sign up to request clarification or add additional context in comments.

Comments

1

Here is another approach using slicing with a cumsum mask and an extra 0 column in the input. The cumsum mask will have the indices of the ip_array + 1 and 0 whenever to add zeros. The concatenated array has an extra initial columns of zeros so indexing with 0 yields a column of zeros.

m = (mask.cumsum()*mask)[0]
# array([0, 1, 2, 0, 3])

np.c_[np.zeros(ip_array.shape[0]), ip_array][:,m].astype(int)

# array([[0, 4, 5, 0, 2],
#        [0, 3, 2, 0, 1],
#        [0, 1, 8, 0, 6]])

1 Comment

I think I found an even simpler method, but in 3 lines.
0

A solution with parameters and other way to do than green checked. So it is more understandable. Juste the last line is important for the operation.

import numpy
import random

n1 = 5
n2 = 5
r = 0.7
random.seed(1)
a = numpy.array([[0 if random.random() > r else 1 for _ in range(n1)]])
n3 = numpy.count_nonzero(a)
b = numpy.array([[random.randint(1,9) for _ in range(n3)] for _ in range(n2)])
c = numpy.zeros((n2, n1))
c[:, numpy.where(a)[1]] = b[:]

Result:

a = array([[1, 0, 0, 1, 1]])
b = array([[8, 8, 7],
       [4, 2, 8],
       [1, 7, 7],
       [1, 8, 5],
       [4, 2, 6]])
c = array([[8., 0., 0., 8., 7.],
       [4., 0., 0., 2., 8.],
       [1., 0., 0., 7., 7.],
       [1., 0., 0., 8., 5.],
       [4., 0., 0., 2., 6.]])

Here your time processing depending on n-values:

enter image description here

Using this code:

import numpy
import random
import time
import matplotlib.pyplot as plt

n1 = 5
n2 = 5
r = 0.7


def main(n1, n2):
    print()
    print(f"{n1 = }")
    print(f"{n2 = }")
    random.seed(1)
    a = numpy.array([[0 if random.random() > r else 1 for _ in range(n1)]])
    n3 = numpy.count_nonzero(a)
    b = numpy.array([[random.randint(1,9) for _ in range(n3)] for _ in range(n2)])
    t0 = time.time()
    c = numpy.zeros((n2, n1))
    c[:, numpy.where(a)[1]] = b[:]
    t = time.time() - t0
    print(f"{t = }")
    return t


t1 = [main(10**i, 10) for i in range(1, 8)]
t2 = [main(10, 10**i) for i in range(1, 8)]

plt.plot(t1, label="n1 time process evolution")
plt.plot(t2, label="n2 time process evolution")

plt.xlabel("n-values (log)")
plt.ylabel("Time processing (s)")
plt.title("Insert columns into a numpy array based on mask")
plt.legend()
plt.show()

Comments

0
mask = np.array([0, 1, 1, 0, 1])
#extract indices of zeros
mask_pos = (list(np.where(mask == 0)[0]))
ip_array =np.array([[4, 5, 2],
        [3, 2, 1],
        [1, 8, 6]])

#insert 0 at respextive mask position
for i in mask_pos:
    ip_array = np.insert(ip_array,i,0,axis=1)

print(ip_array)

Comments

0

Arguably the simplest solution is to use np.insert to create the new columns for you:

idx = np.flatnonzero(~np.array(mask[0], bool))
idx -= np.arange(len(idx))
np.insert(ip_array, idx, 0, axis=1)

Subtracting np.arange(len(idx)) from idx is necessary because the array you are inserting into does not have the new columns yet, so the indices in the old array are reduced by the number of preceding inserted columns.

In two lines:

idx = np.flatnonzero(~np.array(mask[0], bool))
np.insert(ip_array, idx - np.arange(len(idx)), 0, axis=1)

One-liner using the walrus operator (python 3.8+):

np.insert(ip_array, (idx := np.flatnonzero(~np.array(mask[0], bool))) - np.arange(len(idx)), 0, axis=1)

And one without, but with more redundancy:

np.insert(ip_array, np.flatnonzero(~np.array(mask[0], bool)) - np.arange(len(mask[0]) - np.count_nonzero(mask[0])), 0, axis=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.