Python: Insert columns into a numpy array based on mask

Question

Suppose I have the following data:

mask = [[0, 1, 1, 0, 1]] # 2D mask
ip_array = [[4, 5, 2],
            [3, 2, 1],
            [1, 8, 6]] # 2D array

I want to insert columns of 0s into ip_array where ever there is 0 in the mask. So the output should be like:

[[0, 4, 5, 0, 2]
 [0, 3, 2, 0, 1]
 [0, 1, 8, 0, 6]]

I am new to numpy functions and I am looking for an efficient way to do this. Any help is appreciated!

what is your n-values? (len(mask) and len(ip_array)) in average? (mini and maxi?) Because using numpy is time consuming at 'setup' so there is a threshold where it is a good idea to use it or not... — Vincent Bénet
– Vincent Bénet, Commented Jan 12, 2022 at 9:31

user7864386 · Accepted Answer · 2022-01-12 09:32:52Z

4

Here's one way to do it in two steps:

(i) Create an array of zeros of the correct shape (the first dimension of ip_array and the second dimension of mask)

(ii) Use the mask across the second dimension (as a boolean mask) and assign the values of ip_array to the array of zeros.

out = np.zeros((ip_array.shape[0], mask.shape[1])).astype(int)
out[..., mask[0].astype(bool)] = ip_array
print(out)

Output:

[[0 4 5 0 2]
 [0 3 2 0 1]
 [0 1 8 0 6]]

answered Jan 12, 2022 at 9:32

user7864386

Sign up to request clarification or add additional context in comments.

Comments

mozway · Accepted Answer · 2022-01-12 10:00:54Z

1

Here is another approach using slicing with a cumsum mask and an extra 0 column in the input. The cumsum mask will have the indices of the ip_array + 1 and 0 whenever to add zeros. The concatenated array has an extra initial columns of zeros so indexing with 0 yields a column of zeros.

m = (mask.cumsum()*mask)[0]
# array([0, 1, 2, 0, 3])

np.c_[np.zeros(ip_array.shape[0]), ip_array][:,m].astype(int)

# array([[0, 4, 5, 0, 2],
#        [0, 3, 2, 0, 1],
#        [0, 1, 8, 0, 6]])

answered Jan 12, 2022 at 10:00

mozway

267k13 gold badges56 silver badges106 bronze badges

1 Comment

Mad Physicist Over a year ago

I think I found an even simpler method, but in 3 lines.

Vincent Bénet · Accepted Answer · 2022-01-12 10:01:25Z

A solution with parameters and other way to do than green checked. So it is more understandable. Juste the last line is important for the operation.

import numpy
import random

n1 = 5
n2 = 5
r = 0.7
random.seed(1)
a = numpy.array([[0 if random.random() > r else 1 for _ in range(n1)]])
n3 = numpy.count_nonzero(a)
b = numpy.array([[random.randint(1,9) for _ in range(n3)] for _ in range(n2)])
c = numpy.zeros((n2, n1))
c[:, numpy.where(a)[1]] = b[:]

Result:

a = array([[1, 0, 0, 1, 1]])
b = array([[8, 8, 7],
       [4, 2, 8],
       [1, 7, 7],
       [1, 8, 5],
       [4, 2, 6]])
c = array([[8., 0., 0., 8., 7.],
       [4., 0., 0., 2., 8.],
       [1., 0., 0., 7., 7.],
       [1., 0., 0., 8., 5.],
       [4., 0., 0., 2., 6.]])

Here your time processing depending on n-values:

Using this code:

import numpy
import random
import time
import matplotlib.pyplot as plt

n1 = 5
n2 = 5
r = 0.7


def main(n1, n2):
    print()
    print(f"{n1 = }")
    print(f"{n2 = }")
    random.seed(1)
    a = numpy.array([[0 if random.random() > r else 1 for _ in range(n1)]])
    n3 = numpy.count_nonzero(a)
    b = numpy.array([[random.randint(1,9) for _ in range(n3)] for _ in range(n2)])
    t0 = time.time()
    c = numpy.zeros((n2, n1))
    c[:, numpy.where(a)[1]] = b[:]
    t = time.time() - t0
    print(f"{t = }")
    return t


t1 = [main(10**i, 10) for i in range(1, 8)]
t2 = [main(10, 10**i) for i in range(1, 8)]

plt.plot(t1, label="n1 time process evolution")
plt.plot(t2, label="n2 time process evolution")

plt.xlabel("n-values (log)")
plt.ylabel("Time processing (s)")
plt.title("Insert columns into a numpy array based on mask")
plt.legend()
plt.show()

Mahantesh · Accepted Answer · 2022-01-12 14:57:04Z

0

mask = np.array([0, 1, 1, 0, 1])
#extract indices of zeros
mask_pos = (list(np.where(mask == 0)[0]))
ip_array =np.array([[4, 5, 2],
        [3, 2, 1],
        [1, 8, 6]])

#insert 0 at respextive mask position
for i in mask_pos:
    ip_array = np.insert(ip_array,i,0,axis=1)

print(ip_array)

answered Jan 12, 2022 at 14:57

Mahantesh

1

Comments

Mad Physicist · Accepted Answer · 2023-05-26 00:37:37Z

Arguably the simplest solution is to use np.insert to create the new columns for you:

idx = np.flatnonzero(~np.array(mask[0], bool))
idx -= np.arange(len(idx))
np.insert(ip_array, idx, 0, axis=1)

Subtracting np.arange(len(idx)) from idx is necessary because the array you are inserting into does not have the new columns yet, so the indices in the old array are reduced by the number of preceding inserted columns.

In two lines:

idx = np.flatnonzero(~np.array(mask[0], bool))
np.insert(ip_array, idx - np.arange(len(idx)), 0, axis=1)

One-liner using the walrus operator (python 3.8+):

np.insert(ip_array, (idx := np.flatnonzero(~np.array(mask[0], bool))) - np.arange(len(idx)), 0, axis=1)

And one without, but with more redundancy:

np.insert(ip_array, np.flatnonzero(~np.array(mask[0], bool)) - np.arange(len(mask[0]) - np.count_nonzero(mask[0])), 0, axis=1)

Collectives™ on Stack Overflow

Python: Insert columns into a numpy array based on mask

5 Answers 5

Comments

1 Comment

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related