Initialize an array with a value in a specific location

Question

I would like to initialize an array filled with 0, and a 1 in a specific location. I know how to do it with two lines of code like this:

import numpy as np
shape = (2,3)
location = (0,1)
arr = np.zeros(shape)
arr[location] = 1

Does there exist a faster way to do it, maybe with a oneliner ?

Whole Brain · Accepted Answer · 2021-06-07 16:51:11Z

Introducing sparsity

Even though you probably won't find any way to make such simple initialization faster, you probably want to use sparse matrices for larger matrices and/or matrices with more than one data point.

Several types of sparse matrices exist and their purpose is to make some calculations faster (but some other slower, specifically to each type) and their memory-usage more efficient, as long as your matrices are made up many zeros.

For your specific case, it will be slower though :

import numpy as np
from scipy import sparse
shape = (2, 3)
location = (0, 1)

# With dense matrix
arr = np.zeros(shape)
arr[location] = 1
# timeit > 280 ns ± 4.16 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# With coordinate matrix
sparr = sparse.coo_matrix(([1], ([location[0]], [location[1]])), shape=shape)
# timeit > 20.5 µs ± 156 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Sparsity: a typical case with matrices

But imagine that you have a very large matrix, with ones dispatched in some locations, you could simply do this:

shape = (2000, 3000)
n_points = 500
loc_y = np.random.randint(shape[0], size=(n_points))
loc_x = np.random.randint(shape[1], size=(n_points))
data = np.ones(np_points)

sp_arr = sparse.coo_matrix((data, (loc_y, loc_x)), shape=shape)
# timeit > 17.8 µs ± 90.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Instead of this:

np_arr = np.zeros(shape)
for d, x, y in zip(data, loc_x, loc_y):
    np_arr[y, x] = d
# timeit > 613 µs ± 8.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

... or something else like this:

np_loc = [y * shape[1] + x for x, y in zip(loc_x, loc_y)]  # not in the timeit

np_arr2 = np.zeros(shape)
np.put(np_arr2, np_loc, data)
# timeit > 497 µs ± 13.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

((sp_arr.toarray() == np_arr) & (np_arr == np_arr2)).all()
# > True

Limitations

Notice that you might have to cast your sparse matrix into a classic numpy (dense) ndarray in many cases, and that takes computation time. The use of sparse matrices may be optimal in cases where you have to do several such manipulations without the need to cast them to dense arrays inbetween steps ; for instance, handling the onehot encoded representation of a dataset with many features.

Having fun

As a final note, it is quite fun to check how each sparse matrix type stores its data and when a certain type is more memory-efficient than another. You can even try to infer the critical number of data points for which they become detrimental compared to their dense version.

Collectives™ on Stack Overflow

Initialize an array with a value in a specific location

1 Answer 1

Introducing sparsity

Sparsity: a typical case with matrices

Limitations

Having fun

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Introducing sparsity

Sparsity: a typical case with matrices

Limitations

Having fun

Comments

Your Answer

Sign up or log in

Post as a guest

Related