1

I would like to initialize an array filled with 0, and a 1 in a specific location. I know how to do it with two lines of code like this:

import numpy as np
shape = (2,3)
location = (0,1)
arr = np.zeros(shape)
arr[location] = 1

Does there exist a faster way to do it, maybe with a oneliner ?

1 Answer 1

1

Introducing sparsity

Even though you probably won't find any way to make such simple initialization faster, you probably want to use sparse matrices for larger matrices and/or matrices with more than one data point.

Several types of sparse matrices exist and their purpose is to make some calculations faster (but some other slower, specifically to each type) and their memory-usage more efficient, as long as your matrices are made up many zeros.

For your specific case, it will be slower though :

import numpy as np
from scipy import sparse
shape = (2, 3)
location = (0, 1)

# With dense matrix
arr = np.zeros(shape)
arr[location] = 1
# timeit > 280 ns ± 4.16 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# With coordinate matrix
sparr = sparse.coo_matrix(([1], ([location[0]], [location[1]])), shape=shape)
# timeit > 20.5 µs ± 156 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Sparsity: a typical case with matrices

But imagine that you have a very large matrix, with ones dispatched in some locations, you could simply do this:

shape = (2000, 3000)
n_points = 500
loc_y = np.random.randint(shape[0], size=(n_points))
loc_x = np.random.randint(shape[1], size=(n_points))
data = np.ones(np_points)

sp_arr = sparse.coo_matrix((data, (loc_y, loc_x)), shape=shape)
# timeit > 17.8 µs ± 90.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Instead of this:

np_arr = np.zeros(shape)
for d, x, y in zip(data, loc_x, loc_y):
    np_arr[y, x] = d
# timeit > 613 µs ± 8.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

... or something else like this:

np_loc = [y * shape[1] + x for x, y in zip(loc_x, loc_y)]  # not in the timeit

np_arr2 = np.zeros(shape)
np.put(np_arr2, np_loc, data)
# timeit > 497 µs ± 13.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

((sp_arr.toarray() == np_arr) & (np_arr == np_arr2)).all()
# > True

Limitations

Notice that you might have to cast your sparse matrix into a classic numpy (dense) ndarray in many cases, and that takes computation time. The use of sparse matrices may be optimal in cases where you have to do several such manipulations without the need to cast them to dense arrays inbetween steps ; for instance, handling the onehot encoded representation of a dataset with many features.

Having fun

As a final note, it is quite fun to check how each sparse matrix type stores its data and when a certain type is more memory-efficient than another. You can even try to infer the critical number of data points for which they become detrimental compared to their dense version.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.