The steps before, during, and after this problem statement are some mix of under-specified and undesirable. Avoid a list representation entirely. Use row-packed form, and if you are memory-constrained and can abide a 0 packing value, then you can use CSR (which will have the second dimension correct, and the first dimension as an implicit product).
import numpy as np
import scipy.sparse
n_rows = 3
n_cols = np.array((4, 9, 1))
n_max_cols = n_cols.max()
n_total_cols = n_cols.sum()
rand = np.random.default_rng(seed=0)
packed = rand.random(size=(n_rows, n_total_cols), dtype=np.float32)
print('If you can use this packed form directly (there are many operations that can), then STOP HERE.')
np.set_printoptions(precision=7)
print(packed)
print()
print(
"You could use this form if 0 is an acceptable padding value "
"(you haven't responded to specify)."
)
sparse = scipy.sparse.lil_array((n_rows*n_cols.size, n_max_cols))
x = 0
y = 0
# There are vectorised options to construct this as well; this form is easy to understand.
for width in n_cols:
xnew = x + width
ynew = y + n_rows
sparse[y: ynew, 0: width] = packed[:, x: xnew]
x = xnew
y = ynew
csr = sparse.tocsr()
print(csr.toarray())
If you can use this packed form directly (there are many operations that can), then STOP HERE.
[[0.85 0.64 0.51 0.27 0.31 0.04 0.08 0.02 0.18 0.81 0.65 0.91 0.5 0.61]
[0.97 0.73 0.63 0.54 0.56 0.94 0.28 0.82 0.67 0. 0.39 0.86 0.55 0.03]
[0.76 0.73 0.85 0.18 0.09 0.86 0.02 0.54 0.08 0.3 0.48 0.42 0.4 0.03]]
You could use this form if 0 is an acceptable padding value (you haven't responded to specify).
[[0.85 0.64 0.51 0.27 0. 0. 0. 0. 0. ]
[0.97 0.73 0.63 0.54 0. 0. 0. 0. 0. ]
[0.76 0.73 0.85 0.18 0. 0. 0. 0. 0. ]
[0.31 0.04 0.08 0.02 0.18 0.81 0.65 0.91 0.5 ]
[0.56 0.94 0.28 0.82 0.67 0. 0.39 0.86 0.55]
[0.09 0.86 0.02 0.54 0.08 0.3 0.48 0.42 0.4 ]
[0.61 0. 0. 0. 0. 0. 0. 0. 0. ]
[0.03 0. 0. 0. 0. 0. 0. 0. 0. ]
[0.03 0. 0. 0. 0. 0. 0. 0. 0. ]]
np.fullbecause of page faults so you need to reuse the output array if possible. Then, a smaller part (~15%) is lost in Numpy overheads so native code (or Numba/Cython) can make this a bit faster. Finally, a non-negligible time is taken by memory accesses that cannot be avoided here unless you do not actually create this (certainly unnecessary) expensive array, but directly merge this computation with the next ones using this array. If you expect a speed up like >5 time, then the later is certainly mandatory.np.hstack(list_with_irregular_arrays)to perform your later computations?