best way to store numbers in a multidimensional (sparse) array in python

Question

What is the best container object for a calculation in N dimensions, when the problem is symmetric so that only some numbers need to be calculated?

Concretely, for N=4 I have:

M=50
results = np.zeros((M,M,M,M))

for ii in range(M):
   for jj in range(ii,M):
       for kk in range(jj,M):
          for ll in range(kk, M):
              res=1 #really some calculation
              results[ii,jj,kk,ll] = res

Many elements in this array are completely redundant and aren't even accessed. This is even more true for higher N (I'd like to go up to N=10 or ideally N=15).

Is it better to use lists and append in each step for such a problem, or a dictionary, or sparse matrices? I tried a sparse matrix, but it keeps warning me that I shouldn't frequently change elements in a sparse matrix, so presumably this is not a good idea.

The only functionality that I'd need to retain is finding maxima (ideally along each dimension).

Any insights would be appreciated!

You could use a dictionary with tuples of indices as keys. That's easy to implement. Doing calculations is a different matter. — hpaulj
– hpaulj, Commented Aug 17, 2020 at 22:03
Thanks for your comments. Ehsan - My bottleneck is space, and time in the sense that the calculation for each element is fast, and so I don't want the writing to the element/array extension to take up lots of time. @hpaulj - thanks for suggesting this. Do you think it's better than tensorflow? — mzzx
– mzzx, Commented Aug 18, 2020 at 3:36
I'm most familiar with the scipy.sparse package. That's limited to 2d. Its roots are in linear algebra, and ideas mathematicians developed years ago to deal with finite difference and finite element problems. So it does things like matrix multiplication well (if sparsity is on the order of 5% or less). sklearn has added some of its own utilities. I haven't paid attention to what tensorflow does. — hpaulj
– hpaulj, Commented Aug 18, 2020 at 4:28
Ok, nice, thanks for the info @hpaulj! I didn't know that sklearn had done something to improve the matrix multiplication ideas from scipy.sparse, that's a great pointer for something to look into! — mzzx
– mzzx, Commented Aug 18, 2020 at 17:07

Marat · Accepted Answer · 2020-08-17 20:14:25Z

2

The "density" of the matrix will by 1 / D**2, where D is the number of dimensions - so you can see that the payoff in space is exponential, while the performance penalty comparing to lists or dense matrices is constant.

So, when the number of dimensions is high, sparse matrices will provide HUGE advantage in space used, and they're still faster than just lists. If the number of dimensions is small, dense matrices will be slightly bigger but also only slightly faster (slightly here: few times faster, but since the total execution time is small, the absolute difference is still small).

Overall, unless the number of dimensions is fixed, it makes more sense to stick with sparse matrices. However, if D is fixed, it's better to just benchmark for this specific case.

answered Aug 17, 2020 at 20:14

Marat

15.9k3 gold badges44 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

mzzx Over a year ago

Just to clarify, so you'd suggest reformatting so that the array becomes a 2D thing? Because to my knowledge I can't get multidimensional sparse matrices, correct?

mzzx Over a year ago

And is it best to use the lil_matrix object from scipy. sparse?

Marat Over a year ago

@mzzx sorry, I wasn't clear on this. I meant arbitrary dimensions sparse matrices, e.g. tensorflow SparseTensor (preferred) or those provided by sparse. scipy's sparse 2D matrices won't be enough here

mzzx Over a year ago

Oh ok! Thanks so much!

Collectives™ on Stack Overflow

best way to store numbers in a multidimensional (sparse) array in python

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related