0

What is the best container object for a calculation in N dimensions, when the problem is symmetric so that only some numbers need to be calculated?

Concretely, for N=4 I have:

M=50
results = np.zeros((M,M,M,M))

for ii in range(M):
   for jj in range(ii,M):
       for kk in range(jj,M):
          for ll in range(kk, M):
              res=1 #really some calculation
              results[ii,jj,kk,ll] = res

Many elements in this array are completely redundant and aren't even accessed. This is even more true for higher N (I'd like to go up to N=10 or ideally N=15).

Is it better to use lists and append in each step for such a problem, or a dictionary, or sparse matrices? I tried a sparse matrix, but it keeps warning me that I shouldn't frequently change elements in a sparse matrix, so presumably this is not a good idea.

The only functionality that I'd need to retain is finding maxima (ideally along each dimension).

Any insights would be appreciated!

5
  • 1
    What is your bottleneck? calculation or space? Commented Aug 17, 2020 at 21:56
  • 1
    You could use a dictionary with tuples of indices as keys. That's easy to implement. Doing calculations is a different matter. Commented Aug 17, 2020 at 22:03
  • Thanks for your comments. Ehsan - My bottleneck is space, and time in the sense that the calculation for each element is fast, and so I don't want the writing to the element/array extension to take up lots of time. @hpaulj - thanks for suggesting this. Do you think it's better than tensorflow? Commented Aug 18, 2020 at 3:36
  • 1
    I'm most familiar with the scipy.sparse package. That's limited to 2d. Its roots are in linear algebra, and ideas mathematicians developed years ago to deal with finite difference and finite element problems. So it does things like matrix multiplication well (if sparsity is on the order of 5% or less). sklearn has added some of its own utilities. I haven't paid attention to what tensorflow does. Commented Aug 18, 2020 at 4:28
  • Ok, nice, thanks for the info @hpaulj! I didn't know that sklearn had done something to improve the matrix multiplication ideas from scipy.sparse, that's a great pointer for something to look into! Commented Aug 18, 2020 at 17:07

1 Answer 1

2

The "density" of the matrix will by 1 / D**2, where D is the number of dimensions - so you can see that the payoff in space is exponential, while the performance penalty comparing to lists or dense matrices is constant.

So, when the number of dimensions is high, sparse matrices will provide HUGE advantage in space used, and they're still faster than just lists. If the number of dimensions is small, dense matrices will be slightly bigger but also only slightly faster (slightly here: few times faster, but since the total execution time is small, the absolute difference is still small).

Overall, unless the number of dimensions is fixed, it makes more sense to stick with sparse matrices. However, if D is fixed, it's better to just benchmark for this specific case.

Sign up to request clarification or add additional context in comments.

4 Comments

Just to clarify, so you'd suggest reformatting so that the array becomes a 2D thing? Because to my knowledge I can't get multidimensional sparse matrices, correct?
And is it best to use the lil_matrix object from scipy. sparse?
@mzzx sorry, I wasn't clear on this. I meant arbitrary dimensions sparse matrices, e.g. tensorflow SparseTensor (preferred) or those provided by sparse. scipy's sparse 2D matrices won't be enough here
Oh ok! Thanks so much!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.