2

I'm dealing with a big array D with which I'm running into memory problems. However, the entries of that big array are in fact just copies of elements of a much smaller array B. Now my idea would be to use something like a "dynamic view" into B instead of constructing the full D. For example, is it possible to use a function D_fun like an array which the reads the correct element of B? I.e. something like

def D_fun(B, I, J):
    i = convert_i_idx(I, J)
    j = convert_j_idx(I, J)

    return B[i,j]

And then I could use D_fun to do some matrix and vector multiplications.

Of course, anything else that would keep me form copying the elements of B repeatedly into a huge matrix would be appreciated.

Edit: I realized that if I invest some time in my other code I can get the matrix D to be a block matrix with the Bs on the diagonal and zeros otherwise.

5
  • Oh and FYI, D is nor quite dense but it's also not sparse enough that using a scipy.sparse solves the memory issue. Commented Feb 3, 2017 at 11:01
  • 1
    What exactly is the implementation of convert_i_idx? If its particularly simple, you might find that as_strided can help Commented Feb 3, 2017 at 12:23
  • Thanks for the tip, but I'm afraid it's not simple enough for that. Commented Feb 3, 2017 at 12:32
  • Element multiplication or np.dot like? Commented Feb 3, 2017 at 12:47
  • Matrix and vector multiplications in the mathematical sense, so np.dot like Commented Feb 3, 2017 at 13:31

2 Answers 2

3

This is usually done by subclassing numpy.ndarray and overloading __getitem__, __setitem__, __delitem__ (array-like access via []) to remap the indices like D_fun(..) does. Still, I am not sure if this will work in combination with the numpy parts implemented in C.

Some concerns: When you're doing calculations on your big matrix D via the small matrix B, numpy might create a copy of D with its real dimensions, thus using more space than wanted.

If several (I1,J1), (I2,J2).. are mapped to the same (i,j), D[I1,J1] = newValue will also set D(I2,J2) to newValue.

Sign up to request clarification or add additional context in comments.

1 Comment

I'll give that a try! Your second concern is not an issue. I'll probably won't change the array once created and even if I had to, that would be the desired behavior.
1

np.dot uses compiled libraries to perform fast matrix products. That constrains the data type (integer, floats), and requires that the data be contiguous. I'd suggest studying this recent question about large dot products, numpy: efficient, large dot products

Defining a class with a custom __getitem__ is a way of accessing a object with indexing syntax. Look in numpy/lib/index_tricks.py for some interesting examples of this, np.mgrid,np.r_, np.s_ etc. But this is largely a syntax enhancement. It doesn't avoid the issues of defining a robust and efficient mapping between your D and B.

And before trying to do much with subclassing ndarray take a look at the implementation for np.matrix or np.ma. scipy.sparse also creates classes that behave like ndarray in many ways, but does not subclass ndarray.

In your D_fun are I and J scalars? If so this conversion would be horribly in efficient. It would be better if they could be arrays, lists or slices (anything that B[atuple] implements), but that can be a lot of work.

def D_fun(B, I, J):
    i = convert_i_idx(I, J)
    j = convert_j_idx(I, J)    
    return B[i,j]

def __getitem__(self, atuple):
    # sketch of a getitem version of your function
    I, J = atuple
    <select B based on I,J?>
    i = convert_i_idx(I, J)
    j = convert_j_idx(I, J) 
    return B.__getitem__((i,j))

What is the mapping from D to B like? The simplest, and most efficient mapping would be that D is just a higher dimensional collection of B, i.e.

 D = np.array([B0,B1,B2,...,Bn])
 D[0,...] == B0  

Slightly more complicated is the case where D[n1:n2,....] == B0, a slice

But if the B0 values are scattered around D you chances of efficient, reliable mapping a very small.

3 Comments

I'm not sure if I got the last part and how that could help me. I realized that D can be written in diagonal block form (see edit above). Does that work in that direction?
Take a look at scipy.sparse block_diag construction.
Mhh, but how could that exploit the fact that the matrices on the diagonal are identical in order to save memory?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.