3

Im trying to produce a usual matrix multiplication between two huge matrices (10*25,000,000). My memory runs out when I do so. How could I use numpy's memmap to be able to handle this? Is this even a good idea? I'm not so worried about the speed of the operation, I just want the result even if it means waiting some time. Thank you in advanced!

8 gbs ram, I7-2617M 1.5 1.5 ghz, Windows7 64 bits. Im using the 64 bit version of everything: python(2.7), numpy, scipy.

Edit1:

Maybe h5py is a better option?

2
  • You talk about "usual matrix multiplication" as opposed to element-wise multiplication I suppose. What is the type of an element ? int8 ? float64 ? Is the resulting matrice supposed to be 25,000,000*25,000,000 or 10*10 ? If 10*10 you should be OK. 10*25,000,000*8bytes = 2GBytes. Commented May 30, 2012 at 14:40
  • (10;25,000,000)*(25,000,000;10) any ideas? do these packages help at all to overcome this or am I reasoning in the wrong direction. float64. I could maybe work with float32 but it still wont work. @FélixCantournet Commented Jun 6, 2012 at 2:20

2 Answers 2

2

you might try to use np.memmap, and compute the 10x10 output matrix one element at a time.

so you just load the first row of the first matrix and the first column of the second, and then np.sum(row1 * col1).

Sign up to request clarification or add additional context in comments.

Comments

1

Try numpy.memmap and numexpr! This will work using Your disk and CPU chache without memory xD. Its nice like fortran loop. Some code in here: python - way to do fast matrix multiplication and reduction while working in memmaps and CPU. But beware of size of files that it will create - if they will be only temp files, remove them later, if not then i suppose its best to combine them with pandas.hdf5 files with compression 9x. So You create data.tofile load it with memmap, calculate, save memmap to pandas.hd5f, delete memmap. Storing data in one row is also some option with hdf5 files that should take less space - I think I read about it somewhere. Also, when You memmap 1row data with numpy just give some shape with proper order, and numpy memmap will read that 1row data in chosen shape.

1 Comment

numexpr is element-wise only

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.