7

How can I create a huge numpy array using pytables. I tried this but gives me the "ValueError: array is too big." error:

import numpy as np
import tables as tb
ndim = 60000
h5file = tb.openFile('test.h5', mode='w', title="Test Array")
root = h5file.root
h5file.createArray(root, "test", np.zeros((ndim,ndim), dtype=float))
h5file.close()

2 Answers 2

15

Piggybacking off of @b1r3k's response, to create an array that you are not going to access all at once (i.e. bring the whole thing into memory), you want to use a CArray (Chunked Array). The idea is that you would then fill and access it incrementally:

import numpy as np
import tables as tb
ndim = 60000
h5file = tb.openFile('test.h5', mode='w', title="Test Array")
root = h5file.root
x = h5file.createCArray(root,'x',tb.Float64Atom(),shape=(ndim,ndim))
x[:100,:100] = np.random.random(size=(100,100)) # Now put in some data
h5file.close()
Sign up to request clarification or add additional context in comments.

Comments

8

You could try to use tables.CArray class as it supports compression but...

I think questions is more about numpy than pytables because you are creating array using numpy before storing it with pytables.

In that way you need a lot of ram to execute np.zeros((ndim,ndim) - and this is probably the place where exception: "ValueError: array is too big." is raised.

If matrix/array is not dense then you could use sparse matrix representation available in scipy: http://docs.scipy.org/doc/scipy/reference/sparse.html

Another solution is to try to access your array via chunks if it you don't need whole array at once - check out this thread: Very large matrices using Python and NumPy

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.