18

How can I put a numpy multidimensional array in a HDF5 file using PyTables?

From what I can tell I can't put an array field in a pytables table.

I also need to store some info about this array and be able to do mathematical computations on it.

Any suggestions?

5
  • 8
    Honestly, if you're storing a lot of just straight up ND arrays, you're better off with h5py instead of pytables. It's as simple as f.create_dataset('name', data=x) where x is your numpy array and f is the open hdf file. Doing the same thing in pytables is possible, but considerably more difficult. Commented Jan 12, 2012 at 22:16
  • Joe, +1. I was about to post an almost identical comment. Commented Jan 12, 2012 at 22:21
  • I thought of that but pytables has some features (tables.expr) to do calculations directly on the arrays, can i have that with h5py ? Commented Jan 12, 2012 at 22:22
  • 4
    @scripts - Not in the compressed, accelerated way that pytables does. (Or at least not that I know of, anyway.) pytables will also give you lots of nice querying abilities. h5py is better suited to straight-up storage and slicing of on-disk arrays (and is more pythonic, i.m.o., too). Not to plug my own answer too much, but my thoughts on the tradeoff between the two is here: stackoverflow.com/questions/7883646/… Commented Jan 12, 2012 at 22:34
  • thanks for the info Joe Kington and for my case pytables is better suited because of the powerful querying techniques Commented Jan 12, 2012 at 22:43

1 Answer 1

35

There may be a simpler way, but this is how you'd go about doing it, as far as I know:

import numpy as np
import tables

# Generate some data
x = np.random.random((100,100,100))

# Store "x" in a chunked array...
f = tables.open_file('test.hdf', 'w')
atom = tables.Atom.from_dtype(x.dtype)
ds = f.createCArray(f.root, 'somename', atom, x.shape)
ds[:] = x
f.close()

If you want to specify the compression to use, have a look at tables.Filters. E.g.

import numpy as np
import tables

# Generate some data
x = np.random.random((100,100,100))

# Store "x" in a chunked array with level 5 BLOSC compression...
f = tables.open_file('test.hdf', 'w')
atom = tables.Atom.from_dtype(x.dtype)
filters = tables.Filters(complib='blosc', complevel=5)
ds = f.createCArray(f.root, 'somename', atom, x.shape, filters=filters)
ds[:] = x
f.close()

There's probably a simpler way for a lot of this... I haven't used pytables for anything other than table-like data in a long while.

Note: with pytables 3.0, f.createCArray was renamed to f.create_carray. It can also accept the array directly, without specifying the atom,

f.create_carray('/', 'somename', obj=x, filters=filters)
Sign up to request clarification or add additional context in comments.

2 Comments

Note that this can now be done much more straightforwardly using the create_array method on file objects, as described in the section 'Creating new array objects' at pytables.github.io/usersguide/tutorials.html
AttributeError: 'File' object has no attribute 'createCArray'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.