Python: how to store a numpy multidimensional array in PyTables?

Question

How can I put a numpy multidimensional array in a HDF5 file using PyTables?

From what I can tell I can't put an array field in a pytables table.

I also need to store some info about this array and be able to do mathematical computations on it.

Any suggestions?

Honestly, if you're storing a lot of just straight up ND arrays, you're better off with h5py instead of pytables. It's as simple as f.create_dataset('name', data=x) where x is your numpy array and f is the open hdf file. Doing the same thing in pytables is possible, but considerably more difficult. — Joe Kington
– Joe Kington, Commented Jan 12, 2012 at 22:16
I thought of that but pytables has some features (tables.expr) to do calculations directly on the arrays, can i have that with h5py ? — scripts
– scripts, Commented Jan 12, 2012 at 22:22
@scripts - Not in the compressed, accelerated way that pytables does. (Or at least not that I know of, anyway.) pytables will also give you lots of nice querying abilities. h5py is better suited to straight-up storage and slicing of on-disk arrays (and is more pythonic, i.m.o., too). Not to plug my own answer too much, but my thoughts on the tradeoff between the two is here: stackoverflow.com/questions/7883646/… — Joe Kington
– Joe Kington, Commented Jan 12, 2012 at 22:34
thanks for the info Joe Kington and for my case pytables is better suited because of the powerful querying techniques — scripts
– scripts, Commented Jan 12, 2012 at 22:43

Suyog Jadhav · Accepted Answer · 2019-07-03 11:45:40Z

35

There may be a simpler way, but this is how you'd go about doing it, as far as I know:

import numpy as np
import tables

# Generate some data
x = np.random.random((100,100,100))

# Store "x" in a chunked array...
f = tables.open_file('test.hdf', 'w')
atom = tables.Atom.from_dtype(x.dtype)
ds = f.createCArray(f.root, 'somename', atom, x.shape)
ds[:] = x
f.close()

If you want to specify the compression to use, have a look at tables.Filters. E.g.

import numpy as np
import tables

# Generate some data
x = np.random.random((100,100,100))

# Store "x" in a chunked array with level 5 BLOSC compression...
f = tables.open_file('test.hdf', 'w')
atom = tables.Atom.from_dtype(x.dtype)
filters = tables.Filters(complib='blosc', complevel=5)
ds = f.createCArray(f.root, 'somename', atom, x.shape, filters=filters)
ds[:] = x
f.close()

There's probably a simpler way for a lot of this... I haven't used pytables for anything other than table-like data in a long while.

Note: with pytables 3.0, f.createCArray was renamed to f.create_carray. It can also accept the array directly, without specifying the atom,

f.create_carray('/', 'somename', obj=x, filters=filters)

edited Jul 3, 2019 at 11:45

Suyog Jadhav

3252 silver badges6 bronze badges

answered Jan 12, 2012 at 22:45

Joe Kington

287k73 gold badges621 silver badges474 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ben Allison Over a year ago

Note that this can now be done much more straightforwardly using the create_array method on file objects, as described in the section 'Creating new array objects' at pytables.github.io/usersguide/tutorials.html

Nico Schlömer Over a year ago

AttributeError: 'File' object has no attribute 'createCArray'

Collectives™ on Stack Overflow

Python: how to store a numpy multidimensional array in PyTables?

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related