Storing a list of strings to a HDF5 Dataset from Python using VL format

Question

I expected the following code to work, but it doesn't.

import h5py
import numpy as np

with h5py.File('file.hdf5','w') as hf:
    dt = h5py.special_dtype(vlen=str)
    feature_names = np.array(['a', 'b', 'c'])
    hf.create_dataset('feature names', data=feature_names, dtype=dt)

I get the error message TypeError: No conversion path for dtype: dtype('<U1'). The following code does work, but using a for loop to copy the data seems a bit clunky to me. Is there a more straightforward way to do this? I would prefer to be able to pass the sequence of strings directly into the create_dataset function.

import h5py
import numpy as np

with h5py.File('file.hdf5','w') as hf:
    dt = h5py.special_dtype(vlen=str)
    feature_names = np.array(['a', 'b', 'c'])
    ds = hf.create_dataset('feature names', (len(feature_names),), dtype=dt)

    for i in range(len(feature_names)):
        ds[i] = feature_names[i]

Note: My question follows from this answer to Storing a list of strings to a HDF5 Dataset from Python, but I don't consider it a duplicate of that question.

Define "straightforward." Your loop that works is about as "straightforward" as it gets. — Robert Harvey
– Robert Harvey, Commented Mar 21, 2019 at 14:22
@RobertHarvey I was hoping that there was a Python type that I could use for my sequence/list/vector of variable-length strings, that could directly be used by hp5y. — mhwombat
– mhwombat, Commented Mar 21, 2019 at 14:41
Does ds[:] = feature_names work? Or data=feature_names.astype(object)? — hpaulj
– hpaulj, Commented Mar 21, 2019 at 16:07
@hpaulj ds[:] = feature_names works, but your second option doesn't. If you want to turn that into an answer, I'll vote it up. Also, I'll accept it unless someone comes up with a way to pass the list into the create_dataset function. — mhwombat
– mhwombat, Commented Mar 21, 2019 at 16:45

teegaar · Accepted Answer · 2019-07-03 13:23:49Z

10

You almost did it, the missing detail was to pass dtype to np.array:

import h5py                                                                                                                                                                                                
import numpy as np            

with h5py.File('file.hdf5','w') as hf: 
     dt = h5py.special_dtype(vlen=str) 
     feature_names = np.array(['a', 'b', 'c'], dtype=dt) 
     hf.create_dataset('feature names', data=feature_names)

PS: It looks like a bug for me - create_dataset ignores the given dtype and don't apply it to the given data.

answered Jul 3, 2019 at 13:23

teegaar

93611 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Storing a list of strings to a HDF5 Dataset from Python using VL format

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related