6

I expected the following code to work, but it doesn't.

import h5py
import numpy as np

with h5py.File('file.hdf5','w') as hf:
    dt = h5py.special_dtype(vlen=str)
    feature_names = np.array(['a', 'b', 'c'])
    hf.create_dataset('feature names', data=feature_names, dtype=dt)

I get the error message TypeError: No conversion path for dtype: dtype('<U1'). The following code does work, but using a for loop to copy the data seems a bit clunky to me. Is there a more straightforward way to do this? I would prefer to be able to pass the sequence of strings directly into the create_dataset function.

import h5py
import numpy as np

with h5py.File('file.hdf5','w') as hf:
    dt = h5py.special_dtype(vlen=str)
    feature_names = np.array(['a', 'b', 'c'])
    ds = hf.create_dataset('feature names', (len(feature_names),), dtype=dt)

    for i in range(len(feature_names)):
        ds[i] = feature_names[i]

Note: My question follows from this answer to Storing a list of strings to a HDF5 Dataset from Python, but I don't consider it a duplicate of that question.

4
  • Define "straightforward." Your loop that works is about as "straightforward" as it gets. Commented Mar 21, 2019 at 14:22
  • @RobertHarvey I was hoping that there was a Python type that I could use for my sequence/list/vector of variable-length strings, that could directly be used by hp5y. Commented Mar 21, 2019 at 14:41
  • Does ds[:] = feature_names work? Or data=feature_names.astype(object)? Commented Mar 21, 2019 at 16:07
  • @hpaulj ds[:] = feature_names works, but your second option doesn't. If you want to turn that into an answer, I'll vote it up. Also, I'll accept it unless someone comes up with a way to pass the list into the create_dataset function. Commented Mar 21, 2019 at 16:45

1 Answer 1

10

You almost did it, the missing detail was to pass dtype to np.array:

import h5py                                                                                                                                                                                                
import numpy as np            

with h5py.File('file.hdf5','w') as hf: 
     dt = h5py.special_dtype(vlen=str) 
     feature_names = np.array(['a', 'b', 'c'], dtype=dt) 
     hf.create_dataset('feature names', data=feature_names)

PS: It looks like a bug for me - create_dataset ignores the given dtype and don't apply it to the given data.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.