23

I downloaded a dataset which is stored in .h5 files. I need to keep only certain columns and to be able to manipulate the data in it.

To do this, I tried to load it in a pandas dataframe. I've tried to use:

pd.read_hdf(path)

But I get: No dataset in HDF5 file.

I've found answers on SO (read HDF5 file to pandas DataFrame with conditions) but I don't need conditions, and the answer adds conditions about how the file was written but I'm not the creator of the file so I can't do anything about that.

I've also tried using h5py:

df = h5py.File(path)

But this is not easily manipulable and I can't seem to get the columns out of it (only the names of the columns using df.keys()) Any idea on how to do this ?

1

2 Answers 2

17

Easiest way to read them into Pandas is to convert into h5py, then np.array, and then into DataFrame. It would look something like:

df = pd.DataFrame(np.array(h5py.File(path)['variable_1']))
Sign up to request clarification or add additional context in comments.

2 Comments

What does 'variable_1' represent in this case? - I am having the same issue opening .h5 files
variable_1 would be a single dataset. A file in h5py is supposed to be a dictionary of labeled datasets. So "h5py.File(path)" returns a dictionary and then accessing ["variable_1"] on that returns the value for the key "variable_1" which is a single dataset.
6

Pandas HDF support needs the HDF file to be formated very specifically. You can see https://stackoverflow.com/a/33644128/4128030 for more info.

1 Comment

Yes. More about this here as well.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.