1

I am trying to read in a hdf file but no groups show up. I have tried a couple different methods using tables and h5py but neither work in displaying the groups in the file. I checked and the file is 'Hierarchical Data Format (version 5) data' (See Update). The file information is here for a reference.

Example data can be found here

import h5py
import tables as tb

hdffile = "TRMM_LIS_SC.04.1_2010.260.73132"

Using h5py:

f = h5py.File(hdffile,'w')
print(f)

Outputs:

< HDF5 file "TRMM_LIS_SC.04.1_2010.260.73132" (mode r+) >
[]

Using tables:

fi=tb.openFile(hdffile,'r')
print(fi)

Outputs:

TRMM_LIS_SC.04.1_2010.260.73132 (File) ''
Last modif.: 'Wed Aug 10 18:41:44 2016'
Object Tree:
/ (RootGroup) ''

Closing remaining open files:TRMM_LIS_SC.04.1_2010.260.73132...done

UPDATE

h5py.File(hdffile,'w') overwrote the file and emptied it.

Now my question is how to read in a hdf version 4 file into python since h5py and tables both do not work?

1
  • What @MaxU says... And, this will also help you: docs.python.org/3/library/functions.html#open See the table, to read a file it is 'r', to write, 'w' to append 'a'. Good luck! Commented Aug 10, 2016 at 20:01

3 Answers 3

4

How big is the file? I think that doing h5py.File(hdffile,'w') overwrites it, so it's empty. Use h5py.File(hdffile,'r') to read.

I don't have enough karma to reply to @Luke H's answer, but reading it into pandas might not be a good idea. Pandas hdf5 uses pytables, which is an "opinionated" way of using hdf5. This means that it stores extra metadata (eg. the index). So I would only use pytables to read the file if it was made with pytables.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! You are right that the 'w' emptied the file and caused the hdf (version 5) file because I re-downloaded the file and it is in version 4 now. Unfortunately now h5py.File will not work because the file signature is not found.
1

UPDATE:

i would recommend you first to convert your HDF version 4 files to HDF5 / h5 files as all modern libraries / modules are working with HDF version 5...

OLD answer:

try it this way:

store = pd.HDFStore(filename)
print(store)

this should print you details about the HDF file, including stored keys, lengths of stored DFs, etc.

Demo:

In [18]: fn = r'C:\Temp\a.h5'

In [19]: store = pd.HDFStore(fn)

In [20]: print(store)
<class 'pandas.io.pytables.HDFStore'>
File path: C:\Temp\a.h5
/df_dc               frame_table  (typ->appendable,nrows->10,ncols->3,indexers->[index],dc->[a,b,c])
/df_no_dc            frame_table  (typ->appendable,nrows->10,ncols->3,indexers->[index])

now you can read dataframes using keys from the output above:

In [21]: df = store.select('df_dc')

In [22]: df
Out[22]:
    a   b   c
0  92  80  86
1  27  49  62
2  55  64  60
3  31  66   3
4  37  75  81
5  49  69  87
6  59   0  87
7  69  91  39
8  93  75  31
9  21  15   7

2 Comments

So from @user357269 I found out that the file got overwrote and it is in hdf version 4 and pandas HDF tools only works with version 5. Thanks though.
I don't have hdf4 of the converter installed so I will try that. Thanks.
0

Try using pandas:

import pandas as pd
f = pd.read_hdf(C:/path/to/file)

See Pandas HDF documentation here.

This should read in any hdf file as a dataframe you can then manipulate.

3 Comments

I tried using the pd.read_hdf from pandas but it requires a second argument for a group identifier that I haven't been able to find.
That is because there is more than one "pandas object" in the file. You'll need to specify which one (VIA the "key" argument). I'm sorry I can't help you much more than that.
So from @user357269 I found out that the file got overwrote and it is in hdf version 4 and pandas.read_hdf only works from what I gather with version 5.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.