Previously, I have saved multi columns of dataset into one HDF file. The procedure can be outlined as follows:
import pandas as pd
from pandas import HDFStore, DataFrame
from pandas import read_hdf
hdf = HDFStore("FILE.h5")
feature = ['var1','var2']
## noted that the original dataframe is huge, and thus fake dataframe was generated as example.
for k in range(0,len(feature),1):
df = {'A':['1','2','3','4'],'B':[4,5,6,7]}
df = pd.DataFrame(df)
hdf.put(feature[k], df, format='table', encoding="utf-8")
Then, I can read the file 'FILE.h5' by simply using
df = pd.read_hdf("./FILE.h5,'var1',encoding = 'utf-8')
It always worked well until I have upgraded my Python environment from 2.7 to 3.7.
For now with Python 3.7 and Pandas 0.24.2, the HDF file could not be correctly read. The error shows like:
df = pd.read_hdf("./FILE.h5,'var1',encoding = 'utf-8')
>>> ...
~/anaconda3/lib/python3.7/codecs.py in getdecoder(encoding)
961
962 """
--> 963 return lookup(encoding).decode
964
965 def getincrementalencoder(encoding):
TypeError: lookup() argument must be str, not numpy.bytes_
PS
I have read the GitHub issue which was similar to my situation. But it could not fix my problem. Then, I turned to use h5py package dealing with hdf5-format files, but it was not as convenient as the pandas.