0

I have various pd.DataFrames that I'd like to write to an hdf store by passing them to a function. Is there a way to programmatically generate key names based on the variable name of any given dataframe?

from sklearn import datasets
import pandas as pd
df1 = pd.DataFrame(datasets.load_iris().data)
df2 = pd.DataFrame(datasets.load_boston().data)

def save_to_hdf(df1):
    with pd.HDFStore('test.h5') as store:
        store.put('df1', df1)

save_to_hdf(df1)
4
  • Why not pass df1, both the name, and the actual DataFrame? It's a tiny bit more work, but it makes thing much clearer. Or use a dict, like {'df1': df1, 'df2': df2}, and iterate over the items. It's also more flexible. Commented Feb 17, 2018 at 1:54
  • If you really want to be that generic, you could probably use globals()['df1'] to get the relevant DataFrame, but I wouldn't recommend it. Commented Feb 17, 2018 at 1:55
  • @Evert I'd like to keep it more general, so I don't have to maintain a list of variables and names. Maybe I'm missing something... Commented Feb 17, 2018 at 2:09
  • The general case is given by the accepted answer; which echoes my first comment. Commented Feb 17, 2018 at 5:15

1 Answer 1

1

You should do it like np.savez() does it:

def save_to_hdf(filename, **kwargs):
    with pd.HDFStore(filename) as store:
        for name, df in kwargs.items():
            store.put(name, df)

save_to_hdf('test.h5', df1=df1, another_name=df2)

This is more efficient: it only needs to open the file once to write as many arrays as you want. And you can use names that are different to the variables.

You can avoid having to name the variables twice by using a dict:

dfs = {
    'iris': pd.DataFrame(datasets.load_iris().data),
    'boston': pd.DataFrame(datasets.load_boston().data),
}
save_to_hdf('test.h5', **dfs)
Sign up to request clarification or add additional context in comments.

3 Comments

@ConfusinglyCuriousTheThird: Programmatically creating names in files based on variables in Python is a very bad idea. Any Python programmer would be surprised to see such behavior, and you should drop the idea in favor of something more clear and explicit, like the above.
I agree. My question is, I suppose: is there a better alternative to maintaining a list of strings corresponding to variables of the same name?
@ConfusinglyCuriousTheThird: I've added to my answer to show how you can avoid duplicate names in your code. Just store the data all together from the beginning.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.