1

I have a large file I need to load to a dataframe. I will need to work on it for a while. Is there a way of keeping in loaded in memory, so that if my script fails, I will not need to load it again ?

4
  • Maybe you can pickle it using to_pickle Commented Jan 14, 2016 at 8:28
  • And maybe this help. Commented Jan 14, 2016 at 9:16
  • Thanks ! How about other data structures ? like numpy matrices or objects ? Commented Jan 20, 2016 at 7:37
  • numpy is easy pd.DataFrame(numpyarray) Commented Jan 20, 2016 at 7:38

1 Answer 1

1

Here's an example of how one can keep variables in memory between runs.

For persistent storage beyond RAM, I would recommend looking into HDF5. It's fast, simple, and allows for queries if necessary: (see docs).

It supports .read_hdf() and .to_hdf() similar to the _csv() methods, but is significantly faster.

A simple illustration of storage and retrieval including query (from the docs) would be:

df = DataFrame(dict(A=list(range(5)), B=list(range(5))))
df.to_hdf('store_tl.h5','table', append=True)
read_hdf('store_tl.h5', 'table', where = ['index>2'])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.