Load a dataframe to memory python

Question

I have a large file I need to load to a dataframe. I will need to work on it for a while. Is there a way of keeping in loaded in memory, so that if my script fails, I will not need to load it again ?

Thanks ! How about other data structures ? like numpy matrices or objects ? — matlabit
– matlabit, Commented Jan 20, 2016 at 7:37

Community · Accepted Answer · 2017-05-23 11:59:25Z

1

Here's an example of how one can keep variables in memory between runs.

For persistent storage beyond RAM, I would recommend looking into HDF5. It's fast, simple, and allows for queries if necessary: (see docs).

It supports .read_hdf() and .to_hdf() similar to the _csv() methods, but is significantly faster.

A simple illustration of storage and retrieval including query (from the docs) would be:

df = DataFrame(dict(A=list(range(5)), B=list(range(5))))
df.to_hdf('store_tl.h5','table', append=True)
read_hdf('store_tl.h5', 'table', where = ['index>2'])

edited May 23, 2017 at 11:59

CommunityBot

11 silver badge

answered Jan 14, 2016 at 9:00

Stefan

43.1k13 gold badges80 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Load a dataframe to memory python

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related