load dataset into memory for future computation in python

Question

I have a large dataset that I perform experiments on. It takes 30 mins to load the dataset from file into memory using a python program. Then I perform variations of an algorithm on the dataset. Each time I have to vary the algorithm, I have to load the dataset into memory again, which eats up 30 minutes.

Is there any way to load the dataset into memory once and for always. And then each time to run a variation of an algorithm, just use that pre loaded dataset?

I know the question is a bit abstract, suggestions to improve the framing of the question are welcome. Thanks.

EDITS:

Its a text file, contains graph data, around 6 GB. If I only load a portion of the dataset, it doesn't make for a very good graph. I do not do computation while loading the dataset.

You could possibly try using a ram disk, or SSD. Not an answer to your question, sorry... — Daniel Fairhead
– Daniel Fairhead, Commented Dec 5, 2013 at 0:53
How are you loading your data set? Is it a .csv file, a database, or what? Do you perform computations during load or is it simply reading from disk for 30 minutes? — FogleBird
– FogleBird, Commented Dec 5, 2013 at 0:53
i would suggest that until you have not finalized your algorithm, work with a much smaller sample of full data ( if viable ). — behzad.nouri
– behzad.nouri, Commented Dec 5, 2013 at 0:58
What kind of file is it? How much data do you have? It seems incredible to me that if you're just reading from disk that it could take 30 minutes to load the data and not run you out of memory. Are you doing processing on the data as you read it? — mgilson
– mgilson, Commented Dec 5, 2013 at 0:59
@mgilson I have 32 GB of memory, and do a hash table lookup around 10-20 times for each line I pick up from the file before appending it to a python list. — rohanag
– rohanag, Commented Dec 5, 2013 at 1:43

Daniel Fairhead · Accepted Answer · 2013-12-05 00:55:00Z

1

You could write a very quick CLI which would load the data, and then ask for a python filename, which it would then eval() on the data...

answered Dec 5, 2013 at 0:55

Daniel Fairhead

1,1819 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

fuesika · Accepted Answer · 2013-12-05 01:01:22Z

1

You could use an environment such as Spyder which is similar to Matlab. This allows you even to have a list of all variables in the workspace at any time during algorithm execution.

answered Dec 5, 2013 at 1:01

fuesika

3,3406 gold badges29 silver badges35 bronze badges

Comments

xuhdev · Accepted Answer · 2021-08-04 05:25:04Z

0

One possible solution is to use Jupyter to load it once and keep the Jupyter session running. Then you modify your algorithm in a cell and always rerun that cell alone. You can operate on the loaded dataset in RAM as much as you want until you terminate the Jupyter session.

answered Aug 4, 2021 at 5:25

xuhdev

9,6144 gold badges56 silver badges82 bronze badges

Collectives™ on Stack Overflow

load dataset into memory for future computation in python

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related