Large, sparse list of lists giving MemoryError when calling np.array(data)

Question

I have a large matrix of 0s and 1s, that is mostly 0s. It is initially stored as a list of 25 thousand other lists, each of which are about 2000 ints long.

I am trying to put these into a numpy array, which is what another piece of my program takes. So I run training_data = np.array(data), but this returns a MemoryError

Why is this happening? I'm assuming it is too much memory for the program to handle (which is surprising to me..), but if so, is there a better way of doing this?

Katriel · Accepted Answer · 2012-03-18 16:37:39Z

1

A (short) integer takes two bytes to store. You want 25,000 lists, each with 2,000 integers; that gives

25000*2000*2/1000000 = 100 MB

This works fine on my computer (4GB RAM):

>>> import numpy as np
>>> x = np.zeros((25000,2000),dtype=int)

Are you able to instantiate the above matrix of zeros?

Are you reading the file into a Python list of lists and then converting that to a numpy array? That's a bad idea; it will at least double the memory requirements. What is the file format of your data?

For sparse matrices scipy.sparse provides various alternative datatypes which will be much more efficient.

EDIT: responding to the OP's comment.

I have 25000 instances of some other class, each of which returns a list of length about 2000. I want to put all of these lists returned into the np.array.

Well, you're somehow going over 8GB! To solve this, don't do all this manipulation in memory. Write the data to disk a class at a time, then delete the instances and read in the file from numpy.

First do

with open(..., "wb") as f:
    f = csv.writer(f)
    for instance in instances:
        f.writerow(instance.data)

This will write all your data into a large-ish CSV file. Then, you can just use np.loadtxt:

numpy.loadtxt(open(..., "rb"), delimiter=",")

edited Mar 18, 2012 at 16:37

answered Mar 18, 2012 at 16:20

Katriel

124k19 gold badges141 silver badges172 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

zebra Over a year ago

Way more than 100 MB :) I have 8 GB of RAM.

zebra Over a year ago

To your next edit, I have 25000 instances of some other class, each of which returns a list of length about 2000. I want to put all of these lists returned into the np.array

Katriel Over a year ago

@zebra: wow -- you're going over 8GB! That's impressive :o. I'll edit (again).

Collectives™ on Stack Overflow

Large, sparse list of lists giving MemoryError when calling np.array(data)

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related