1

I have a large matrix of 0s and 1s, that is mostly 0s. It is initially stored as a list of 25 thousand other lists, each of which are about 2000 ints long.

I am trying to put these into a numpy array, which is what another piece of my program takes. So I run training_data = np.array(data), but this returns a MemoryError

Why is this happening? I'm assuming it is too much memory for the program to handle (which is surprising to me..), but if so, is there a better way of doing this?

1 Answer 1

1

A (short) integer takes two bytes to store. You want 25,000 lists, each with 2,000 integers; that gives

25000*2000*2/1000000 = 100 MB

This works fine on my computer (4GB RAM):

>>> import numpy as np
>>> x = np.zeros((25000,2000),dtype=int)

Are you able to instantiate the above matrix of zeros?

Are you reading the file into a Python list of lists and then converting that to a numpy array? That's a bad idea; it will at least double the memory requirements. What is the file format of your data?

For sparse matrices scipy.sparse provides various alternative datatypes which will be much more efficient.


EDIT: responding to the OP's comment.

I have 25000 instances of some other class, each of which returns a list of length about 2000. I want to put all of these lists returned into the np.array.

Well, you're somehow going over 8GB! To solve this, don't do all this manipulation in memory. Write the data to disk a class at a time, then delete the instances and read in the file from numpy.

First do

with open(..., "wb") as f:
    f = csv.writer(f)
    for instance in instances:
        f.writerow(instance.data)

This will write all your data into a large-ish CSV file. Then, you can just use np.loadtxt:

numpy.loadtxt(open(..., "rb"), delimiter=",")
Sign up to request clarification or add additional context in comments.

3 Comments

Way more than 100 MB :) I have 8 GB of RAM.
To your next edit, I have 25000 instances of some other class, each of which returns a list of length about 2000. I want to put all of these lists returned into the np.array
@zebra: wow -- you're going over 8GB! That's impressive :o. I'll edit (again).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.