0

I am preprocessing a large dataset for an NN training. My dataset is accumulated in features = list().

When attempting features = np.array(features) I am getting:

numpy.core._exceptions.MemoryError: Unable to allocate 29.6 GiB for an array with shape (37990, 605, 173) and data type float64

I have seen a number of solutions in other posts, like saving and reloading, which did not work due to np.save converting to an array first, or using uint8 for images, or a lower memory format when possible.

The problem is, that my input is a tensor bot, not an image. I am not sure what are the maximal values and due to my classification task, I don't know if I can use another format. I am trying to avoid using a keras generator due to the implementation overhead. So, my question is, is there a way of handling this dataset without the use of a generator?

1
  • np.arrays do not use more memory than a list, the problem is that during conversion, you need to hold both array and list in memory and you do not seem to have enough RAM for that. Commented May 28, 2021 at 9:59

1 Answer 1

1

You can use numpy's mmap() support: this will back the data by a file on disk, while still acting like a normal numpy array. So it doesn't have to fit in memory.

https://numpy.org/doc/stable/reference/generated/numpy.memmap.html

See https://pythonspeed.com/articles/mmap-vs-zarr-hdf5/ for explanation of how this works.

Sign up to request clarification or add additional context in comments.

4 Comments

I don't think I understand your links. If I am not mistaken, this is for an ndarray, which is already saved on the disk. My data is a list and I can not convert it into an ndarray due to size errors. Am I missing something here?
You should create an empty mmaped() ndarray, then copy the values over into it (arr = memmap("myarray", "w+"); arr[:] = features[:], something like that).
Does not work. I am initializing the memmap array. On the copy command, it gives the same error.
Try arr[:] = features, without the extra list copy.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.