1

I wrote a program to read images using Python's opencv and tried to load 3 GB images, but the program aborted. There is 32 GB of memory on my PC, but when I run this program it will run out of it. What is the cause?

The error message is not issued and the PC becomes abnormally heavy. I confirmed it with Ubuntu's System Monitor, and it ran out of memory and swap.

I import images into one array to pass to tensorflow deep learning program. The size of the images are 200 x 200 color images.

I use 64 bit version of Python.

import os
import numpy as np
import cv2

IMG_SIZE = 200


def read_images(path):
    dirnames = sorted(os.listdir(path))
    files = [sorted(os.listdir(path+dirnames[i]))\
         for i in range(len(dirnames))]
    i = 0
    images = []
    for fs in files:
        tmp_images = []
        for f in fs:
            img = cv2.imread(path +dirnames[i] + "/" + f)
            img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))
            img = img.flatten().astype(np.float32)/255.0
            tmp_images.append(img)
        i = i + 1
        images.append(tmp_images)

    return np.asarray(images)
7
  • 1
    Please post the error message you get. Commented Nov 24, 2017 at 13:19
  • 1
    are you using a 32 bit version of Python or 64 bit? Commented Nov 24, 2017 at 13:20
  • I use 64bit version. Commented Nov 24, 2017 at 14:13
  • 1
    An image being 2 MB on your system does not mean it will be 2 MB when loaded into OpenCV. There is overhead since a np.ndarray holds more information, and additionally, you're converting it to float32 instead of the native uint8, which means that you're using four times the memory for each image. You could just convert when you use each image. You probably don't need to load 3 GB of images into a single array, so what are you exactly trying to do? It's not clear how large the images are being resized from either, so hard to give precise advice here. Commented Nov 25, 2017 at 0:39
  • 1
    In addition to the above, images stored on your disk as PNG or JPEG are compressed by the nature of those formats, and imread decompresses them to put the data into an array. Commented Nov 25, 2017 at 1:00

1 Answer 1

1

Reasons for running out of memory:

  • Image file size and the size of corresponding array in memory are different. Images, e.g., PNG and JPEG formats, are compressed. The size of a corresponding uncompressed BMP image is more relevant here. Also, ndarray holds some meta-information that makes it a bit larger.

  • Converting to float32 from uint8 multiplies the size by 4. Try to avoid this if possible (I recognize uint8 imposes some limitations, like being unable to normalize and center the data).

Possible remedies:

  • Use numpy.memmap to create an array stored on disk
  • Reduce the quality of images, by converting to grayscale and/or reducing the resolution.
  • Train the model on a smaller number of images.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.