I'm working on Ubuntu 14.04 with Python 3.4 (Numpy 1.9.2 and PIL.Image 1.1.7). Here's what I do:
>>> from PIL import Image
>>> import numpy as np
>>> img = Image.open("./tifs/18015.pdf_001.tif")
>>> arr = np.asarray(img)
>>> np.shape(arr)
(5847, 4133)
>>> arr.dtype
dtype('bool')
# all of the following four cases where I incrementally increase
# the number of rows to 700 are done instantly
>>> v = arr[1:100,1:100].sum(axis=0)
>>> v = arr[1:500,1:100].sum(axis=0)
>>> v = arr[1:600,1:100].sum(axis=0)
>>> v = arr[1:700,1:100].sum(axis=0)
# but suddenly this line makes Python crash
>>> v = arr[1:800,1:100].sum(axis=0)
fish: Job 1, “python3” terminated by signal SIGSEGV (Address boundary error)
Seems to me like Python runs out of memory all of a sudden. If that is the case - how can I allocate more memory to Python? As I can see from htop my 32GB memory capacity is not even remotely depleated.
You may download the TIFF image here.
If I create an empty boolean array, set the pixels explicitely and then apply the summation - then it works:
>>> arr = np.empty((h,w), dtype=bool)
>>> arr.setflags(write=True)
>>> for r in range(h):
>>> for c in range(w):
>>> arr.itemset((r,c), img.getpixel((c,r)))
>>> v=arr.sum(axis=0)
>>> v.mean()
5726.8618436970719
>>> arr.shape
(5847, 4133)
But this "workaround" is not very satisfactory as copying every pixel takes way too long - maybe there is a faster method?
numpy.asarray()is generating an array backed by part or all of the Image object (as opposed to copying all the pixel values to a separate internal representation), and numpy and PIL disagree about some aspect of the expected behavior of the Image (or perhaps PIL is just buggy). You could probe and/or work around that by manually extracting a pixel raster from the Image object, and building your numpy array around that.