6

I'm working on Ubuntu 14.04 with Python 3.4 (Numpy 1.9.2 and PIL.Image 1.1.7). Here's what I do:

>>> from PIL import Image
>>> import numpy as np

>>> img = Image.open("./tifs/18015.pdf_001.tif")
>>> arr = np.asarray(img)
>>> np.shape(arr)
(5847, 4133)

>>> arr.dtype
dtype('bool')

# all of the following four cases where I incrementally increase
# the number of rows to 700 are done instantly
>>> v = arr[1:100,1:100].sum(axis=0)
>>> v = arr[1:500,1:100].sum(axis=0)
>>> v = arr[1:600,1:100].sum(axis=0)
>>> v = arr[1:700,1:100].sum(axis=0)

# but suddenly this line makes Python crash
>>> v = arr[1:800,1:100].sum(axis=0)

fish: Job 1, “python3” terminated by signal SIGSEGV (Address boundary error)

Seems to me like Python runs out of memory all of a sudden. If that is the case - how can I allocate more memory to Python? As I can see from htop my 32GB memory capacity is not even remotely depleated.

You may download the TIFF image here.


If I create an empty boolean array, set the pixels explicitely and then apply the summation - then it works:

>>> arr = np.empty((h,w), dtype=bool)
>>> arr.setflags(write=True)

>>> for r in range(h):
>>>     for c in range(w):
>>>         arr.itemset((r,c), img.getpixel((c,r)))

>>> v=arr.sum(axis=0)

>>> v.mean()
5726.8618436970719

>>> arr.shape
(5847, 4133)

But this "workaround" is not very satisfactory as copying every pixel takes way too long - maybe there is a faster method?

15
  • 2
    A segmentation fault always indicates a bug. Even if Python were running out of memory, it would be a bug for it to crash with a segmentation fault instead of throwing an out of memory error. Commented Mar 17, 2015 at 18:00
  • 1
    It is conceivable that you are running out of stack space. That you do not do so in the 10000 x 10000 random case could point to a difference in the algorithm used for array sections vs. the one used for whole arrays. If a recursive algorithm were used for sections, then an array section with many discontinuous segments might recurse too deeply and exhaust the stack. This is all speculative, of course. Commented Mar 17, 2015 at 18:09
  • The first case will also crash when not sectioned and the second case will not crash also when sectioned. Commented Mar 17, 2015 at 18:10
  • My best guess, then, is that numpy.asarray() is generating an array backed by part or all of the Image object (as opposed to copying all the pixel values to a separate internal representation), and numpy and PIL disagree about some aspect of the expected behavior of the Image (or perhaps PIL is just buggy). You could probe and/or work around that by manually extracting a pixel raster from the Image object, and building your numpy array around that. Commented Mar 17, 2015 at 18:33
  • Could you tell us which versions of PIL and numpy you are using? Commented Mar 17, 2015 at 18:37

1 Answer 1

3

I can reproduce your segfault using numpy v1.8.2/PIL v1.1.7 installed from the Ubuntu repositories.

  • If I install numpy 1.8.2 in a virtualenv using pip (still using PIL v1.7.1 from the Ubuntu repos) then I no longer see the segfault.

  • If I do the opposite (installing PIL v1.1.7 using pip, and using numpy v1.8.2 from the Ubuntu repos), I still get the segfault.

This leads me to believe that it's caused by an old bug in numpy. I haven't been able to find a good candidate in numpy's issue tracker, but I suspect that updating numpy (e.g. from the current source or via pip) would probably resolve the issue.

One workaround is to convert the image mode to "P" (unsigned 8-bit ints) before creating the array, then converting it back to boolean:

arr2 = np.asarray(img.convert("P")).astype(np.bool)
v = arr2[1:800,1:100].sum(axis=0)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.