Why does Python crash when I try to sum this numpy array?

Question

I'm working on Ubuntu 14.04 with Python 3.4 (Numpy 1.9.2 and PIL.Image 1.1.7). Here's what I do:

>>> from PIL import Image
>>> import numpy as np

>>> img = Image.open("./tifs/18015.pdf_001.tif")
>>> arr = np.asarray(img)
>>> np.shape(arr)
(5847, 4133)

>>> arr.dtype
dtype('bool')

# all of the following four cases where I incrementally increase
# the number of rows to 700 are done instantly
>>> v = arr[1:100,1:100].sum(axis=0)
>>> v = arr[1:500,1:100].sum(axis=0)
>>> v = arr[1:600,1:100].sum(axis=0)
>>> v = arr[1:700,1:100].sum(axis=0)

# but suddenly this line makes Python crash
>>> v = arr[1:800,1:100].sum(axis=0)

fish: Job 1, “python3” terminated by signal SIGSEGV (Address boundary error)

Seems to me like Python runs out of memory all of a sudden. If that is the case - how can I allocate more memory to Python? As I can see from htop my 32GB memory capacity is not even remotely depleated.

You may download the TIFF image here.

If I create an empty boolean array, set the pixels explicitely and then apply the summation - then it works:

>>> arr = np.empty((h,w), dtype=bool)
>>> arr.setflags(write=True)

>>> for r in range(h):
>>>     for c in range(w):
>>>         arr.itemset((r,c), img.getpixel((c,r)))

>>> v=arr.sum(axis=0)

>>> v.mean()
5726.8618436970719

>>> arr.shape
(5847, 4133)

But this "workaround" is not very satisfactory as copying every pixel takes way too long - maybe there is a faster method?

A segmentation fault always indicates a bug. Even if Python were running out of memory, it would be a bug for it to crash with a segmentation fault instead of throwing an out of memory error. — John Bollinger
– John Bollinger, Commented Mar 17, 2015 at 18:00
It is conceivable that you are running out of stack space. That you do not do so in the 10000 x 10000 random case could point to a difference in the algorithm used for array sections vs. the one used for whole arrays. If a recursive algorithm were used for sections, then an array section with many discontinuous segments might recurse too deeply and exhaust the stack. This is all speculative, of course. — John Bollinger
– John Bollinger, Commented Mar 17, 2015 at 18:09
The first case will also crash when not sectioned and the second case will not crash also when sectioned. — Raffael
– Raffael, Commented Mar 17, 2015 at 18:10
My best guess, then, is that numpy.asarray() is generating an array backed by part or all of the Image object (as opposed to copying all the pixel values to a separate internal representation), and numpy and PIL disagree about some aspect of the expected behavior of the Image (or perhaps PIL is just buggy). You could probe and/or work around that by manually extracting a pixel raster from the Image object, and building your numpy array around that. — John Bollinger
– John Bollinger, Commented Mar 17, 2015 at 18:33
Could you tell us which versions of PIL and numpy you are using? — ali_m
– ali_m, Commented Mar 17, 2015 at 18:37

ali_m · Accepted Answer · 2015-03-17 20:20:31Z

3

I can reproduce your segfault using numpy v1.8.2/PIL v1.1.7 installed from the Ubuntu repositories.

If I install numpy 1.8.2 in a virtualenv using pip (still using PIL v1.7.1 from the Ubuntu repos) then I no longer see the segfault.
If I do the opposite (installing PIL v1.1.7 using pip, and using numpy v1.8.2 from the Ubuntu repos), I still get the segfault.

This leads me to believe that it's caused by an old bug in numpy. I haven't been able to find a good candidate in numpy's issue tracker, but I suspect that updating numpy (e.g. from the current source or via pip) would probably resolve the issue.

One workaround is to convert the image mode to "P" (unsigned 8-bit ints) before creating the array, then converting it back to boolean:

arr2 = np.asarray(img.convert("P")).astype(np.bool)
v = arr2[1:800,1:100].sum(axis=0)

answered Mar 17, 2015 at 20:20

ali_m

74.6k28 gold badges230 silver badges315 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Why does Python crash when I try to sum this numpy array?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related