0

I have large file 9600x7000 pixel jpg file I am trying to see if I can do a edge detection. I tried loading the large (25Mb) file using:

from PIL import Image
image = Image.open("C:\\pathtofile\\test-tac.jpg")
image.show()

However python interpreter will crash. I am using Pycharm running Python 2.7.

So, I used a GDAL (used for large GEO refererencing files) to load the file. It will load the file into memory without any problem.

#reference http://www.gdal.org/gdal_tutorial.html
import gdal
from gdalconst import *

dataset = gdal.Open("C:\\pathtofile\\test-tac.jpg", GA_ReadOnly )
if dataset is None:
   print "error loading file in gdal"

This will load file. However, I am trying to run following edge detection on it:

from matplotlib import pyplot as plt

from skimage import data
from skimage.feature import corner_harris, corner_subpix, corner_peaks
from skimage.transform import warp, AffineTransform
from skimage.draw import ellipse

# running corner Harris on the image object to detect image corners. 
#(reference http://scikit-image.org/docs/dev/auto_examples/plot_corner.html)
coords = corner_peaks(corner_harris(image), min_distance=3) #5
coords_subpix = corner_subpix(image, coords, window_size=13)

plt.gray()
plt.imshow(image, interpolation='nearest')
plt.plot(coords[:, 1], coords[:, 0], '.b', markersize=9)  # dots
plt.plot(coords_subpix[:, 1], coords_subpix[:, 0], '+r', markersize=15) # +
plt.plot(coords_subpix[:, 1][1], coords_subpix[:, 0][1], '*r', markersize=20)  #X_Point1=Subpix[:,1][1], Y_Point1=Subpix[:,0][1]

N=len(coords_subpix[:,0])
labels = ['point{0}'.format(i) for i in range(N)]

#Label corners in image
for label, x, y in zip(labels, coords_subpix[:,1], coords_subpix[:,0]):
   plt.annotate(label,
    xy=(x,y), xytext = (-10,10),
    textcoords = 'offset points', ha = 'right', va = 'bottom',
    bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),
    arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))

   plt.axis((0, 9672, 7272, 0))            # (Y_start, Y_Stop, X_Stop, X_Start) ((0, 9672, 7272, 0))
   plt.show()

This would work if I generate image using following code:

 tform = AffineTransform(scale=(1.3, 1.1), rotation=1, shear=0.8,
                    translation=(210, 50))
 image = warp(data.checkerboard(), tform.inverse, output_shape=(350, 350))
 rr, cc = ellipse(310, 175, 10, 100)
 image[rr, cc] = 1
 image[180:230, 10:60] = 1
 image[230:280, 60:110] = 1

My problem is I am not understanding Python much about the data format from the 'image' variable versus dataset variable generated by GDAL. My end goal is to be able to run edge detection on large (10000x7000) pixel jpg image using Python scikit-image library. If there is better way that GDAL to read large jpg images I am open to it.

If I set:

image=dataset

and run it, I get following error:

coords = corner_peaks(corner_harris(image), min_distance=3) #5
 File "C:\Python27\lib\site-packages\skimage\feature\corner.py", line 171, in corner_harris
 Axx, Axy, Ayy = _compute_auto_correlation(image, sigma)
 File "C:\Python27\lib\site-packages\skimage\feature\corner.py", line 54, in _compute_auto_correlation
 if image.ndim == 3:
 AttributeError: 'Dataset' object has no attribute 'ndim'

This error message points that I am not understanding the datatype between dataset and image variables.

type(dataset)

Gives:

<class 'osgeo.gdal.Dataset'>

and

type(image)

Gives:

(350,350) float64.

For your large source file use: http://www.lib.utexas.edu/maps/tpc/txu-pclmaps-oclc-22834566_a-2c.jpg to give it a try.

1
  • Have you tried using the scipy.ndimage library? Commented Dec 20, 2013 at 21:48

2 Answers 2

2

All scikit-image algorithms require Numpy arrays as input. You therefore need to convert your dataset variable to an ndarray. The easiest way to do this is to use the gdal plugin to read the file (or look at the plugin source -- it shows how to do the conversion).

Sign up to request clarification or add additional context in comments.

2 Comments

If GDAL can open the file with ds = gdal.Open() you can simply do image = ds.ReadAsArray(). It will give you a Numpy array, i think the layout will be [bands, y, x], so perhaps some reshaping is needed.
@Rutger Kassies Please provide a working example with Numpy array and gdal.Open(). I am new thins python image processing.
0
import cv2
image = cv2.imread('txu-pclmaps-oclc-22834566_a-2c.jpg')

opencv can load the image without a problem. Although i suspect loading the image was not the problem. The first step of your algorithm was peeking at around 6 gigs of memory usage. So if you're not on a 64bit version of python it will crash. Also there looks to be something wrong with your code. When i tried to run it, it failed at the second function.

ValueError: operands could not be broadcast together with shapes (0,13) (13,13)

This was with the (8003, 10859, 3) image. I also tried with just one channel with the same error message.

2 Comments

I tried this on my machine with 8GB of memory. I am getting MemoryError. I need help using the GDAL library for large files. How do I check if my python is 32 bit vs 64 bit?
open the command line and type python. The first lines printed out should tell you the python version and architecture.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.