Detecting text in an image using python

Question

I have around 100+ images with 2 different texts on it. The images are below. one is occupied and the other is unoccupied.

So is there any way in python to differentiate these images using some code to detect the text in it?

If so I wanted to identify the occupied images and delete unoccupied images. Since I am new to python can anyone help me in doing this?

This is a very complicated task to accomplish, moreso if you are completely new to python. I would recommend checking out opencv for image detection. — Aeolus
– Aeolus, Commented Aug 10, 2018 at 4:23
@The For someone actually called Tesseract, I'm surprised you didn't recommend Google Tesseract character recognition. — Alistair Carscadden
– Alistair Carscadden, Commented Aug 10, 2018 at 4:25
In addition to software systems like opencv (or tesseract-ocr), You will need to "clean" the image - in this case cropping so it's just the text, and making an as-clear-as-possible image. That is complicated too. — rask004
– rask004, Commented Aug 10, 2018 at 4:27
@AlistairCarscadden Perhaps I was a little to severe in my comment. Tesseract seems pretty easy to use even for a beginner. I falsely assumed text image recognition would be beyond the abilities of a new programmer. Apologies. — Aeolus
– Aeolus, Commented Aug 10, 2018 at 4:29
I was just making a joke about your name, as well as recommending the software to the asker. Here's a link to the pytesseract project. — Alistair Carscadden
– Alistair Carscadden, Commented Aug 10, 2018 at 4:30

kavko · Accepted Answer · 2018-08-11 00:18:49Z

This answer is based on the assumption that that there are only two different texts on the images as you posted in the question. So I assume that the number of characters and the color of the text is always the same ("Room status: Unoccupied" and "Room status" Occupied" in red color). That being said, I would try a more simple way to differentiate between these two different types. These images contain caracters that are very near to each other so in my opinion is that it would be very difficult to seperate each character and identify it with an OCR. I would try a more simple approach like finding the area containg the text and find the pure lenght of the text - "unoccupied" has two more characters in the text as "occupied" and hence has a bigger distance in lenght. So you can transform the image to HSV color space and use the cv2.inRange() function to extract the text (red color). Then you can merge the characters to one contour with cv2.morphologyEx() and get its lenght with cv2.minAreaRect(). Hope it helps or at least gives you a new perspective on how to find your solution. Cheers!

Example code:

import cv2
import numpy as np

# Read the image and transform to HSV colorspace.
img = cv2.imread('ocupied.jpg')
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

# Extract the red text.
lower_red = np.array([0,150,50])
upper_red = np.array([40,255,255])
mask_red = cv2.inRange(hsv, lower_red, upper_red)

# Search for contours on the mask.
_, contours, hierarchy = cv2.findContours(mask_red,cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)

# Create a new mask for further processing.
mask = np.ones(img.shape, np.uint8)*255

# Draw contours on the mask with size and ratio of borders for threshold (to remove other noises from the image).
for cnt in contours:
    size = cv2.contourArea(cnt)
    x,y,w,h = cv2.boundingRect(cnt)
    if 10000 > size > 50 and w*2.5 > h:
        cv2.drawContours(mask, [cnt], -1, (0,0,0), -1)

# Connect neighbour contours and select the biggest one (text).
kernel = np.ones((50,50),np.uint8)
opening = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
gray_op = cv2.cvtColor(opening, cv2.COLOR_BGR2GRAY)
_, threshold_op = cv2.threshold(gray_op, 150, 255, cv2.THRESH_BINARY_INV)
_, contours_op, hierarchy_op = cv2.findContours(threshold_op, cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
cnt = max(contours_op, key=cv2.contourArea)

# Create rotated rectangle to get the 4 points of the rectangle.
rect = cv2.minAreaRect(cnt)

# Create bounding and calculate the "lenght" of the text.
box = cv2.boxPoints(rect)
a, b, c, d = box = np.int0(box)
bound =[]
bound.append(a)
bound.append(b)
bound.append(c)
bound.append(d)
bound = np.array(bound)
(x1, y1) = (bound[:,0].min(), bound[:,1].min())
(x2, y2) = (bound[:,0].max(), bound[:,1].max())

# Draw the rectangle.
cv2.rectangle(img,(x1,y1),(x2,y2),(0,255,0),1)

# Identify the room status.   
if x2 - x1 > 200:
    print('unoccupied')
else:
    print('occupied')

# Display the result
cv2.imshow('img', img)

Result:

occupied

unoccupied

mahesh · Accepted Answer · 2018-08-10 07:59:04Z

2

Using the tesseract OCR Engine and the python wrapper pytesseract, it is only a few lines' task:

import pytesseract
from PIL import Image

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe"
img = Image.open('D:\\tmp2.jpg').crop((0,0,250,35))
print(pytesseract.image_to_string(img, config='--psm 7'))

I have tested this on Windows 7. Of course, I have assumed that the text appears at the same position in every image (from your example, it does seem to be the case). Else, you need to find a better cropping mechanism.

answered Aug 10, 2018 at 7:59

mahesh

1,09812 silver badges27 bronze badges

Collectives™ on Stack Overflow

Detecting text in an image using python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related