4

I am building a project that can read text from images. I also need to determine in which color this text is written. Images are computer generated and are always consist of numbers. I am using PyTesseract for OCR detection. Can anyone suggest to me how can I do it?

Sample Image

Say for example I need information in my python code like 429.05 Green

My code is as bellow

import pytesseract
import cv2

pytesseract.pytesseract.tesseract_cmd = "C:\\Program Files\\Tesseract-OCR\\tesseract.exe"
img = cv2.imread("D:\\test2.png")
text = pytesseract.image_to_string(img)

print(text)
5
  • what is the unit for 429.05 . Commented Aug 2, 2020 at 13:23
  • It is text in sample image Commented Aug 2, 2020 at 13:39
  • you can use webcolors to get the color. ref pypi.org/project/webcolors/1.3 Commented Aug 2, 2020 at 13:43
  • Thanks AnonyMouze, can you show with python example. Image would only be having numbers and max possible colors are green, red and black and that too one color for one image Commented Aug 2, 2020 at 13:53
  • Requesting libraries/software is off-topic for StackOverflow. Commented Aug 2, 2020 at 13:57

2 Answers 2

3

This could be done with the Pillow library.

First import the required libraries and use the getcolors method to obtain the color pallet, sorting it by pixel count ascending.

from PIL import Image
i = Image.open("D:\\test2.png")

colors = sorted(i.getcolors())

For your image colors is now a list of tuples, where the first item in each tuple is the number of pixels containing said colour, and the second item is another tuple indicating the RGB colour code.

The last item in the list is that with the most pixels (white):

>>> colors[-1]
(2547, (255, 255, 255))

Second last is probably the colour you want:

>>> colors[-2]
(175, (76, 175, 80))

This can then be converted to a hex code:

>>> '#%02x%02x%02x' % colors[-2][1]
'#4caf50'

And quickly confirm with a web-based hex picker:

calculated color

This looks correct for your test image, but you may need to tweak slightly if the images you are working on vary.

Sign up to request clarification or add additional context in comments.

Comments

2

Thanks to all for support. I cropped image containing first letter then applied steps as suggested by @v25. Bellow is code.

import pytesseract
from PIL import Image


pytesseract.pytesseract.tesseract_cmd = "C:\\Program Files\\Tesseract-OCR\\tesseract.exe"
img = Image.open("D:\\test1.png")

text = pytesseract.image_to_boxes(img).split(" ")
(left, upper, right, lower) = (int(text[1]),int(text[2])-8,int(text[3]),int(text[4])+8)
im_crop = img.crop((left, upper, right, lower))
colors = sorted(im_crop.getcolors())
hex = ('#%02x%02x%02x' % colors[-2][1])
color = None
if (hex == '#91949a'):
    color = "Black"
elif ( hex == '#4caf50'):
    color = "Green"
elif ( hex == '#ff9d9d'):
color= "Red"
number = pytesseract.image_to_string(img)
print("Number is: "+number+" Color is: "+color)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.