How do I read color of text from image in Python

Question

I am building a project that can read text from images. I also need to determine in which color this text is written. Images are computer generated and are always consist of numbers. I am using PyTesseract for OCR detection. Can anyone suggest to me how can I do it?

Sample Image

Say for example I need information in my python code like 429.05 Green

My code is as bellow

import pytesseract
import cv2

pytesseract.pytesseract.tesseract_cmd = "C:\\Program Files\\Tesseract-OCR\\tesseract.exe"
img = cv2.imread("D:\\test2.png")
text = pytesseract.image_to_string(img)

print(text)

you can use webcolors to get the color. ref pypi.org/project/webcolors/1.3 — AnonyMouze
– AnonyMouze, Commented Aug 2, 2020 at 13:43
Thanks AnonyMouze, can you show with python example. Image would only be having numbers and max possible colors are green, red and black and that too one color for one image — Gaurav Shah
– Gaurav Shah, Commented Aug 2, 2020 at 13:53
Requesting libraries/software is off-topic for StackOverflow. — DisappointedByUnaccountableMod
– DisappointedByUnaccountableMod, Commented Aug 2, 2020 at 13:57

v25 · Accepted Answer · 2020-08-02 14:11:54Z

This could be done with the Pillow library.

First import the required libraries and use the getcolors method to obtain the color pallet, sorting it by pixel count ascending.

from PIL import Image
i = Image.open("D:\\test2.png")

colors = sorted(i.getcolors())

For your image colors is now a list of tuples, where the first item in each tuple is the number of pixels containing said colour, and the second item is another tuple indicating the RGB colour code.

The last item in the list is that with the most pixels (white):

>>> colors[-1]
(2547, (255, 255, 255))

Second last is probably the colour you want:

>>> colors[-2]
(175, (76, 175, 80))

This can then be converted to a hex code:

>>> '#%02x%02x%02x' % colors[-2][1]
'#4caf50'

And quickly confirm with a web-based hex picker:

This looks correct for your test image, but you may need to tweak slightly if the images you are working on vary.

Gaurav Shah · Accepted Answer · 2020-08-02 15:44:45Z

Thanks to all for support. I cropped image containing first letter then applied steps as suggested by @v25. Bellow is code.

import pytesseract
from PIL import Image


pytesseract.pytesseract.tesseract_cmd = "C:\\Program Files\\Tesseract-OCR\\tesseract.exe"
img = Image.open("D:\\test1.png")

text = pytesseract.image_to_boxes(img).split(" ")
(left, upper, right, lower) = (int(text[1]),int(text[2])-8,int(text[3]),int(text[4])+8)
im_crop = img.crop((left, upper, right, lower))
colors = sorted(im_crop.getcolors())
hex = ('#%02x%02x%02x' % colors[-2][1])
color = None
if (hex == '#91949a'):
    color = "Black"
elif ( hex == '#4caf50'):
    color = "Green"
elif ( hex == '#ff9d9d'):
color= "Red"
number = pytesseract.image_to_string(img)
print("Number is: "+number+" Color is: "+color)

Collectives™ on Stack Overflow

How do I read color of text from image in Python

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related