24

I have an .xls Excel file with cells having background colors. I am reading that file into pandas with read_excel. Is there any way to get the background colors of cells?

2
  • You may want to consider dropping down into xlrd (which Pandas uses) to get the colour information. Commented Dec 17, 2017 at 16:28
  • Will try and get back here if I get lucky! Commented Dec 17, 2017 at 16:30

4 Answers 4

15

The Solution suggested above works only for xls file, not for xlsx file. This raises a NotImplementedError: formatting_info=True not yet implemented. Xlrd library is still not updated to work for xlsx files. So you have to Save As and change the format every time which may not work for you.
Here is a solution for xlsx files using openpyxl library. A2 is the cell whose color code we need to find out.

import openpyxl
from openpyxl import load_workbook
excel_file = 'color_codes.xlsx' 
wb = load_workbook(excel_file, data_only = True)
sh = wb['Sheet1']
color_in_hex = sh['A2'].fill.start_color.index # this gives you Hexadecimal value of the color
print ('HEX =',color_in_hex) 
print('RGB =', tuple(int(color_in_hex[i:i+2], 16) for i in (0, 2, 4))) # Color in RGB
Sign up to request clarification or add additional context in comments.

4 Comments

In my case this approach was giving inaccurate results (i.e. not capturing colors correctly).
I checked it for my project, it worked just fine. Can you share your example ?
Unfortunately excel has multiple ways of colouring in cells. This only finds one of them.
This was useful openpyxl.readthedocs.io/en/stable/… -- for understanding non-hex (indexed) results which were labeled in this link as legacy support.
13

Brute-forced it through xlrd, as per Mark's suggestion:

from xlrd import open_workbook
wb = open_workbook('wb.xls', formatting_info=True)
sheet = wb.sheet_by_name("mysheet")
#create empy colormask matrix
bgcol=np.zeros([sheet.nrows,sheet.ncols])
#cycle through all cells to get colors
for row in range(sheet.nrows):
  for column in range(sheet.ncols):
    cell = sheet.cell(row, column)  
    fmt = wb.xf_list[cell.xf_index]
    bgcol[row,column]=fmt.background.background_colour_index
#return pandas mask of colors
colormask=pd.DataFrame(bgcol) 

Yet, there must be a better way thorugh pandas directly...

3 Comments

Unfortunately formatting_info=True only works on .xls but not on .xlsx files :-( A good hint anyway - thnks!
Unfortunately it does not work properly for me with xls either. I have several background colours and the matrix results in all cells having colour index 64..
@JoePhi Please look at my solution below. It works for XLSX file. It gives the color code, both in Hex and RGB tuple.
6

Improving on Sumit's answer (which should be the accepted one in my opinion), you can obtain the color for the whole column by using list comprehension:

import openpyxl
from openpyxl import load_workbook
excel_file = 'my_file.xlsx' 
wb = load_workbook(excel_file, data_only = True)
sh = wb['my_sheet']
# extract color from column A.
color_in_hex = [cell.fill.start_color.index for cell in sh['A:A']]

Comments

1

I edited the code snippet from @csaladenes's response above based on this link, and it works for my xls file (the original resulted in all cells showing the same color index, though they have different background colors):

import xlrd
import numpy as np
wb = xlrd.open_workbook(file, formatting_info=True)
sheet = wb.sheet_by_name("mysheet")
bgcol=np.zeros([sheet.nrows,sheet.ncols])
for row in range(sheet.nrows):
    for col in range(sheet.ncols):
        c = sheet.cell(row, col)
        cif = sheet.cell_xf_index(row, col)
        iif = wb.xf_list[cif]
        cbg = iif.background.pattern_colour_index
        bgcol[row,col] = cbg

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.