I have several excel files that use lots of comments for saving information. For example, one cell has value 2 and there is a comment attached to the cell saying "2008:2#2009:4". it seems that value 2 is for the current year (2010) value. The comment keeps all previous year values separated by '#'. I would like to create a dictionary to keep all this info like {2008:2, 2009:4, 2010:2} but I don't know how to parse (or read) this comment attached to the cell. Python excel readin module has this function (reading in comment)?
3 Answers
You can do this without an Excel COM object using openpyxl:
from openpyxl import load_workbook
workbook = load_workbook('/tmp/data.xlsx')
first_sheet = workbook.get_sheet_names()[0]
worksheet = workbook.get_sheet_by_name(first_sheet)
for row in worksheet.iter_rows():
for cell in row:
if cell.comment:
print(cell.comment.text)
The parsing of the comments itself can be done the same as with Steven Rumbalski's answer.
(example adapted from here)
1 Comment
Normally for reading from Excel, I would suggest using xlrd, but xlrd does not support comments. So instead use the Excel COM object:
from win32com.client import Dispatch
xl = Dispatch("Excel.Application")
xl.Visible = True
wb = xl.Workbooks.Open("Book1.xls")
sh = wb.Sheets("Sheet1")
comment = sh.Cells(1,1).Comment.Text()
And here's how to parse the comment:
comment = "2008:2#2009:4"
d = {}
for item in comment.split('#'):
key, val = item.split(':')
d[key] = val
Often, Excel comments are on two lines with the first line noting who created the comment. If so your code would look more like this:
comment = """Steven:
2008:2#2009:4"""
_, comment = comment.split('\n')
d = {}
for item in comment.split('#'):
key, val = item.split(':')
d[key] = val
2 Comments
After running the last posted code here, can you store that information later in a word document?
from openpyxl import load_workbook
workbook = load_workbook('/tmp/data.xlsx')
first_sheet = workbook.get_sheet_names()[0]
worksheet = workbook.get_sheet_by_name(first_sheet)
for row in worksheet.iter_rows():
for cell in row:
if cell.comment:
print(cell.comment.text)