Data: I have a fairly large excel file with more than 20 columns. Each cell contains comments.
Desired Goal: I am trying to read all the comments from column M named 'Engine' from the first row till the last row.
Desired Output: I want to extract all the comments in column M and save them in a list or pandas data frame.
Below is what I tried after reading others' threads:
# load the worksheet for interation
from win32com.client import Dispatch
xlApp = Dispatch("Excel.Application")
workbook = xlApp.Workbooks.Open('My_large_data_file.xls')
worksheet = workbook.Sheets('Mysheet')
# get the row counts for iteration
from openpyxl import load_workbook
wb = load_workbook('My_large_data_file.xls', read_only=True)
sheet = wb.get_sheet_by_name('Mysheet')
row_count = sheet.max_row
comments = []
# iteration
for i in range(2, row_count + 1): # first row is column names
print(i)
comment = worksheet.Cells(i, 13).Comment.Text() # Column M = #13
comments.append(comment)
However, this method only works for cells whose comments are visiable by default. If a cell's comment is invisible, it is read as a NoneType. Then I get error like this:
Traceback (most recent call last):
File "<ipython-input-64-dead2ed27460>", line 5, in <module>
comment = worksheet.Cells(i, 13).Comment.Text() # Column M = #13
AttributeError: 'NoneType' object has no attribute 'Text'
Problem:
1) How can I set all the cells' comments visible so that I can extract them? I am not sure if it needs to apply some VBA code in python.
2) My current codes are not efficient. Especially I am dealing with 60+ such excel files and each contains 70000+ rows. Any suggestions to improve it?
Thanks in advance!
#####################################
There are several status of comments in excel files :
- completely hidden without indicator - (double click triggers comments to display)
- hidden with a red indicator - (mouse hover triggers comments to display)
- displayed.
worksheet.Cells(i, j).Comment.Text()
This method works fine for #2 and #3 cases. But it is not working for #1 hidden without indicator case.
iter_rowsmethod on read-only mode.Cells(i, j).Comment.Text()works fine for me even when comments are hidden. What did you do to make them invisible enough forCommentto becomeNone?