Trying to extract the number from columns in an Excel file, and write them into the next columns.
Matching criteria: any number of length five, either started with “PB” or not
I’ve limited the length of the number match to five however there are a “16” extracted (row#2, column D)
How I can improve it? Thank you.
import xlwt, xlrd, re
from xlutils.copy import copy
workbook = xlrd.open_workbook("C:\\Documents\\num.xlsx")
old_sheet = workbook.sheet_by_name("Sheet1")
wb = copy(workbook)
sheet = wb.get_sheet(0)
number_of_ships = old_sheet.nrows
for row_index in range(0, old_sheet.nrows):
Column_a = old_sheet.cell(row_index, 0).value
Column_b = old_sheet.cell(row_index, 1).value
a_b = Column_a + Column_b
found_PB = re.findall(r"[PB]+(\d{5})", a_b, re.I)
list_of_numbers = re.findall(r'\d+', a_b)
for f in found_PB:
if len(f) == 5:
sheet.write(row_index, 2, "";"".join(found_PB))
for l in list_of_numbers:
if len(l) == 5:
sheet.write(row_index, 3, "";"".join(list_of_numbers))
wb.save("C:\\Documents\\num-1.xls")

\d+, it will just extract 1+ digit chunks, so you have not restricted anything. If you need numbers afterPB, writePB, not[PB](a character class matching eitherPorB).PB+does not work the way you think. It will match PB, PBB, PBBB, PBBBB, etc, and cannot match a number unless it starts with PB (or PBBBB,...) The+affects the previous character or group. If you want to modify both letter you may wrap them inside a group(?:PB). Also+means 1 or more times. You'll probably want*(0 or more times) or even?(0 or 1 times)