I am trying to parse information from XML file using Python's xml module. Problem is that when I specify list of files and start parsing strategy, after first file being (supposedly) successfully parsed, I am getting following error:
Parsing 20586908.xml ..
Parsing 20586934.xml ..
Traceback (most recent call last):
File "<ipython-input-72-0efdae22e237>", line 11, in parse
xmlTree = ET.parse(xmlFilePath, parser = self.parser)
File "C:\Users\StefanCepa995\miniconda3\envs\dl4cv\lib\xml\etree\ElementTree.py", line 1202, in parse
tree.parse(source, parser)
File "C:\Users\StefanCepa995\miniconda3\envs\dl4cv\lib\xml\etree\ElementTree.py", line 601, in parse
parser.feed(data)
xml.etree.ElementTree.ParseError: parsing finished: line 1755, column 0
Here is the code I am using to parse XML files:
class INBreastXMLParser:
def __init__(self, xmlRootDir):
self.parser = ET.XMLParser(encoding="utf-8")
self.xmlAnnotations = [os.path.join(root, f)
for root, dirs, files in os.walk(xmlRootDir)
for f in files if f.endswith('.xml')]
def parse(self):
for xmlFilePath in self.xmlAnnotations:
logger.info(f"Parsing {os.path.basename(xmlFilePath)} ..")
try:
xmlTree = ET.parse(xmlFilePath, parser = self.parser)
root = xmlTree.getroot()
except Exception as err:
logging.error(f"Could not parse {xmlFilePath}. Reason - {err}")
traceback.print_exc()
And here is the screenshot of the part of the file where parsing fails:
