In my S1000D xml, it specifies a DOCTYPE with a reference to a public URL that contains references to a number of other files that contain all the valid character entities. I've used xml.etree.ElementTree and lxml to try to parse it and get a parse error with both indicating:
undefined entity −: line 82, column 652
Even though − is a valid entity according to the ENTITY Reference specfied.
The xml top is as follow:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE dmodule [
<!ENTITY % ISOEntities PUBLIC 'ISO 8879-1986//ENTITIES ISO Character Entities 20030531//EN//XML' 'http://www.s1000d.org/S1000D_4-1/ent/ISOEntities'>
%ISOEntities;]>
If you go out and get http://www.s1000d.org/S1000D_4-1/ent/ISOEntities, it will include 20 other ent files with one called iso-tech.ent which contains the line:
<!ENTITY minus "−"> <!-- MINUS SIGN -->
in line 82 of the xml file near column 652 is the following:
....Refer to 70−41....
How can I run a python script to parse this file without get the undefined entity?
Sorry I don't want to specify parser.entity['minus'] = chr(2212) for example. I did that for a quick fix but there are many character entity references.
I would like the parser to check Entity reference that is specified in the xml.
I'm surprised but I've gone around the sun and back and haven't found how to do this (or maybe I have but couldn't follow it).
if I update my xml file and add
<!ENTITY minus "−">
It won't fail, so It's not the xml.
It fails on the parse. Here's code I use for ElementTree
fl = os.path.join(pth, fn)
try:
root = ET.parse(fl)
except ParseError as p:
print("ParseError : ", p)
Here's the code I use for lxml
fl = os.path.join(pth, fn)
try:
parser = etree.XMLParser(load_dtd=True, resolve_entities=True)
root = etree.parse(fl, parser=parser)
except etree.XMLSyntaxError as pe:
print("lxml XMLSyntaxError: ", pe)
I would like the parser to load the ENTITY reference so that it knows that − and all the other character entities specified in all the files are valid entity characters.
Thank you so much for your advice and help.