26

Trying to parse XML file into ElementTree:

>>> import xml.etree.cElementTree as ET
>>> tree = ET.ElementTree(file='D:\Temp\Slikvideo\JPEG\SV_4_1_mask\index.xml')

I get following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Program Files\Anaconda2\lib\xml\etree\ElementTree.py", line 611, in __init__
    self.parse(file)
  File "<string>", line 38, in parse
ParseError: junk after document element: line 3, column 0

XML file starts like this:

<?xml version="1.0" encoding="UTF-8" ?>
<Version Writer="E:\d\src\Modules\SceneSerialization\src\mitkSceneIO.cpp" Revision="$Revision: 17055 $" FileVersion="1" />
<node UID="OBJECT_2016080819041580480127">
    <source UID="OBJECT_2016080819041550469454" />
    <data type="LabelSetImage" file="hfbaaa_Bolus.nrrd" />
    <properties file="sicaaa" />
</node>
<node UID="OBJECT_2016080819041512769572">
    <source UID="OBJECT_2016080819041598947781" />
    <data type="LabelSetImage" file="ifbaaa_Bolus.nrrd" />
    <properties file="ticaaa" />
</node>

followed by many more nodes.

I do not see any junk in line 3, column 0? I assume there must be another reason for the error.

The .xml file is generated by external software MITK so I assume that should be ok.

Working on Win 7, 64 bit, VS2015, Anaconda

4
  • That XML isn't well-formed. There is no root element that contains all other elements. Commented Aug 9, 2016 at 14:36
  • Unrelated to the question, you should consider either escaping the Windows path string literal ("...\\...") or use raw strings (r"...\..."). Commented Aug 9, 2016 at 14:39
  • @Martin: thanks, agree. Done that in other parts of the code. Commented Aug 9, 2016 at 14:41
  • 1
    In my case, the simple solution was embedding the tree caller in a try: ... / except: pass block, for anyone who simply does not care about one out of 100s of files. :)) Commented Nov 20, 2020 at 15:16

3 Answers 3

41

As @Matthias Wiehl said, ElementTree expects only a single root node and is not well-formed XML, which should be fixed at its origin. As a workaround you can add a fake root node to the document.

import xml.etree.cElementTree as ET
import re

with open("index.xml") as f:
    xml = f.read()
tree = ET.fromstring(re.sub(r"(<\?xml[^>]+\?>)", r"\1<root>", xml) + "</root>")
Sign up to request clarification or add additional context in comments.

1 Comment

Martin, that's an elegant fix. This works when importing etree.ElementTree, if I use the cEmelentTree I get an error in cElementTree.py un(shallow)copyable object of type <type 'Element'>. I need to figure out why.
3

The root node of your document (Version) is opened and closed on line 2. The parser does not expect any nodes after the root node. Solution is to remove the closing forward slash.

2 Comments

Assuming I need to parse this file (I cannot generate a different format), what would be a quick fix? Copy the file and create a dummy that is properly formatted and then parse that? What should I change? Should I put the closing forward slash at the end of the document?
As was pointed out correctly, the document is not well-formed. The software that generated it is broken. You should file a bug report.
0

Try repairing the document like this. Close the version element at the end

<?xml version="1.0" encoding="UTF-8" ?>
<Version Writer="E:\d\src\Modules\SceneSerialization\src\mitkSceneIO.cpp" Revision="$Revision: 17055 $" FileVersion="1">
    <node UID="OBJECT_2016080819041580480127">
        <source UID="OBJECT_2016080819041550469454" />
        <data type="LabelSetImage" file="hfbaaa_Bolus.nrrd" />
        <properties file="sicaaa" />
    </node>
    <node UID="OBJECT_2016080819041512769572">
        <source UID="OBJECT_2016080819041598947781" />
        <data type="LabelSetImage" file="ifbaaa_Bolus.nrrd" />
        <properties file="ticaaa" />
    </node>
</Version>

1 Comment

This solution certainly works, but doing so is semantically wrong.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.