8

If I try to parse a broken XML the exception shows the line number. Is there a way to show the XML context?

I want to see the xml tags before and after the broken part.

Example:

import xml.etree.ElementTree as ET
tree = ET.fromstring('<a><b></a>')

Exception:

Traceback (most recent call last):
  File "tmp/foo.py", line 2, in <module>
    tree = ET.fromstring('<a><b></a>')
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1300, in XML
    parser.feed(text)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: mismatched tag: line 1, column 8

Something like this would be nice:

ParseError:
<a><b></a>
=====^

2 Answers 2

16

You could make a helper function to do this:

import sys
import io
import itertools as IT
import xml.etree.ElementTree as ET
PY2 = sys.version_info[0] == 2
StringIO = io.BytesIO if PY2 else io.StringIO

def myfromstring(content):
    try:
        tree = ET.fromstring(content)
    except ET.ParseError as err:
        lineno, column = err.position
        line = next(IT.islice(StringIO(content), lineno))
        caret = '{:=>{}}'.format('^', column)
        err.msg = '{}\n{}\n{}'.format(err, line, caret)
        raise 
    return tree

myfromstring('<a><b></a>')

yields

xml.etree.ElementTree.ParseError: mismatched tag: line 1, column 8
<a><b></a>
=======^
Sign up to request clarification or add additional context in comments.

4 Comments

up-vote for using err.position didn't know about that.
@KobiK: I didn't know either, but a good introspection tool, like IPython, comes in handy to discover what's available in objects like err.
Thank you very much. Works nice.
In Python 3, you wil lneed to replace io.BytesIO with io.StringIO
2

It's not the best option but it's easy and simple, you can just parse the ParseError Extract the line and column and then use it to show where is the problem.

import xml.etree.ElementTree as ET
from xml.etree.ElementTree import ParseError
my_string = '<a><b><c></b></a>'
try:
    tree = ET.fromstring(my_string)
except ParseError as e:
    formatted_e = str(e)
    line = int(formatted_e[formatted_e.find("line ") + 5: formatted_e.find(",")])
    column = int(formatted_e[formatted_e.find("column ") + 7:])
    split_str = my_string.split("\n")
    print "{}\n{}^".format(split_str[line - 1], len(split_str[line - 1][0:column])*"-")

Note: the \n is just for the example you need to split it the right way.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.