1

I am sniffing packets on the network and recovering the XML data from the raw payload using Scapy and Python. The XML data I get has a few tags missing when I assemble the frames.Thus, I cannot parse the XML file using etree.parse() function. Is there any method by which I can parse the broken XML file and use XPATH expressions to traverse and get the data I want.

1 Answer 1

2

I'm sure my solution is far too simple to cover all cases, but it should be able to cover simple cases when closing tags are missing:

>>> def fix_xml(string):
    """
    Tries to insert missing closing XML tags
    """
    error = True
    while error:
        try:
            # Put one tag per line
            string = string.replace('>', '>\n').replace('\n\n', '\n')
            root = etree.fromstring(string)
            error = False
        except etree.XMLSyntaxError as exc:
            text = str(exc)
            pattern = "Opening and ending tag mismatch: (\w+) line (\d+) and (\w+), line (\d+), column (\d+)"
            m = re.match(pattern, text)
            if m:
                # Retrieve where error took place
                missing, l1, closing, l2, c2 = m.groups()
                l1, l2, c2 = int(l1), int(l2), int(c2)
                lines = string.split('\n')
                print 'Adding closing tag <{0}> at line {1}'.format(missing, l2)
                missing_line = lines[l2 - 1]
                # Modified line goes back to where it was
                lines[l2 - 1] = missing_line.replace('</{0}>'.format(closing), '</{0}></{1}>'.format(missing, closing))
                string = '\n'.join(lines)
            else:
                raise
    print string

This seems to add correctly missing tags B and C:

>>> s = """<A>
  <B>
    <C>
  </B>
  <B></A>"""
>>> fix_xml(s)
Adding closing tag <C> at line 4
Adding closing tag <B> at line 7
<A>
  <B>
    <C>
  </C>
</B>
  <B>
</B>
</A>
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.