1

I have an XML, with one of the nodes having '&' within a string:

<uid>JAMES&001</uid>

now, when I try to read the whole xml using the following code:

tree = et.parse(fileName)
root = tree.getroot()
ids = root.findall("uid")

I get the error on the link of the above-mentioned node:

xml.etree.ElelmentTree.ParseError: not well-formed (invalid token): line17, column 21

The code works fine on other instances where there is no '&'. I guess it's breaking the string.

Can it be fixed with encoding? How? I searched through other questions but couldn't find an answer.

TIA

1
  • 1
    No, you don't have an XML. You have a not-XML. Commented Sep 22, 2022 at 20:11

1 Answer 1

1

You need to sanitize your xml first since it isn't well formed.

You need to replace the offending & - something like .replace("&", "&amp;")

One way to use it:

with open(fileName, 'r+') as f:
        read_data = f.read()
        doc = ET.fromstring(read_data.replace("&", "&amp;"))
        print(doc.find('./uid').text)

Output, given your sample, should be

JAMES&001
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.