0

This question appears related to this one from 2013, but it didn't help me.

I'm about to parse a large (2GB) XML file, and plan to do it with Python 3.5.2 and ElementTree. I'm new to Python, but it works well until reaching any escape character, such as:

<author>Sanjeev Sax&ouml;na</author>

returning:

test.xml
  File "<string>", line unknown
ParseError: undefined entity &ouml;: line 5, column 19enter code here

My code looks something like this:

import xml.etree.ElementTree as etree
for event, elem in etree.iterparse('test_esc.xml'):
  # do something with the node

What's the best way to deal with this? Parsing the unescaped 'ö' actually works fine:

<author>Sanjeev Saxöna</author>

Is there an easy way to programmatically unescape the whole XML file?

2

1 Answer 1

0

As suggested by the answer linked by Soulaimane Sahmi, I added an inline DTD to the XML file. It is maybe not the best solution out there, but it works for now.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.