0

My job is to parse XML files and retrieve various reports. I also create and edit XML files using etree in Python. Most of the time, i am stuck in files with custom entities like mdash, nbsp, and so on.

I browsed and found one solution mentioned here Python ElementTree support for parsing unknown XML entities?

So i added the entity definition [!ENTITY nbsp " ] and worked on it.It works but i need to read them as string, add the entity definition to it, and then carry on my work.

Is this the only way? If i want to parse the XML files with custom entities without adding them to the file, can i do that?

Is there a way to define those entities in the script and parse the XMl files?

3
  • If the files that you work with contain entity references like   but no corresponding entity declarations, then the files are ill-formed and therefore not really XML files. Commented Mar 21, 2018 at 16:11
  • I understand. But can't help it. I have 100k+ XMl files, and still a lot to come. I can't add those entities back to the XML files (which would be my last way). Is there anything i can do without adding them to the files or the way i mentioned, i would like to know. Commented Mar 21, 2018 at 16:38
  • This workaround works with lxml but not with ElementTree, unfortunately: stackoverflow.com/a/9128457/407651 Commented Mar 21, 2018 at 16:40

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.