Stack Overflow for Teams is now Stack Internal: See how we’re powering the human intelligence layer of enterprise AI. Read more >

1. Home
2. Questions
3. AI Assist Labs
4. Tags
6. Challenges
7. Chat
8. Articles
9. Users
11. Jobs
12. Companies
13. Collectives
14. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Teams
Bring the best of human thought and AI automation together at your work. Learn more

python xml won't parse

Ask Question

Asked 4 years, 7 months ago

Modified 4 years, 7 months ago

Viewed 97 times

0

Trying to read bulk data from US Patent and Trade Office. Have tried several xml files from here, I get the same results:

import xml.etree.ElementTree as ET
import re
file = 'ipgb20210105.xml'
tree = ET.parse(file)

yields: "ParseError: junk after document element: line 862, column 0"

Have tried recommendation to wrap with fake root node, but this doesn't work either:

with open(file) as f:
    xml = f.read()
tree = ET.fromstring(re.sub(r"(<\?xml[^>]+\?>)", r"\1<root>", xml) + "</root>")

yields: "ParseError: not well-formed (invalid token): line 2, column 2"

Any help much appreciated!

asked Apr 15, 2021 at 17:00

Dan Brendel

33 bronze badges

ipgb20210105.xml is not one big well-formed XML document. It consists of thousands of small XML documents (each with its own XML declaration) squashed together.

mzjn
– mzjn

2021-04-15 17:36:05 +00:00
Commented Apr 15, 2021 at 17:36
Try Python 3: Split concatenated XML files.

urznow
– urznow

2021-04-16 08:09:58 +00:00
Commented Apr 16, 2021 at 8:09

Add a comment |

Related questions

Load 7 more related questions

0

Your Answer

Sign up or log in

Post as a guest

Name

Email

Required, but never shown

Post as a guest

Name

Email

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.