0

I'm tryint to parse large file (>100mb) as described at http://effbot.org/zone/element-iterparse.htm#incremental-parsing

But if file contains namespaces, lxml fails with error

lxml.etree.XMLSyntaxError: Namespace default prefix was not found

It works fine if I remove elem.clear(), but uses a lot of memory. Example of xml file

<?xml version="1.0" encoding="utf-8" ?>
<feed xmlns="NS">
  <offer>
    <type>type1</type>
    <name>name1</name>
  </offer>
</feed>

lxml version is 3.2.0, because new versions segfaults after end of parsing

2
  • 1
    Could you provide sample code? Commented Mar 25, 2014 at 16:19
  • I've tried lxml>3.3 and all is OK now Commented Jan 29, 2015 at 9:14

1 Answer 1

0

Did you read this? In my experience with 100MB+ files you are in over 2GB ram usage memory (eg with my 160MB ones I'm up to 4.5GB) Are you using 64 bit python?

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.