python lxml iterparse fails on large files containing namespaces

Question

I'm tryint to parse large file (>100mb) as described at http://effbot.org/zone/element-iterparse.htm#incremental-parsing

But if file contains namespaces, lxml fails with error

lxml.etree.XMLSyntaxError: Namespace default prefix was not found

It works fine if I remove elem.clear(), but uses a lot of memory. Example of xml file

<?xml version="1.0" encoding="utf-8" ?>
<feed xmlns="NS">
  <offer>
    <type>type1</type>
    <name>name1</name>
  </offer>
</feed>

lxml version is 3.2.0, because new versions segfaults after end of parsing

Could you provide sample code?

maxbublis
– maxbublis

2014-03-25 16:19:38 +00:00
Commented Mar 25, 2014 at 16:19 — maxbublis
– maxbublis, Commented Mar 25, 2014 at 16:19
I've tried lxml>3.3 and all is OK now

vitalii
– vitalii

2015-01-29 09:14:27 +00:00
Commented Jan 29, 2015 at 9:14 — vitalii
– vitalii, Commented Jan 29, 2015 at 9:14

Community · Accepted Answer · 2017-05-23 12:28:48Z

0

Did you read this? In my experience with 100MB+ files you are in over 2GB ram usage memory (eg with my 160MB ones I'm up to 4.5GB) Are you using 64 bit python?

edited May 23, 2017 at 12:28

CommunityBot

11 silver badge

answered Mar 25, 2014 at 15:25

Stabby

1097 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

python lxml iterparse fails on large files containing namespaces

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related