4

I've been messing around with the lxml library for a little while and maybe I'm not understanding it correctly or I'm missing something but I can't seem to figure out how to edit the file after I catch a certain xpath and then be able to write that back out into xml while I'm parsing element by element.

Say we have this xml as an example:

<xml>
   <items>
      <pie>cherry</pie>
      <pie>apple</pie>
      <pie>chocolate</pie>
  </items>
</xml>

What I would like to do while parsing is when I hit that xpath of "/xml/items/pie" is to add an element before pie, so it will turn out like this:

<xml>
   <items>
      <item id="1"><pie>cherry</pie></item>
      <item id="2"><pie>apple</pie></item>
      <item id="3"><pie>chocolate</pie></item>
  </items>
</xml>

That output would need to be done by writing to a file line by line as I hit each tag and edit the xml at certain xpaths. I mean I could have it print the starting tag, the text, the attribute if it exists, and then the ending tag by hard coding certain parts, but that would be very messy and it be nice if there was a way to avoid that if possible.

Here's my guess code at this:

from lxml import etree

path=[]
count=0

context=etree.iterparse(file,events=('start','end'))
for event, element in context:
    if event=='start':
       path.append(element.tag)
       if /'+'/'.join(path)=='/xml/items/pie':
          itemnode=etree.Element('item',id=str(count))
          itemnode.text=""
          element.addprevious(itemnode)#Not the right way to do it of course
          #write/print out xml here.
    else:
        element.clear()
        path.pop()

Edit: Also, I need to run through fairly big files, so I have to use iterparse.

1

3 Answers 3

2

Here's a solution using iterparse(). The idea is to catch all tag "start" events, remember the parent (items) tag, then for every pie tag create an item tag and put the pie into it:

from StringIO import StringIO
from lxml import etree
from lxml.etree import Element

data = """<xml>
   <items>
      <pie>cherry</pie>
      <pie>apple</pie>
      <pie>chocolate</pie>
  </items>
</xml>"""

stream = StringIO(data)
context = etree.iterparse(stream, events=("start", ))

for action, elem in context:
    if elem.tag == 'items':
        items = elem
        index = 1
    elif elem.tag == 'pie':
        item = Element('item', {'id': str(index)})
        items.replace(elem, item)
        item.append(elem)
        index += 1

print etree.tostring(context.root)

prints:

<xml>
   <items>
      <item id="1"><pie>cherry</pie></item>
      <item id="2"><pie>apple</pie></item>
      <item id="3"><pie>chocolate</pie></item>
   </items>
</xml>
Sign up to request clarification or add additional context in comments.

1 Comment

Ah, that's fantastic thank you. Also, didn't know after you run through iterparse you could have it print out like that.
1

There is a more clean way to make modifications you need:

  • iterate over pie elements
  • make an item element
  • use replace() to replace a pie element with item

replace(self, old_element, new_element)

Replaces a subelement with the element passed as second argument.


from lxml import etree
from lxml.etree import XMLParser, Element

data = """<xml>
   <items>
      <pie>cherry</pie>
      <pie>apple</pie>
      <pie>chocolate</pie>
  </items>
</xml>"""


tree = etree.fromstring(data, parser=XMLParser())
items = tree.find('.//items')
for index, pie in enumerate(items.xpath('.//pie'), start=1):
    item = Element('item', {'id': str(index)})
    items.replace(pie, item)
    item.append(pie)

print etree.tostring(tree, pretty_print=True)

prints:

<xml>
   <items>
      <item id="1"><pie>cherry</pie></item>
      <item id="2"><pie>apple</pie></item>
      <item id="3"><pie>chocolate</pie></item>
   </items>
</xml>

4 Comments

Huh, didn't even think about replace that's great. But how would I work that into my code? I need to go through fairly big files so I wouldn't be able to have the document all at once using iterparse.
@Oron well, this is a game changer then.
Yea sorry about that, I should of mentioned why I was using iterparse.
@Oron never mind, see another answer which uses iterparse().
0

I would suggest you to use an XSLT template, as it seems to match better for this task. Initially XSLT is a little bit tricky until you get used to it, if all you want is to generate some output from an XML, then XSLT is a great tool.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.