11

I have this XML file:

<domain type='kmc' id='007'>
  <name>virtual bug</name>
  <uuid>66523dfdf555dfd</uuid>
  <os>
    <type arch='xintel' machine='ubuntu'>hvm</type>
    <boot dev='hd'/>
    <boot dev='cdrom'/>
  </os>
  <memory unit='KiB'>524288</memory>
  <currentMemory unit='KiB'>270336</currentMemory>
  <vcpu placement='static'>10</vcpu>

Now, I want parse this and fetch its attribute value. For instance, I want to fetch the uuid field. So what should be the proper method to fetch it, in Python?

2
  • 6
    What have you tried? Googling "python xml" yields quite a few really useful results that should point you in the right direction. Commented Sep 5, 2012 at 21:34
  • there are a lot of examples but not pointing in the direction i want to go. I wnat to fetch attributes value. the examples i am seeing are to convert to xml file or to convert form an xml file Commented Sep 5, 2012 at 21:36

7 Answers 7

30

Here's an lxml snippet that extracts an attribute as well as element text (your question was a little ambiguous about which one you needed, so I'm including both):

from lxml import etree
doc = etree.parse(filename)

memoryElem = doc.find('memory')
print memoryElem.text        # element text
print memoryElem.get('unit') # attribute

You asked (in a comment on Ali Afshar's answer) whether minidom (2.x, 3.x) is a good alternative. Here's the equivalent code using minidom; judge for yourself which is nicer:

import xml.dom.minidom as minidom
doc = minidom.parse(filename)

memoryElem = doc.getElementsByTagName('memory')[0]
print ''.join( [node.data for node in memoryElem.childNodes] )
print memoryElem.getAttribute('unit')

lxml seems like the winner to me.

Sign up to request clarification or add additional context in comments.

2 Comments

This method is also compatible with xml.etree.ElementTree, which is included with Python 2 and 3.
The first try with lxml does not work for me ... when I want to acces .text it says that Nonetype object has no attribute 'text'.
13

XML

<data>
    <items>
        <item name="item1">item1</item>
        <item name="item2">item2</item>
        <item name="item3">item3</item>
        <item name="item4">item4</item>
    </items>
</data>

Python :

from xml.dom import minidom
xmldoc = minidom.parse('items.xml')
itemlist = xmldoc.getElementsByTagName('item') 
print "Len : ", len(itemlist)
print "Attribute Name : ", itemlist[0].attributes['name'].value
print "Text : ", itemlist[0].firstChild.nodeValue
for s in itemlist :
    print "Attribute Name : ", s.attributes['name'].value
    print "Text : ", s.firstChild.nodeValue

Comments

3

etree, with lxml probably:

root = etree.XML(MY_XML)
uuid = root.find('uuid')
print uuid.text

Comments

0

Other people can tell you how to do it with the Python standard library. I'd recommend my own mini-library that makes this a completely straight forward.

>>> obj = xml2obj.xml2obj("""<domain type='kmc' id='007'>
... <name>virtual bug</name>
... <uuid>66523dfdf555dfd</uuid>
... <os>
... <type arch='xintel' machine='ubuntu'>hvm</type>
... <boot dev='hd'/>
... <boot dev='cdrom'/>
... </os>
... <memory unit='KiB'>524288</memory>
... <currentMemory unit='KiB'>270336</currentMemory>
... <vcpu placement='static'>10</vcpu>
... </domain>""")
>>> obj.uuid
u'66523dfdf555dfd'

http://code.activestate.com/recipes/534109-xml-to-python-data-structure/

Comments

0

I would use lxml and parse it out using xpath //UUID

Comments

0

Above XML does not have closing tag, It will give

etree parse error: Premature end of data in tag

Correct XML is:

<domain type='kmc' id='007'>
  <name>virtual bug</name>
  <uuid>66523dfdf555dfd</uuid>
  <os>
    <type arch='xintel' machine='ubuntu'>hvm</type>
    <boot dev='hd'/>
    <boot dev='cdrom'/>
  </os>
  <memory unit='KiB'>524288</memory>
  <currentMemory unit='KiB'>270336</currentMemory>
  <vcpu placement='static'>10</vcpu>
</domain>

Comments

0

You can try parsing it with using (recover=True). you can do something like this.

parser = etree.XMLParser(recover=True)
tree = etree.parse('your xml file', parser)

I used this recently and it worked for me, you can try and see but in case you need to do any more complecated xml data extractions, you can take a look at this code i wrote for some project handling complex xml data extractions.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.