1

I'm writing my first Python script using libxml2 to retrieve data from an XML file. The file looks like the following:

<myGroups1>
<myGrpContents name="ABC" help="abc_help">
     <myGrpKeyword name="abc1" help="help1"/>
     <myGrpKeyword name="abc2" help="help2"/>
     <myGrpKeyword name="abc3" help="help3"/>
</myGrpContents>
</myGroups1>

There are many similar groups in the file. My intention is to get the attributes "name" and "help" and put them in a different format into another file. But I'm only able to retrieve till myGroups1 element using the following code.

doc = libxml2.parseFile(cmmfilename)
root2 = doc.children
child = root2.children
while child is not None:
    if not child.isBlankNode():
        if child.type == "element":
            print "\t Element ", child.name, " with ", child.lsCountNode(), "child(ren)"
            print "\t and content ", repr(child.content)
    child = child.next

How can I iterate deeper to the elements and get the attributes? Any help in this would be deeply appreciated.

5
  • It looks like you'll have to ask for the children of myGroups1 somehow - are you aware you can have loops inside of other loops? Let me know if you'd like an example of this. Commented Sep 13, 2013 at 6:20
  • I'm looking for exactly something like a loop that could go deep until there are no more elements left. It would be of great help if you could provide some examples. Regdn the children of myGroups1, I'm able to extract only the first level children,i.e. myGrpContents in my example above. Also, I dont know what methods to use to extract the attributes of them. Commented Sep 13, 2013 at 6:43
  • For Getting Things Done, you probably want to read about xpath. But are you familiar with recursion? That's a common way to do several layers of loops. If not, could you imagine a while loop which kept asking for a node's children, if it had none went on to the next sibling node, and if it had no siblings asked for parents until one of the parents has a sibling? Commented Sep 13, 2013 at 6:50
  • The problem is that when I try to extract the properties from the children of the root element (by iterating thru child.properties in the above code), it throws the error "TypeError: iteration over non-sequence". Should I use some other method to extract the attributes from the children of an XML node? Currently the ".properties" is working only for the root node for me. Meanwhile, can u share the examples that you mentioned in ur first comment? Commented Sep 13, 2013 at 8:01
  • The code is as follows: doc = libxml2.parseFile(filename) root2 = doc.children child = root2.children while child is not None: if not child.isBlankNode(): if child.type == "element": print "\t Element ", child.name, " with ", child.lsCountNode(), "child(ren)" for property in child.properties: if property.type == "attribute": print property.name, "= ", property.content This is actually throwing the error that i mentioned in the comment above. Commented Sep 13, 2013 at 8:03

2 Answers 2

1

python. how to get attribute value with libxml2 is probably the kind of answer you're looking for.

When faced with a problem like this, when I'd rather not read the docs for some reason, exploring a library interactively like this can be helpful - I suggest you use an interactive python repl (I like bpython) to try this. Here's my session in which I came up with a solution:

>>> import libxml2
>>> xml = """<myGroups1>
... <myGrpContents name="ABC" help="abc_help">
...      <myGrpKeyword name="abc1" help="help1"/>
...      <myGrpKeyword name="abc2" help="help2"/>
...      <myGrpKeyword name="abc3" help="help3"/>
... </myGrpContents>
... </myGroups1>"""
>>> tree = libxml2.parseMemory(xml, len(xml)) # I found this method by looking through `dir(libxml2)`
>>> tree.children
<xmlNode (myGroups1) object at 0x10aba33b0>
>>> a = tree.children
>>> a
<xmlNode (myGroups1) object at 0x10a919ea8>
>>> a.children
<xmlNode (text) object at 0x10ab24368>
>>> a.properties
>>> b = a.children
>>> b.children
>>> b.properties
>>> b.next
<xmlNode (myGrpContents) object at 0x10a921290>
>>> b.next.content
'\n     \n     \n     \n'
>>> b.next.next.content
'\n'
>>> b.next.next.next.content
Traceback (most recent call last):
  File "<input>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'content'
>>> b.next.next.next
>>> b.next.properties
<xmlAttr (name) object at 0x10aba32d8>
>>> b.next.properties.children
<xmlNode (text) object at 0x10ab40f38>
>>> b.next.properties.children.content
'ABC'
>>> b.next.properties.children.name
'text'
>>> b.next.properties.next
<xmlAttr (help) object at 0x10ab40fc8>
>>> b.next.properties.next.name
'help'
>>> b.next.properties.next.content
'abc_help'
>>> list(tree)
[<xmlDoc (None) object at 0x10a921248>, <xmlNode (myGroups1) object at 0x10aba32d8>, <xmlNode (text) object at 0x10aba3878>, <xmlNode (myGrpContents) object at 0x10aba3d88>, <xmlNode (text) object at 0x10aba3950>, <xmlNode (myGrpKeyword) object at 0x10aba3758>, <xmlNode (text) object at 0x10aba3320>, <xmlNode (myGrpKeyword) object at 0x10aba3f38>, <xmlNode (text) object at 0x10aba3560>, <xmlNode (myGrpKeyword) object at 0x10aba3998>, <xmlNode (text) object at 0x10aba33f8>, <xmlNode (text) object at 0x10aba38c0>]
>>> good = list(tree)[5]
>>> good.properties
<xmlAttr (name) object at 0x10aba35f0>
>>> good.prop('name')
'abc1'
>>> good.prop('help')
'help1'
>>> good.prop('whoops')
>>> good.hasProp('whoops')
>>> good.hasProp('name')
<xmlAttr (name) object at 0x10ab40ef0>
>>> good.hasProp('name').content
'abc1'
>>> for thing in tree:
...     if thing.hasProp('name') and thing.hasProp('help'):
...         print thing.prop('name'), thing.prop('help')
...         
...     
... 
ABC abc_help
abc1 help1
abc2 help2
abc3 help3

Because it's bpython, I cheated a little bit - there's a rewind key, so I mistyped more than this, but otherwise this is pretty close.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot :) In the meanwhile, after going thru the xpath usage, I got another solution: result = doc.xpathEval('//*') for node in result: if node.type == "element": if node.prop("name") != None: print node.prop("name") if node.prop("help") != None: print node.prop("help")
This other one is better as it can print all attributes without any need for explicitly coding them: print "Node Name = ", node.name if node.properties != None: for attribute in node.properties: print "Attr: ", attribute.name, ", value: ", attribute.content
1

Haven't used libxml2, but dived in to the case and found this,

try either,

if child.type == "element":
    if child.name == "myGrpKeyword":
        print child.prop('name')
        print child.prop('help')

or

if child.type == "element":
    if child.name == "myGrpKeyword":
        for property in child.properties:
            if property.type=='attribute':
                # check what is the attribute 
                if property.name == 'name':
                    print property.content
                if property.name == 'help':
                    print property.content

Refer http://ukchill.com/technology/getting-started-with-libxml2-and-python-part-1/

update:

try a recursive function

def explore(child):     
    while child is not None:
        if not child.isBlankNode():
            if child.type == "element":
                print element.prop('name')
                print element.prop('help')
                explore(child.children)
        child = child.next
doc = libxml2.parseFile(cmmfilename)
root2 = doc.children
child = root2.children
explore(child)

3 Comments

Thanks for the reply. But, here I dont want to compare the element name like "myGrpKeyword" to get the attribute. I need to iterate through all the children of the root element in the entire file. Also, when I try to iterate thru the child.properties, it shows the error "TypeError: iteration over non-sequence"
I updated the answer with a recursive function, just check this if it fits your problem.
Yes. Thanks a lot. I used the following code to print all the attributes: def printAllAttributes(node): print "Node Name = ", node.name if node.properties != None: for attribute in node.properties: if attribute.name != "text": print attribute.name, ":", attribute.content nodeList.append(node.name); print "\n"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.