1

I am trying to parse an XML file with python using lxml, but get an error on basic attempts. I use this post and the lxml tutorials to bootstrap.

My XML file is basically built from records below (I trimmed it down so that it is easier to read):

<?xml version="1.0" ?>
<?xml-stylesheet href="file:///usr/share/nmap/nmap.xsl" type="text/xsl"?>
<nmaprun scanner="nmap" args="nmap -sV -p135,12345 -oX 10.232.0.0.16.xml 10.232.0.0/16" start="1340201347" startstr="Wed Jun 20 16:09:07 2012" version="5.21" xmloutputversion="1.03">
<host>
  <hostnames>
    <hostname name="host1.example.com" type="PTR"/>
  </hostnames>
</host>
</nmaprun>

I run it through this complicated script:

from lxml import etree

d = etree.parse("myfile.xml")
for host in d.findall("host"):
    aa = host.find("hostnames/hostname")
    print aa.attrib["name"]

I get AttributeError: 'NoneType' object has no attribute 'attrib' on the print line. I checked the value of d, host and aa and they are all defined as Elements.

Upfront apologies if this is something obvious (and it probably is).

EDIT: I added the header of the XML file as requested (I am still reading and rereading the answers :))

Thanks!

5
  • 2
    Having aa be a NoneType means that find wasn't able to, well, find anything. As such, this isn't so much an error in the XML-specific code as it is (1) a slightly miswritten search, and (2) a lack of error-checking in handling the output of the lxml library. Commented Jun 20, 2012 at 16:13
  • Also -- when you say "built from the records below", I take this to mean that you're leaving things out; ie. that there's a root, a header, &c. that you aren't disclosing. These things are important; please be sure that you're at least telling us what the root of your document looks like. Commented Jun 20, 2012 at 16:54
  • @Charles Duffy: sorry, I updated the XML file. The find was successful (in the sense that it did not return an error, the only one was on the print). When printing "aa" I get a bunch of Elements which match the file, it's the attribute part that is not working. Commented Jun 20, 2012 at 19:05
  • For the iteration where it fails with the NoneType error, there clearly is no element, even if the find successfully locates them during other iterations. Commented Jun 20, 2012 at 19:08
  • @Charles Duffy: ah, I got it now. It may inded be possible that a given record does not have it, I will add a check and retest, updating the script above if needed. Also thanks for your answer below, I will test as well and be back. Thanks! Commented Jun 20, 2012 at 19:16

3 Answers 3

2

You can solve this with an xpath expression.

d.xpath('//hostname/@name') # thank you for comment

Alternatively

for host in d.xpath('//hostname'):
    print host.get('name'), host.get('whatever else etc...')
Sign up to request clarification or add additional context in comments.

2 Comments

Actually, //hostname/@name.
@larsmans ...well, Jon's solution is correct if we still want the line below to be able to do an attrib lookup, but yes, just going straight to the string (and removing the variable assignment altogether) makes more sense.
1

Though it would make more sense to use XPath, your code already works fine when standing alone, so long as one handles the case where a host has no hostnames found:

doc = lxml.etree.XML("""
  <nmaprun>
    <host>
      <hostnames>
        <hostname name="host1.example.com" type="PTR"/>
      </hostnames>
    </host>
  </nmaprun>""")
for host in doc.findall('host'):
  host_el = host.find('hostnames/hostname')
  if host_el is not None:
    print host_el.attrib['name']

With XPath (doc.xpath() rather than doc.find() or doc.findall()), one could do better, filtering only for hostnames with a name and thus avoiding the faulty records altogether:

  • host[hostnames/hostname/@name] will find hosts which have at least one hostnames with a hostname with a a name attribute.
  • //hostnames/hostname/@name will directly return only the names themselves (if using lxml, exposing these as strings).

4 Comments

The XML in the question does not have <root> as the root element.
@mzjn He clearly describes his XML file as "built from the records below", which is distinct from containing *only* the records below. A file "containing" given records can certainly have an undisclosed root element.
@WoJ If you do what I'm doing here, checking if aa is None before each time you try to access and print its attributes, does your problem go away?
Thank you for all answers pointing to the idea of an empty record even though findall was successful. I think I finally understood how the parsing works (the lack of parent/child relationship built-in was particularly painful)
1

It looks like you might have some <host> element that either have not <hostnames> or no <hostname> sub-element defined.

As suggested in a comment to your question by @Charles Duffy, you need to check that your call to find() found an element

for host in d.findall("host"):
    aa = host.find("hostnames/hostname")
    if aa:
        print aa.attrib["name"]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.