1

I am trying to extract some elements from the following XML file (trimmed down nmap output):

<?xml version="1.0"?>
<nmaprun>
<host starttime="1381245200" endtime="1381245316">
    <address addr="192.168.1.5" addrtype="ipv4"/>
    <hostnames>
      <hostname name="host1.example.com" type="PTR"/>
    </hostnames>
    <os>
        <osmatch>
        <osclass type="general purpose" vendor="Linux" osfamily="Linux" osgen="2.6.X" accuracy="100">
          <cpe>cpe:/o:linux:linux_kernel:2.6</cpe>
        </osclass>
      </osmatch>
    </os>
  </host>
</nmaprun>

with the following code:

import xml.etree.ElementTree as ET

d = [
        {'path': 'address', 'el': 'addr'},
        {'path': 'hostnames/hostname', 'el': 'name'},
        {'path': 'os/osmatch/osclass', 'el': 'osfamily'}
]

tree = ET.parse('testnmap.xml')
root = tree.getroot()
for i in root.iter('host'):
        for h in d:
                if i.find(h['path']): print i.find(h['path']).get(h['el'])
                else: print "UNKNOWN ", (h['path'])

The idea being to extract the IP, hostname and OS. The output gives me

UNKNOWN  address
UNKNOWN  hostnames/hostname
Linux

So the innermost path worked (osfamily), while the others (hostname) failed. What should be the proper call to reach them?

1
  • 1
    As an alternative, consider using the parser included in the Ndiff Python script that is distributed with Nmap. It is specifically designed for parsing Nmap XML and returning the results in Python objects. Commented Oct 10, 2013 at 13:45

1 Answer 1

1

I think the problem is the boolean comparison of i.find(h['path']). It checks if that element has children, and it only happens in <osclass>. You have to check if it's null, comparing to None, like:

...
e = i.find(h['path'])
if e is not None: print(e.get(h['el']))
...

It yields:

192.168.1.5
host1.example.com
Linux
Sign up to request clarification or add additional context in comments.

3 Comments

I am not sure I understand: what is the difference between calling i.find("os/osmatch/osclass") and i.find("hostnames/hostname"), the .get() afterwards reaches for elements within the tag in both cases (what I mean is that they get the "aaa" value from <tag aaa="bbb">). Your code works and solves the problem -- it's just i do understand why it works :)
@Woj: As I understand it, when an element exists but has no children it returns False, and when the element doesn't exist return None. In boolean context both values are interpreted as False, so an explicit check with None is needed to filter those elements that cannot be found. So, <hostname> was UNKNOWN because it has no children but <osclass> had one, the <cpe>.
all is clear now -- I did not realize that there were two different values returned depending on children existence, both False for an if. Thanks for the clarification!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.