Python parse NMAP XML output "elem key=" NodeList

Question

I am trying to parse a specific value from NMAP xml file. The portion of the xml file looks like this:

<nmaprun scanner="nmap" args="nmap -A -P0 -oA scanoutput 192.168.1.5" start="1445258532" startstr="Mon Oct 19 08:42:12 2015" version="6.47" xmloutputversion="1.04">
    <hostscript>
        <script id="smb-os-discovery" output="&#10;  OS: Windows Server 2008 R2 Standard 7601 Service Pack 1 (Windows Server 2008 R2 Standard 6.1)&#10;  OS CPE: cpe:/o:microsoft:windows_server_2008::sp1&#10;  Computer name: SOMEHOSTNAME&#10;  NetBIOS computer name: SOMEHOSTNAME&#10;  Domain name: domain.local&#10;  Forest name: domain.local&#10;  FQDN: SOMEHOSTNAME.domain.local&#10;  System time: 2015-10-19T08:50:07-04:00&#10;">
            <elem key="os">Windows Server 2008 R2 Standard 7601 Service Pack 1</elem>
            <elem key="lanmanager">Windows Server 2008 R2 Standard 6.1</elem>
            <elem key="server">SOMEHOSTNAME\x00</elem>
            <elem key="date">2015-10-19T08:50:07-04:00</elem>
            <elem key="fqdn">SOMEHOSTNAME.domain.local</elem>
            <elem key="domain_dns">domain.local</elem>
            <elem key="forest_dns">domain.local</elem>
            <elem key="workgroup">HOME\x00</elem>
            <elem key="cpe">cpe:/o:microsoft:windows_server_2008::sp1</elem>
        </script>
    </hostscript>
</nmaprun>

I am trying to get the value from each key, but not sure how to address it. For example how to get just the value from elem key="os"? So far I can get the full output, but it gets messy later on when I add it in CSV and I need to break each value separate. Here is the code I have:

serveros = [script.getAttribute('output') for script in hosttag.getElementsByTagName('script') if script.getAttribute('id') == 'smb-os-discovery']

If I change it to :

serveros = [script.getElementsByTagName('os') for script in hosttag.getElementsByTagName('script') if script.getAttribute('id') == 'smb-os-discovery']

I get this error:

TypeError: sequence item 0: expected string, NodeList found

Thanks in advance!

Are you using lxml module? If so, it allows XPath: //elem[key='os'] — Parfait
– Parfait, Commented Oct 20, 2015 at 1:44
Well consider, lxml as minidom is limited in querying XML by nodes and attributes. — Parfait
– Parfait, Commented Oct 21, 2015 at 0:21

sls · Accepted Answer · 2016-06-21 15:54:55Z

0

Maybe something like this:

class ScriptResult:
    def __init__( self, script, port_number ):
        self.port_number = port_number
        for k,v in script.attrib.iteritems():
            self.__dict__[k] = v
        return

    def __str__( self ):
        d = '\n'
        for k,v in self.__dict__.iteritems():
            d += '    %-30s :   %s\n' % (k,v)
        return "ScriptResult(%s)\n" % d

class Host:
    def __init__( self ):
        self.script_results = []   # define list of script results
        return

    def print_results( self ):
        for i in self.script_results:
            print i
        return


class XML_Parser:

    def get_hostscripts( self, host, xml_host_element ):
        for hs in xml_host_element.findall('hostscript'):
            for s in hs.findall('script'):
                host.script_results.append( ScriptResult( s, 'host' ) )

edited Jun 21, 2016 at 15:54

answered Jun 6, 2016 at 23:14

sls

11 bronze badge

Sign up to request clarification or add additional context in comments.

2 Comments

sls Over a year ago

Okay... The original poster was trying to read Nmap XML files. These files are messy. The scripts populate the XML with whatever info they happen to think is important. The original poster was trying to deal with this at the XML level using the DOM findall tools. Instead of working at the XML level, the example I gave just takes all the XML and puts it into a class object. Then at a high level, in Python, you can get out the pieces and format them as required. Particularly if you use the getattr method which can tolerate missing values and supply a default value.

sls Over a year ago

self.analyze_script_smb_os_discovery(   project, s, script_name, script_output )     def analyze_script_smb_os_discovery( self, project, s, script_name, script_output ):         if script_name != 'smb-os-discovery':             return         script_output = script_output.strip()         olist = [ i.partition(':') for i in script_output.split('\n') ]         odict = dict( [ (i[0].strip(),i[2].strip()) for i in olist ] )

Collectives™ on Stack Overflow

Python parse NMAP XML output "elem key=" NodeList

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related