0

I am trying to parse a specific value from NMAP xml file. The portion of the xml file looks like this:

<nmaprun scanner="nmap" args="nmap -A -P0 -oA scanoutput 192.168.1.5" start="1445258532" startstr="Mon Oct 19 08:42:12 2015" version="6.47" xmloutputversion="1.04">
    <hostscript>
        <script id="smb-os-discovery" output="&#10;  OS: Windows Server 2008 R2 Standard 7601 Service Pack 1 (Windows Server 2008 R2 Standard 6.1)&#10;  OS CPE: cpe:/o:microsoft:windows_server_2008::sp1&#10;  Computer name: SOMEHOSTNAME&#10;  NetBIOS computer name: SOMEHOSTNAME&#10;  Domain name: domain.local&#10;  Forest name: domain.local&#10;  FQDN: SOMEHOSTNAME.domain.local&#10;  System time: 2015-10-19T08:50:07-04:00&#10;">
            <elem key="os">Windows Server 2008 R2 Standard 7601 Service Pack 1</elem>
            <elem key="lanmanager">Windows Server 2008 R2 Standard 6.1</elem>
            <elem key="server">SOMEHOSTNAME\x00</elem>
            <elem key="date">2015-10-19T08:50:07-04:00</elem>
            <elem key="fqdn">SOMEHOSTNAME.domain.local</elem>
            <elem key="domain_dns">domain.local</elem>
            <elem key="forest_dns">domain.local</elem>
            <elem key="workgroup">HOME\x00</elem>
            <elem key="cpe">cpe:/o:microsoft:windows_server_2008::sp1</elem>
        </script>
    </hostscript>
</nmaprun>

I am trying to get the value from each key, but not sure how to address it. For example how to get just the value from elem key="os"? So far I can get the full output, but it gets messy later on when I add it in CSV and I need to break each value separate. Here is the code I have:

serveros = [script.getAttribute('output') for script in hosttag.getElementsByTagName('script') if script.getAttribute('id') == 'smb-os-discovery']

If I change it to :

serveros = [script.getElementsByTagName('os') for script in hosttag.getElementsByTagName('script') if script.getAttribute('id') == 'smb-os-discovery']

I get this error:

TypeError: sequence item 0: expected string, NodeList found

Thanks in advance!

3
  • Are you using lxml module? If so, it allows XPath: //elem[key='os'] Commented Oct 20, 2015 at 1:44
  • Using xml.dom.minidom Commented Oct 20, 2015 at 1:48
  • Well consider, lxml as minidom is limited in querying XML by nodes and attributes. Commented Oct 21, 2015 at 0:21

1 Answer 1

0

Maybe something like this:

class ScriptResult:
    def __init__( self, script, port_number ):
        self.port_number = port_number
        for k,v in script.attrib.iteritems():
            self.__dict__[k] = v
        return

    def __str__( self ):
        d = '\n'
        for k,v in self.__dict__.iteritems():
            d += '    %-30s :   %s\n' % (k,v)
        return "ScriptResult(%s)\n" % d

class Host:
    def __init__( self ):
        self.script_results = []   # define list of script results
        return

    def print_results( self ):
        for i in self.script_results:
            print i
        return


class XML_Parser:

    def get_hostscripts( self, host, xml_host_element ):
        for hs in xml_host_element.findall('hostscript'):
            for s in hs.findall('script'):
                host.script_results.append( ScriptResult( s, 'host' ) )
Sign up to request clarification or add additional context in comments.

2 Comments

Okay... The original poster was trying to read Nmap XML files. These files are messy. The scripts populate the XML with whatever info they happen to think is important. The original poster was trying to deal with this at the XML level using the DOM findall tools. Instead of working at the XML level, the example I gave just takes all the XML and puts it into a class object. Then at a high level, in Python, you can get out the pieces and format them as required. Particularly if you use the getattr method which can tolerate missing values and supply a default value.
self.analyze_script_smb_os_discovery( project, s, script_name, script_output ) def analyze_script_smb_os_discovery( self, project, s, script_name, script_output ): if script_name != 'smb-os-discovery': return script_output = script_output.strip() olist = [ i.partition(':') for i in script_output.split('\n') ] odict = dict( [ (i[0].strip(),i[2].strip()) for i in olist ] )

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.