0

I'm having problems getting my head around this Python data structure:

 data = {'nmap': {'command_line': u'ls',
                  'scaninfo': {u'tcp': {'method': u'connect',
                                        'services': u'80,443'}},
                  'scanstats': {'downhosts': u'0',
                                'elapsed': u'1.18',
                                'timestr': u'Wed Mar 19 21:37:54 2014',
                                'totalhosts': u'1',
                                'uphosts': u'1'}},
         'scan': {u'url': {'addresses': {u'ipv6': u'2001:470:0:63::2'},
                                        'hostname': u'abc.net',
                                        'status': {'reason': u'syn-ack',
                                                   'state': u'up'},
                                        u'tcp': {80: {'conf': u'3',
                                                      'cpe': '',
                                                      'extrainfo': '',
                                                      'name': u'http',
                                                      'product': '',
                                                      'reason': u'syn-ack',
                                                      'state': u'open',
                                                      'version': ''},
                                                 443: {'conf': u'3',
                                                       'cpe': '',
                                                       'extrainfo': '',
                                                       'name': u'https',
                                                       'product': '',
                                                       'reason': u'syn-ack',
                                                       'script': {
                                                           u'ssl-cert': u'place holder'},
                                                       'state': u'open',
                                                       'version': ''}},
                                        'vendor': {}
         }
         }
 }

Basically I need to iterate over the 'tcp' key values and extract the contents of the 'script' item if it exists.

This is what I've tried:

items = data["scan"]
for item in items['url']['tcp']:
    if t["script"] is not None:
        print t  

However I can't seem to get it to work.

1
  • You mixed up item and t btw. Commented Mar 19, 2014 at 22:47

3 Answers 3

1

This will find any dictionary items with the key 'script' anywhere in the data structure:

def find_key(data, search_key, out=None):
    """Find all values from a nested dictionary for a given key."""
    if out is None:
        out = []
    if isinstance(data, dict):
        if search_key in data:
            out.append(data[search_key])
        for key in data:
            find_key(data[key], search_key, out)
    return out

For your data, I get:

>>> find_key(data, 'script')
[{'ssl-cert': 'place holder'}]

To find the ports, too, modify slightly:

tcp_dicts = find_key(data, 'tcp') # find all values for key 'tcp'
ports = [] # list to hold ports
for d in tcp_dicts: # iterate through values for key 'tcp'
    if all(isinstance(port, int) for port in d): # ensure all are port numbers
        for port in d:
            ports.append((port, 
                          d[port].get('script'))) # extract number and script

Now you get something like:

[(80, None), (443, {'ssl-cert': 'place holder'})]
Sign up to request clarification or add additional context in comments.

3 Comments

One more question here. If I wanted to include the 'tcp' ports i.e. 80 and 443 in the out data, how could I do that?
Not sure if I'm with you here! I assume I keep the original function as it? I can't seem to get this to work! Basically the bits I need in my output data are: "ipv6" address, "hostname", "tcp" port numbers, "status" and the contents of "script"
Ah, I see there are places where the key 'tcp' doesn't give a dictionary of port numbers. I have updated to deal with this. And yes, this is using the existing implementation of find_key. All those things that weren't in your question you can still use find_key for.
1

data['scan']['url']['tcp'] is a dictionary, so when you just iterate over it, you will get the keys but not the values. If you want to iterate over the values, you have to do so:

for t in data['scan']['url']['tcp'].values():
    if 'script' in t and t['script'] is not None:
        print(t)

If you need the key as well, iterate over the items instead:

for k, t in data['scan']['url']['tcp'].items():
    if 'script' in t and t['script'] is not None:
        print(k, t)

You also need to change your test to check 'script' in t first, otherwise accessing t['script'] will raise a key error.

Comments

0

Don't you mean if item["script"]?

Really though if the key has a chance to not exist, use the get method provided by dict.

So try instead

items = data["scan"]
for item in items['url']['tcp']:
    script = item.get('script')
    if script:
        print script

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.