3

This is a snippet of the XML I am trying to parse:

<DSMs>
<DSM class="ACE" order="320"/>
<DSM class="ACS" order="1900"/>
<DSM class="Aironet" order="1050"/>
<DSM class="Apache" order="4700"/>
<DSM class="AppSecDbProtect" order="1477"/>
<DSM class="ArborNetworksPravail" order="1554">
  <Thresholds>
    <Threshold name="MinNumEvents" value="5"/>
    <Threshold name="AbandonAfterSuccessiveFailures" value="3"/>
  </Thresholds>
  <Templates>
    <Template name="DeviceName" value="Arbor Networks Pravail @ $$SOURCE_ADDRESS$$"/>
  </Templates>
</DSM>
<DSM class="ARN" order="2000"/>
<DSM class="ArpeggioSIFTIT" order="1553"/>
<DSM class="ArubaClearPass" order="545">
  <Thresholds>
    <Threshold name="MinNumEvents" value="5"/>
    <Threshold name="AbandonAfterSuccessiveFailures" value="3"/>
  </Thresholds>
  <Templates>
    <Template name="DeviceName" value="Aruba ClearPass Policy Manager @ $$SOURCE_ADDRESS$$"/>
    <Template name="DeviceDescription" value="Aruba ClearPass Policy Manager Device"/>
  </Templates>
 </DSM>
</DSMs>  

What I did so far (part of the code):

ta_dsms = []
for level1 in root:
   if level1.tag == 'DSMs':
       for level2 in level1:
           ta_dsm = level2.attrib
           ta_dsms.append(ta_dsm)
print ta_dsms

The current output of ta_dsms is like:

 [{'class': 'ACE', 'order': '320'}, 
  {'class': 'ACS', 'order': '1900'}, 
 ...]

My question is what the elegant way is to get Thresholds and Templates info and add them to my array. Only some DSMs have children. I've been stuck on this all day. Thank you for saving my life!

3
  • i just updated my answer. as I understand it, we should have Thresholds and Templates be lists in each DSM, correct? Commented Feb 5, 2018 at 23:15
  • @briancaffey In this case, only ArborNetworksPravail and ArubaClearPass have Thresholds and Templates. Thanks. Commented Feb 5, 2018 at 23:32
  • The question is why you are trying to convert the XML into an nested list? The XML is a fine nested data structure already, there is no need to convert it to a list to work with it really. Commented Feb 5, 2018 at 23:50

2 Answers 2

1

Does this get what you want?

import xml.etree.ElementTree as ET
tree = ET.parse('data.xml')
root = tree.getroot()

ta_dsms = []
for level1 in root:
    d = {}
    if level1.tag == 'DSM':
        for k,v in level1.attrib.items():
            d[k] = v

            for level2 in level1:
                threshold_list = []
                if level2.tag == "Thresholds":
                    for c in level2.getchildren():
                        threshold_dic = {}
                        for k,v in c.attrib.items():
                            threshold_dic[k] = v
                        threshold_list.append(threshold_dic)
                    d["Thresholds"] = threshold_list
                template_list = []
                if level2.tag == "Templates":
                    for c in level2.getchildren():
                        template_dic = {}
                        for k,v in c.attrib.items():
                            template_dic[k] = v
                        template_list.append(template_dic)
                    d["Templates"] = template_list
        ta_dsms.append(d)


print(ta_dsms)

The result is:

[  
   {  
      "class":"ACE",
      "order":"320"
   },
   {  
      "class":"ACS",
      "order":"1900"
   },
   {  
      "class":"Aironet",
      "order":"1050"
   },
   {  
      "class":"Apache",
      "order":"4700"
   },
   {  
      "class":"AppSecDbProtect",
      "order":"1477"
   },
   {  
      "class":"ArborNetworksPravail",
      "Thresholds":[  
         {  
            "name":"MinNumEvents",
            "value":"5"
         },
         {  
            "name":"AbandonAfterSuccessiveFailures",
            "value":"3"
         }
      ],
      "Templates":[  
         {  
            "name":"DeviceName",
            "value":"Arbor Networks Pravail @ $$SOURCE_ADDRESS$$"
         }
      ],
      "order":"1554"
   },
   {  
      "class":"ARN",
      "order":"2000"
   },
   {  
      "class":"ArpeggioSIFTIT",
      "order":"1553"
   },
   {  
      "class":"ArubaClearPass",
      "Thresholds":[  
         {  
            "name":"MinNumEvents",
            "value":"5"
         },
         {  
            "name":"AbandonAfterSuccessiveFailures",
            "value":"3"
         }
      ],
      "Templates":[  
         {  
            "name":"DeviceName",
            "value":"Aruba ClearPass Policy Manager @ $$SOURCE_ADDRESS$$"
         },
         {  
            "name":"DeviceDescription",
            "value":"Aruba ClearPass Policy Manager Device"
         }
      ],
      "order":"545"
   }
]
Sign up to request clarification or add additional context in comments.

4 Comments

Do you know why there are duplicated lists in the array?
@ZoeSun Oh, I just noticed that. let me have a look
The format looks perfect. The only issue is the dups.
@ZoeSun OK, I think I fixed the duplicate error by moving ta_dsms.append(d). Is it OK?
1
from lxml import etree

class XmlParser(object):
    results = []
    def __init__(self, filename, **kwargs):
        self.__dict__.update(kwargs)
        self.filename = filename
        self._process()

    def _process(self):
        f=open(self.filename, "r")
        self.data = f.read()

    def get_result_dict(self):
        self._parse()
        return self._map_to_dict( )

    def _map_to_dict(self):
        for row in self.root:
            self.results.append(self.map_by_keys(row))
        return self.results

    def _parse(self):
        self.root = etree.fromstring(self.data)

    def map_by_keys(self, row ):
        """can be DMS"""
        """can be Threshhold no children"""
        """Can be Threshold with children"""
        if row.get('name') is not None:
            # threshold with children
            return (row.tag, {'name':row.get('name'), 'value':row.get('value')})

        elif (row.get('name') is None) and row.get('class') is None:
            # Threshold with no children
            children = []
            for child in row.getchildren():
                key, values = self.map_by_keys(child)
                children.append({key: values})
            return (row.tag, children )

        else:
            # parent DMS
            unit = {'class': row.get('class'), 'order': row.get('order')}
            if len(row.getchildren()) > 0:
                for child in row.getchildren():
                    key, values = self.map_by_keys( child )
                    unit[key] = values

            return unit


file = './x.xml'
parser = XmlParser(file)
print(parser.get_result_dict())

prints:

[{'class': 'ACE', 'order': '320'}, {'class': 'ACS', 'order': '1900'}, {'class': 'Aironet', 'order': '1050'}, {'class': 'Apache', 'order': '4700'}, {'class': 'AppSecDbProtect', 'order': '1477'}, {'class': 'ArborNetworksPravail', 'order': '1554', 'Thresholds': [{'Threshold': {'value': '5', 'name': 'MinNumEvents'}}, {'Threshold': {'value': '3', 'name': 'AbandonAfterSuccessiveFailures'}}], 'Templates': [{'Template': {'value': 'Arbor Networks Pravail @ $$SOURCE_ADDRESS$$', 'name': 'DeviceName'}}]}, {'class': 'ARN', 'order': '2000'}, {'class': 'ArpeggioSIFTIT', 'order': '1553'}, {'class': 'ArubaClearPass', 'order': '545', 'Thresholds': [{'Threshold': {'value': '5', 'name': 'MinNumEvents'}}, {'Threshold': {'value': '3', 'name': 'AbandonAfterSuccessiveFailures'}}], 'Templates': [{'Template': {'value': 'Aruba ClearPass Policy Manager @ $$SOURCE_ADDRESS$$', 'name': 'DeviceName'}}, {'Template': {'value': 'Aruba ClearPass Policy Manager Device', 'name': 'DeviceDescription'}}]}]

in order to understand recursion you must first understand recursion

1 Comment

I just noticed you need the children as well

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.