0

I am new to XML parsing and python too .I need to get to the tree subelements and print all of them.

I have an XML file which goes like this. Here is my file- https://gofile.io/?c=OXcdue

  • allocations --queue ---subelements of queue ---queue(subelement) ----subelement of this queue ----queue ----queue

My requirement is to read all the queues which has subqueues and their subqueues.

3
  • I only want to print queue pireporting_q1- "all attributes and sub elements"+ "atscale_rtam_mr_sq1" with all subelements+ "atscale_spark_sq1" with all subelemets- Desired result is here - <queue name="pireporting_q1"> <minResources>6960000 mb,1160 vcores,87 disks</minResources> <maxResources>10440000 mb,1740 vcores,130 disks</maxResources> <queue name="atscale_rtam_mr_sq1"> </queue> <queue name="atscale_spark_sq1"> </queue> </queue> Commented Oct 10, 2019 at 10:29
  • Please edit the question to clarify what you want. The comment is very hard to read. Add the content of he XML file. Commented Oct 10, 2019 at 15:03
  • Please check the file in the link. It's not allowing me to add file here. This is my first post. Let me know how do i add my code here without it giving me n error. Commented Oct 12, 2019 at 11:21

2 Answers 2

1

You can use the lxml library to parse any xml content. This library is better than the standard xml library as it allows you to get the namespace of the xml document if necessary (not needed in your case).

from lxml import etree
tree = etree.parse(path_to_xml_file)
root = tree.getroot()

for children in root.getchildren():
    print (children.tag)

    for child in children:
        print(child.tag, child.text)

Refer to the documentation here for more information on how to access various parts of your xml file and recursively finding all subelements.. This documentation is for the standard xml library but is also supported in the lxml library as lxml is built on top of xml.

Sign up to request clarification or add additional context in comments.

3 Comments

Namespace in xml is a way of assigning elements and attributes to a group. This allows you to have elements with the same name but avoid conflict because they are assigned to a different group. Namespaces are defined at the top of xmls like so: <element xmlns:name = "URL"> If you try and parse this using the normal xml library, it will not find the namespace whereas the lxml has a method to do exactly so.
Just wanted to give a reason for using lxml over xml.
The xml library doesn't deal with namespaces very well in my experience especially if the namespace is None.
0

Below (Using no external library)

import pprint
import xml.etree.ElementTree as ET

xml = '''<allocations>
    <queue name="bdpaas_express_q1">
      <minResources>12000 mb,2 vcores,1 disks</minResources>
      <maxResources>18000 mb,3 vcores,2 disks</maxResources>
      <aclSubmitApps> xyz</aclSubmitApps>
      <aclAdministerApps> xyz</aclAdministerApps>
      <label>allnodes</label>
    </queue>
    <queue name="dl_priority_q1">
      <minResources>8496000 mb,1416 vcores,108 disks</minResources>
      <maxResources>12768000 mb,2128 vcores,162 disks</maxResources>
      <aclSubmitApps> dla_grp</aclSubmitApps>
      <aclAdministerApps> dla_grp</aclAdministerApps>>
      <label>fastnodes</label>
    </queue>
    <queue name="pireporting_q1">
      <minResources>6960000 mb,1160 vcores,87 disks</minResources>
      <maxResources>10440000 mb,1740 vcores,130 disks</maxResources>
      <queue name="atscale_rtam_mr_sq1">
        <minResources>6000000 mb,1000 vcores,75 disks</minResources>
        <maxResources>9000000 mb,1500 vcores,112 disks</maxResources>
        <aclSubmitApps> atscalep</aclSubmitApps>
        <aclAdministerApps> atscalep</aclAdministerApps>
        <label>allnodes</label>
      </queue>
      <queue name="atscale_spark_sq1">
        <minResources>960000 mb,160 vcores,12 disks</minResources>
        <maxResources>1440000 mb,240 vcores,18 disks</maxResources>
        <aclSubmitApps> atscalep</aclSubmitApps>
        <aclAdministerApps> atscalep</aclAdministerApps>
        <label>allnodes</label>
      </queue>
    </queue>
  <queuePlacementPolicy>
    <rule create="false" name="specified" />
    <rule name="reject" />
  </queuePlacementPolicy>
</allocations>
'''


root = ET.fromstring(xml)
queues = root.findall('.//queue')
for queue in queues:
  if queue.find('./queue'):
    print(ET.tostring(queue, encoding='utf8', method='xml'))

output

<?xml version="1.0" encoding="UTF-8"?>
<queue name="pireporting_q1">
   <minResources>6960000 mb,1160 vcores,87 disks</minResources>
   <maxResources>10440000 mb,1740 vcores,130 disks</maxResources>
   <queue name="atscale_rtam_mr_sq1" />
   <queue name="atscale_spark_sq1" />
</queue>

4 Comments

Hi , I have a very big XML file , this was just an example what i gave. What are my options other then from string method?
fromstring is just for the answer. You can use 'parse'
could you also please tell me how do i iterate over the parent and subqueue tags separately here? Thanks for helping....
@user2910022 I am not sure I understand. Does my code solve your problem and you get the output you are looking for? What else are you looking for?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.