How to find a specific node in xml in python, while checking its tree structure?

Question

My xml is as below.

<?xml version="1.0" encoding="UTF-8"?>
<ServiceResponse xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="https://xx.xx.xx/xx/xx/x.x/xx/xx.xsd">
  <responseCode>SUCCESS</responseCode>
  <count>100</count>
  <hasMoreRecords>true</hasMoreRecords>
  <lastId>12345</lastId>
  <data>
    <Main>
      <sub1>1</id>
      <sub2>a</name>
    </Main>
    <Main>
      <sub1>2</id>
      <sub2>b</name>
    </Main>
  </data>
</ServiceResponse>

My code is as below.

import csv
import xml.etree.ElementTree as etree
    
xml_file_name = 'blah.xml'
csv_file_name = 'blah.csv'
main_tag_name = 'Main'
fields = ['sub1', 'sub2']

tree = etree.parse(xml_file_name)

with open(csv_file_name, 'w', newline='', encoding="utf-8") as csv_file:
    csvwriter = csv.writer(csv_file)
    csvwriter.writerow(fields)
    for host in tree.iter(tag=main_tag_name):
        data = []
        for field in fields:
            if host.find(field) is not None:
                data.append(host.find(field).text)
            else:
                data.append('')
        csvwriter.writerow(data)

Somehow I think this is not the correct way to parse an xml, because it is searching 'Main' anywhere in the tree structure, and does not follow a specific path to search it. Meaning - If it accidentally finds 'Main' anywhere else, the program will not work as desired.

Request you to suggest me the most optimized way you know for this use case, mostly a built-in approach rather than too much of customization.

Note:
I want to use this as a common script for multiple xml files which have various tags before reaching the main tag and then has various sub tags. This needs to be considered to make sure we don't hardcode the tree structure and is configurable.

abhilb · Accepted Answer · 2019-11-20 12:58:18Z

1

You can try xpath based approach.

For example:

with open('some.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    with open("test.xml") as f:
        tree = ET.parse(f)
        root = tree.getroot()
        sub1_nodes = root.findall('.//data/Main/sub1')
        sub2_nodes = root.findall('.//data/Main/sub2')
        for a,b in zip(sub1_nodes, sub2_nodes):
            writer.writerow([a.text, b.text])

answered Nov 20, 2019 at 12:58

abhilb

5,7672 gold badges22 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

SmiP Over a year ago

Thanks, I will try this. On a side note, if there are multiple tags like sub1, sub2 and so on, this will lead to root.findall() being called lot of times. Also, if say the xml is like main sub1 sub2, main sub1, main sub2 The sub1 of 2nd row and sub2 of 3rd row would get merged in zip. Hence I am trying to find something where i reach till Main sequentially and then get all available subs of each row separately.

abhilb Over a year ago

Ok. I just wanted to show an example for xpath based approach

Collectives™ on Stack Overflow

How to find a specific node in xml in python, while checking its tree structure?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related