0

My xml is as below.

<?xml version="1.0" encoding="UTF-8"?>
<ServiceResponse xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="https://xx.xx.xx/xx/xx/x.x/xx/xx.xsd">
  <responseCode>SUCCESS</responseCode>
  <count>100</count>
  <hasMoreRecords>true</hasMoreRecords>
  <lastId>12345</lastId>
  <data>
    <Main>
      <sub1>1</id>
      <sub2>a</name>
    </Main>
    <Main>
      <sub1>2</id>
      <sub2>b</name>
    </Main>
  </data>
</ServiceResponse>

My code is as below.

import csv
import xml.etree.ElementTree as etree
    
xml_file_name = 'blah.xml'
csv_file_name = 'blah.csv'
main_tag_name = 'Main'
fields = ['sub1', 'sub2']

tree = etree.parse(xml_file_name)

with open(csv_file_name, 'w', newline='', encoding="utf-8") as csv_file:
    csvwriter = csv.writer(csv_file)
    csvwriter.writerow(fields)
    for host in tree.iter(tag=main_tag_name):
        data = []
        for field in fields:
            if host.find(field) is not None:
                data.append(host.find(field).text)
            else:
                data.append('')
        csvwriter.writerow(data)

Somehow I think this is not the correct way to parse an xml, because it is searching 'Main' anywhere in the tree structure, and does not follow a specific path to search it. Meaning - If it accidentally finds 'Main' anywhere else, the program will not work as desired.

Request you to suggest me the most optimized way you know for this use case, mostly a built-in approach rather than too much of customization.

Note:
I want to use this as a common script for multiple xml files which have various tags before reaching the main tag and then has various sub tags. This needs to be considered to make sure we don't hardcode the tree structure and is configurable.

1 Answer 1

1

You can try xpath based approach.

For example:

with open('some.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    with open("test.xml") as f:
        tree = ET.parse(f)
        root = tree.getroot()
        sub1_nodes = root.findall('.//data/Main/sub1')
        sub2_nodes = root.findall('.//data/Main/sub2')
        for a,b in zip(sub1_nodes, sub2_nodes):
            writer.writerow([a.text, b.text])
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, I will try this. On a side note, if there are multiple tags like sub1, sub2 and so on, this will lead to root.findall() being called lot of times. Also, if say the xml is like main sub1 sub2, main sub1, main sub2 The sub1 of 2nd row and sub2 of 3rd row would get merged in zip. Hence I am trying to find something where i reach till Main sequentially and then get all available subs of each row separately.
Ok. I just wanted to show an example for xpath based approach

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.