0

I'm trying to take an API call response and parse the XML data into list, but I am struggling with the multiple child/parent relationships.

My hope is to export a new XML file that would line up each job ID and tracking number, which I could then import into Excel.


Here is what I have so far

The source XML file looks like this:

<project>
   <name>October 2019</name>
   <jobs>
      <job>
      <id>5654206</id>
      <tracking>
         <mailPiece>
             <barCode>00270200802095682022</barCode>
             <address>Accounts Payable,1661 Knott Ave,La Mirada,CA,90638</address>
             <status>En Route</status>
             <dateTime>2019-10-12 00:04:21.0</dateTime>
             <statusLocation>PONTIAC,MI</statusLocation>
        </mailPiece>
     </tracking>...

Code:

import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element, SubElement

tree = ET.parse('mailings.xml')
root = tree.getroot()
print(root.tag)

for x in root[1].findall('job'):
    id=x.find('id').text
    tracking=x.find('tracking').text
    print(root[1].tag,id,tracking)

The script currently returns the following:

jobs 5654206 None
jobs 5654203 None
2
  • For your xml snippet, are you looking for an out put like id: ['5654206'] barCode: ['00270200802095682022']? Commented Oct 12, 2019 at 23:29
  • Could you add an example of the xml output you would like to have? Commented Oct 13, 2019 at 20:55

1 Answer 1

1

Debugging is your friend...

I am struggling with the multiple child/parent relationships.

The right way to resolve this yourself is through using a debugger. For example, with VS Code, after applying a breakpoint and running the script with the debugger, it will stop at the breakpoint and I can inspect all the variables in memory, and run commands at the debug console just as if they were in my script. The Variable windows output looks like this:

enter image description here

There are various ways to do this at the command-line, or with a REPL like iPython, etc., but I find using debugging in a modern IDE environment like VS Code or PyCharm are definitely the way to go. Their debuggers remove the need to pepper print statements everywhere to test out your code, rewriting your code to expose more variables that must be printed to the console, etc.

A debugger allows you to see all the variables as a snapshot, exactly how the Python interpreter sees them, at any point in your code execution. You can:

  • step through your code line-by-line and watch the variables changes in real-time in the window
  • setup a separate watch window with only the variables you care about
  • and setup breakpoints that will only happen if variables are set to particular values, etc.

Child Hierarchy with the XML find method

Inspecting the variables as I step through your code, it appears that the find() method was walking the children within an Element at all levels, not just at the top level. When you used x.find('tracking') it is finding the mailPiece nodes directly. If you print the tag property, instead of the text property, you will see it is 'mailPiece' (see the debug windows above).

So, one way to resolve your issue is to store each mailPiece element as a variable, then pull out the individual attributes you want from it (i.e. BarCode, address, etc.) using find.

Here is some code that pulls all of this into a combined hierarchy of lists and dictionaries that you can then use to build your Excel outputs.

Note: The most efficient way to do this is line-by-line as you read the xml, but this is better for readability, maintainability, and if you need to do any post-processing that requires knowledge of more than one node at a time.

import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element, SubElement
from types import SimpleNamespace

tree = ET.parse('mailings.xml')
root = tree.getroot()

jobs = []
for job in root[1].findall('job'):
    jobdict = {}
    jobdict['id'] = job.find('id').text
    jobdict['trackingMailPieces'] = []
    for tracking in job.find('tracking'):
        if tracking.tag == 'mailPiece':
            mailPieceDict = {}
            mailPieceDict['barCode'] = tracking.find('barCode').text
            mailPieceDict['address'] = tracking.find('address').text
            mailPieceDict['status'] = tracking.find('status').text
            mailPieceDict['dateTime'] = tracking.find('dateTime').text
            mailPieceDict['statusLocation'] = tracking.find('statusLocation').text
            jobdict['trackingMailPieces'].append(mailPieceDict)
    jobs.append(jobdict)

for job in jobs:
    print('Job ID: {}'.format(job['id']))
    for mp in job['trackingMailPieces']:
        print('  mailPiece:')
        for key, value in mp.items():
            print('    {} = {}'.format(key, value))

The result is:

Job ID: 5654206
  mailPiece:
    barCode = 00270200802095682022
    address = Accounts Payable,1661 Knott Ave,La Mirada,CA,90638
    status = En Route
    dateTime = 2019-10-12 00:04:21.0
    statusLocation = PONTIAC,MI

Output?

I didn't address what to do with the output as that is beyond the scope of this question, but consider writing out to a CSV file, or even directly to an Excel file, if you don't need to pass on the XML to another program for some reason. There are Python packages that handle writing CSV and Excel files.

No need to create an intermediate format that you then need to manipulate after bringing it into Excel, for example.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.