parsing nested xml in python

Question

I have this XML file:

<?xml version="1.0" ?><XMLSchemaPalletLoadTechData xmlns="http://tempuri.org/XMLSchemaPalletLoadTechData.xsd">
  <TechDataParams>
    <RunNumber>sample</RunNumber>
    <Holder>sample</Holder>
    <ProcessToolName>sample</ProcessToolName>
    <RecipeName>sample</RecipeName>
    <PalletName>sample</PalletName>
    <PalletPosition>sample</PalletPosition>
    <IsControl>sample</IsControl>
    <LoadPosition>sample</LoadPosition>
    <HolderJob>sample</HolderJob>
    <IsSPC>sample</IsSPC>
    <MeasurementType>sample</MeasurementType>
  </TechDataParams>
  <TechDataParams>
    <RunNumber>sample</RunNumber>
    <Holder>sample</Holder>
    <ProcessToolName>sample</ProcessToolName>
    <RecipeName>sample</RecipeName>
    <PalletName>sample</PalletName>
    <PalletPosition>sample</PalletPosition>
    <IsControl>sample</IsControl>
    <LoadPosition>sample</LoadPosition>
    <HolderJob>sample</HolderJob>
    <IsSPC>sample</IsSPC>
    <MeasurementType>XRF</MeasurementType>
  </TechDataParams>
</XMLSchemaPalletLoadTechData>

And this is my code for parsing the xml:

for data in xml.getElementsByTagName('TechDataParams'):
    #parse xml
    runnum=data.getElementsByTagName('RunNumber')[0].firstChild.nodeValue
    hold=data.getElementsByTagName('Holder')[0].firstChild.nodeValue
    processtn=data.getElementsByTagName('ProcessToolName'[0].firstChild.nodeValue)
    recipedata=data.getElementsByTagName('RecipeName'[0].firstChild.nodeValue)
    palletna=data.getElementsByTagName('PalletName')[0].firstChild.nodeValue
    palletposi=data.getElementsByTagName('PalletPosition')[0].firstChild.nodeValue
    control = data.getElementsByTagName('IsControl')[0].firstChild.nodeValue
    loadpos=data.getElementsByTagName('LoadPosition')[0].firstChild.nodeValue
    holderjob=data.getElementsByTagName('HolderJob')[0].firstChild.nodeValue
    spc = data.getElementsByTagName('IsSPC')[0].firstChild.nodeValue
    mestype = data.getElementsByTagName('MeasurementType')[0].firstChild.nodeValue

but when i print each node, i am only getting one set of 'TechDataParams', but I want to be able to get all 'TechDataParams' from the XML.

Let me know if my question is a bit unclear.

alecxe · Accepted Answer · 2015-01-07 08:01:50Z

Please don't dive into parsing XML with minidom, unless you want your hair to be pulled out by yourself.

I would use xmltodict module here. One line and you have a list of dicts with all the data you need:

import xmltodict

data = """your xml here"""

data = xmltodict.parse(data)['XMLSchemaPalletLoadTechData']['TechDataParams']
for params in data:
    print dict(params)

Prints:

{u'PalletPosition': u'sample', u'HolderJob': u'sample', u'RunNumber': u'sample', u'ProcessToolName': u'sample', u'RecipeName': u'sample', u'IsControl': u'sample', u'PalletName': u'sample', u'LoadPosition': u'sample', u'MeasurementType': u'sample', u'Holder': u'sample', u'IsSPC': u'sample'}
{u'PalletPosition': u'sample', u'HolderJob': u'sample', u'RunNumber': u'sample', u'ProcessToolName': u'sample', u'RecipeName': u'sample', u'IsControl': u'sample', u'PalletName': u'sample', u'LoadPosition': u'sample', u'MeasurementType': u'XRF', u'Holder': u'sample', u'IsSPC': u'sample'}

Stephen Lin · Accepted Answer · 2015-01-07 08:01:49Z

0

Here is an example for you. Replace file_path with your own.

I replace value of RunNumber with 001 and 002.

# -*- coding: utf-8 -*-
#!/usr/bin/python

from xml.dom import minidom

file_path = 'C:\\temp\\test.xml'

doc = minidom.parse(file_path)
TechDataParams = doc.getElementsByTagName('TechDataParams')
for t in TechDataParams:
    num = t.getElementsByTagName('RunNumber')[0]
    print 'num is ', num.firstChild.data

OUTPUT:

num is  001
num is  002

answered Jan 7, 2015 at 8:01

Stephen Lin

4,9021 gold badge16 silver badges26 bronze badges

1 Comment

ellabells Over a year ago

thank you! i will also try this method and see what works best!

Vivek Sable · Accepted Answer · 2015-01-07 08:16:42Z

0

Also by lxml.etree module.

Input contain namespace i.e. http://tempuri.org/XMLSchemaPalletLoadTechData.xsd
Use xpath method to find target TechDataParams tags.
Get children of TechDataParams tag and create dictionary which key is tag name and value is text of tag.
Append to list varaible which is TechDataParams.

code:

from lxml import etree
root = etree.fromstring(content)
TechDataParams_info = []
for  i in root.xpath("//a:XMLSchemaPalletLoadTechData/a:TechDataParams", namespaces={"a": 'http://tempuri.org/XMLSchemaPalletLoadTechData.xsd'}):
    temp = dict()
    for j in i.getchildren():
        temp[j.tag.split("}", 1)[-1]] = j.text
    TechDataParams_info.append(temp)

print TechDataParams_info

output:

[{'PalletPosition': 'sample', 'HolderJob': 'sample', 'RunNumber': 'sample', 'ProcessToolName': 'sample', 'RecipeName': 'sample', 'IsControl': 'sample', 'PalletName': 'sample', 'LoadPosition': 'sample', 'MeasurementType': 'sample', 'Holder': 'sample', 'IsSPC': 'sample'}, {'PalletPosition': 'sample', 'HolderJob': 'sample', 'RunNumber': 'sample', 'ProcessToolName': 'sample', 'RecipeName': 'sample', 'IsControl': 'sample', 'PalletName': 'sample', 'LoadPosition': 'sample', 'MeasurementType': 'XRF', 'Holder': 'sample', 'IsSPC': 'sample'}]

answered Jan 7, 2015 at 8:16

Vivek Sable

10.3k6 gold badges45 silver badges63 bronze badges

1 Comment

ellabells Over a year ago

thank you! i will also try this method and see what works best!

Collectives™ on Stack Overflow

parsing nested xml in python

3 Answers 3

Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related