0

I am Python XML beginner and I have an issue to get data from the given XML file:

<?xml version="1.0" encoding="UTF-8"?>
<martif xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="en">
   <cat>
      <desc type="No">1</desc>
      <desc type="Main">DES1.1</desc>
      <desc type="Sub">DES1.2</desc>
      <lang xml:lang="EN">
         <t>
            <term>T1.1</term>
            <Typ type="TermType">main</Typ>
         </t>
         <t>
            <term>T1.2</term>
            <Typ type="TermType">option</Typ>
         </t>
      </lang>
      <lang xml:lang="FR">
         <t>
            <term>T1.3</term>
            <Typ type="TermType">main</Typ>
         </t>
         <t>
            <term>T1.4</term>
            <Typ type="TermType">option</Typ>
         </t>
      </lang>
   </cat>
   <cat>
      <desc type="No">2</desc>
      <desc type="Main">DES2.1</desc>
      <desc type="Sub">DES2.2</desc>
      <lang xml:lang="EN">
         <t>
            <term>T2.1</term>
            <Typ type="TermType">main</Typ>
         </t>
         <t>
            <term>T2.2</term>
            <Typ type="TermType">option</Typ>
         </t>
      </lang>
      <lang xml:lang="FR">
         <t>
            <term>T2.3</term>
            <Typ type="TermType">main</Typ>
         </t>
         <t>
            <term>T2.4</term>
            <Typ type="TermType">option</Typ>
         </t>
      </lang>
   </cat>
</martif>

The desired result should be:

Type:  Main      Category: DES1.1
Type:  Sub       Category: DES1.2
lang:  EN
Term:  T2.1
TermType: main
Term:  T1.2
TermType: option
lang:  FR
Term:  T1.3
Term Note: main
Term:  T1.4
TermType: option

Type:  Main      Category: DES2.1
Type:  Sub       Category: DES2.2
lang:  EN
Term:  T2.1
TermType: main
Term:  T2.2
TermType: option
lang:  FR
Term:  T2.3
Term Note: main
Term:  T2.4
TermType: option

I tried but I still have some issue to get the desired result, the issue is how to extract the data based on the given xml data structure.

Here is my code:

from xml.dom import minidom

doc = minidom.parse("data.xml")
descs = doc.getElementsByTagName("desc")

for desSetElem in descs:
      type = desSetElem.getAttribute("type")
      if type!='No':
        print('Type: ',type,'     Category:',desSetElem.firstChild.nodeValue)
        lang_termSetElem = doc.getElementsByTagName('lang')
        for lang_term in lang_termSetElem:
             # for lang_tig in lang_tigSetElem:
               lang_type=lang_term.getAttribute(('xml:lang'))
               print('lang: ',lang_type)
               print('Term: ',lang_term.getElementsByTagName("term")[0].firstChild.nodeValue)
               print('Term Type:',lang_term.getElementsByTagName("Typ")[0].firstChild.nodeValue)

Here the result I got:

Type:  Main      Category: DES1.1
lang:  EN
Term:  T1.1
Term Type: main
lang:  FR
Term:  T1.3
Term Type: main
lang:  EN
Term:  T2.1
Term Type: main
lang:  FR
Term:  T2.3
Term Type: main
Type:  Sub      Category: DES1.2
lang:  EN
Term:  T1.1
Term Type: main
lang:  FR
Term:  T1.3
Term Type: main
lang:  EN
Term:  T2.1
Term Type: main
lang:  FR
Term:  T2.3
Term Type: main
Type:  Main      Category: DES2.1
lang:  EN
Term:  T1.1
Term Type: main
lang:  FR
Term:  T1.3
Term Type: main
lang:  EN
Term:  T2.1
Term Type: main
lang:  FR
Term:  T2.3
Term Type: main
Type:  Sub      Category: DES2.2
lang:  EN
Term:  T1.1
Term Type: main
lang:  FR
Term:  T1.3
Term Type: main
lang:  EN
Term:  T2.1
Term Type: main
lang:  FR
Term:  T2.3
Term Type: main

1 Answer 1

1

Consider walking down the three levels of XML with your looping: <cat>, <desc>/<lang>, and <t>. Specifically, since <lang> is a sibling of <desc> it should not be a nested loop. Also, <t> elements would need to be iterated.

Consider also using F-strings (Python 3.6+) and line breaking to conform to PEP-8 standards of 80 characters.

from xml.dom import minidom

doc = minidom.parse("MiniDOMPrintOutput.xml")
cats = doc.getElementsByTagName("cat")

for catElem in cats:
    descs = catElem.getElementsByTagName("desc")
    for desSetElem in descs:
        type = desSetElem.getAttribute("type")
        if type != 'No':
            print(f"Type: {type.ljust(9)}"
                  f"Category: {desSetElem.firstChild.nodeValue}")

    lang_termSetElem = catElem.getElementsByTagName("lang")
    for lang_term in lang_termSetElem:
        lang_type = lang_term.getAttribute(("xml:lang"))
        print(f"lang: {lang_type}")

        lang_tigSetElem = lang_term.getElementsByTagName("t")
        for lang_tig in lang_tigSetElem:
            term = (lang_tig.getElementsByTagName('term')[0]
                            .firstChild
                            .nodeValue)
            Typ = (lang_tig.getElementsByTagName('Typ')[0]
                           .firstChild
                           .nodeValue)

            print(f"Term: {term}")
            print(f"Term Type: {Typ}")

Output

Type: Main     Category: DES1.1
Type: Sub      Category: DES1.2
lang: EN
Term: T1.1
Term Type: main
Term: T1.2
Term Type: option
lang: FR
Term: T1.3
Term Type: main
Term: T1.4
Term Type: option
Type: Main     Category: DES2.1
Type: Sub      Category: DES2.2
lang: EN
Term: T2.1
Term Type: main
Term: T2.2
Term Type: option
lang: FR
Term: T2.3
Term Type: main
Term: T2.4
Term Type: option
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.