4

I have an XML document in the following format

<root>
<H D="14/11/2017">
<FC>
    <F LV="0">The quick</F>
    <F LV="1">brown</F>
    <F LV="2">fox</F>
</FC>
</H>
<H D="14/11/2017">
<FC>
    <F LV="0">The lazy</F>
    <F LV="1">fox</F>
</FC>
</H>
</root>

How can I extract the text from 'D' inside H tag and also all the text inside the F tags.

3 Answers 3

13

From ElementTree docs:

We can import this data by reading from a file:

import xml.etree.ElementTree as ET

tree = ET.parse('country_data.xml')
root = tree.getroot()

Or directly from a string:

root = ET.fromstring(country_data_as_string)

and later in the same page, 20.5.1.4. Finding interesting elements:

for neighbor in root.iter('neighbor'):
    print(neighbor.attrib)

Which translate to:

import xml.etree.ElementTree as ET

root = ET.fromstring("""
<root>
<H D="14/11/2017">
<FC>
    <F LV="0">The quick</F>
    <F LV="1">brown</F>
    <F LV="2">fox</F>
</FC>
</H>
<H D="14/11/2017">
<FC>
    <F LV="0">The lazy</F>
    <F LV="1">fox</F>
</FC>
</H>
</root>""")
# root = tree.getroot()
for h in root.iter("H"):
    print (h.attrib["D"])
for f in root.iter("F"):
    print (f.attrib, f.text)

output:

14/11/2017
14/11/2017
{'LV': '0'} The quick
{'LV': '1'} brown
{'LV': '2'} fox
{'LV': '0'} The lazy
{'LV': '1'} fox
Sign up to request clarification or add additional context in comments.

Comments

4

You did not specifiy what exactly you whant to use so i recommend lxml for python. For getting the values you whant you have more possibiltys:

With a loop:

from lxml import etree
tree = etree.parse('XmlTest.xml')
root = tree.getroot()
text = []
for element in root:
   text.append(element.get('D',None))
     for child in element:
       for grandchild in child:
         text.append(grandchild.text)
print(text)

Output: ['14/11/2017', 'The quick', 'brown', 'fox', '14/11/2017', 'The lazy', 'fox']

With xpath:

from lxml import etree
tree = etree.parse('XmlTest.xml')
root = tree.getroot() 
D = root.xpath("./H")
F = root.xpath(".//F")

for each in D:
  print(each.get('D',None))

for each in F:
  print(each.text)

Output: 14/11/2017 14/11/2017 The quick brown fox The lazy fox

Both have there own advantages but give you a good starting point. I recommend the xpath since it gives you more freedom when values are missing.

Comments

1

This should help you

import xml.etree.ElementTree as ET
data='''
<root>
<H D="14/11/2017">
<FC>
    <F LV="0">The quick</F>
    <F LV="1">brown</F>
    <F LV="2">fox</F>
</FC>
</H>
<H D="14/11/2017">
<FC>
    <F LV="0">The lazy</F>
    <F LV="1">fox</F>
</FC>
</H>
</root>
'''
#list created to store data
D_data=[]
F_data=[]

#data parsed
root= ET.XML(data)

#This will get the value of D
for sub in root:
    b=(sub.attrib.get('D'))
    D_data.append(b)

#This will get all the text for F tag in xml
for f in root.iter("F"):
    b=f.text
    #print f.tag,f.attrib,f.text
    F_data.append(b)

print D_data
print F_data

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.