How to read xml in python?

Question

<feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:georss="http://www.georss.org/georss">
<id>tag:blogger.com,1999:blog-5575774992011030841.archive</id>
<updated>2016-12-08T11:17:36.889+05:30</updated>
<title type="text">Rochak News</title>
<link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://www.rochaknews.com/feeds/archive">
<link rel="self" type="application/atom+xml" href="http://www.rochaknews.com/feeds/archive">
<link rel="http://schemas.google.com/g/2005#post" type="application/atom+xml" href="https://www.blogger.com/feeds/5575774992011030841/archive">
<link rel="alternate" type="text/html" href="http://www.rochaknews.com/">
<author>
<generator version="7.00" uri="https://www.blogger.com">Blogger</generator>
<entry>
<id>tag:blogger.com,1999:blog-5575774992011030841.settings.BLOG_BY_POST_ARCHIVING</id>
<published>2016-08-22T19:39:26.425+05:30</published>
<updated>2016-12-08T11:17:36.889+05:30</updated>
<category scheme="http://schemas.google.com/g/2005#kind" term="http://schemas.google.com/blogger/2008/kind#settings"></category>
<title type="text">Whether to provide an archive page for each post</title>
<content type="text">true</content>
<link rel="edit" type="application/atom+xml" href="https://www.blogger.com/feeds/5575774992011030841/settings/BLOG_BY_POST_ARCHIVING">
<link rel="self" type="application/atom+xml" href="https://www.blogger.com/feeds/5575774992011030841/settings/BLOG_BY_POST_ARCHIVING">
<author>
</entry>
<entry></entry>
<entry></entry>
<entry></entry>
<entry>
<id>tag:blogger.com,1999:blog-5575774992011030841.post-1104181070415553676</id>
<published>2016-09-06T13:47:00.006+05:30</published>
<updated>2016-09-06T13:47:55.331+05:30</updated>
<category scheme="http://schemas.google.com/g/2005#kind" term="http://schemas.google.com/blogger/2008/kind#post"></category>
<category scheme="http://www.blogger.com/atom/ns#" term="Bollywood"></category>
<category scheme="http://www.blogger.com/atom/ns#" term="rochaknews"></category>
<title type="text">OMG ! प्रियंका और दीपिका के बाद अब हॉलीवुड जाएगी ये एक्ट्रेस</title>
<content type="html"><div dir="ltr" style="text-align: left;" trbidi="on"><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-vBoYugcMUgc/V857M85aCkI/AAAAAAAAApY/vsIMDYcsy4AiVVxlkavmuG9wWl7mT8n1wCLcB/s1600/Sonam-kapoor-indian-fashion-pictures.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="384" src="https://3.bp.blogspot.com/-vBoYugcMUgc/V857M85aCkI/AAAAAAAAApY/vsIMDYcsy4AiVVxlkavmuG9wWl7mT8n1wCLcB/s640/Sonam-kapoor-indian-fashion-pictures.jpg" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"></div><br /><br />प्रियंका और दीपिका के बाद अब सोनम कपूर को भी लगा हॉलीवुड का चस्का। &nbsp;बॉलीवुड के बाद अब एक्ट्रेस सोनम कपूर हॉलीवुड में अपनी नयी जर्नी की शुरुआत करने जा रही हैं। स्टाइल डिवा सोनम अब हॉलीवुड फिल्मों में हाथ आज़माने जा रही हैं। सोनम ने इस बात की खबर खुद सोशल मीडिया पर ट्वीट के जरिये दी। उन्होने अपने फैंस को बताया की उन्हें यूनाइटेड टेलेंट एजेंसी ने सभी क्षेत्रों में प्रतिनिधित्व के लिए चुना है।<br /><br />हॉलीवुड में कपूर कोई पहला नाम नहीं है इससे पहले सोनम के पापा भी हॉलीवुड में ‘स्लमडॉग मिलेनियर’ और ‘मिशन इम्पॉसिबल: घोस्ट प्रोटोकॉल’ से अपनी एक्टिंग के जरिये कई दिल जीत चुके हैं। दीपिका पादुकोण और प्रियंका चोपड़ा के बाद अब सोनम भी हॉलीवुड अपने जलवे बिखेरने की तैयारी में हैं। फिलहाल सोनम अपनी नयी फिल्म ‘वीरे दी वैडिंग’ की शूटिंग में बिज़ि हैं।<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-mPIxgYqcXSY/V857TD4J2NI/AAAAAAAAApc/bdPmebhpSMM1L7dPGYDfZtLJ-5Bbt7AswCLcB/s1600/hot%2Bsonam%2Bkapur.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="450" src="https://1.bp.blogspot.com/-mPIxgYqcXSY/V857TD4J2NI/AAAAAAAAApc/bdPmebhpSMM1L7dPGYDfZtLJ-5Bbt7AswCLcB/s640/hot%2Bsonam%2Bkapur.jpg" width="640" /></a></div><br /><a href="http://www.rochaknews.com/2016/09/blog-post_80.html">Read more: ब्रेस्ट इम्प्लांट' के कारण सुर्ख़ियों में आईं ये एक्ट्रेस</a><br /><a href="http://www.rochaknews.com/2016/09/blog-post_22.html">Read more:‘पॉर्न हब’ ने नेत्रहीनों के लिए लॉन्च किया एक खास चैनल</a><br /><a href="http://www.rochaknews.com/2016/08/blog-post_94.html">Read more: अय्याशी के लिए यहाँ मिलते है एक से बढ़ कर एक लड़कियां</a><br /><a href="http://www.rochaknews.com/2016/09/blog-post_69.html">Read more: जानिए, पुरुष बनाना चाहते है ऐसी महिलाओ से सम्बन्ध</a></div></content>
<link rel="replies" type="application/atom+xml" href="http://www.rochaknews.com/feeds/1104181070415553676/comments/default" title="टिप्पणियाँ भेजें">
<link rel="replies" type="text/html" href="http://www.rochaknews.com/2016/09/omg_6.html#comment-form" title="0 टिप्पणियाँ">
<link rel="edit" type="application/atom+xml" href="https://www.blogger.com/feeds/5575774992011030841/posts/default/1104181070415553676">
<link rel="self" type="application/atom+xml" href="https://www.blogger.com/feeds/5575774992011030841/posts/default/1104181070415553676">
<link rel="alternate" type="text/html" href="http://www.rochaknews.com/2016/09/omg_6.html" title="OMG ! प्रियंका और दीपिका के बाद अब हॉलीवुड जाएगी ये एक्ट्रेस">
<author>
<media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://3.bp.blogspot.com/-vBoYugcMUgc/V857M85aCkI/AAAAAAAAApY/vsIMDYcsy4AiVVxlkavmuG9wWl7mT8n1wCLcB/s72-c/Sonam-kapoor-indian-fashion-pictures.jpg" height="72" width="72"></media:thumbnail>
<thr:total>0</thr:total>
</entry>
<entry></entry>

</feed>

I am reading this xml file in python. Above xml is only sample with having only some content.It have basically <feed> and child <entry> and sub child category like <title>,<published> and <content type="html">. Only some <entry> tag have content type = html. I want to read only <entry> tag which have <content type="html">. How can i read it? Please sugggest

i got <entry> and its child as:

import xml.etree.ElementTree as etree
tree = etree.parse('xmlread/template/blog-12-08-2016.xml')
entries = tree.findall('{http://www.w3.org/2005/Atom}entry')
print len(entries)



for child in entries:
  title_element = child.find('{http://www.w3.org/2005/Atom}title')
  published = child.find('{http://www.w3.org/2005/Atom}published')
  # print published.text
  # print title_element.text

But unable to read only specific <entry> tag which have only content type="html". Please look into the matter.

Community · Accepted Answer · 2017-05-23 11:45:40Z

1

I think your XML is funked up.

xml.etree.ElementTree.ParseError: mismatched tag: line 21, column 2

After that's fixed, here's how you do it:

parent = ET.fromstring(data)
print [child.text for child in parent.findall('.//child[@type="html"]')]

Example from SO:

import xml.etree.ElementTree as ET

data = """
<parent>
    <child attr="test">1</child>
    <child attr="something else">2</child>
    <child other_attr="other">3</child>
    <child>4</child>
    <child attr="test">5</child>
</parent>
"""

parent = ET.fromstring(data)
print [child.text for child in parent.findall('.//child[@attr]')]
print [child.text for child in parent.findall('.//child[@attr="test"]')]

edited May 23, 2017 at 11:45

CommunityBot

11 silver badge

answered Dec 10, 2016 at 8:08

paragbaxi

4,26511 gold badges48 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to read xml in python?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related