I recently started working on python. I am trying to parse a xml document. Consider following xml file for reference:
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
</catalog>
Here I want to retrieve first book tag with all its contents, i.e.
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
I come from scala background, there I can easily do this with
val node = scala.xml.XML.loadString(str)
val nodeSeq = node \\ "book"
nodeSeq.head.toString()
I have tried doing this with lxml with xpath but it gets complicated (fetch recursively content for nested elements) to achieve above requirement. Is there any easy way to do this in python? Also can it be extended for html?
TIA