Using Python and Elementtree, I'm having trouble parsing XML into text line items such that each line item represents one level only, no more, no less. Each line item will be eventually one record in a database such that the user can search on multiple terms within that field. Sample XML:
?xml version="1.0" encoding="utf-8"?>
<root>
<mainTerm>
<title>Meat</title>
<see>protein</see>
</mainTerm>
<mainTerm>
<title>Vegetables</title>
<see>starch</see>
</mainTerm>
<mainTerm>
<title>Fruit</nemod></title>
<term level="1">
<title>Apple</title>
<code>apl</code>
</term>
<term level="1">
<title>Red Delicious</title>
<code>rd</code>
<term level="2">
<title>Large Red Delicious</title>
<code>lrd</code>
</term>
<term level="2">
<title>Medium Red Delicious</title>
<code>mrd</code>
</term>
<term level="2">
<title>Small Red Delicious</title>
<code>mrd</code>
</term>
<term level="1">
<title>Grapes</title>
<code>grp</code>
</term>
<term level="1">
<title>Peaches</title>
<code>pch</code>
</term>
</mainTerm>
</root>
Desired Output:
Meat > protein
Vegetables > starch
Fruit > Apple > apl
Fruit > Apple > apl > Red Delicious > rd
Fruit > Apple > apl > Red Delicious > rd > Large Red Delicious > lrd
Fruit > Apple > apl > Red Delicious > rd > Medium Red Delicious > mrd
Fruit > Apple > apl > Red Delicious > rd > Small Red Delicious > srd
Fruit > Grapes > grp
Fruit > Peaches > pch
It's easy enough to use the tag 'mainTerm' to parse the XML, but the tricky part is limiting each line to only one level but at the same time including the upper level terms as well in the text. I'm basically trying to "flatten" the XML hierarchy by creating unique lines of text, each of which lists its parents (e.g. Fruit > Apple > apl) but not its siblings (e.g. Large Red Delicious, Medium Red Delicious, or Small Red Delicious).
I realize this can be accomplished by first converting the data to a relational database format, then running a query, etc, but I was hoping for a more direct solution directly from the XML.
Hope this makes sense...thanks
</nemod>tag, no closing<term>tag.