I have the following XML file:
<?xml version="1.0" encoding="UTF-8"?>
<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:okp="okapi-framework:xliff-extensions" xmlns:its="http://www.w3.org/2005/11/its" xmlns:itsxlf="http://www.w3.org/ns/its-xliff/" its:version="2.0">
<file original="temp/file_conversion/tmp4a9kn6bn/69502fea-751c-4c3c-a38a-4fce9e13ebde.txt" source-language="en" target-language="ar" datatype="x-text/plain" okp:inputEncoding="UTF-8">
<body>
<trans-unit id="1idhasofh" xml:space="preserve">
<source xml:lang="en">foo<bpt id="0"><bar></bpt><Instruction><ept id="0"><crow></ept><grande></source>
<target xml:lang="ar">foo<bpt id="0"><bar></bpt><Instruction><ept id="0"><crow></ept><grande></target>
</trans-unit>
</body>
</file>
</xliff>
I'm trying to create a function that parses an XML file that I've read into an ElementTree.Element:
from xml.etree import ElementTree as ET
def parse_xml(ele: ET.Element):
tag = ele.tag
if not isinstance(tag, str) and tag is not None:
return
t = ele.text
if t:
yield t
for e in ele:
parse_xml(e)
t = e.tail
if t:
yield t
def main():
fp = "path/to/xml"
tree = ET.parse(fp)
root = tree.getroot()
t_units = root.findall(".//{*}trans-unit")
for source, target in t_units:
for ele in parse_xml(source):
print(ele)
I get:
foo
<Instruction>
<grande>
In my debugger, I see that parse_xml(e) gets skipped. When I replace the yields with print statements:
def parse_xml(ele: ET.Element):
tag = ele.tag
if not isinstance(tag, str) and tag is not None:
return
t = ele.text
if t:
print(t)
for e in ele:
parse_xml(e)
t = e.tail
if t:
print(t)
I get the expected result (reaches all the tagged text):
foo
<bar>
<Instruction>
<crow>
<grande>
Why does this happen with yield?
yield from parse_xml(e)parse_xmlagain. Which does not do anything with yielded value.