0

I have the following XML file:

<?xml version="1.0" encoding="UTF-8"?>
<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:okp="okapi-framework:xliff-extensions" xmlns:its="http://www.w3.org/2005/11/its" xmlns:itsxlf="http://www.w3.org/ns/its-xliff/" its:version="2.0">
<file original="temp/file_conversion/tmp4a9kn6bn/69502fea-751c-4c3c-a38a-4fce9e13ebde.txt" source-language="en" target-language="ar" datatype="x-text/plain" okp:inputEncoding="UTF-8">
<body>
<trans-unit id="1idhasofh" xml:space="preserve">
<source xml:lang="en">foo<bpt id="0">&lt;bar&gt;</bpt>&lt;Instruction><ept id="0">&lt;crow&gt;</ept>&lt;grande&gt;</source>
<target xml:lang="ar">foo<bpt id="0">&lt;bar&gt;</bpt>&lt;Instruction><ept id="0">&lt;crow&gt;</ept>&lt;grande&gt;</target>
</trans-unit>
</body>
</file>
</xliff>

I'm trying to create a function that parses an XML file that I've read into an ElementTree.Element:

from xml.etree import ElementTree as ET


def parse_xml(ele: ET.Element):
    tag = ele.tag
    if not isinstance(tag, str) and tag is not None:
        return
    t = ele.text
    if t:
        yield t
    for e in ele:
        parse_xml(e)
        t = e.tail
        if t:
            yield t

def main():
    fp = "path/to/xml"
    tree = ET.parse(fp)
    root = tree.getroot()

    t_units = root.findall(".//{*}trans-unit")
    for source, target in t_units:
        for ele in parse_xml(source):
            print(ele)

I get:

foo
<Instruction>
<grande>

In my debugger, I see that parse_xml(e) gets skipped. When I replace the yields with print statements:

def parse_xml(ele: ET.Element):
    tag = ele.tag
    if not isinstance(tag, str) and tag is not None:
        return
    t = ele.text
    if t:
        print(t)
    for e in ele:
        parse_xml(e)
        t = e.tail
        if t:
            print(t)

I get the expected result (reaches all the tagged text):

foo
<bar>
<Instruction>
<crow>
<grande>

Why does this happen with yield?

2
  • 1
    You have a recursive call that will be visible in prints, but will not yield its values to top level. Try adding yield from parse_xml(e) Commented Oct 16, 2024 at 14:35
  • 1
    Recursion does not magically skip to the top level. Just like when returning, it only passes its values to the function directly calling it, in this case, parse_xml again. Which does not do anything with yielded value. Commented Oct 16, 2024 at 14:43

1 Answer 1

0

parse_xml is a generator function - when it is called,it won't run: instead it will return a generator that has to be iterated over, just like the root call to parse_xml is iterated in the line for ele in parse_xml(source): in your main method.

There, also, the parse_xml call returns immediately with a generator without running any code inside the function - it is the for statement which actually runs the code inside the function and advances it up to a yield statement.

So, you could just loop over the returned value in each recursive parse_xml call, like this:

def parse_xml(ele: ET.Element):
    ...
    for e in ele:
        for inner_e in parse_xml(e):
            yield inner_e
        t = e.tail
        if t:
            yield t

Or, instead, you can use Python's syntax construct yield from meant to deep-nested (or recursive) generator calls - which is a bit more efficient, (and have other advantages for generators which use other features, like accepting values back from the caller function running the generators). This is the recomended way to go:

def parse_xml(ele: ET.Element):
    tag = ele.tag
    if not isinstance(tag, str) and tag is not None:
        return
    t = ele.text
    if t:
        yield t
    for e in ele:
        yield from parse_xml(e)
        t = e.tail
        if t:
            yield t

The yield from used on the return-value of a inner call to parse_xml as here will do the "right thing": defer the execution of the outer (current) parse_xml call, and tunnel each value yielded by the inner call to its caller - when the inner call is over (it yields a StopIteration: the same mechanism that would make a for loop stop), execution of the outer call will resume.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.