1

I am currently trying to extract all the text from one single XML element. This usually works, but I somehow don't get it to work when there is an additional element inside. Please see my minimal example:

xmlstring = """
<a>
    <b> TEXT 1 <c> PHRASE </c> TEXT 2</b>
</a>

"""

parser = etree.XMLParser()
tree = etree.fromstring(xmlstring, parser)

What I tried is:

reslist = list(root.iter())
result = ' '.join([element.text for element in reslist]) 

The output is:

'\n\t  TEXT 1   PHRASE '

The desired output would be:

'\n\t TEXT 1   PHRASE   TEXT 2 '
2
  • You need to consider the tail property of the c element. Commented Aug 25, 2019 at 12:00
  • Thank you very much, and sorry for the duplicate question! Commented Aug 26, 2019 at 10:06

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.