0

I'm trying to parse the following text from the XML

title_text = word1 Word2 word3 word4

The problem is that with the code below I'm getting title_text = 'word1'.

How can I achieve that?

XML:

<response>...<results>...<grouping>...<group>...
    <doc>...
         <title>
             word1
             <hlword>Word2</hlword>
             <hlword>word3</hlword>
             word4
          </title>
          ...
    </doc>
</group>...</grouping>...</results>...</response>...

Code for parse:

from lxml import objectify
...
tree = objectify.fromstring(xml)
nodes = tree.response.results.grouping.group
for node in nodes:
    title_element = node.doc.title
    title_text = title_element.text
    print title_text

1 Answer 1

1

Just iterate over .itertext():

>>> for node in nodes:
...    print(' '.join(node.doc.title.itertext()))
...
word1 word2 word3 word4
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.