Python lxml HTML xpath query code not working

Question

I am trying to scrape a page using the code below. When I run the code I get an error on the first assignment to the titles variable. The error is: AttributeError: 'NonType' object has no attribute 'split'.

If I simply replace the assignment with print(tag.text) it works as expected. Also the second assignment to the commmands variable works as expected. Why is the first assignment generating the error?

Code:

import requests
import lxml.html as LH

s = requests.Session()
r = s.get('http://www.rebootuser.com/?page_id=1721')

root = LH.fromstring(r.text)
def getTags():
    commands = []
    titles = []

    for tag in root.xpath('//*/tr/td[@width="54%"]/span'):
        titles += tag.text.split(',')

    for tag in root.xpath('//*/td/span/code'):
        commands += tag.text.split(',')

    zipped = zip(titles, commands)

    for item in zipped:
        print item
getTags()

falsetru · Accepted Answer · 2014-01-07 15:48:15Z

1

In the document, some tags that match xpath //*/tr/td[@width="54%"]/span contain b tag as child instead of text.

Accessing text attribute of such tag return None.

>>> None.split(',')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'split'

Use text_content method instead of text attribute to correctly get text content for such tag (and its children):

for tag in root.xpath('/tr/td[@width="54%"]/span'):
    #titles += tag.text.split(',')
    titles += tag.text_content().split(',')

edited Jan 7, 2014 at 15:48

answered Jan 7, 2014 at 15:34

falsetru

371k69 gold badges769 silver badges659 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python lxml HTML xpath query code not working

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related