1

I am trying to scrape a page using the code below. When I run the code I get an error on the first assignment to the titles variable. The error is: AttributeError: 'NonType' object has no attribute 'split'.

If I simply replace the assignment with print(tag.text) it works as expected. Also the second assignment to the commmands variable works as expected. Why is the first assignment generating the error?

Code:

import requests
import lxml.html as LH

s = requests.Session()
r = s.get('http://www.rebootuser.com/?page_id=1721')

root = LH.fromstring(r.text)
def getTags():
    commands = []
    titles = []

    for tag in root.xpath('//*/tr/td[@width="54%"]/span'):
        titles += tag.text.split(',')

    for tag in root.xpath('//*/td/span/code'):
        commands += tag.text.split(',')

    zipped = zip(titles, commands)

    for item in zipped:
        print item
getTags()

1 Answer 1

1

In the document, some tags that match xpath //*/tr/td[@width="54%"]/span contain b tag as child instead of text.

Accessing text attribute of such tag return None.

>>> None.split(',')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'split'

Use text_content method instead of text attribute to correctly get text content for such tag (and its children):

for tag in root.xpath('/tr/td[@width="54%"]/span'):
    #titles += tag.text.split(',')
    titles += tag.text_content().split(',')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.