2

I am having trouble with a web scraping function. The XPath for the two things I am trying to get are

/html/body/div/table[2]/tbody/tr[5]/td[1]/div[1]/ul/li[1]/text()
/html/body/div/table[2]/tbody/tr[5]/td[1]/div[1]/ul/li[1]/a

The html is

<li><a href="http://www.acu.edu/" target="_blank" class="institution">Abilene Christian University</a> (TX)</li>

I am trying to have a function to loop through each li in tr[5]. The problem I am having is getting the text(). I have tried a number of different variations of this function

from lxml.html import parse
from urllib2 import urlopen
def _clean(lst):
    for elm in lst:
        lnk=elm.findall('.//a')
        for this in lnk:
            lnk_txt.append(this.text_content())
        state_txt.append(elm.findall('.//text()'))

This specific function returns an KeyError on the '()'. If I remove (), it returns a list of empty elements. The lnk_txt works.

What I am trying to get are two list. One is the name of the University. The other is the location of the University. The ultimate goal is to make tuples (name, state).

1
  • It is the (TX). I added the sample and my packages to the post Commented Sep 18, 2015 at 14:52

1 Answer 1

2

You need to find the following text sibling of the a element:

lnk.xpath("following-sibling::text()")

Demo:

>>> import lxml.html
>>> data = '<li><a href="http://www.acu.edu/" target="_blank" class="institution">Abilene Christian University</a> (TX)</li>'
>>> li = lxml.html.fromstring(data)
>>> li.xpath("//a[@class='institution']/following-sibling::text()")[0].strip()
'(TX)'
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you it worked. Is there a resource you used for the answer or did you know it from experience?
@lost I would say this is a specific skill "locating elements in the html". Study xpath syntax, css selectors - there is a lot of information out there on the web. But, I would say, practice and practice more.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.