Xpath Not Returning Values lxml Python

Question

I'm working on a project and I'm trying to get lxml to pull stock data from separate tables on separate web pages. When I run my program trying to print the values I'm trying to pull I get empty brackets

('Cash_and_short_term_investments:', [])
('EPSNextYear:', [])

Here is a look at the way I am calling this:

  #the url at this point is http://finviz.com/quote.ashx?t=RAIL confirmed with print statement
   url = driver.current_url
   page2 = requests.get(url)
   tree2 = html.fromstring(page2.content)
   EPSNextYear =              
   tree2.xpath('/html/body/table[3]/tr[1]/td/table/tr[7]/td/table/tr[2]/td[6]/b')
   #Original XPath:/html/body/table[3]/tbody/tr[1]/td/table/tbody/tr[7]/td/table/tbody/tr[2]/td[6]/b
   print ('EPSNextYear:', EPSNextYear)

and:

#the url at this point is https://www.google.com/finance?q=NASDAQ%3ARAIL&fstype=ii&ei=hGwhWNHVPOW7iwLMiIfIDA I've confirmed this with a print
url = driver.current_url
page3 = requests.get(url)
tree3 = html.fromstring(page3.content)
Cash_and_Short_Term_Investments = tree3.xpath('//*[@id="fs-table"]/tr[3]/td[2]/text()')
print('Cash_and_short_term_investments:', Cash_and_Short_Term_Investments)

I have removed the tbody from the XPath like some similar questions have suggested. Any help or suggestions would be greatly appreciated, thanks!

Markus · Accepted Answer · 2016-11-08 08:04:19Z

1

When asking questions like this, you need to provide a short but complete example which demonstrates the problem.

Looking at your second example, it is clear that the XPath expression you are using is incorrect. You are missing the tbody element from your XPath. (And you might like to select the correct table row by looking for the actual string you are searching.)

Given the following code:

from lxml import etree
import urllib

url="http://www.google.com/finance?q=NASDAQ%3ARAIL&fstype=ii&ei=hGwhWNHVPOW7iwLMiIfIDA"
parser = etree.HTMLParser()
tree = etree.parse(urllib.urlopen(url), parser)
result = tree.xpath('//*[@id="fs-table"]/tbody/tr[normalize-space(td) = "Cash and Short Term Investments"]')
for x in result: print etree.tostring(x)

When running this like so:

> python test.py

You get the following output:

<tr>
<td class="lft lm">Cash and Short Term Investments
</td>
<td class="r">39.78</td>
<td class="r">78.45</td>
<td class="r">91.21</td>
<td class="r">110.02</td>
<td class="r rm">125.01</td>
</tr>

<tr>
<td class="lft lm">Cash and Short Term Investments
</td>
<td class="r">110.02</td>
<td class="r">161.49</td>
<td class="r">184.49</td>
<td class="r rm">140.49</td>
</tr>

I'm sure you will be able to figure out what is wrong with your first example, once you turned it into a self-contained reproducer of the problem.

answered Nov 8, 2016 at 8:04

Markus

3,3972 gold badges27 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Marc Over a year ago

This is a good solution to get the strings, I then used Regex with regular expressions to isolate the numbers.

Collectives™ on Stack Overflow

Xpath Not Returning Values lxml Python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related