0

I'm working on a project and I'm trying to get lxml to pull stock data from separate tables on separate web pages. When I run my program trying to print the values I'm trying to pull I get empty brackets

('Cash_and_short_term_investments:', [])
('EPSNextYear:', [])

Here is a look at the way I am calling this:

  #the url at this point is http://finviz.com/quote.ashx?t=RAIL confirmed with print statement
   url = driver.current_url
   page2 = requests.get(url)
   tree2 = html.fromstring(page2.content)
   EPSNextYear =              
   tree2.xpath('/html/body/table[3]/tr[1]/td/table/tr[7]/td/table/tr[2]/td[6]/b')
   #Original XPath:/html/body/table[3]/tbody/tr[1]/td/table/tbody/tr[7]/td/table/tbody/tr[2]/td[6]/b
   print ('EPSNextYear:', EPSNextYear)

and:

#the url at this point is https://www.google.com/finance?q=NASDAQ%3ARAIL&fstype=ii&ei=hGwhWNHVPOW7iwLMiIfIDA I've confirmed this with a print
url = driver.current_url
page3 = requests.get(url)
tree3 = html.fromstring(page3.content)
Cash_and_Short_Term_Investments = tree3.xpath('//*[@id="fs-table"]/tr[3]/td[2]/text()')
print('Cash_and_short_term_investments:', Cash_and_Short_Term_Investments)

I have removed the tbody from the XPath like some similar questions have suggested. Any help or suggestions would be greatly appreciated, thanks!

1 Answer 1

1

When asking questions like this, you need to provide a short but complete example which demonstrates the problem.

Looking at your second example, it is clear that the XPath expression you are using is incorrect. You are missing the tbody element from your XPath. (And you might like to select the correct table row by looking for the actual string you are searching.)

Given the following code:

from lxml import etree
import urllib

url="http://www.google.com/finance?q=NASDAQ%3ARAIL&fstype=ii&ei=hGwhWNHVPOW7iwLMiIfIDA"
parser = etree.HTMLParser()
tree = etree.parse(urllib.urlopen(url), parser)
result = tree.xpath('//*[@id="fs-table"]/tbody/tr[normalize-space(td) = "Cash and Short Term Investments"]')
for x in result: print etree.tostring(x)

When running this like so:

> python test.py 

You get the following output:

<tr>
<td class="lft lm">Cash and Short Term Investments
</td>
<td class="r">39.78</td>
<td class="r">78.45</td>
<td class="r">91.21</td>
<td class="r">110.02</td>
<td class="r rm">125.01</td>
</tr>

<tr>
<td class="lft lm">Cash and Short Term Investments
</td>
<td class="r">110.02</td>
<td class="r">161.49</td>
<td class="r">184.49</td>
<td class="r rm">140.49</td>
</tr>

I'm sure you will be able to figure out what is wrong with your first example, once you turned it into a self-contained reproducer of the problem.

Sign up to request clarification or add additional context in comments.

1 Comment

This is a good solution to get the strings, I then used Regex with regular expressions to isolate the numbers.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.