2

I am trying to extract data from the following website:

https://www.tipranks.com/stocks/sui/stock-analysis

I am targeting the value "6" in the octagon:

enter image description here

I believe I am targeting the correct xpath.

Here is my code:

import sys
import os
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium import webdriver

os.environ['MOZ_HEADLESS'] = '1'
binary = FirefoxBinary('C:/Program Files/Mozilla Firefox/firefox.exe', log_file=sys.stdout)

browser = webdriver.PhantomJS(service_args=["--load-images=no", '--disk-cache=true'])

url = 'https://www.tipranks.com/stocks/sui/stock-analysis'
xpath = '/html/body/div[1]/div/div/div/div/main/div/div/article/div[2]/div/main/div[1]/div[2]/section[1]/div[1]/div[1]/div/svg/text/tspan'
browser.get(url)

element = browser.find_element_by_xpath(xpath)

print(element)

Here is the error that I get back:

Traceback (most recent call last):
  File "C:/Users/jaspa/PycharmProjects/ig-markets-api-python-library/trader/market_signal_IV_test.py", line 15, in <module>
    element = browser.find_element_by_xpath(xpath)
  File "C:\Users\jaspa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath
    return self.find_element(by=By.XPATH, value=xpath)
  File "C:\Users\jaspa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
    'value': value})['value']
  File "C:\Users\jaspa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Users\jaspa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: {"errorMessage":"Unable to find element with xpath '/html/body/div[1]/div/div/div/div/main/div/div/article/div[2]/div/main/div[1]/div[2]/section[1]/div[1]/div[1]/div/svg/text/tspan'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Content-Length":"96","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:51786","User-Agent":"selenium/3.141.0 (python windows)"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"xpath\", \"value\": \"/h3/div/span\", \"sessionId\": \"d8e91c70-9139-11e9-a9c9-21561f67b079\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/d8e91c70-9139-11e9-a9c9-21561f67b079/element"}}
Screenshot: available via screen

I can see that the issue is due to incorrect xpath, but can't figure out why.

I should also point out that using selenium has occurred to me as being the best method to scrape this site, and intend to extract other values and repeat these queries for different stocks on a number of pages. If anybody thinks I would be better with BeutifulSoup, lmxl etc then I am happy to hear suggestions!

Thanks in advance!

3 Answers 3

2

You dont even to declare all path . Octagonal is in the div which class client-components-ValueChange-shape__Octagon so search this div.

x = browser.find_elements_by_css_selector("div[class='client-components-ValueChange-shape__Octagon']") ## Declare which class
for all in x:
    print all.text

Output :

6
Sign up to request clarification or add additional context in comments.

Comments

2

You can try this css selector [class$='shape__Octagon'] to target the content. If I went for pyppeteer, I would do like the following:

import asyncio
from pyppeteer import launch

async def get_content(url):
    browser = await launch({"headless":True})
    [page] = await browser.pages()
    await page.goto(url)
    await page.waitForSelector("[class$='shape__Octagon']")
    value = await page.querySelectorEval("[class$='shape__Octagon']","e => e.innerText")
    return value

if __name__ == "__main__":
    url = "https://www.tipranks.com/stocks/sui/stock-analysis"
    loop = asyncio.get_event_loop()
    result = loop.run_until_complete(get_content(url))
    print(result.strip())

Output:

6

Comments

1

You seem to have two issues here:

For the xpath, I just did:

xpath = '//div[@class="client-components-ValueChange-shape__Octagon"]'

And then do:

print(element.text)

And it gets the value you want. However, your code doesn't actually wait to do the xpath until the browser has finished loading the page. For me, using Firefox, I only get the value about 40% of the time this way. There are many ways to handle this with Selenium, the simplest is probably to just sleep for a few seconds between the browser.get and the xpath statement.

You seem to be setting up Firefox but then using Phantom. I did not try this with Phantom, the sleep behavior may be unnecessary with Phantom.

2 Comments

Thanks Matt. That was a very helpful answer. The sleep statement seems necessary - is this the only way of handling scraping lots of pages? I guess I could scrape all the html and then extract content from this?
You're very welcome. Yes, I've found it necessary to do something like this for many pages on the web. The more complex AJAX stuff they do, the more it's necessary. To be clear, if you mean query what's been rendered in a proper browser by "scrape all the html", then yes, that's how I've done it. I've tried both BeautifilSoup and lxml extensively and settled on using selenium. The reality of the web is there's so much complicated AJAX stuff going on, you're better off just going straight to a real browser, and then examining the completely rendered doc like you are here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.