Scrape values from Website using Selenium

Question

I am trying to extract data from the following website:

https://www.tipranks.com/stocks/sui/stock-analysis

I am targeting the value "6" in the octagon:

I believe I am targeting the correct xpath.

Here is my code:

import sys
import os
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium import webdriver

os.environ['MOZ_HEADLESS'] = '1'
binary = FirefoxBinary('C:/Program Files/Mozilla Firefox/firefox.exe', log_file=sys.stdout)

browser = webdriver.PhantomJS(service_args=["--load-images=no", '--disk-cache=true'])

url = 'https://www.tipranks.com/stocks/sui/stock-analysis'
xpath = '/html/body/div[1]/div/div/div/div/main/div/div/article/div[2]/div/main/div[1]/div[2]/section[1]/div[1]/div[1]/div/svg/text/tspan'
browser.get(url)

element = browser.find_element_by_xpath(xpath)

print(element)

Here is the error that I get back:

Traceback (most recent call last):
  File "C:/Users/jaspa/PycharmProjects/ig-markets-api-python-library/trader/market_signal_IV_test.py", line 15, in <module>
    element = browser.find_element_by_xpath(xpath)
  File "C:\Users\jaspa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath
    return self.find_element(by=By.XPATH, value=xpath)
  File "C:\Users\jaspa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
    'value': value})['value']
  File "C:\Users\jaspa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Users\jaspa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: {"errorMessage":"Unable to find element with xpath '/html/body/div[1]/div/div/div/div/main/div/div/article/div[2]/div/main/div[1]/div[2]/section[1]/div[1]/div[1]/div/svg/text/tspan'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Content-Length":"96","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:51786","User-Agent":"selenium/3.141.0 (python windows)"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"xpath\", \"value\": \"/h3/div/span\", \"sessionId\": \"d8e91c70-9139-11e9-a9c9-21561f67b079\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/d8e91c70-9139-11e9-a9c9-21561f67b079/element"}}
Screenshot: available via screen

I can see that the issue is due to incorrect xpath, but can't figure out why.

I should also point out that using selenium has occurred to me as being the best method to scrape this site, and intend to extract other values and repeat these queries for different stocks on a number of pages. If anybody thinks I would be better with BeutifulSoup, lmxl etc then I am happy to hear suggestions!

Thanks in advance!

Omer Tekbiyik · Accepted Answer · 2019-06-17 20:35:40Z

2

You dont even to declare all path . Octagonal is in the div which class client-components-ValueChange-shape__Octagon so search this div.

x = browser.find_elements_by_css_selector("div[class='client-components-ValueChange-shape__Octagon']") ## Declare which class
for all in x:
    print all.text

Output :

answered Jun 17, 2019 at 20:35

Omer Tekbiyik

4,8041 gold badge19 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

MITHU · Accepted Answer · 2019-06-18 04:56:45Z

2

You can try this css selector [class$='shape__Octagon'] to target the content. If I went for pyppeteer, I would do like the following:

import asyncio
from pyppeteer import launch

async def get_content(url):
    browser = await launch({"headless":True})
    [page] = await browser.pages()
    await page.goto(url)
    await page.waitForSelector("[class$='shape__Octagon']")
    value = await page.querySelectorEval("[class$='shape__Octagon']","e => e.innerText")
    return value

if __name__ == "__main__":
    url = "https://www.tipranks.com/stocks/sui/stock-analysis"
    loop = asyncio.get_event_loop()
    result = loop.run_until_complete(get_content(url))
    print(result.strip())

Output:

edited Jun 18, 2019 at 4:56

answered Jun 18, 2019 at 4:33

MITHU

1664 gold badges15 silver badges45 bronze badges

Comments

Matt Blaha · Accepted Answer · 2019-06-17 21:07:06Z

1

You seem to have two issues here:

For the xpath, I just did:

xpath = '//div[@class="client-components-ValueChange-shape__Octagon"]'

And then do:

print(element.text)

And it gets the value you want. However, your code doesn't actually wait to do the xpath until the browser has finished loading the page. For me, using Firefox, I only get the value about 40% of the time this way. There are many ways to handle this with Selenium, the simplest is probably to just sleep for a few seconds between the browser.get and the xpath statement.

You seem to be setting up Firefox but then using Phantom. I did not try this with Phantom, the sleep behavior may be unnecessary with Phantom.

answered Jun 17, 2019 at 21:07

Matt Blaha

9876 silver badges10 bronze badges

2 Comments

Jaspal Singh Rathour Over a year ago

Thanks Matt. That was a very helpful answer. The sleep statement seems necessary - is this the only way of handling scraping lots of pages? I guess I could scrape all the html and then extract content from this?

Matt Blaha Over a year ago

You're very welcome. Yes, I've found it necessary to do something like this for many pages on the web. The more complex AJAX stuff they do, the more it's necessary. To be clear, if you mean query what's been rendered in a proper browser by "scrape all the html", then yes, that's how I've done it. I've tried both BeautifilSoup and lxml extensively and settled on using selenium. The reality of the web is there's so much complicated AJAX stuff going on, you're better off just going straight to a real browser, and then examining the completely rendered doc like you are here.

Collectives™ on Stack Overflow

Scrape values from Website using Selenium

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related