scraping a dynamic webpage from R - RSelenium issue

Ask Question

Asked 1 year ago

Modified 1 year ago

Viewed 102 times

Part of R Language Collective

I am trying to scrape dynamically filled webpages like this, on R.

I am trying to do that with RSelenium, but I am open to alternatives. For example, I would happy to do everything with rvest only.

The issue with RSelenium is that it does not start at all (even trying with Chrome). Just after loading the package, this is the output:

> rD <- rsDriver(browser = "firefox", port = 4545L, geckover = "latest")
checking Selenium Server versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD
checking chromedriver versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD
checking geckodriver versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD
checking phantomjs versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD
[1] "Connecting to remote server"
Could not open firefox browser.
Client error message:
Undefined error in httr call. httr output: Failed to connect to localhost port 4545 after 2259 ms: Couldn't connect to server
Check server log for further details.
Warning message:
In rsDriver(browser = "firefox", port = 4545L, geckover = "latest") :
  Could not determine server status.

I have seen a similar issue in an question from another forum, but the only solution in that case simply seemed to be specifying the port.

With Chrome there appears to be the problem that Chrome is now at version 130, while ChromeDriver only gets to support up to the version 113, if I understand correctly.

asked Oct 28, 2024 at 13:05

oibaFox

1419 bronze badges

2

For rvest you might want to check previous Q&As targeting rvest::read_html_live() - stackoverflow.com/…

margusl
– margusl

2024-10-28 13:41:02 +00:00
Commented Oct 28, 2024 at 13:41
btw, in that specific case I don't think ou actually need to scrape, download lets you configure a file format and points you to something like https://live.euronext.com/en/pd_es/data/stocks/download?mics=dm_all_stock&initialLetter=&fe_type=csv&fe_decimal_separator=.&fe_date_format=d%2Fm%2FY for CSV export.

margusl
– margusl

2024-10-28 16:42:09 +00:00
Commented Oct 28, 2024 at 16:42
@margusl yes, thanks. I have actually seen that I can download the list. I am using that page only as an example because it is the weirdest one I have seen: even once saved the page locally, the content I see on the browser does not appear. The page I am actually interested is it.finance.yahoo.com/screener/new.

oibaFox
– oibaFox

2024-10-28 16:56:43 +00:00
Commented Oct 28, 2024 at 16:56
moreover, the screener I'd like to access can only be viewed with a login....

oibaFox
– oibaFox

2024-10-28 17:26:48 +00:00
Commented Oct 28, 2024 at 17:26
1

One example with read_html_live & login: stackoverflow.com/q/78948903/646761 . read_html_live() is basically an interface for chromote, so you might find this - github.com/rstudio/chromote/… - useful.

margusl
– margusl

2024-10-29 12:50:54 +00:00
Commented Oct 29, 2024 at 12:50

| Show 1 more comment

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

scraping a dynamic webpage from R - RSelenium issue

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked