0

I'm trying to loop through a job listing website to grab their job listing and do text analysis. For this job I use RSelenium. The code I am working on is as follows:

#### REMOTE.COM ####
remDR$navigate('https://remote.com/jobs/all?query=marketing&country=anywhere')
# click on the cookies policy
remDR$findElement(using = 'xpath', '//*[@id="ccc-notify-accept"]')$clickElement()
# print all job listings
num_links <- 20
for(i in 1:num_links){
  remDR$findElement(using = 'xpath', 
                    paste('/html/body/div[2]/main/div/div/div[3]/article[',i,']', sep = ''))$clickElement()
  print(remDR$getCurrentUrl())
  remDR$goBack()
}

The problem is that when I get the loop started, two issues occur.

First, the print(remDR$getCurrentUrl()) command returns the original url (https://remote.com/jobs/all?query=marketing&country=anywhere), not the page that was clicked on in the first part of the for loop. Second, when remDR$goBack() executes, it takes me back to the previous blank page, as if there was no link clicked on.

To summarize, I think the loop is running faster than Rselenium takes to find and click on the element.

EDIT

Solution was found thanks to a recommendation:

for(i in 1:5){
  remDR$findElement(using = 'xpath', 
                    paste('/html/body/div[2]/main/div/div/div[3]/article[',i,']', sep = ''))$clickElement()
  Sys.sleep(2) # add time for page to load
  print(remDR$getCurrentUrl())
  remDR$navigate('https://remote.com/jobs/all?query=marketing&country=anywhere') # .$navigate() works better as it makes the page load and give you time
  Sys.sleep(2) # add time for page to load
}

The steps taken were to give chrome time to load the page Sys.sleep(2) and use .$navigate() instead of goBack(), reason is .$navigate() load content in browser. Important note, loop won't work without the final Sys.sleep(2) as you need the first page to completely load before the loop clicks on the second item.

3
  • 1
    Define a data frame outside loop, replace print() with rbind() to this dataframe. For delay use Sys.sleep(5) or whatever value it will be. Instead of goBack() script it to follow small right arrow on the bottom. Or add a number to &page=2 URL part untill there is a result. Commented Jan 22, 2024 at 22:15
  • @GrzegorzSapijaszko I don't follow the goBack()alternative. Can you expand more on that point? Commented Jan 22, 2024 at 23:12
  • @GrzegorzSapijaszko found a solution based on your recommendation! for(i in 1:5){ remDR$findElement(using = 'xpath', paste('/html/body/div[2]/main/div/div/div[3]/article[',i,']', sep = ''))$clickElement() Sys.sleep(2) # added time for page to load print(remDR$getCurrentUrl()) remDR$navigate('remote.com/jobs/all?query=marketing&country=anywhere') # .$navigate() is the better solution Sys.sleep(2) #added time for page to load } Commented Jan 23, 2024 at 15:59

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.