How to open multiple hrefs within a webtable to scrape through selenium

Question

I'm trying to scrape this website using python and selenium. However all the information I need is not on the main page, so how would I click the links in the 'Application number' column one by one go to that page scrape the information then return to original page?

Ive tried:

def getData():
  data = []
  select = Select(driver.find_elements_by_xpath('//*[@id="node-41"]/div/div/div/div/div/div[1]/table/tbody/tr/td/a/@href'))
  list_options = select.options
  for item in range(len(list_options)):
    item.click()
  driver.get(url)

URL: http://www.scilly.gov.uk/planning-development/planning-applications

Screenshot of the site:

does it open the link in new tab or same window? Also, show us what have you tried so far. — theGuy
– theGuy, Commented Sep 11, 2018 at 15:43
no it doesnt open a new tab it opens it on the same window and ive edited to show what I tried — Liban West
– Liban West, Commented Sep 11, 2018 at 15:53

DisappointedByUnaccountableMod · Accepted Answer · 2021-02-08 15:07:57Z

1

To open multiple hrefs within a webtable to scrape through selenium you can use the following solution:

Code Block:

  from selenium import webdriver
  from selenium.webdriver.chrome.options import Options
  from selenium.webdriver.support.ui import WebDriverWait
  from selenium.webdriver.common.by import By
  from selenium.webdriver.support import expected_conditions as EC

  hrefs = []
  options = Options()
  options.add_argument("start-maximized")
  options.add_argument("disable-infobars")
  options.add_argument("--disable-extensions")
  options.add_argument("--disable-gpu")
  options.add_argument("--no-sandbox")
  driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\ChromeDriver\chromedriver_win32\chromedriver.exe')
  driver.get('http://www.scilly.gov.uk/planning-development/planning-applications')
  windows_before  = driver.current_window_handle # Store the parent_window_handle for future use
  elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "td.views-field.views-field-title>a"))) # Induce WebDriverWait for the visibility of the desired elements
  for element in elements:
      hrefs.append(element.get_attribute("href")) # Collect the required href attributes and store in a list
  for href in hrefs:
      driver.execute_script("window.open('" + href +"');") # Open the hrefs one by one through execute_script method in a new tab
      WebDriverWait(driver, 10).until(EC.number_of_windows_to_be(2)) # Induce  WebDriverWait for the number_of_windows_to_be 2
      windows_after = driver.window_handles
      new_window = [x for x in windows_after if x != windows_before][0] # Identify the newly opened window
      # driver.switch_to_window(new_window) <!---deprecated>
      driver.switch_to.window(new_window) # switch_to the new window
      # perform your webscraping here
      print(driver.title) # print the page title or your perform your webscraping
      driver.close() # close the window
      # driver.switch_to_window(windows_before) <!---deprecated>
      driver.switch_to.window(windows_before) # switch_to the parent_window_handle
  driver.quit() #Quit your program

Console Output:

  Planning application: P/18/064 | Council of the ISLES OF SCILLY
  Planning application: P/18/063 | Council of the ISLES OF SCILLY
  Planning application: P/18/062 | Council of the ISLES OF SCILLY
  Planning application: P/18/061 | Council of the ISLES OF SCILLY
  Planning application: p/18/059 | Council of the ISLES OF SCILLY
  Planning application: P/18/058 | Council of the ISLES OF SCILLY
  Planning application: P/18/057 | Council of the ISLES OF SCILLY
  Planning application: P/18/056 | Council of the ISLES OF SCILLY
  Planning application: P/18/055 | Council of the ISLES OF SCILLY
  Planning application: P/18/054 | Council of the ISLES OF SCILLY

References

You can find a couple of relevant detailed discussions in:

edited Feb 8, 2021 at 15:07

DisappointedByUnaccountableMod

6,8444 gold badges21 silver badges23 bronze badges

answered Sep 11, 2018 at 18:09

undetected Selenium

194k44 gold badges304 silver badges387 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Liban West Over a year ago

it works thank you but if you can comments would help understand the code even more?

undetected Selenium Over a year ago

@LibanWest Updated the solution with required comments for your convenience

Liban West Over a year ago

hello if your available and free I have question I could use your help with stackoverflow.com/questions/52364188/…

Liban West Over a year ago

Oh Im sorry I thought i did but it looks like i just accepted :) fixed it

Julian Silvestri · Accepted Answer · 2018-09-11 16:49:43Z

0

What you can do is the following:

import selenium
from selenium.webdriver.common.keys import Keys
from selenium import Webdriver
import time

url = "url"
browser = Webdriver.Chrome() #or whatever driver you use
browser.find_element_by_class_name("views-field views-field-title").click()
# or use this browser.find_element_by_xpath("xpath")
#Note you will need to change the class name to click a different item in the table
    time.sleep(5) # not the best way to do this but its simple. Just to make sure things load
#it is here that you will be able to scrape the new url I will not post that as you can scrape what you want. 
# When you are done scraping you can return to the previous page with this
driver.execute_script("window.history.go(-1)")

hope this is what you are looking for.

edited Sep 11, 2018 at 16:49

answered Sep 11, 2018 at 15:23

Julian Silvestri

2,0451 gold badge17 silver badges35 bronze badges

1 Comment

Liban West Over a year ago

<=' not supported between instances of 'int' and 'builtin_function_or_method'

theGuy · Accepted Answer · 2018-09-11 17:10:24Z

0

When you navigate to new page DOM is refreshed and you cannot use list method here. Here is my approach for this action (I don't code much in python so syntax and indendation may be broken)

count = driver.find_elements_by_xpath("//table[@class='views-table cols-6']/tbody/tr") # to count total number of links
len(count)
j = 1
if j<=len:
    driver.find_element_by_xpath("//table[@class='views-table cols-6']/tbody/tr["+str(j)+"]/td/a").click()

    #add wait here
    #do your scrape action here  

    driver.find_element_by_xpath("//a[text()='Back to planning applications']").click()#to go back to main page

    #add wait here for main page to load.
    j+=1

edited Sep 11, 2018 at 17:10

answered Sep 11, 2018 at 16:05

theGuy

6935 silver badges16 bronze badges

2 Comments

Liban West Over a year ago

and driver.find_element_by_xpath("//table[@class='views-table cols-6']/tbody/tr["+j+"]/td/a").click() TypeError: can only concatenate str (not "int") to str

theGuy Over a year ago

try again now with updated code. I have added str to j. Note the change here str(j)

Collectives™ on Stack Overflow

How to open multiple hrefs within a webtable to scrape through selenium

3 Answers 3

References

4 Comments

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

References

4 Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related