0

In this question I was helped to address a dropdown menu in a table. However, I wish to fetch the url from the source code which is:

<a href="#" onclick="window.open('/consultas/util/pdf.php?type=rdd&amp;rdd=nYgT5Rcvs2I%3D');return false;">PDF</a>

and store it in a list, instead of clicking on it as it is currently done. The link in the above code is /consultas/util/pdf.php?type=rdd&rdd=nYgT5Rcvs2I%3D. However, I would need to add before each fetched link http://digesto.asamblea.gob.ni to complete the link.

How can I achieve that?

This is my current script and this the website http://digesto.asamblea.gob.ni/consultas/coleccion/:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# A small script to download issues of the Gaceta de Nicaragua (1843-1960) 19758 issues

import logging
from selenium.webdriver.remote.remote_connection import LOGGER
LOGGER.setLevel(logging.WARNING)

import os
import sys
import time
import shutil
import urllib
from subprocess import call
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.action_chains import ActionChains

profile = webdriver.FirefoxProfile() # profile to prevent download manager
profile.set_preference("network.cookie.cookieBehavior", 0) # accept all cookies
profile.set_preference("network.cookie.lifetimePolicy", 0) # accept cookies
profile.set_preference("network.cookie.alwaysAcceptSessionCookies", 1) # always allow sess
profile.set_preference("browser.download.folderList", 2)
profile.set_preference("browser.link.open_newwindow", 1) # open tabs in same window
profile.set_preference("browser.download.manager.showWhenStarting", False)
profile.set_preference("browser.download.dir", 'Downloads/')
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", 'image/jpeg;application/jpeg;image/jpg;application/jpg')

url = 'http://digesto.asamblea.gob.ni/consultas/coleccion/' # web page
print('Opening digesto.asamblea.gob.ni...')

driver = webdriver.Firefox(firefox_profile=profile)
driver.get(url) # open url

driver.find_element_by_xpath('//*[@id="cavanzada"]').click() # advanced menu
driver.find_element_by_xpath("//select[@id='slcCollection']/option[text()='Diario Oficial']").click()
driver.find_element_by_xpath("//select[@id='slcMedio']/option[text()='Gaceta Oficial']").click() # change journal name here

inputElement = driver.find_element_by_xpath('//*[@id="txtDatePublishFrom"]')
inputElement.send_keys('01/01/1844') # change start date

inputElement = driver.find_element_by_xpath('//*[@id="txtDatePublishTo"]')
inputElement.send_keys('31/12/1860') # change end date

time.sleep( 5 ) # wait for Human Captcha Insertion

inputElement.send_keys(Keys.ENTER) # search

time.sleep( 2 ) # wait to load

select_element = Select(driver.find_element_by_xpath('//*[@id="slcResPage"]')) # page count
select_element.select_by_value('50') # max 50

time.sleep( 1 ) # wait to load

table_id = driver.find_element(By.ID, 'tableDocCollection')
rows = table_id.find_elements_by_css_selector("tbody tr") # get all table rows
for row in rows:
    row.find_element_by_css_selector('button').click()
    row.find_element_by_css_selector('li a[onclick*=pdf]').click() # .get_attribute("href")
    list_of_links = driver.current_url
    driver.close() # quit() #close window
    print(list_of_links)

Disclaimer: when using the script you need to type the captcha by hand without pressing enter for the script to continue.

1
  • 1
    I'd recommend in the future to add only the relevant parts of the code - the full listing did help me to repeat it (manually), but the sheer length of it probably turned off a lot of folks from actually reading it. I'm speaking out of personal experience - too often I don't bother reading the full question, if it doesn't grab me immediately and there are two pages of code I have to skim through. E.g. a recommendation IMHO. Commented Jan 12, 2019 at 7:22

1 Answer 1

1

Relative links starting off with / are from the top-level domain, e.g. http://digesto.asamblea.gob.ni in your case; on the other hand, if they don't start with that, they are from the current page. Inside the loop where you're scraping the links, change the code to this:

list_of_links = []    # will hold the scraped links
tld = 'http://digesto.asamblea.gob.ni'
current_url = driver.current_url   # for any links not starting with /
for row in rows:
    row.find_element_by_css_selector('button').click()
    link = row.find_element_by_css_selector('li a[onclick*=pdf]').get_attribute("href")
    if link.startswith('/'):
        list_of_links.append(tld + link)
    else:
        list_of_links.append(current_url + link)

    # at this point the dropdown will be visible, and will interfere with the next loop cycle
    # click again in it, so the menu closes
    row.find_element_by_css_selector('button').click()

print(list_of_links)
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you, Todor, for your recommendation and answer. Regarding the first, it did not do it intentionally but thought that it might be better to answer the question. I will next time add less code. I have tried your code and get an error message selenium.common.exceptions.ElementClickInterceptedException: Message: Element <button class="btn btn-primary dropdown-toggle" type="button"> is not clickable at point (1199.658317565918,270.18333435058594) because another element <a href="#"> obscures it.
I've realized what is the issue - after the dropdown with the link is expanded, it stays like this; then in the next cycle selenium cannot click the button on the next row - the dropdown is in the way. The "fix" I'm thinking of is to once again click the button - this will probably hide the dropdown (I can't check myself, typing on mobile). If that doesn't do it, find some other place/locator to click, so that dropdown hides at the end of every cycle.
Thank you Todor, that fixed my issue.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.