0

As a python novice I wish to download old newspaper archived on a website (http://digesto.asamblea.gob.ni/consultas/coleccion/) with my script below.

However, I fail to get my script to go through each row of the table and select "PDF" in the dropdown menu saving the corresponding link to a list (in order to download them).

My problem seems to be that the script cannot locate the PDF value from the each dropdown menu using the provided xpath.

This just be the part of the source code which does not function:

table_id = driver.find_element(By.ID, 'gridTableDocCollection')
rows = table_id.find_elements(By.TAG_NAME, "tr") # get all table rows
for row in rows:
    elems = driver.find_elements_by_xpath('//ul[@class="dropdown-menu"]/a')
    for elem in elems:
        print(elem.get_attribute("href"))

Edit:

When I use this code:

list_of_links = driver.find_element_by_xpath('//ul[@class="dropdown-menu"]/li')
print(list_of_links)

I get selenium.webdriver.firefox.webelement.FirefoxWebElement (session="e6799ba5-5f0b-8b4f-817a-721326940b91", element="66c956f0-d813-a840-b24b-a12f92e1189b"instead of a link. What do I do wrong?

Can anyone please help me? I have read for hours through stackoverflow but where never able to get anything working (see part of the code which is commented out).

Disclaimer: when using the script you need to type the captcha by hand without pressing enter for the script to continue.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# A small script to download issues of the Gaceta de Nicaragua (1843-1960) 19758 issues

import logging
from selenium.webdriver.remote.remote_connection import LOGGER
LOGGER.setLevel(logging.WARNING)

import os
import sys
import time
import shutil
from subprocess import call
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.action_chains import ActionChains

profile = webdriver.FirefoxProfile() # profile to prevent download manager
profile.set_preference("network.cookie.cookieBehavior", 0) # accept all cookies
profile.set_preference("network.cookie.lifetimePolicy", 0) # accept cookies
profile.set_preference("network.cookie.alwaysAcceptSessionCookies", 1) # always allow sess
profile.set_preference("browser.download.folderList", 2)
profile.set_preference("browser.download.manager.showWhenStarting", False)
profile.set_preference("browser.download.dir", 'Downloads/')
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", 'image/jpeg;application/jpeg;image/jpg;application/jpg')

url = 'http://digesto.asamblea.gob.ni/consultas/coleccion/' # web page
print('Opening digesto.asamblea.gob.ni...')

driver = webdriver.Firefox(firefox_profile=profile)
driver.get(url) # open url

driver.find_element_by_xpath('//*[@id="cavanzada"]').click() # advanced menu

driver.find_element_by_xpath("//select[@id='slcCollection']/option[text()='Diario Oficial']").click()
driver.find_element_by_xpath("//select[@id='slcMedio']/option[text()='Gaceta Oficial']").click() # change journal name here

inputElement = driver.find_element_by_xpath('//*[@id="txtDatePublishFrom"]')
inputElement.send_keys('01/01/1844') # change start date

inputElement = driver.find_element_by_xpath('//*[@id="txtDatePublishTo"]')
inputElement.send_keys('31/12/1860') # change end date

time.sleep( 5 ) # wait for Human Captcha Insertion

inputElement.send_keys(Keys.ENTER) # search

time.sleep( 2 ) # wait to load

select_element = Select(driver.find_element_by_xpath('//*[@id="slcResPage"]')) # page count
select_element.select_by_value('50') # max 50

time.sleep( 1 ) # wait to load

list_of_links = driver.find_elements_by_xpath('//ul[@class="dropdown-menu"]/a')
print(list_of_links)

#a=[];
#a = driver.find_elements_by_link_text("PDF");
#driver.find_element_by_link_text("PDF").click()
#a = driver.find_element_by_xpath("//select[@class='dropdown-menu']/option[text()='PDF']").click()
#a = driver.find_element_by_xpath('//*[contains(text(), '"dropdown-menu"')] | //*[@#='"PDF"']'); #[contains(@#, "PDF")]
#a = driver.find_elements_by_xpath("//*[contains(text(), 'PDF')]")
#a = driver.find_elements_by_xpath('//div[@class="dropdown-menu"][contains(@#, "PDF")]')
#print(a, sep='\n')
#print(*a, sep='\n')

#driver.find_element(By.CssSelector("a[title='Acciones']")).find_element(By.xpath(".//span[text()='PDF']")).click();

#select_element = Select(driver.find_element_by_xpath('//*[@id="gridTableDocCollection"]/html/body/div[3]/div[1]/div/div/form/div[3]/div[2]/table/tbody/tr[1]/td[5]/div/ul/li[1]/a'))
#select_element.select_by_text('PDF')

table_id = driver.find_element(By.ID, 'gridTableDocCollection')
rows = table_id.find_elements(By.TAG_NAME, "tr") # get all table rows
for row in rows:
    elems = driver.find_elements_by_xpath('//ul[@class="dropdown-menu"]/a')
    for elem in elems:
        print(elem.get_attribute("href"))

1 Answer 1

1

think more about the steps you follow manually. right now, you've started a loop through all of the rows, but not done anything with the "row" element. you'll want to click on the dropdown for the row, then choose the PDF option

table_id = driver.find_element(By.ID, 'tableDocCollection')
rows = table_id.find_elements_by_css_selector("tbody tr") # get all table rows
for row in rows:
    # click on the button to get the dropdowns to appear
    row.find_element_by_css_selector('button').click()
    # now find the one that's the pdf (here, using the fact that the onclick attribute of the link has the text "pdf")
    row.find_element_by_css_selector('li a[onclick*=pdf]').click()

From here, you'll need to go to the new window and download the pdf. Try working that out, then if you need help submit a new question.

Sign up to request clarification or add additional context in comments.

4 Comments

Hi Breaks Software, thank you for your answer! I added your code to the script, but get the same problem as before: selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: button It does not find the dropdown menu/botton. This html body div#page-wrap.wrapper div.container div.row div#infoNorm form#frm div#bavanzada div#gridTableDocCollection table#tableDocCollection.footable.default.no-paging.footable-loaded tbody tr td.footable-visible.footable-last-column div.btn-group button.btn.btn-primary.dropdown-toggle is what I get when I copy the css selector as path.
I had copied your code...the id on the table is wrong (first row of my answer code). from your comment, it should be tableDocCollection
Thank you, Breaks Software. I changed it now. Yet, my basic problem of locating the dropdownmenu persists with the following message now: selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: button. I guess we are almost there.
I made another edit above to only find rows in the table body, which are ones that have a button. the old code found the row in the header, which would not have a button.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.