2

I am fairly new to web scraping. I am trying to write something in python with selenium that will automatically log on to a website and click multiple options from a drop down menu. When all those options have been set, a button will be clicked and then a new page pops up with multiple hrefs. This is where I am running into problems. I am trying to click all the hrefs, but all the hrefs have this structure

<a href="WebsiteName.asp?qt=1&amp;qa=0&amp;ben=1&amp;tpt=0&amp;cl=Something&amp;gl=1&amp;life=1&amp;smo=1">Export</a>  

Where only 'life=1' and 'smo=1' may change to something else in the above HTML.

Most other problems that I have encountered here, tend to have hrefs with a class or something of the like that makes clicking these links more convenient.

The code below is what I have so far.

import selenium,time
import os
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup, SoupStrainer
import requests

#credentials
usernameStr = 'SomeUsername'
passwordStr = 'SomePassword'

browser = webdriver.Firefox(executable_path = r'C:\Users\Name\Downloads\geckodriver-v0.24.0-win64\geckodriver.exe')
url = 'http://somewebsite.com/something/'
browser.get(url)

username = browser.find_element_by_id('username')
username.send_keys(usernameStr)

password = browser.find_element_by_id('password')
password.send_keys(passwordStr)

loginInButton = browser.find_element_by_id("login")
loginInButton.click()

browser.find_element_by_xpath("//*[@id='LifeType']").send_keys("Dual")
browser.find_element_by_id("btnRefresh").click()
browser.find_element_by_id("btnExport").click()
 
other_url = 'http://somewebsite.com/something/exportToExcelChoice.asp?qt=1&qa=0&ben=1&tpt=0&gl=1&cl=CAESFFHIILNI'

below is the where I encounter the problems

page = requests.get(other_url)    
data = page.text
soup = BeautifulSoup(data, features="html.parser")

for link in soup.find_all('a'):
    link.get('href')
    browser.find_element_by_link_text("Export").click()

With Beautiful Soup I can easily print out the required links, but I am note sure if it is even necessary since I cannot click the links. I am still trying to work this one out.

PS I know this isn't strictly web scraping since all that I am doing is clicking buttons with the ultimate goal of putting everything into a csv file.

HTML:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

<head id="Head1">
    <title>
        Quote
    </title>
    <link href="StyleSheet.css" rel="stylesheet" type="text/css" />
    <link rel="StyleSheet" type="text/css" href="/include/arikibo.css" />
    <STYLE type="text/css">
        td {
            font-size: 14px
        }
    </STYLE>

</head>

<body>
    <span STYLE="font-family: Arial, Helvetica, Sans Serif; font-size:20px">

        <table cellpadding="3" cellspacing="0" border="0" >
            <tr>
                <td colspan="5">Please select the type of csv file you wish to generate<br><br>
                <b>Please be patient as this may take a few moments!</b><br><br></td>
            </tr>
            <tr>
                <td>Male s</td><td><a href="Name.asp?qt=1&qa=0&ben=1&tpt=0&cl=CAESFFHIILNI&gl=1&life=1&smo=1">Export</a></td>
                <td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</td>
                <td>Male s</td><td><a href="Name.asp?qt=1&qa=0&ben=1&tpt=0&cl=CAESFFHIILNI&gl=1&life=1&smo=2">Export</a></td>
            </tr>
            <tr>
                <td>Female Non-s</td><td><a href="Name.asp?qt=1&qa=0&ben=1&tpt=0&cl=CAESFFHIILNI&gl=1&life=2&smo=1">Export</a></td>
                <td>&nbsp;</td>
                <td>Female s</td><td><a href="Name.asp?qt=1&qa=0&ben=1&tpt=0&cl=CAESFFHIILNI&gl=1&life=2&smo=2">Export</a></td>
            </tr>           
            <tr>
                <td>Joint Non-s</td><td><a href="Name.asp?qt=1&qa=0&ben=1&tpt=0&cl=CAESFFHIILNI&gl=1&life=3&smo=1">Export</a></td>
                <td>&nbsp;</td>
                <td>Joint s</td><td><a href="Name.asp?qt=1&qa=0&ben=1&tpt=0&cl=CAESFFHIILNI&gl=1&life=3&smo=2">Export</a></td>
            </tr>           
            <tr>
                <td>Dual Non-s</td><td><a href="Name.asp?qt=1&qa=0&ben=1&tpt=0&cl=CAESFFHIILNI&gl=1&life=4&smo=1">Export</a></td>
                <td>&nbsp;</td>
                <td>Dual s</td><td><a href="Name.asp?qt=1&qa=0&ben=1&tpt=0&cl=CAESFFHIILNI&gl=1&life=4&smo=2">Export</a></td>
            </tr>           
        </table>
    </span>
</body>

</html>
2
  • What happen when click to links? As I understood export to excel happen Commented Feb 23, 2019 at 17:35
  • Sorry yes I should have made that clear. When I click the links, I will be exporting a csv file with some values to a specific directory. Commented Feb 23, 2019 at 17:39

1 Answer 1

0

As I understood popup is a new window and you have to switch to it :

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

#...
other_url = 'http://somewebsite.com/something/exportToExcelChoice.asp?qt=1&qa=0&ben=1&tpt=0&gl=1&cl=CAESFFHIILNI'

wait = WebDriverWait(browser, 10)

#handles = driver.window_handles
browser.get(other_url)
#wait.until(EC.new_window_is_opened(handles))
#driver.switch_to.window(driver.window_handles[-1])

links = wait.until(EC.visibility_of_all_elements_located((By.TAG_NAME,"a")))
for link in links:
    link.click()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.