2

I'm trying to use python and selenium to loop through a list of webpages and download a file on each page. I am able to open one page at a time and download the first file I want with a while loop but as soon as I get to the second element in the list of webpages, selenium seems to error out.

Here is my code:

path_to_chromedriver = 'path to chromedriver location'
browser = webdriver.Chrome(executable_path = path_to_chromedriver)

browser.get("file:///path to html file")

#these are example webpages
all_trails = ['www.google.com', 'www.yahoo.com', 'www.bing.com']

index = 0

while (index <= 2):

    url = all_trails[index]
    browser.get(url)

    browser.find_element_by_link_text('Sign In').click()

    username = browser.find_element_by_xpath("//input[@placeholder='Log 
    in with email']")
    password = browser.find_element_by_name('pass')

    username.send_keys("username")
    password.send_keys("password")

    browser.find_element_by_xpath("//button[@type='submit' and 
    @class='btn btn-primary btn-lg' and contains(text(), 'Log 
    In')]").click()

    results_url = browser.find_element_by_xpath("//a[@class='require-
    user' and contains(text(), 'GPX File')]").click()
    index += 1

    browser.quit()
    time.sleep(5)

I'm able to download the file from the first element in the array, which is www.google.com. The loop gets to the second list element www.yahoo.com but as soon as it gets to browser.get(url) that's where I run into this error:

Traceback (most recent call last):
  File "trails_scraper.py", line 22, in <module>
    browser.get(url)
  File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 320, in get
    self.execute(Command.GET, {'url': url})
  File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 306, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 460, in execute
    return self._request(command_info[0], url, body=data)
  File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 483, in _request
    self._conn.request(method, parsed_url.path, body, headers)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1053, in request
    self._send_request(method, url, body, headers)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1093, in _send_request
    self.endheaders(body)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1049, in endheaders
    self._send_output(message_body)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 893, in _send_output
    self.send(msg)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 855, in send
    self.connect()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 832, in connect
    self.timeout, self.source_address)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 575, in create_connection
    raise err
socket.error: [Errno 61] Connection refused

Does anyone know what is going on? I know the more error prone method is to use a for loop but logically my code seems correct.

Any help would be magnificently appreciated :)

3
  • 1
    Did you try browser.quit() outside for loop? Commented Dec 15, 2017 at 9:01
  • This helped but it doesn't open up another webpage to download the file. Carlo 1585's answer below allows the webdrive to open another page, I didn't realize it needed the path to chromedriver to open another page. Thanks though! Commented Dec 15, 2017 at 18:56
  • What do you mean by needed the path to chromedriver to open another page? You need path to chromedriver just to run chromedriver... If chromedriver file already in the Path, you don't need to specify it explicitly. And it definitely cannot affect on opening another page! Commented Dec 15, 2017 at 19:07

1 Answer 1

2

So the problem is that you are declaring your browser out of the loop so, when the loop finish the 1 time it close the browser and if fail for your

browser.get(url)

Because there is any browser.

you have 2 solution:

1) you introduce the browser declaration inside of the loop

path_to_chromedriver = 'path to chromedriver location'


#these are example webpages
all_trails = ['www.google.com', 'www.yahoo.com', 'www.bing.com']

index = 0

while (index <= 2):
    browser = webdriver.Chrome(executable_path = path_to_chromedriver)

    browser.get("file:///path to html file")

    url = all_trails[index]
    browser.get(url)

    browser.find_element_by_link_text('Sign In').click()

    username = browser.find_element_by_xpath("//input[@placeholder='Log 
    in with email']")
    password = browser.find_element_by_name('pass')

    username.send_keys("username")
    password.send_keys("password")

    browser.find_element_by_xpath("//button[@type='submit' and 
    @class='btn btn-primary btn-lg' and contains(text(), 'Log 
    In')]").click()

    results_url = browser.find_element_by_xpath("//a[@class='require-
    user' and contains(text(), 'GPX File')]").click()
    index += 1

    browser.quit()
    time.sleep(5)

2) you close the browser just after the loop

path_to_chromedriver = 'path to chromedriver location'
browser = webdriver.Chrome(executable_path = path_to_chromedriver)

browser.get("file:///path to html file")

#these are example webpages
all_trails = ['www.google.com', 'www.yahoo.com', 'www.bing.com']

index = 0

while (index <= 2):

    url = all_trails[index]
    browser.get(url)

    browser.find_element_by_link_text('Sign In').click()

    username = browser.find_element_by_xpath("//input[@placeholder='Log 
    in with email']")
    password = browser.find_element_by_name('pass')

    username.send_keys("username")
    password.send_keys("password")

    browser.find_element_by_xpath("//button[@type='submit' and 
    @class='btn btn-primary btn-lg' and contains(text(), 'Log 
    In')]").click()

    results_url = browser.find_element_by_xpath("//a[@class='require-
    user' and contains(text(), 'GPX File')]").click()
    index += 1
    time.sleep(5)
browser.quit()
Sign up to request clarification or add additional context in comments.

1 Comment

The first solution worked but I used a combination of both solutions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.