How to write a selenium loop in Python?

Question

I want to web-scrape data from many different websites which contains javascript code (thus why I am using the selenium method to get the information). Everything is working great, but when I try to load the next URL I get a very long error message :

> Traceback (most recent call last):
  File "C:/Python27/air17.py", line 46, in <module>
    scrape(urls)
  File "C:/Python27/air17.py", line 28, in scrape
    browser.get(url)
  File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 268, in get
    self.execute(Command.GET, {'url': url})
  File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 254, in execute
    response = self.command_executor.execute(driver_command, params)
  File "C:\Python27\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 464, in execute
    return self._request(command_info[0], url, body=data)
  File "C:\Python27\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 487, in _request
    self._conn.request(method, parsed_url.path, body, headers)
  File "C:\Python27\lib\httplib.py", line 1042, in request
    self._send_request(method, url, body, headers)
  File "C:\Python27\lib\httplib.py", line 1082, in _send_request
    self.endheaders(body)
  File "C:\Python27\lib\httplib.py", line 1038, in endheaders
    self._send_output(message_body)
  File "C:\Python27\lib\httplib.py", line 882, in _send_output
    self.send(msg)
  File "C:\Python27\lib\httplib.py", line 844, in send
    self.connect()
  File "C:\Python27\lib\httplib.py", line 821, in connect
    self.timeout, self.source_address)
  File "C:\Python27\lib\socket.py", line 575, in create_connection
    raise err
error: [Errno 10061]

The data from the first website is in the csv file, but when the code tries to open the next website it freezes, and I get this error message. What am I doing wrong?

from bs4 import BeautifulSoup
from selenium import webdriver
import time
import urllib2
import unicodecsv as csv
import os
import sys
import io
import time
import datetime
import pandas as pd
from bs4 import BeautifulSoup
import MySQLdb
import re
import contextlib
import selenium.webdriver.support.ui as ui

filename=r'output.csv'

resultcsv=open(filename,"wb")
output=csv.writer(resultcsv, delimiter=';',quotechar = '"', quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1')
output.writerow(['TIME','FLIGHT','FROM','AIRLANE','AIRCRAFT','IHAVETODELETETHIS','STATUS'])


def scrape(urls):
    browser = webdriver.Firefox()
    for url in urls:
        browser.get(url)
        html = browser.page_source
        soup=BeautifulSoup(html,"html.parser")
        table = soup.find('table', { "class" : "table table-condensed table-hover data-table m-n-t-15" })
        datatable=[]
        for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"):
            temp_data = []
            for data in record.find_all("td"):
                temp_data.append(data.text.encode('latin-1'))
            datatable.append(temp_data)

        output.writerows(datatable)

        resultcsv.close()
        time.sleep(10) 
        browser.quit()

urls = ["https://www.flightradar24.com/data/airports/bud/arrivals", "https://www.flightradar24.com/data/airports/fco/arrivals"]
scrape(urls)

These have too be outside the loop (one tab less): resultcsv.close() browser.quit() — CrazyElf
– CrazyElf, Commented Jul 26, 2017 at 10:02

Community · Accepted Answer · 2020-06-20 09:12:55Z

4

Not sure that the browser.quit() you have at the end of the method is such a good idea. According to the Selenium doc :

quit()

Quits the driver and close every associated window.

I think a browser.close()(as documented here) would be enough in the loop. Keep the browser.quit() outside the loop.

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Jul 26, 2017 at 9:57

Cédric Julien

81.2k16 gold badges131 silver badges134 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

CrazyElf Over a year ago

I don't think even browser.close() needed inside the loop

Alex Garcia Over a year ago

Indeed, the quit is killing the webdriver

Cédric Julien Over a year ago

@CrazyElf close the current page is cleaner, it will release memory.

Collectives™ on Stack Overflow

How to write a selenium loop in Python?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related