0

I am trying to scrape data from the Sunshine List website (http://www.sunshinelist.ca/) using the BeautifulSoup library and the Selenium package (in order to deal with the 'Next' button on the webpage). I know there are several related posts but I just can't identify where and how I should explicitly ask the driver to wait.

Error: StaleElementReferenceException: Message: The element reference of stale: either the element is no longer attached to the DOM or the page has been refreshed

This is the code I have written:

import numpy as np
import pandas as pd
import requests
import re
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import StaleElementReferenceException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

ffx_bin = FirefoxBinary(r'C:\Users\BhagatM\AppData\Local\Mozilla Firefox\firefox.exe')
ffx_caps = DesiredCapabilities.FIREFOX
ffx_caps['marionette'] = True
driver = webdriver.Firefox(capabilities=ffx_caps,firefox_binary=ffx_bin)
driver.get("http://www.sunshinelist.ca/")
driver.maximize_window()

tablewotags1=[]

while True:
    divs = driver.find_element_by_id('datatable-disclosures')
    divs1=divs.find_elements_by_tag_name('tbody')

    for d1 in divs1:
        div2=d1.find_elements_by_tag_name('tr')
        for d2 in div2:
            tablewotags1.append(d2.text)

    try:
        driver.find_element_by_link_text('Next →').click()
    except NoSuchElementException:
        break

year1=tablewotags1[0::10]
name1=tablewotags1[3::10]
position1=tablewotags1[4::10]
employer1=tablewotags1[1::10]  


df1=pd.DataFrame({'Year':year1,'Name':name1,'Position':position1,'Employer':employer1})
df1.to_csv('Sunshine List-1.csv', index=False)
0

1 Answer 1

0

I think you just need to point to the correct firefox Binary. Also, Which version of Firefox are you using? Looks like it's one of the newer versions, this should do if thats the case.

ffx_bin = FirefoxBinary(r'pathtoyourfirefox')
ffx_caps = DesiredCapabilities.FIREFOX
ffx_caps['marionette'] = True
driver = webdriver.Firefox(capabilities=ffx_caps,firefox_binary=ffx_bin)

Cheers

EDIT: So in order to answer your new enquery, "why is not writting the CVS" you should do so like this:

import csv   # You are missing this import
ls_general_list = []

def csv_for_me(list_to_csv):
    with open(pathtocsv, 'a', newline='') as csvfile:
        sw = csv.writer(csvfile, delimeter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
        for line in list_to_csv:
            for data in line:
                sw.writerow(data)

Then replace this in you code, df=pd.DataFrame({'Year':year,'Name':name,'Position':position,'Employer':employer})

for this one, ls.general_list.append(('Year':year,'Name':name,'Position':position,'Employer':employer))

then do so like this, csv_for_me(ls_general_list)

Please accept the answer if it's satisfactory and now you have a csv

Sign up to request clarification or add additional context in comments.

7 Comments

Your input was very helpful. I have modified my question to a new issue I am facing.
The code to create the .csv file seems to be fine. The issue is that the code is scraping around 120k rows of data and I think Python is not able to handle that. The code worked fine when I tried scraping the first 1000 rows of data (the .csv file was created). Any idea how I could split the data into separate .csv files?
Also another thing I noticed was that even though the code clicks on the 'next' button, the list 'tablewotags' simply ends up storing the data on the first page, multiple times. I haven't been able to identify the problem but I have a feeling the code within the while loop is the issue.
Move tablewotags = [] out of the loop and at the end use the function to write to the csv. I think, you only need some basics libraries to achieve what you want in other words Pandas is not necesary.
I have moved tablewotags =[], but I am now trying to resolve the StaleElementReferenceException.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.