0

I am new to web crawling and I am trying to write a simple script to get course names from a University course catalog table:

from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
binary = FirefoxBinary(r'C:\Program Files\Mozilla Firefox\firefox.exe')
driver = webdriver.Firefox(firefox_binary=binary)

url = 'https://courses.illinois.edu/schedule/2018/fall/CS'
driver.get(url)

course_names = []
for i in range(1, 69):
    if(float(i)%2 != 0): #odd row number
        curr_name = driver.find_element_by_css_selector('tr.odd:nth-child(i) > td:nth-child(2) > a:nth-child(1)').text
    else:
        curr_name = driver.find_element_by_css_selector('tr.even:nth-child(i) > td:nth-child(2) > a:nth-child(1)').text

    course_names.append(curr_name)
print(course_names)

driver.quit()

When I run this I get the following error:

InvalidSelectorException: Message: Given css selector expression "tr.odd:nth-child(str(i)) > td:nth-child(2) > a:nth-child(1)" is invalid: InvalidSelectorError: 'tr.odd:nth-child(str(i)) > td:nth-child(2) > a:nth-child(1)' is not a valid selector: "tr.odd:nth-child(str(i)) > td:nth-child(2) > a:nth-child(1)"

I am completely lost on how to get around this. I am just trying to get it to go through the table. It just does not seem to like i. I know this works:

tr.odd:nth-child(1) > td:nth-child(2) > a:nth-child(1)
tr.even:nth-child(2) > td:nth-child(2) > a:nth-child(1)
tr.odd:nth-child(3) > td:nth-child(2) > a:nth-child(1)

Any suggestions?

4
  • not exeperienced with selenium but for me i is inside the string used as the selector and it's not the variable i defined outside which is wrong .... i think you should have something like 'nth-child('+i+')' Commented Mar 25, 2018 at 15:12
  • It seems your css selectors are incorrect. Did you evaluate those? Commented Mar 25, 2018 at 15:17
  • I tried both suggested replacements for i but I am still getting the same error. Any other tips? Commented Mar 25, 2018 at 16:58
  • nvm, 'nth-child('+str(i)+')' eventually worked :) Commented Mar 25, 2018 at 17:47

1 Answer 1

2

There are multiple issues with your code:

  • i is used as a character in your selector. Replace with nth-child(" + str(i) + ")

  • you are filtering the odd and even rows in your script and in the selector. Choose one, not both.

  • locating elements and reading the text in a loop is expensive. Scraping the text directly with some JavaScript would be a better approach.

rows = driver.execute_script("""
    return [].map.call(document.querySelectorAll('#default-dt tbody tr'), row => [
       row.cells[0].innerText,             /* Course number */
       row.cells[1].innerText,             /* Course title  */
       row.querySelector('[href]').href    /* Course link   */
    ]);
    """)

for code, title, href in rows:
    print(code, title, href)
Sign up to request clarification or add additional context in comments.

4 Comments

I tried making the replacement you suggested but it still is giving the same error
Try the script in the console of your browser (F12) to figure out the issue.
how is that going to show anything different? I am running script in jupyter notebook. can browser console run python scripts?
print all the selectors from python and try them in the browser's console with document.querySelector('tr.odd:nth-child(1) > td:nth-child(2) > a:nth-child(1)')

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.