0

I'm trying to collect the URLs of all possible versions of this page (all combinations of Level, Event, and Season) using Selenium. I've been successful using driver.find_elements_by_xpath to navigate to the correct option and click it before saving the URL, but this has been very slow and I'm wondering if there's a better alternative.

There doesn't seem to be any href attribute I can steal the link from without clicking on the actual option. Using the Select class and trying to loop through the options is cleaner, but I still have to generate the Select object every time - trying to do this:

s = Select(driver.find_element_by_xpath("//label[contains(text(), 'Level')]/../select"))
for option in s.options:
    option.click()

works for the first option, but then gives me the error stale element reference: element is not attached to the page document. I'm stumped - is there a better way to collect these links? Below is my snippet of code:

driver.get("https://athletic.net/TrackAndField/Division/Event.aspx?DivID=89120&Event=1")
for i in range(0, len(driver.find_elements_by_xpath("//label[contains(text(), 'Level')]/../select/option"))):
    driver.find_elements_by_xpath("//label[contains(text(), 'Level')]/../select/option")[i].click()
    for j in range(0, len(driver.find_elements_by_xpath("//optgroup//option[contains(text(), 'Meters')]"))):
        driver.find_elements_by_xpath("//optgroup//option[contains(text(), 'Meters')]")[j].click()
        for k in range(0, len(driver.find_elements_by_xpath("//label[contains(text(), 'Season')]/..//option[contains(text(), 'Indoor')]/../option"))):
            driver.find_elements_by_xpath("//label[contains(text(), 'Season')]/..//option[contains(text(), 'Indoor')]/../option")[k].click()
            for l in range(0, len(driver.find_elements_by_xpath("//label[contains(text(), 'Season')]/..//option[contains(text(), '2018')]/../option"))):
                driver.find_elements_by_xpath("//label[contains(text(), 'Season')]/..//option[contains(text(), '2018')]/../option")[l].click()
                with open("links.txt", 'a+') as f:
                    f.write(driver.current_url + ";")
2
  • instead of clicking on every possible combination just grab every "value" in the option tag for each level. the DivID is just the location value <option value="12345"> the event is the same thing. Commented Jul 17, 2018 at 3:27
  • Thanks! But as I mentioned in my comment to @GPT14, the DivID also seems to change whenever I modify the Level or Season - these menus do not store the DivID as an attribute in each option. I'm not sure how to figure out what the DivID would be if I clicked on one of these options without actually clicking on it. Commented Jul 17, 2018 at 21:09

1 Answer 1

1

The URL is a combination of the Location identified by the 'DivID' and the Event identified by the 'Event'.

So you can use find_elements_by_xpath(plural) to find all the options for both the drop-down lists then using list comprehension extract the value attribute from each option

location_option_list = driver.find_elements_by_xpath("//select[@ng-model='appC.locationDivId']/option")
location_values = [location_option.get_attribute('value') for location_option in location_option_list]

event_option_list = driver.find_elements_by_xpath("//select[@ng-model='appC.params.eventId']//option")
event_values = [event_option.get_attribute('value') for event_option in event_option_list]

urls = ""
for location_value in location_values:
    for event_value in event_values:
        urls += "https://www.athletic.net/TrackAndField/Division/Event.aspx?DivID=%s&Event=%s;" \
                % (location_value, event_value)

The above code would only work for 'High School' and 'Middle School' levels LEVELS. You can easily modify it to handle 'Youth Clubs' and 'College' LEVELS

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, this was very helpful! However, choosing different options for the "Level" and "Season" menus also seems to change the DivID to values that are not accessible in the values for each option. Level seems to assign powers of 2 for the value, while Season assigns the year to each value. How can I use these value attributes to determine what the DivID would be if I clicked on them?
@kreesh :You basically have to loop the code snippet for all combinations of Levels and seasons. Create a list of the urls that get generated, and convert this list to a set so that you get only the unique urls. Create a new question with your code if you get stuck, I would be happy to take a look.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.