Finding an element by partial href (Python Selenium)

Question

I'm trying to access text from elements that have different xpaths but very predictable href schemes across multiple pages in a web database. Here are some examples:

<a href="/mathscinet/search/mscdoc.html?code=65J22,(35R30,47A52,65J20,65R30,90C30)">
65J22 (35R30 47A52 65J20 65R30 90C30) </a>

In this example I would want to extract "65J22 (35R30 47A52 65J20 65R30 90C30)"

<a href="/mathscinet/search/mscdoc.html?code=05C80,(05C15)">
05C80 (05C15) </a>

In this example I would want to extract "05C80 (05C15)". My web scraper would not be able to search by xpath directly due to the xpaths of my desired elements changing between pages, so I am looking for a more roundabout approach.

My main idea is to use the fact that every href contains "/mathscinet/search/mscdoc.html?code=". Selenium can't directly search for hrefs, but I was thinking of doing something similar to this C# implementation:

Driver.Instance.FindElement(By.XPath("//a[contains(@href, 'long')]"))

To port this over to python, the only analogous method I could think of would be to use the in operator, but I am not sure how the syntax will work when everything is nested in a find_element_by_xpath. How would I bring all of these ideas together to obtain my desired text?

driver.find_element_by_xpath("//a['/mathscinet/search/mscdoc.html?code=' in @href]").text

Andrei · Accepted Answer · 2018-07-17 05:08:52Z

7

If I right understand you want to locate all elements, that have same partial href. You can use this:

elements = driver.find_elements_by_xpath("//a[contains(@href, '/mathscinet/search/mscdoc.html')]")
for element in elements:
    print(element.text)

or if you want to locate one element:

driver.find_element_by_xpath("//a[contains(@href, '/mathscinet/search/mscdoc.html')]").text

This will give a list of all elements located.

edited Jul 17, 2018 at 5:08

answered Jul 17, 2018 at 5:03

Andrei

5,6675 gold badges27 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Viragos Over a year ago

Update for selenium 4.18.1 driver.find_element(by=By.XPATH, value="//a[contains(@href, '/mathscinet/search/mscdoc.html')]").text

undetected Selenium · Accepted Answer · 2018-07-17 06:52:41Z

1

As per the HTML you have shared @AndreiSuvorkov's answer would possibly cater to your current requirement. Perhaps you can get much more granular and construct an optimized xpath by:

Instead of using contains using starts-with
Include the ?code= part of the @href attribute

Your effective code block will be:

all_elements = driver.find_elements_by_xpath("//a[starts-with(@href,'/mathscinet/search/mscdoc.html?code=')]")
for elem in all_elements:
    print(elem.get_attribute("innerHTML"))

answered Jul 17, 2018 at 6:52

undetected Selenium

194k44 gold badges304 silver badges387 bronze badges

Comments

Viragos · Accepted Answer · 2024-05-01 15:52:08Z

1

Andrei Goldmann's answer is correct, updating answer with new selenium formatting (selenium 4.18.1)

from selenium.webdriver.common.by import By

elements = driver.find_elements(by=By.XPATH, value="//a[contains(@href, '/mathscinet/search/mscdoc.html')]")
for element in elements:
    print(element.text)

or if you want to locate one element:

from selenium.webdriver.common.by import By

driver.find_element(by=By.XPATH, value="//a[contains(@href, '/mathscinet/search/mscdoc.html')]").text

This will give a list of all elements located.

answered May 1, 2024 at 15:52

Viragos

6517 silver badges16 bronze badges

Collectives™ on Stack Overflow

Finding an element by partial href (Python Selenium)

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related