3

I'm trying to access text from elements that have different xpaths but very predictable href schemes across multiple pages in a web database. Here are some examples:

<a href="/mathscinet/search/mscdoc.html?code=65J22,(35R30,47A52,65J20,65R30,90C30)">
65J22 (35R30 47A52 65J20 65R30 90C30) </a>

In this example I would want to extract "65J22 (35R30 47A52 65J20 65R30 90C30)"

<a href="/mathscinet/search/mscdoc.html?code=05C80,(05C15)">
05C80 (05C15) </a>

In this example I would want to extract "05C80 (05C15)". My web scraper would not be able to search by xpath directly due to the xpaths of my desired elements changing between pages, so I am looking for a more roundabout approach.

My main idea is to use the fact that every href contains "/mathscinet/search/mscdoc.html?code=". Selenium can't directly search for hrefs, but I was thinking of doing something similar to this C# implementation:

Driver.Instance.FindElement(By.XPath("//a[contains(@href, 'long')]"))

To port this over to python, the only analogous method I could think of would be to use the in operator, but I am not sure how the syntax will work when everything is nested in a find_element_by_xpath. How would I bring all of these ideas together to obtain my desired text?

driver.find_element_by_xpath("//a['/mathscinet/search/mscdoc.html?code=' in @href]").text

3 Answers 3

7

If I right understand you want to locate all elements, that have same partial href. You can use this:

elements = driver.find_elements_by_xpath("//a[contains(@href, '/mathscinet/search/mscdoc.html')]")
for element in elements:
    print(element.text)

or if you want to locate one element:

driver.find_element_by_xpath("//a[contains(@href, '/mathscinet/search/mscdoc.html')]").text

This will give a list of all elements located.

Sign up to request clarification or add additional context in comments.

1 Comment

Update for selenium 4.18.1 driver.find_element(by=By.XPATH, value="//a[contains(@href, '/mathscinet/search/mscdoc.html')]").text
1

As per the HTML you have shared @AndreiSuvorkov's answer would possibly cater to your current requirement. Perhaps you can get much more granular and construct an optimized xpath by:

  • Instead of using contains using starts-with
  • Include the ?code= part of the @href attribute
  • Your effective code block will be:

    all_elements = driver.find_elements_by_xpath("//a[starts-with(@href,'/mathscinet/search/mscdoc.html?code=')]")
    for elem in all_elements:
        print(elem.get_attribute("innerHTML"))
    

Comments

1

Andrei Goldmann's answer is correct, updating answer with new selenium formatting (selenium 4.18.1)

from selenium.webdriver.common.by import By

elements = driver.find_elements(by=By.XPATH, value="//a[contains(@href, '/mathscinet/search/mscdoc.html')]")
for element in elements:
    print(element.text)

or if you want to locate one element:

from selenium.webdriver.common.by import By

driver.find_element(by=By.XPATH, value="//a[contains(@href, '/mathscinet/search/mscdoc.html')]").text

This will give a list of all elements located.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.