3

This is a somewhat backwards approach to web scraping. I need to locate the xpath of a web element AFTER I have already found it with a text()= identifier

Because the xpath values are different based on what information shows up, I need to use predictable labels inside the row for locating the span text next to found element. I found a simple and reliable way is locating the keyword label and then increasing td integer by one inside the xpath.

    def x_label(self, contains):
         mls_data_xpath = f"//span[text()='{contains}']"
         string = self.driver.find_element_by_xpath(mls_data_xpath).get_attribute("xpath")
         digits = string.split("td[")[1]
         num = int(re.findall(r'(\d+)', digits)[0]) + 1
         labeled_data = f'{string.split("td[")[0]}td[{num}]/span'
         print(labeled_data)
         labeled_text = self.driver.find_element_by_xpath(labeled_data).text
         return labeled_text

I cannot find too much information on .get_attribute() and get_property() so I am hoping there is something like .get_attribute("xpath") but I haven't been able to find it.

Basically, I am taking in a string like "ApprxTotalLivArea" which I can rely on and then increasing the integer after td[0] by 1 to find the span data from cell next door. I am hoping there is something like a get_attributes("xpath") to locate the xpath string from the element I locate through my text()='{contains}' search.

I need to use predictable labels inside the row for locating the span text next to element

2
  • 1
    Have you checked this thread. Is it helpful? Commented Mar 31, 2022 at 20:59
  • That is a great starting point but I have to make it work for python. Commented Mar 31, 2022 at 21:21

4 Answers 4

2

The Remote WebElement does includes the following methods:

But xpath isn't a valid property of a WebElement. So get_attribute("xpath") will always return NULL

Sign up to request clarification or add additional context in comments.

1 Comment

That is pseudo code that obviously doesn't work but explains what I am looking for - I haven't found anything in get_attribute or get_property that is relevant for xpath values but it seems like I should be able to generate an xpath from the element somehow.
2

This function iteratively get's the parent until it hits the html element at the top

from selenium import webdriver
from selenium.webdriver.common.by import By


def get_xpath(elm):
    e = elm
    xpath = elm.tag_name
    while e.tag_name != "html":
        e = e.find_element(By.XPATH, "..")
        neighbours = e.find_elements(By.XPATH, "../" + e.tag_name)
        level = e.tag_name
        if len(neighbours) > 1:
            level += "[" + str(neighbours.index(e) + 1) + "]"
        xpath = level + "/" + xpath
    return "/" + xpath

driver = webdriver.Chrome()
driver.get("https://www.stackoverflow.com")
login = driver.find_element(By.XPATH, "//a[text() ='Log in']")
xpath = get_xpath(login)
print(xpath)

assert login == driver.find_element(By.XPATH, xpath)

Hope this helps!

Comments

0

I was able to find a python version of the execute script from this post that was based off a JavaScript answer in another forum. I had to make a lot of .replace() calls on the string this function creates but I was able to universally find the label string I need and increment the td/span xpath by +1 to find the column data and retrieve it regardless of differences in xpath values on different page listings.

def x_label(self, contains):
    label_contains = f"//span[contains(text(), '{contains}')]"
    print(label_contains)
    labeled_element = self.driver.find_element_by_xpath(label_contains)
    print(labeled_element)
    element_label = labeled_element.text
    print(element_label)

    self.driver.execute_script("""
    window.getPathTo = function (element) {
        if (element.id!=='')
            return 'id("'+element.id+'")';
        if (element===document.body)
            return element.tagName;

        var ix= 0;
        var siblings= element.parentNode.childNodes;
        for (var i= 0; i<siblings.length; i++) {
            var sibling= siblings[i];
            if (sibling===element)
                return window.getPathTo(element.parentNode)+'/'+element.tagName+'['+(ix+1)+']';
            if (sibling.nodeType===1 && sibling.tagName===element.tagName)
                ix++;
        }
    }
    """)

    generated_xpath = self.driver.execute_script("return window.getPathTo(arguments[0]);", labeled_element)
    generated_xpath = f'//*[@{generated_xpath}'.lower().replace('tbody[1]', 'tbody')

    print(f'generated_xpath = {generated_xpath}')

    expected_path = r'//*[@id="wrapperTable"]/tbody/tr/td/table/tbody/tr[26]/td[6]/span'

    generated_xpath = generated_xpath.replace('[@id("wrappertable")', '[@id="wrapperTable"]').replace('tr[1]', 'tr')
    clean_path = generated_xpath.replace('td[1]', 'td').replace('table[1]', 'table').replace('span[1]', 'span')
    print(f'clean_path = {clean_path}')
    print(f'expected_path = {expected_path}')
    digits = generated_xpath.split("]/td[")[1]
    print(digits)
    num = int(re.findall(r'(\d+)', digits)[0]) + 1
    print(f'Number = {num}')
    labeled_data = f'{clean_path.split("td[")[0]}td[{num}]/span'
    print(f'labeled_data = {labeled_data}')
    print(f'expected_path = {expected_path}')

    if labeled_data == expected_path:
        print('Congrats')
    else:
        print('Rats')

    labeled_text = self.driver.find_element_by_xpath(labeled_data).text
    print(labeled_text)
    return labeled_text

Comments

0

An upgrade of Tom Fuller's function. The following helps to find the correct xpath if there are elements with the same tag_name (and, for example, class) in the parent element:

def get_xpath(elm):
    e = elm
    xpath = elm.tag_name
    i=0 # Счетчик финального элемента
    while e.tag_name != "html":
        if i==0: # Сохраняем родительский элемент финального-искомого (только в первый цикл)
            parent_elm=e.find_element(By.XPATH, "..")
            i+=1
        e = e.find_element(By.XPATH, "..")
        neighbours = e.find_elements(By.XPATH, "../" + e.tag_name)
        level = e.tag_name
        if len(neighbours) > 1:
            level += "[" + str(neighbours.index(e) + 1) + "]"
        xpath = level + "/" + xpath
    
    
    elm_count=1
    other_elements=parent_elm.find_elements('xpath', elm.tag_name)
    for other_element in other_elements:
        if other_element==elm:
            final_element_count=elm_count
        else:
            elm_count+=1
    if final_element_count>1:
        final_xpath="/" + xpath+f'[{str(final_element_count)}]'
    else:
        final_xpath="/" + xpath
    return final_xpath

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.