11

I am parsing a JS generated webpage like so:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Firefox()
driver.get('https://www.consumerbarometer.com/en/graph-builder/?question=M1&filter=country:singapore,canada,mexico,brazil,argentina,united_states,bulgaria,austria,belgium,croatia,czech_republic,denmark,estonia,finland,france,germany,greece,hungary,italy,ireland,latvia,lithuania,norway,netherlands,poland,portugal,russia,romania,serbia,slovakia,spain,slovenia,sweden,switzerland,ukraine,united_kingdom,australia,china,israel,hong_kong_sar,japan,korea,new_zealand,malaysia,taiwan,turkey,vietnam')

// wait for svg to appear
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.TAG_NAME, 'svg')))

for text in driver.find_elements_by_class_name('bar-text-label'):
    print(text.text)

driver.close()

Besides getting the text from the class bar-text-label I would also like to get values from an HTML5 data-attribute. For example,<rect rx="3" ry="3" width="76%" height="40" transform="translate(0,40)" data-value="76" class="bar"></rect> and I would like to be able to parse 76 from this.

Is this possible to do in Selenium?

I tried both of the below, with no sucess:

for text in driver.find_elements_by_class_name('bar'): 
    print(data_value.text)

for data in driver.find_elements_by_xpath('//*[contains(@data-value)]/@data-value'): 
    print(data.text)
1
  • Have you tried using the .get_attribute() method on the element after it has been located? Commented Feb 4, 2015 at 17:08

2 Answers 2

13

If you have elements like the following:

<rect rx="3" ry="3" width="76%" height="40" transform="translate(0,40)" data-value="75" class="bar">bar1</rect>
<rect rx="3" ry="3" width="76%" height="40" transform="translate(0,40)" data-value="76" class="bar">bar2</rect>

You can get the text value and the attribute value as follows:

elements = driver.find_elements_by_class_name('bar')
for element in elements:
    print element.text
    print element.get_attribute('data-value')

This prints out:

bar1
75
bar2
76
Sign up to request clarification or add additional context in comments.

Comments

3

You mention you tried:

for text in driver.find_elements_by_class_name('bar'): 
    print(data_value.text)

Seeing as data_value is not defined anywhere, it won't work. If you did print(text.text) you should get the text of each element that has a bar class. (This is essentially what you do in your first snippet.)

You also mention this:

for data in driver.find_elements_by_xpath('//*[contains(@data-value)]/@data-value'): 
    print(data.text)

This cannot work because Selenium's find_element(s)... functions cannot return anything else than elements or lists of elements. You are trying to get it to return an attribute, which won't work. XPath generally allows it, but when you use XPath through Selenium you cannot get anything else than elements.

You could do what Jessamyn Smith suggested or:

results = driver.execute_script("""
    var els = document.getElementsByClassName("bar");
    var ret = [];
    for (var i =0, el; (el = els[i]); ++i) {
        ret.push([el.textContent, el.attributes["data-value"].value]);
    }
    return ret;
""")
for r in results:
    print(r[0], r[1])

This will take one round-trip between your script and the browser. Looping and using .text and .get_attribute() involves 2 round-trips per iteration. The JavasScript builds a list of pairs of results. Each pair contains the text of the element in the first position, and the value of data-value in the second position.

2 Comments

This is very interesting. I did not know you could execute js like that.
I did not know either at first. If you run everything locally, the difference is not great but if you use Sauce Labs, Browser Stack or something to run tests remotely, the round-trips add up a lot. I've reduced the time it takes to run large test suites in half by combining multiple Selenium calls into a single execute_script (or execute_script_async) call.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.