1

Here is the HTML code that I am trying to extract the text from:

<fieldset>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">CNPJ:</label>011234560083
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">CIDADE:</label>TAUBATE
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">ESTADO:</label>SP
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">TOTAL BRUTO: </label>2.407,09
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">LIQ: </label>2.344,09
    </div>
</fieldset>

This code,

print browse.find_element_by_xpath("//div[@class='grid-3-12 form-no-lbl']").text

returns just the first element: 011234560083

How can I read values for each label? Like "LIQ:" = 2.344,09

3 Answers 3

1

It seems really odd that your code doesn't work. I haven't run into a case quite like this. I think the code below should work. Basically I grab the text inside the LABEL and prepend it to the text you are already finding. The combination should get you the text you are looking for.

lines = browse.find_elements_by_css_selector("div.grid-3-12.form-no-lbl")
for line in lines
    print line.find_element_by_css_selector("label.form-lbl").text + line.text
Sign up to request clarification or add additional context in comments.

Comments

1

If you have the luxury of having both Selenium and lxml available, you could use Selenium for navigating to the desired page(s), and then using lxml to parse the HTML. For example,

import lxml.html as LH
# content = browser.page_source
content = '''\
<fieldset>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">CNPJ:</label>011234560083
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">CIDADE:</label>TAUBATE
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">ESTADO:</label>SP
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">TOTAL BRUTO: </label>2.407,09
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">LIQ: </label>2.344,09
    </div>
</fieldset>'''

root = LH.fromstring(content)
labels = root.xpath('//fieldset/div[@class="grid-3-12 form-no-lbl"]/label')
data = [[item.strip() for item in [elt.text, elt.tail]] for elt in labels]

yields

[['CNPJ:', '011234560083'],
 ['CIDADE:', 'TAUBATE'],
 ['ESTADO:', 'SP'],
 ['TOTAL BRUTO:', '2.407,09'],
 ['LIQ:', '2.344,09']]

2 Comments

Perfect, but sorry , how can i get the entire HTML from this: {>>> elem = brf.find_element_by_xpath("//div[@class='grid-3-12 form-no-lbl']") >>> print elem <selenium.webdriver.remote.webelement.WebElement (session="a513f6c7-ecf8-4ab4-82 45-208018507295", element="{b87c7a58-e899-4b01-a029-60ecb89d54b7}")>}
@Lara sure, use the elem.get_attribute("outerHTML"). (or use innerHTML if you don't need the current elements representation in the output).
0

It is a rather common problem in Selenium. Just because you cannot directly match the text nodes with find_element_by_* commands.

In your case, I assume you know the LIQ, ESTADO etc labels beforehand and need to get a value by the label.

The idea would be to locate a label by text, move up the tree to the parent, get the text, split by : and get the last element which would correspond to the desired value:

label = "ESTADO"
text = driver.find_element_by_xpath("//label[starts-with(., '%s:')]/.." % label).text
print(text.split(":")[-1].strip())

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.