How can I extract text after an element using Python and Selenium?

Question

Here is the HTML code that I am trying to extract the text from:

<fieldset>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">CNPJ:</label>011234560083
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">CIDADE:</label>TAUBATE
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">ESTADO:</label>SP
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">TOTAL BRUTO: </label>2.407,09
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">LIQ: </label>2.344,09
    </div>
</fieldset>

This code,

print browse.find_element_by_xpath("//div[@class='grid-3-12 form-no-lbl']").text

returns just the first element: 011234560083

How can I read values for each label? Like "LIQ:" = 2.344,09

JeffC · Accepted Answer · 2016-06-29 18:00:58Z

1

It seems really odd that your code doesn't work. I haven't run into a case quite like this. I think the code below should work. Basically I grab the text inside the LABEL and prepend it to the text you are already finding. The combination should get you the text you are looking for.

lines = browse.find_elements_by_css_selector("div.grid-3-12.form-no-lbl")
for line in lines
    print line.find_element_by_css_selector("label.form-lbl").text + line.text

answered Jun 29, 2016 at 18:00

JeffC

26.4k5 gold badges35 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Peter Mortensen · Accepted Answer · 2022-11-16 02:14:33Z

1

If you have the luxury of having both Selenium and lxml available, you could use Selenium for navigating to the desired page(s), and then using lxml to parse the HTML. For example,

import lxml.html as LH
# content = browser.page_source
content = '''\
<fieldset>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">CNPJ:</label>011234560083
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">CIDADE:</label>TAUBATE
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">ESTADO:</label>SP
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">TOTAL BRUTO: </label>2.407,09
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">LIQ: </label>2.344,09
    </div>
</fieldset>'''

root = LH.fromstring(content)
labels = root.xpath('//fieldset/div[@class="grid-3-12 form-no-lbl"]/label')
data = [[item.strip() for item in [elt.text, elt.tail]] for elt in labels]

yields

[['CNPJ:', '011234560083'],
 ['CIDADE:', 'TAUBATE'],
 ['ESTADO:', 'SP'],
 ['TOTAL BRUTO:', '2.407,09'],
 ['LIQ:', '2.344,09']]

edited Nov 16, 2022 at 2:14

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Jun 18, 2016 at 2:50

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

2 Comments

Lara Over a year ago

Perfect, but sorry , how can i get the entire HTML from this: {>>> elem = brf.find_element_by_xpath("//div[@class='grid-3-12 form-no-lbl']") >>> print elem <selenium.webdriver.remote.webelement.WebElement (session="a513f6c7-ecf8-4ab4-82 45-208018507295", element="{b87c7a58-e899-4b01-a029-60ecb89d54b7}")>}

alecxe Over a year ago

@Lara sure, use the elem.get_attribute("outerHTML"). (or use innerHTML if you don't need the current elements representation in the output).

alecxe · Accepted Answer · 2016-06-18 02:39:30Z

0

It is a rather common problem in Selenium. Just because you cannot directly match the text nodes with find_element_by_* commands.

In your case, I assume you know the LIQ, ESTADO etc labels beforehand and need to get a value by the label.

The idea would be to locate a label by text, move up the tree to the parent, get the text, split by : and get the last element which would correspond to the desired value:

label = "ESTADO"
text = driver.find_element_by_xpath("//label[starts-with(., '%s:')]/.." % label).text
print(text.split(":")[-1].strip())

answered Jun 18, 2016 at 2:39

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Collectives™ on Stack Overflow

How can I extract text after an element using Python and Selenium?

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related