Scraping based on a specific string [python selenium]

Question

I am using selenium in python to scrape a website. Most pages function well, but one exception I can't seem to capture. The html:

<div class="parablock">
  <p>De Hoge Raad acht geen termen aanwezig voor een veroordeling in de proceskosten.<span class="linebreak1"> </span></p>

  <p>
    <span class="emphasis" style="font-weight:bold;">4 Beslissing</span>    </p>
  <p>De Hoge Raad verklaart het beroep in cassatie ongegrond.</p>
</div>

What I am after is the last bit of text: "De Hoge Raad verklaart het beroep in cassatie ongegrond." The problem is, there are several div's with class parablock. There are also multiple span's with class emphasis.

What there is only one of is the one indicated as "Beslissing". However this is not set as a class or anything. Is there an easy way to scrape the required text matching the string "Beslissing" ?

Or do I have to soup the whole page, turn the thing into a string and Regex everything to get the text after "Beslissing" ?

You can try this expression //div[@class="parablock"]/p[span[contains(., "Beslissing")]]/following-sibling::p. — vold
– vold, Commented Apr 11, 2017 at 9:13

Andersson · Accepted Answer · 2017-04-11 09:14:16Z

2

Try to use find_by_xpath('//p[span[contains(text(),"Beslissing")]]/following-sibling::p')

to find <p> element that is sibling of <p> that contains <span> that contains text "Beslissing"

answered Apr 11, 2017 at 9:14

Andersson

52.8k18 gold badges83 silver badges132 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Maresh · Accepted Answer · 2017-04-11 09:18:10Z

1

I think you could use the regex selector from scrapy

Or you can just select all the .parablock and make your own logic such as:

for el in response.css('.parablock'):
    if el.css('.emphasis::text').extract()[0] == '4 Beslissing': # you might want a more bosut comparison here
        my_value = el.css('p::text').extract()[-1]
        break

This is just an example but I'd go for somthing similar if the re selector doesnt cut it.

answered Apr 11, 2017 at 9:18

Maresh

4,7221 gold badge27 silver badges30 bronze badges

Collectives™ on Stack Overflow

Scraping based on a specific string [python selenium]

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related