1

I basically want to scrape Feb 2016 - Present under <span class="visually-hidden">, but I can't see to get to it. Here's the HTML at code:

<div class="pv-entity__summary-info">

<h3 class="Sans-17px-black-85%-semibold">Litigation Paralegal</h3>

<h4>
  <span class="visually-hidden">Company Name</span>
  <span class="pv-entity__secondary-title Sans-15px-black-55%">Olswang</span>
</h4>


  <div class="pv-entity__position-info detail-facet m0"><h4 class="pv-entity__date-range Sans-15px-black-55%">
      <span class="visually-hidden">Dates Employed</span>
      <span>Feb 2016 – Present</span>
    </h4><h4 class="pv-entity__duration de Sans-15px-black-55% ml0">
        <span class="visually-hidden">Employment Duration</span>
        <span class="pv-entity__bullet-item">1 yr 2 mos</span>
      </h4><h4 class="pv-entity__location detail-facet Sans-15px-black-55% inline-block">
      <span class="visually-hidden">Location</span>
      <span class="pv-entity__bullet-item">London, United Kingdom</span>
    </h4></div>

</div>

And here is what I've been doing at the moment with selenium in my code:

        date= browser.find_element_by_xpath('.//div[@class = "pv-entity__duration de Sans-15px-black-55% ml0"]').text
        print date

But this gives no results. How would I go about either pulling the date?

2
  • Which text are you trying to extract? the Feb 2016 - Present one or 1 yr 2 mos? Commented Mar 27, 2017 at 14:21
  • Updated original message. Feb 2016 - Present is what I'm trying to scrape Commented Mar 27, 2017 at 14:33

2 Answers 2

2

There is no div with class="pv-entity__duration de Sans-15px-black-55% ml0", but h4. If you want to get text of div, then try:

date= browser.find_element_by_xpath('.//div[@class = "pv-entity__position-info detail-facet m0"]').text
print date

If you want to get "Feb 2016 - Present", then try

date= browser.find_element_by_xpath('//h4[@class="pv-entity__date-range Sans-15px-black-55%"]/span[2]').text
print date
Sign up to request clarification or add additional context in comments.

Comments

0

You can rewrite your xpath code something like this :

# -*- coding: utf-8 -*-
from lxml import html
import unicodedata


html_str = """
<div class="pv-entity__summary-info">

<h3 class="Sans-17px-black-85%-semibold">Litigation Paralegal</h3>

<h4>
  <span class="visually-hidden">Company Name</span>
  <span class="pv-entity__secondary-title Sans-15px-black-55%">Olswang</span>
</h4>


  <div class="pv-entity__position-info detail-facet m0"><h4 class="pv-entity__date-range Sans-15px-black-55%">
      <span class="visually-hidden">Dates Employed</span>
      <span>Feb 2016 – Present</span>
    </h4><h4 class="pv-entity__duration de Sans-15px-black-55% ml0">
        <span class="visually-hidden">Employment Duration</span>
        <span class="pv-entity__bullet-item">1 yr 2 mos</span>
      </h4><h4 class="pv-entity__location detail-facet Sans-15px-black-55% inline-block">
      <span class="visually-hidden">Location</span>
      <span class="pv-entity__bullet-item">London, United Kingdom</span>
    </h4></div>

</div>
"""

root = html.fromstring(html_str)
# For fetching Feb 2016 â Present :
txt = root.xpath('//h4[@class="pv-entity__date-range Sans-15px-black-55%"]/span/text()')[1]
# For fetching 1 yr 2 mos :
txt1 = root.xpath('//h4[@class="pv-entity__duration de Sans-15px-black-55% ml0"]/span/text()')[1]
print txt
print txt1

This will result in :

Feb 2016 â Present
1 yr 2 mos

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.