Python: Selenium & PhantomJS

Question

I am trying to scrape the following website: https://www.linkedin.com/jobs/search/?keywords=coach%20&location=United%20States&locationId=us%3A0

The text I want to get is:

Showing 114,877 results

the HTML code:

<div class="jobs-search-results__count-sort pt3">
            <div class="jobs-search-results__count-string results-count-string Sans-15px-black-55% pb0 pl5 pr4">
                Showing 114,877 results
            </div>

My python code is:

index_url = 'https://www.linkedin.com/jobs/search/?keywords=coach%20&location=United%20States&locationId=us%3A0'

    java = '!function(i,n){void 0!==i.addEventListener&&void 0!==i.hidden&&(n.liVisibilityChangeListener=function(){i.hidden&&(n.liHasWindowHidden=!0)},i.addEventListener("visibilitychange",n.liVisibilityChangeListener))}(document,window);'
    browser = webdriver.PhantomJS()
    browser.get(index_url)
    browser.execute_script(java)
    soup = BeautifulSoup(browser.page_source, "html.parser")
    link = "jobs-search-results__count-string results-count-string Sans-15px-black-55% pb0 pl5 pr4" 
    div = soup.find('div', {"class":link})
    text = div.text

So far it looks like my code is not working. I think it was to do something with the execution of the java script.

I get the following error:

AttributeError                            Traceback (most recent call last)
<ipython-input-33-7cdc1c4e0894> in <module>()
      6 link = "jobs-search-results__count-string results-count-string Sans-15px-black-55% pb0 pl5 pr4"
      7 div = soup.find('div', {"class":link})
----> 8 text = div.text

AttributeError: 'NoneType' object has no attribute 'text'

soup output:

<html><head>\n<script type="text/javascript">\nwindow.onload = function() {\n  // Parse the tracking code from cookies.\n  var trk = "bf";\n  var trkInfo = "bf";\n  var cookies = document.cookie.split("; ");\n  for (var i = 0; i < cookies.length; ++i) {\n    if ((cookies[i].indexOf("trkCode=") == 0) && (cookies[i].length > 8)) {\n      trk = cookies[i].substring(8);\n    }\n    else if ((cookies[i].indexOf("trkInfo=") == 0) && (cookies[i].length > 8)) {\n      trkInfo = cookies[i].substring(8);\n    }\n  }\n\n  if (window.location.protocol == "http:") {\n    // If "sl" cookie is set, redirect to https.\n    for (var i = 0; i < cookies.length; ++i) {\n      if ((cookies[i].indexOf("sl=") == 0) && (cookies[i].length > 3)) {\n        window.location.href = "https:" + window.location.href.substring(window.location.protocol.length);\n        return;\n      }\n    }\n  }\n\n  // Get the new domain. For international domains such as\n  // fr.linkedin.com, we convert it to www.linkedin.com\n  var domain = "www.linkedin.com";\n  if (domain != location.host) {\n    var subdomainIndex = location.host.indexOf(".linkedin");\n    if (subdomainIndex != -1) {\n      domain = "www" + location.host.substring(subdomainIndex);\n    }\n  }\n\n  window.location.href = "https://" + domain + "/authwall?trk=" + trk + "&trkInfo=" + trkInfo +\n      "&originalReferer=" + document.referrer.substr(0, 200) +\n      "&sessionRedirect=" + encodeURIComponent(window.location.href);\n}\n</script>\n</head><body></body></html>

Curious enough, when accessing using Chrome webdriver, the text in context is inside div = soup.find('div', {"class":"result-context"}). It could be falling into a logging in modal dialog when using PhantomJS. — Vinícius Figueiredo
– Vinícius Figueiredo, Commented Aug 2, 2017 at 4:18

ksai · Accepted Answer · 2017-08-02 07:50:41Z

I have the solution in webdriver.Chrome, because I have never used PhantomJS. There are two cases if you want to get the results text. One is that you are logged in on Linkedin from the driver instance and other is that you are not logged in.

Let's suppose you are not logged in. So the following code will get your work done

from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
url = 'https://www.linkedin.com/jobs/search/?keywords=coach%20&location=United%20States&locationId=us%3A0'
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')
text = soup.find('div',{'class':'results-context'}).text
print(text)

Suppose you are logged in

from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
url = 'https://www.linkedin.com/jobs/search/?keywords=coach%20&location=United%20States&locationId=us%3A0'
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')

class = 'jobs-search-results__count-string results-count-string Sans-15px-black-55% pb0 pl5 pr4'
text = soup.find('div',{'class':class}).text.split('\n')[1].lstrip()
print(text)

Collectives™ on Stack Overflow

Python: Selenium & PhantomJS

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related