1

I am trying to scrape few values from LI page, I could get name, education, headline. I added code for profile picture, summary but could not get it.

Any helping hints much appreciate.

def getLinkedinData(self):
    result = {}
    driver = webdriver.PhantomJS('/usr/local/bin/phantomjs' ,service_args=service_args)
    driver.set_window_size(1124, 850)
    google_news_trends = []
    driver.get("https://www.linkedin.com/in/joymerrillsti")
    driver.page_source.encode("utf-8")
    try:
        print driver.find_element_by_class_name('full-name').text#
    except:
        pass
    #This does not give link to profile picture
    try:
        img = driver.find_element_by_class_name('profile-picture')
        for s in img:
            print s
            print s.find_element_by_tag_name('img').get_attribute('src')
    except:
        pass

    try:
        head = driver.find_element_by_id('headline-container')
        print head.text
        for s in head:
            print s.find_element_by_tag_name('p').text
    except:
        pass

    try:
        location = driver.find_element_by_id('location-container')
        for s in location:
            print s.find_element_by_tag_name('a').text
    except:
        pass
    #This does not give summary
    try:
        summary = driver.find_element_by_id('summary-item')
        for s in summary:
            print s.text
            print s.find_element_by_tag_name('p').text
    except:
        pass
    #This is fine, but is there any way to get only value for Education
    try:
        ed = driver.find_element_by_id('overview-summary-education') #Here how to get only education value?
        print ed.text
    except:
        pass

2 Answers 2

1

I would first evaluate the Linked API and see if it could provide you with the desired information.

If you insist on web-scraping the page, I think you are missing only one thing here - an Explicit Wait to wait for the page to load:

from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.PhantomJS('/usr/local/bin/phantomjs')
driver.get("https://www.linkedin.com/in/joymerrillsti")
driver.set_window_size(1124, 850)

wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "full-name")))

print driver.find_element_by_class_name('full-name').text
print driver.find_element_by_css_selector('div.profile-picture img').get_attribute('src')
print driver.find_element_by_id('headline-container').text
print driver.find_element_by_id('location-container').text
print driver.find_element_by_id('summary-item').text
print driver.find_element_by_id('overview-summary-education').text

I've also cleaned things up a bit.

Sign up to request clarification or add additional context in comments.

Comments

1

The img you can find that way:

Note: You can find attributes of an element with the get_attribute function.

img = driver.find_element_by_class_name('profile-picture>a>img').get_attribute("src")

The summary you can find that way:

summary = driver.find_element_by_class_name('description').text

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.