Getting different attribute value using selenium

Question

I am trying to scrape few values from LI page, I could get name, education, headline. I added code for profile picture, summary but could not get it.

Any helping hints much appreciate.

def getLinkedinData(self):
    result = {}
    driver = webdriver.PhantomJS('/usr/local/bin/phantomjs' ,service_args=service_args)
    driver.set_window_size(1124, 850)
    google_news_trends = []
    driver.get("https://www.linkedin.com/in/joymerrillsti")
    driver.page_source.encode("utf-8")
    try:
        print driver.find_element_by_class_name('full-name').text#
    except:
        pass
    #This does not give link to profile picture
    try:
        img = driver.find_element_by_class_name('profile-picture')
        for s in img:
            print s
            print s.find_element_by_tag_name('img').get_attribute('src')
    except:
        pass

    try:
        head = driver.find_element_by_id('headline-container')
        print head.text
        for s in head:
            print s.find_element_by_tag_name('p').text
    except:
        pass

    try:
        location = driver.find_element_by_id('location-container')
        for s in location:
            print s.find_element_by_tag_name('a').text
    except:
        pass
    #This does not give summary
    try:
        summary = driver.find_element_by_id('summary-item')
        for s in summary:
            print s.text
            print s.find_element_by_tag_name('p').text
    except:
        pass
    #This is fine, but is there any way to get only value for Education
    try:
        ed = driver.find_element_by_id('overview-summary-education') #Here how to get only education value?
        print ed.text
    except:
        pass

alecxe · Accepted Answer · 2015-07-22 13:47:51Z

I would first evaluate the Linked API and see if it could provide you with the desired information.

If you insist on web-scraping the page, I think you are missing only one thing here - an Explicit Wait to wait for the page to load:

from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.PhantomJS('/usr/local/bin/phantomjs')
driver.get("https://www.linkedin.com/in/joymerrillsti")
driver.set_window_size(1124, 850)

wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "full-name")))

print driver.find_element_by_class_name('full-name').text
print driver.find_element_by_css_selector('div.profile-picture img').get_attribute('src')
print driver.find_element_by_id('headline-container').text
print driver.find_element_by_id('location-container').text
print driver.find_element_by_id('summary-item').text
print driver.find_element_by_id('overview-summary-education').text

I've also cleaned things up a bit.

omri_saadon · Accepted Answer · 2015-07-22 14:08:06Z

1

The img you can find that way:

Note: You can find attributes of an element with the get_attribute function.

img = driver.find_element_by_class_name('profile-picture>a>img').get_attribute("src")

The summary you can find that way:

summary = driver.find_element_by_class_name('description').text

edited Jul 22, 2015 at 14:08

answered Jul 22, 2015 at 13:50

omri_saadon

10.7k8 gold badges36 silver badges58 bronze badges

Collectives™ on Stack Overflow

Getting different attribute value using selenium

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related