Extract data from web page using CSS selector - Selenium Python

Question

I want to extract some dates from Dell's website in my interest for my devices. I tried to download the webpages using urllib but it's protected by captcha and I can't bypass that for now. Now I am using Selenium to open a browser, solve manually the capthca and then automatically opening the pages and extracting the dates. The problem is that the css selector is returning some weird elements instead of the desired output

My code:

from selenium import webdriver
import time
driver = webdriver.Chrome()


def scrape(codes):
    dates = []
    for i in range(len(codes)):
        driver.get("https://www.dell.com/support/home/us/en/19/product-support/"
                   "servicetag/%s/warranty?ref=captchasuccess" % codes[i])

    # Solve captcha manually
        if i == 0:
            print("You now have 120\" seconds to solve the captcha")
            time.sleep(120)
            print("120\" Passed")
    # Extract data
        expdate = driver.find_element_by_css_selector("#printdivid > div > div.not-annotated.hover > table:nth-child(3) > tbody > tr > td:nth-child(3)")
        print(expdate)
    driver.close()

codes = ['1FMR762', '15FDBG2', '10V8YZ1']
scrape(codes)

Expected output:

June 22, 2018
October 15, 2017
April 19, 2017

Given output:

<selenium.webdriver.remote.webelement.WebElement (session="d83af0f7a3a9c79307d2058f863a7ecb", element="0.21873872382745052-1")>
<selenium.webdriver.remote.webelement.WebElement (session="d83af0f7a3a9c79307d2058f863a7ecb", element="0.06836824093097027-1")>
<selenium.webdriver.remote.webelement.WebElement (session="d83af0f7a3a9c79307d2058f863a7ecb", element="0.6642161898702734-1")>

Heiko Becker · Accepted Answer · 2018-09-20 08:39:10Z

1

Looking at the API documentation, the find_element_by_css_selector function returns a WebElement object. See https://selenium-python.readthedocs.io/api.html.

The web elements content needs to be converted into a string before printing as explained in Python and how to get text from Selenium element WebElement object?.

So it should help to change your line print (expdate) to print (expdate.text).

answered Sep 20, 2018 at 8:39

Heiko Becker

6164 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

kubectlgetpods Over a year ago

I changed the line to print(expdate.get_attribute('innerText')) because the text is hidden

kubectlgetpods Over a year ago

dell.com/support/home/yu/en/yubsdt1/product-support/servicetag/… The problem is here when I have to extract the date from the line that contains "Onsite Service After Remote Diagnosis", is there a way to check that?

Heiko Becker Over a year ago

What does your current program output? Is it already at the right table column?

kubectlgetpods Over a year ago

For the link that I have posted it outputs "October 15, 2017" but it should output "October 15, 2019", it seems to be on the right column but not on the right row

Heiko Becker Over a year ago

Have you tried changing your selector to ... > table > tbody > tr:nth-child(2) > ...? An explanation can be found at stackoverflow.com/questions/4494708/…

|

Collectives™ on Stack Overflow

Extract data from web page using CSS selector - Selenium Python

1 Answer 1

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related