0

I want to extract some dates from Dell's website in my interest for my devices. I tried to download the webpages using urllib but it's protected by captcha and I can't bypass that for now. Now I am using Selenium to open a browser, solve manually the capthca and then automatically opening the pages and extracting the dates. The problem is that the css selector is returning some weird elements instead of the desired output

My code:

from selenium import webdriver
import time
driver = webdriver.Chrome()


def scrape(codes):
    dates = []
    for i in range(len(codes)):
        driver.get("https://www.dell.com/support/home/us/en/19/product-support/"
                   "servicetag/%s/warranty?ref=captchasuccess" % codes[i])

    # Solve captcha manually
        if i == 0:
            print("You now have 120\" seconds to solve the captcha")
            time.sleep(120)
            print("120\" Passed")
    # Extract data
        expdate = driver.find_element_by_css_selector("#printdivid > div > div.not-annotated.hover > table:nth-child(3) > tbody > tr > td:nth-child(3)")
        print(expdate)
    driver.close()

codes = ['1FMR762', '15FDBG2', '10V8YZ1']
scrape(codes)

Expected output:

June 22, 2018
October 15, 2017
April 19, 2017

Given output:

<selenium.webdriver.remote.webelement.WebElement (session="d83af0f7a3a9c79307d2058f863a7ecb", element="0.21873872382745052-1")>
<selenium.webdriver.remote.webelement.WebElement (session="d83af0f7a3a9c79307d2058f863a7ecb", element="0.06836824093097027-1")>
<selenium.webdriver.remote.webelement.WebElement (session="d83af0f7a3a9c79307d2058f863a7ecb", element="0.6642161898702734-1")>

1 Answer 1

1

Looking at the API documentation, the find_element_by_css_selector function returns a WebElement object. See https://selenium-python.readthedocs.io/api.html.

The web elements content needs to be converted into a string before printing as explained in Python and how to get text from Selenium element WebElement object?.

So it should help to change your line print (expdate) to print (expdate.text).

Sign up to request clarification or add additional context in comments.

8 Comments

I changed the line to print(expdate.get_attribute('innerText')) because the text is hidden
dell.com/support/home/yu/en/yubsdt1/product-support/servicetag/… The problem is here when I have to extract the date from the line that contains "Onsite Service After Remote Diagnosis", is there a way to check that?
What does your current program output? Is it already at the right table column?
For the link that I have posted it outputs "October 15, 2017" but it should output "October 15, 2019", it seems to be on the right column but not on the right row
Have you tried changing your selector to ... > table > tbody > tr:nth-child(2) > ...? An explanation can be found at stackoverflow.com/questions/4494708/…
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.