0

I wanted to get a part of html dynamic content from a website, I can see this content in "inspect element" but not in "view source". I tried to use BeautifulSoup and selenium libraries with no success, since after loading the page I need to press some screen buttons to load the content.

For example, in the website http://play.typeracer.com I can load its html source code but I can't load the content that shows up after pressing "Practice" on the webpage. ( tables and text)

Hope I was explicit, thanks for your attention

5
  • Check out the requests-html package. It allows you to render a page before extracting data. Commented Jul 29, 2018 at 13:09
  • When using selenium's webdriver I was able to open firefox and press the key with "driver = webdriver.Firefox()" and " driver.get("website.com")" . But if I load the content after making any key press it gives me an error and crashes the program. I will check it. Commented Jul 29, 2018 at 13:15
  • It is unclear to me what you are actually trying to achieve. You want to scrape the content or you want to automate/simulate a website user? Please give a full example - otherwise i can't help you Commented Jul 29, 2018 at 13:55
  • I want to scrape content, for example getting the text that you have to write into a .txt file. Commented Jul 29, 2018 at 14:05
  • By text that you have to write I mean the sentences that are part of the game typeracer, The ones that show up when you press "practice" for example Commented Jul 29, 2018 at 14:06

1 Answer 1

2

Here is a solution using Selenium and Firefox:

  1. Open a browser window and navigating to the url
  2. Waiting till the link for practice appears
  3. Extracting all span elements that hold part of the text
  4. Create the output string. In case the first word has only one letter there will be only 2 span elements. If the word has more than one letter there will be 3 span elements.
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    
    url = 'http://play.typeracer.com/'
    browser = webdriver.Firefox()
    browser.get(url)
    
    try:  # waiting till link is loaded
        element = WebDriverWait(browser, 30).until(
            EC.presence_of_element_located((By.LINK_TEXT, 'Practice')))
    finally:  # link loaded -> click it
        element.click()
    
    try:  # wait till text is loaded
        WebDriverWait(browser, 30).until(
            EC.presence_of_element_located((By.XPATH, '//span[@unselectable="on"]')))
    finally:  # extract text 
        spans = browser.find_elements_by_xpath('//span[@unselectable="on"]')
        if len(spans) == 2:  # first word has only one letter
            text = f'{spans[0].text} {spans[1].text}'
        elif len(spans) == 3:  # first word has more than one letter
            text = f'{spans[0].text}{spans[1].text} {spans[2].text}'
        else:
            text = ' '.join([span.text for span in spans])
            print('special case that is not handled yet: {text}')
    
    
    print(text)
    >>> 'Scissors cuts paper. Paper covers rock. Rock crushes lizard. Lizard poisons Spock. Spock smashes scissors. Scissors decapitates lizard. Lizard eats paper. Paper disproves Spock. Spock vaporizes rock. And as it always has, rock crushes scissors.'
    

    Update

    Just in case you also want to automate the typing afterwards ;)

    try:
        txt_input = WebDriverWait(browser, 30).until(
            EC.presence_of_element_located((By.XPATH,
                '//input[@class="txtInput" and @autocorrect="off"]')))
    finally:
        for letter in text:
            txt_input.send_keys(letter)
    

    The reason for the try:... finally: ... blocks is that we have to wait till the content is loaded - which can sometimes take quite a bit.

    Sign up to request clarification or add additional context in comments.

    4 Comments

    Thanks, amazing answer! Can you just explain why does searching elemnt require "//" at the start and also what exactly does WebDriverWait do? shouldn't it wait automatically and thus not requiring the try, finally?
    // is part of the xpath syntax. WebDriverWait waits up to 30s in our case till element is found. try finally makes sure that the code within finally only gets executed if the code in try ran without an exception.
    My code is not meant to be perfect - just a quick and dirty solution that should give you enough to understand how it works ;)
    Yes it helped understanding the webdriver. Thanks!

    Your Answer

    By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.