1

driver.page_source don't returns all the source code.It is detaily printing only some parts of code, but it's missing a big part of code. How can i fix this?

This is my code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
  def htmlToLuna():
  url ='https://codefights.com/tournaments/Xph7eTJQssbXjDLzP/A'
  driver = webdriver.Chrome('C:\\Python27\\chromedriver\\chromedriver.exe')
  driver.get(url)
  web=open('web.txt','w')
  web.write(driver.page_source)
  print driver.page_source
  web.close()

print htmlToLuna()
1
  • 1
    I have opened the url in your question. You see the spinner in the page is spinning even after the page is loaded? webdriver wont wait unless you speicfy it. Commented Sep 2, 2017 at 4:33

1 Answer 1

4

Here is a simple code all it does is it opens the url and gets the length page source and waits for five seconds and will get the length of page source again.

if __name__=="__main__":
    browser = webdriver.Chrome()
    browser.get("https://codefights.com/tournaments/Xph7eTJQssbXjDLzP/A")
    initial = len(browser.page_source)
    print(initial)
    time.sleep(5)
    new_source = browser.page_source
    print(len(new_source)

see the output: 15722 48800

you see that the length of the page source increases after a wait? you must make sure that the page is fully loaded before getting the source. But this is not a proper implementation since it blindly waits.

Here is a nice way to do this, The browser will wait until the element of your choice is found. Timeout is set for 10 sec.

if __name__=="__main__":
    browser = webdriver.Chrome()
    browser.get("https://codefights.com/tournaments/Xph7eTJQssbXjDLzP/A")
    try:
        WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, '.CodeMirror > div:nth-child(1) > textarea:nth-child(1)')))  # 10 seconds delay
        print("Result:")
        print(len(browser.page_source))
    except TimeoutException:
        print("Your exception message here!")

The output: Result: 52195

Reference:

https://stackoverflow.com/a/26567563/7642415

http://selenium-python.readthedocs.io/locating-elements.html

Hold on! even that wont make any guarantees for getting full page source, since individual elements are loaded dynamically. If the browser finds the element it moves on. So make sure you find the proper element to make sure the page has been loaded fully.

P.S Mine is Python3 & webdriver is in my environment PATH. So my code needs to be modified a bit to make it work for Python 2.x versions. I guess only print statements are to be modified.

Sign up to request clarification or add additional context in comments.

12 Comments

thank you. does seleneium have option to open a new tab instead of opening the browser again?
@AvoAsatryan yes it has, Google gives you thousands of answer here is one: stackoverflow.com/a/28432939/7642415 and if you are woking with a new url and wont go for the previous url then you may call the webdriver.get method again without closing the browser
Yes, that does the same. It wont close the browser, All it does is passes Cntrl + T to the currently opened browser which opens the new tab
no, i meant . for exaple i opened my Chrome, after it i am runing my script, i want that selenium open a new tab in already opened session which is runing before the run of script
To my knowledge No, Try PhantomJS instead. You can open browser virtually. phantomjs.org
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.