I am looking to use a headless browser to scrape some websites and need to use a proxy server.
I'm a bit lost and am looking for help.
When I disable the proxy it works perfectly every time.
When I disable headless mode I get an empty browser window, but if I press enter on the URL bar that has "https://www.whatsmyip.org" the page loads (using the proxy server showing a different IP).
I have the same error for other websites as well, it's not just whatsmyip.org that is having this result.
I am running Centos7, Python 3.6 and Selenium 3.14.0.
I have also tried it on a Windows machine running Anaconda and have the same results.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver import DesiredCapabilities
from selenium.webdriver.common.proxy import Proxy, ProxyType
my_proxy = "x.x.x.x:xxxx" #I have a real proxy address here
proxy = Proxy({
'proxyType': ProxyType.MANUAL,
'httpProxy': my_proxy,
'ftpProxy': my_proxy,
'sslProxy': my_proxy,
'noProxy': ''
})
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--allow-insecure-localhost')
chrome_options.add_argument('--allow-running-insecure-content')
chrome_options.add_argument("--ignore-ssl-errors");
chrome_options.add_argument("--ignore-certificate-errors");
chrome_options.add_argument("--ssl-protocol=any");
chrome_options.add_argument('--window-size=800x600')
chrome_options.add_argument('--disable-application-cache')
capabilities = dict(DesiredCapabilities.CHROME)
proxy.add_to_capabilities(capabilities)
capabilities['acceptSslCerts'] = True
capabilities['acceptInsecureCerts'] = True
browser = webdriver.Chrome(executable_path=r'/home/glen/chromedriver', chrome_options=chrome_options, desired_capabilities=capabilities)
browser.get('https://www.whatsmyip.org/')
print(browser.page_source)
browser.close()
When I run the code I get the following returned:
<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body></body></html>
Not the website.