5

I'm just thinking about how we can imitate how browser, say Chrome, detects the protocol of the website with Python. For example we type "stackoverflow.com" on the address bar, then press Enter, browser can automatically detects and change the url to "https://stackoverflow.com" (add website's protocol), I wonder how we can do it in Python, exactly like:

url = "stackoverflow.com"
browser = Browser (url) # Browser is a class that we can get website content from url, get its protocol,...
print browser.protocol

https

Is there any library or package that help do this? Thanks a lot.

Edit: My question is unique since other question ask how to redirect to https if we enter http, as I mention, can we automatically detect at the first stage without dummy protocol?

1

3 Answers 3

16

It works for stackoverflow because when you first visit stackoverflow.com on port 80 (the http port), stackoverflow's servers notify the browser that the link has been permanently moved to https.

To detect the same in Python, use the requests library, like this:

>>> import requests
>>> r = requests.get('http://stackoverflow.com') # first we try http
>>> r.url # check the actual URL for the site
'https://stackoverflow.com/'

To find out how the URL changed, look at the history object, and you will see a 301 response, which means the URI has moved permanently to a new address.

>>> r.history[0]
<Response [301]>
>>> r.history[0].url # this is the original URL we tried
'http://stackoverflow.com/'
Sign up to request clarification or add additional context in comments.

4 Comments

this wouldn't work for something like (at the time of this comment), imgur.com which doesn't redirect http to https. EDIT: However, it seems chrome doesn't try out https first either.
Cameron: Blurie wanted to imitate the browser so I think this will work.
Some websites do not redirect to HTTPS if you first visit them with HTTP, so this is not a verified method to check
this doesn't work for the site hosted on HostGator.
5

When you enter a url without http:// or https:// the browser automatically assumes that you're using http:// and sends a request on port 80.

If the site redirects you to an https site, you'll get two headers of note. One will have a response of 301 which indicates a nonerror redirect. The other will be 101 which indicates that you're upgrading your connection type.

You can see this happen if you open a new tab and load http://stackexchange.com and watch the packes as they come in on the network tab of your web browser's developer tool suite.

Note:

Both codes are dependent on the host supporting this behavior. Not all websites will automatically redirect you to an https:// site. Additionally, not all of them support http2, so you may not get the 101 upgrade.

If you really want to determine if https:// is the preferred option, you may want to manually check if it exists when you don't get a redirect.

Comments

3

Since you mentioned "browser" and "Chrome" behaviour, one can get the same results as @BurkhanKhalid's really good answer using selenium:

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("http://stackoverflow.com") #Trying http first
url = driver.current_url

>>> print(url[:url.find(":")])
https

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.