Python Scripting Question

Question

I have a question about writing a small tool that would provide the headers for any website. I am new to python but wanted to know if there is anything else other than encoding that I would have to account for in my code when developing the tool? I have a rough draft of my code shown below. Any pointers from the python coders?

#!/usr/bin/python
import sys, urllib


if len(sys.argv) == 2:
  website = sys.argv[1]
website = urllib.urlopen(sys.argv[1])
if(website.code != 200): 
  print "Something went wrong here"
  print website.code 
  exit(0)

print 'Printing the headers'
print '-----------------------------------------'
for header, value in website.headers.items() :
    print header + ' : ' + value

What does this have to do with security? Also, why not just use curl? — ek9
– ek9, Commented Apr 25, 2014 at 7:31

dr jimbob · Accepted Answer · 2014-04-25 07:34:31Z

Seems a fairly straightforward script (though this question seems more of a fit for stackoverflow). Couple comments, first curl -I is a useful command line tool to compare against. Second, even when you don't get 200 status, there are still often useful content or headers you may want to display. E.g.,

$ curl -I http://security.stackexchange.com/asdf
HTTP/1.1 404 Not Found
Cache-Control: private
Content-Length: 24068
Content-Type: text/html; charset=utf-8
X-Frame-Options: SAMEORIGIN
Set-Cookie: prov=678b5b9c-0130-4398-9834-673475961dc6; domain=.stackexchange.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
Date: Fri, 25 Apr 2014 07:24:00 GMT

Also note urllib follows redirects automatically. E.g., with curl you'll see:

$ curl -I http://www.security.stackexchange.com
HTTP/1.1 301 Moved Permanently
Content-Length: 157
Content-Type: text/html; charset=UTF-8
Location: http://security.stackexchange.com/
Date: Fri, 25 Apr 2014 07:26:52 GMT

while your tool will just give.

$ python user3567119.py http://www.security.stackexchange.com
Printing the headers
-----------------------------------------
content-length : 68639
set-cookie : prov=9bf4f3d4-e3ae-4161-8e34-9aaa83f0aa4b; domain=.stackexchange.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
expires : Fri, 25 Apr 2014 07:29:32 GMT
vary : *
last-modified : Fri, 25 Apr 2014 07:28:32 GMT
connection : close
cache-control : public, no-cache="Set-Cookie", max-age=60
date : Fri, 25 Apr 2014 07:28:31 GMT
x-frame-options : SAMEORIGIN
content-type : text/html; charset=utf-8

Third, if you continue playing around with HTTP requests in python, I highly recommend using requests. With requests, you'll be able to see the 301 if you do:

In [1]: import requests

In [2]: r=requests.get('http://www.security.stackexchange.com')

In [3]: r
Out[3]: <Response [200]>

In [4]: r.history
Out[4]: (<Response [301]>,)

It's also worth trying out some HTTP requests in just plain old telnet. E.g., telnet security.stackexchange.com 80 then quickly type:

GET / HTTP/1.1
Host: security.stackexchange.com

followed by a blank line. Then you'll see the actual HTTP response on the wire (instead of recreating it after urllib has processed the HTTP response):

HTTP/1.1 200 OK
Cache-Control: public, no-cache="Set-Cookie", max-age=60
Content-Type: text/html; charset=utf-8
Expires: Fri, 25 Apr 2014 07:38:37 GMT
Last-Modified: Fri, 25 Apr 2014 07:37:37 GMT
Vary: *
X-Frame-Options: SAMEORIGIN
Set-Cookie: prov=a75de1f2-678b-4a9d-bbfd-39e933e60237; domain=.stackexchange.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
Date: Fri, 25 Apr 2014 07:37:36 GMT
Content-Length: 68849

<!DOCTYPE html>

Collectives™ on Stack Overflow

Python Scripting Question

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related