2

I was thinking that, if I access a password protected site using python's mechanism, I would get a 401 Unauthorized error which needs authentication data.

So inside my script, I tried to access my yahoo mail box which apparently needs username and password, I thought I would get 401, but I didn't.

Code:

yahoo_mail = 'http://mail.cn.yahoo.com'
br = mechanize.Browser()
r = br.open(yahoo_mail)
print r.info()  #here, I got 200, it's ok apparently

br.select_form(nr=0)  #select the login form
r = br.submit()  #submit the form without providing username and password
print r.info()  #but I didn't get 401, why?

Question:

  1. Why I didn't get 401 without providing auth-info ?
  2. If not my mail box, any other website can give me a 401 ?
1
  • I think you mean 401 Unauthorized, not 410 Gone Commented Oct 2, 2011 at 10:54

3 Answers 3

5

Most web sites these days do not use HTTP Authentication. So 401 is not returned if you fail to log in; instead, a normal 200 successful response is returned, and the text inside the web page says you did not log in.

Instead, sites use cookies. This means that your browser does not actually know what sites it is logged into; when you finally provide a successful password to Yahoo!, it either changes the cookie it has stored on your browser, or maybe even keeps the cookie the same but just changes the database record on their end that is associated with the cookie.

So HTTP status codes are generally useless during the process of logging in. Instead you will have to scrape the text of the "200 Success" page that comes back to see if it congratulates you on logging in or repeats the form; or, alternately, you might just check the URL of the page you get back, and see whether it is the login form again, or whether it is instead the destination that you wanted to visit.

Sign up to request clarification or add additional context in comments.

Comments

0
  1. Authentication failed doesn't mean you're not allowed to see the page behind the authentication. It means you won't see the version of this page that take your credentials into account. If you're on a homepage and you failed to authenticate, you still can see the homepage.

  2. Search engines don't seem to index 401 pages, so it can be a bit hard to find...

Comments

0

It looks like Yahoo just handles the password authentication in their code. Try adding the following two lines to your code:

f = open('a.html', 'w')
f.write(r.read())

When you read the page, you will see the same page again.

It looks like they just have a bit of javascript that tells you your password was wrong.

1 Comment

You're on the right track in realizing the authentication probably isn't done via HTTP, but password auth via Javascript wouldn't be secure at all. As Brandon's answer suggests, they do password authentication in server side code and store a cookie on the client side.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.