Login to website via Python Requests

Question

for a university project I am currently trying to login to a website, and scrap a little detail (a list of news articles) from my user profile.

I am new to Python, but I did this before to some other website. My first two approaches deliver different HTTP errors. I have considered problems with the header my request is sending, however my understanding of this sites login process appears to be insufficient.

This is the login page: http://seekingalpha.com/account/login

My first approach looks like this:

import requests

with requests.Session() as c:
    requestUrl ='http://seekingalpha.com/account/orthodox_login'

    USERNAME = 'XXX'
    PASSWORD = 'XXX'

    userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'

    login_data = {
        "slugs[]":None,
        "rt":None,
        "user[url_source]":None,
        "user[location_source]":"orthodox_login",
        "user[email]":USERNAME,
        "user[password]":PASSWORD
        }

    c.post(requestUrl, data=login_data, headers = {"referer": "http://seekingalpha.com/account/login", 'user-agent': userAgent})

    page = c.get("http://seekingalpha.com/account/email_preferences")
    print(page.content)

This results in "403 Forbidden"

My second approach looks like this:

from requests import Request, Session

requestUrl ='http://seekingalpha.com/account/orthodox_login'

USERNAME = 'XXX'
PASSWORD = 'XXX'

userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'

# c.get(requestUrl) 
login_data = {
    "slugs[]":None,
    "rt":None,
    "user[url_source]":None,
    "user[location_source]":"orthodox_login",
    "user[email]":USERNAME,
    "user[password]":PASSWORD
    }
headers = {
    "accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
    "Accept-Language":"de-DE,de;q=0.8,en-US;q=0.6,en;q=0.4",
    "origin":"http://seekingalpha.com",
    "referer":"http://seekingalpha.com/account/login",
    "Cache-Control":"max-age=0",
    "Upgrade-Insecure-Requests":1,
    "user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
    }

s = Session()
req = Request('POST', requestUrl, data=login_data, headers=headers)

prepped = s.prepare_request(req)
prepped.body ="slugs%5B%5D=&rt=&user%5Burl_source%5D=&user%5Blocation_source%5D=orthodox_login&user%5Bemail%5D=XXX%40XXX.com&user%5Bpassword%5D=XXX"

resp = s.send(prepped)

print(resp.status_code)

In this approach I was trying to prepare the header exactly as my browser would do it. Sorry for redundancy. This results in HTTP error 400.

Does someone have an idea, what went wrong? Probably a lot.

Web sites try to protect themselves from bots by adding a hidden field to their login forms, which contains an identification code. If you don't get the identification code they don't let you in. You must first get their login page, find the hidden field, copy that and post the request. There are variations on this. So, study carefully the sequence of requests the browser sends when you login manually. — Cyb3rFly3r
– Cyb3rFly3r, Commented Apr 16, 2016 at 14:31
Thanks, yes I have seen other websites do this. But I could not identify some token or something like this in the formular data.Have a look: pasteboard.co/eoXubQx.png — MCH
– MCH, Commented Apr 16, 2016 at 14:48
Have you tried Mechanize module? is requests module your only alternative? — estebanpdl
– estebanpdl, Commented Apr 16, 2016 at 16:27
I could use other modules. Unfortunately Mechanize is not available under Python 3. Is there any other alternative you can suggest? — MCH
– MCH, Commented Apr 16, 2016 at 17:08
@estebanpdl, I chose Mechanize and it works like a charm. Sadly it is not available for py3, but it woked. Thx! — MCH
– MCH, Commented Apr 17, 2016 at 15:28

Aminah Nuraini · Accepted Answer · 2016-04-17 15:52:02Z

3

Instead of spending a lot of energy on manually logging in and playing with Session, I suggest you just scrape the pages right away using your cookie.

When you log in, usually there is a cookie added to your request to identify your identity. Please see this for example:

Your code will be like this:

import requests
response = requests.get("www.example.com", cookies={
                        "c_user":"my_cookie_part",
                        "xs":"my_other_cookie_part"
                        })
print response.content

answered Apr 17, 2016 at 15:52

Aminah Nuraini

19.4k9 gold badges98 silver badges113 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Login to website via Python Requests

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related