1

for a university project I am currently trying to login to a website, and scrap a little detail (a list of news articles) from my user profile.

I am new to Python, but I did this before to some other website. My first two approaches deliver different HTTP errors. I have considered problems with the header my request is sending, however my understanding of this sites login process appears to be insufficient.

This is the login page: http://seekingalpha.com/account/login

My first approach looks like this:

import requests

with requests.Session() as c:
    requestUrl ='http://seekingalpha.com/account/orthodox_login'

    USERNAME = 'XXX'
    PASSWORD = 'XXX'

    userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'

    login_data = {
        "slugs[]":None,
        "rt":None,
        "user[url_source]":None,
        "user[location_source]":"orthodox_login",
        "user[email]":USERNAME,
        "user[password]":PASSWORD
        }

    c.post(requestUrl, data=login_data, headers = {"referer": "http://seekingalpha.com/account/login", 'user-agent': userAgent})

    page = c.get("http://seekingalpha.com/account/email_preferences")
    print(page.content)

This results in "403 Forbidden"

My second approach looks like this:

from requests import Request, Session

requestUrl ='http://seekingalpha.com/account/orthodox_login'

USERNAME = 'XXX'
PASSWORD = 'XXX'

userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'

# c.get(requestUrl) 
login_data = {
    "slugs[]":None,
    "rt":None,
    "user[url_source]":None,
    "user[location_source]":"orthodox_login",
    "user[email]":USERNAME,
    "user[password]":PASSWORD
    }
headers = {
    "accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
    "Accept-Language":"de-DE,de;q=0.8,en-US;q=0.6,en;q=0.4",
    "origin":"http://seekingalpha.com",
    "referer":"http://seekingalpha.com/account/login",
    "Cache-Control":"max-age=0",
    "Upgrade-Insecure-Requests":1,
    "user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
    }

s = Session()
req = Request('POST', requestUrl, data=login_data, headers=headers)

prepped = s.prepare_request(req)
prepped.body ="slugs%5B%5D=&rt=&user%5Burl_source%5D=&user%5Blocation_source%5D=orthodox_login&user%5Bemail%5D=XXX%40XXX.com&user%5Bpassword%5D=XXX"

resp = s.send(prepped)

print(resp.status_code)

In this approach I was trying to prepare the header exactly as my browser would do it. Sorry for redundancy. This results in HTTP error 400.

Does someone have an idea, what went wrong? Probably a lot.

5
  • Web sites try to protect themselves from bots by adding a hidden field to their login forms, which contains an identification code. If you don't get the identification code they don't let you in. You must first get their login page, find the hidden field, copy that and post the request. There are variations on this. So, study carefully the sequence of requests the browser sends when you login manually. Commented Apr 16, 2016 at 14:31
  • Thanks, yes I have seen other websites do this. But I could not identify some token or something like this in the formular data.Have a look: pasteboard.co/eoXubQx.png Commented Apr 16, 2016 at 14:48
  • Have you tried Mechanize module? is requests module your only alternative? Commented Apr 16, 2016 at 16:27
  • I could use other modules. Unfortunately Mechanize is not available under Python 3. Is there any other alternative you can suggest? Commented Apr 16, 2016 at 17:08
  • @estebanpdl, I chose Mechanize and it works like a charm. Sadly it is not available for py3, but it woked. Thx! Commented Apr 17, 2016 at 15:28

1 Answer 1

3

Instead of spending a lot of energy on manually logging in and playing with Session, I suggest you just scrape the pages right away using your cookie.

When you log in, usually there is a cookie added to your request to identify your identity. Please see this for example:

My cookie

Your code will be like this:

import requests
response = requests.get("www.example.com", cookies={
                        "c_user":"my_cookie_part",
                        "xs":"my_other_cookie_part"
                        })
print response.content
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.