1

My goal to create an authenticated session in github so I can use the advanced search (which limits functionality to non-authenticated users). Currently I am getting a webpage response from the post request of "What? Your browser did something unexpected. Please contact us if the problem persists."

Here is the code I am using to try to accomplish my task.

import requests
from lxml import html

s = requests.Session()
payload = (username, password)
_ = s.get('https://www.github.com/login')
p = s.post('https://www.github.com/login', auth=payload)

url = "https://github.com/search?l=&p=0&q=language%3APython+extension%3A.py+sklearn&ref=advsearch&type=Code"
r = s.get(url, auth=payload)
text = r.text
tree = html.fromstring(text)

Is what I'm trying possible? I would prefer to not use the github v3 api since it is rate limited and I wanted to do more of my own scraping of the advanced search. Thanks.

5
  • I assume you would need OAuth login, but I could be wrong Commented Sep 5, 2017 at 2:59
  • Thanks for the response, I'll check that out. Commented Sep 5, 2017 at 3:01
  • You're trying to use HTTP Basic authentication, but GitHub uses a form-based login mechanism. You would need to inspect the login page to determine what endpoint to which you should POST a response with the necessary fields (which may include static fields including in the login form itself). Commented Sep 5, 2017 at 3:30
  • The REST API may be useful Commented Sep 5, 2017 at 4:03
  • Read GitHub OAuth 2 Tutorial Commented Sep 5, 2017 at 8:25

1 Answer 1

2

As mentioned in the comments, github uses post data for authentication so you should have your creds in the data parameter.
The elements you have to submit are 'login', 'password', and 'authenticity_token'. The value of 'authenticity_token' is dynamic, but you can scrape it from '/login'.
Finally submit data to /session and you should have an authenticated session.

s = requests.Session()
r = s.get('https://www.github.com/login')
tree = html.fromstring(r.content)
data = {i.get('name'):i.get('value') for i in tree.cssselect('input')}
data['login'] = username
data['password'] = password
r = s.post('https://github.com/session', data=data)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.