6

I am trying to access an intranet site with HTTP Basic Authentication enabled.

Here's the code I'm using:

from bs4 import BeautifulSoup
import urllib.request, base64, urllib.error

request = urllib.request.Request(url)
string = '%s:%s' % ('username','password')

base64string = base64.standard_b64encode(string.encode('utf-8'))

request.add_header("Authorization", "Basic %s" % base64string)
try:
    u = urllib.request.urlopen(request)
except urllib.error.HTTPError as e:
    print(e)
    print(e.headers)

soup = BeautifulSoup(u.read(), 'html.parser')

print(soup.prettify())

But it doesn't work and fails with 401 Authorization required. I can't figure out why it's not working.

1
  • Does nobody have an answer? Commented Nov 8, 2017 at 9:04

3 Answers 3

9

The solution given here works without any modifications.

from bs4 import BeautifulSoup
import urllib.request

# create a password manager
password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()

# Add the username and password.
# If we knew the realm, we could use it instead of None.
top_level_url = "http://example.com/foo/"
password_mgr.add_password(None, top_level_url, username, password)

handler = urllib.request.HTTPBasicAuthHandler(password_mgr)

# create "opener" (OpenerDirector instance)
opener = urllib.request.build_opener(handler)

# use the opener to fetch a URL
u = opener.open(url)

soup = BeautifulSoup(u.read(), 'html.parser')

The previous code works as well. You just have to decode the utf-8 encoded string otherwise the header contains a byte-sequence.

from bs4 import BeautifulSoup
import urllib.request, base64, urllib.error

request = urllib.request.Request(url)
string = '%s:%s' % ('username','password')

base64string = base64.standard_b64encode(string.encode('utf-8'))

request.add_header("Authorization", "Basic %s" % base64string.decode('utf-8'))
try:
    u = urllib.request.urlopen(request)
except urllib.error.HTTPError as e:
    print(e)
    print(e.headers)

soup = BeautifulSoup(u.read(), 'html.parser')

print(soup.prettify())
Sign up to request clarification or add additional context in comments.

Comments

0

UTF-8 encoding might not work. You can try to use ASCII or ISO-8859-1 encoding instead.

Also, try to access the intranet site with a web browser and check how the Authorization header is different from the one you are generating.

5 Comments

Thanks. I got the authentication to work using the instructions given here. But for learning purposes, how do I check what header is being generated by the browser?
Great that you got it working. You should submit the solution as an answer and accept it so others may benefit from it.
You can see the headers in browsers' dev tools. For example the Network tab in Chrome dev tools. stackoverflow.com/questions/4423061/…
Ok I checked and the Authorization header I'm generating using the code above is exactly the same as what the browser is sending so I don't understand what the problem is.
Ok I figured out what the problem is. Modifying my answer with the solution.
0

Encode using "ascii". This worked for me.

import base64
import urllib.request

url = "http://someurl/path"
username = "someuser"
token = "239487svksjdf08234"

request = urllib.request.Request(url)
base64string = base64.b64encode((username + ":" + token).encode("ascii"))
request.add_header("Authorization", "Basic {}".format(base64string.decode("ascii")))
response = urllib.request.urlopen(request)

response.read() # final response string

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.