Parse an HTTP request Authorization header with Python

Question

I need to take a header like this:

 Authorization: Digest qop="chap",
     realm="[email protected]",
     username="Foobear",
     response="6629fae49393a05397450978507c4ef1",
     cnonce="5ccc069c403ebaf9f0171e9517f40e41"

And parse it into this using Python:

{'protocol':'Digest',
  'qop':'chap',
  'realm':'[email protected]',
  'username':'Foobear',
  'response':'6629fae49393a05397450978507c4ef1',
  'cnonce':'5ccc069c403ebaf9f0171e9517f40e41'}

Is there a library to do this, or something I could look at for inspiration?

I'm doing this on Google App Engine, and I'm not sure if the Pyparsing library is available, but maybe I could include it with my app if it is the best solution.

Currently I'm creating my own MyHeaderParser object and using it with reduce() on the header string. It's working, but very fragile.

Brilliant solution by nadia below:

import re

reg = re.compile('(\w+)[=] ?"?(\w+)"?')

s = """Digest
realm="stackoverflow.com", username="kixx"
"""

print str(dict(reg.findall(s)))

So far this solution has proven on only to be super clean, but also very robust. While not the most "by the book" implementation of the RFC, I've yet to build a test case that returns invalid values. However, I am only using this to parse the Authorization header, nonce of the other headers I'm interested in need parsing, so this may not be a good solution as a general HTTP header parser. — Kris Walker
– Kris Walker, Commented Sep 4, 2009 at 11:35
I came here looking for an full-fledged RFC-ified parser. Your question and the answer by @PaulMcG got me on the right path (see my answer below). Thank you both! — biscuit314
– biscuit314, Commented Sep 23, 2018 at 2:02

Nadia Alramli · Accepted Answer · 2009-08-28 21:40:19Z

14

A little regex:

import re
reg=re.compile('(\w+)[:=] ?"?(\w+)"?')

>>>dict(reg.findall(headers))

{'username': 'Foobear', 'realm': 'testrealm', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'response': '6629fae49393a05397450978507c4ef1', 'Authorization': 'Digest'}

answered Aug 28, 2009 at 21:40

Nadia Alramli

116k39 gold badges176 silver badges152 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Kris Walker Over a year ago

Wow, I love Python. "Authorization:" is not actually part of the header string, so I did this instead: #! /usr/bin/env python import re def mymain(): reg = re.compile('(\w+)[=] ?"?(\w+)"?') s = """Digest realm="fireworksproject.com", username="kristoffer" """ print str(dict(reg.findall(s))) if name == 'main': mymain() I'm not getting the "Digest" protocol declaration, but I don't need it anyway. Essentially 3 lines of code... Brilliant!!!

Bastien Léonard Over a year ago

I think it would be more explicit to use a raw string or \\.

Rudie Over a year ago

Actually the " aren't mandatory (algorithm for example usually doesn't delimit its value with ") and a value itself can contain escaped ". "? is a bit risky =) (I asked the same question for PHP.)

Sam Alba Over a year ago

More tolerant version: re.compile(r'(\w+)[:=][\s"]?([^",]+)"?')

alitayyeb Over a year ago

In my case (Docker API) this regex pattern is better: ([^\s,]+) ?[=] ?"?([^\s,"]+)"? because the parameters values may contain : and . for example:

Bearer realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:bitnamicharts/nginx:pull",error="insufficient_scope"

|

Glorfindel · Accepted Answer · 2021-01-14 06:07:00Z

10

You can also use urllib2 as [CheryPy][1] does.

here is the snippet:

input= """
 Authorization: Digest qop="chap",
     realm="[email protected]",
     username="Foobear",
     response="6629fae49393a05397450978507c4ef1",
     cnonce="5ccc069c403ebaf9f0171e9517f40e41"
"""
import urllib2
field, sep, value = input.partition("Authorization: Digest ")
if value:
    items = urllib2.parse_http_list(value)
    opts = urllib2.parse_keqv_list(items)
    opts['protocol'] = 'Digest'
    print opts

it outputs:

{'username': 'Foobear', 'protocol': 'Digest', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'realm': '[email protected]', 'response': '6629fae49393a05397450978507c4ef1'}

[1]: https://web.archive.org/web/20130118133623/http://www.google.com:80/codesearch/p?hl=en#OQvO9n2mc04/CherryPy-3.0.1/cherrypy/lib/httpauth.py&q=Authorization Digest http lang:python

edited Jan 14, 2021 at 6:07

Glorfindel

22.8k13 gold badges97 silver badges124 bronze badges

answered Aug 28, 2009 at 22:11

Piotr Czapla

26.8k26 gold badges106 silver badges123 bronze badges

2 Comments

kbolino Over a year ago

In Python 3, these functions still exist (though they aren't documented) but they're in urllib.request instead of urllib2

Pyprohly Over a year ago

Warning: urllib.request is one of the heaviest imports in the Python standard library. If you’re just using these two functions it might not be worth it.

PaulMcG · Accepted Answer · 2009-09-04 09:40:06Z

3

Here's my pyparsing attempt:

text = """Authorization: Digest qop="chap",
    realm="[email protected]",     
    username="Foobear",     
    response="6629fae49393a05397450978507c4ef1",     
    cnonce="5ccc069c403ebaf9f0171e9517f40e41" """

from pyparsing import *

AUTH = Keyword("Authorization")
ident = Word(alphas,alphanums)
EQ = Suppress("=")
quotedString.setParseAction(removeQuotes)

valueDict = Dict(delimitedList(Group(ident + EQ + quotedString)))
authentry = AUTH + ":" + ident("protocol") + valueDict

print authentry.parseString(text).dump()

which prints:

['Authorization', ':', 'Digest', ['qop', 'chap'], ['realm', '[email protected]'],
 ['username', 'Foobear'], ['response', '6629fae49393a05397450978507c4ef1'], 
 ['cnonce', '5ccc069c403ebaf9f0171e9517f40e41']]
- cnonce: 5ccc069c403ebaf9f0171e9517f40e41
- protocol: Digest
- qop: chap
- realm: [email protected]
- response: 6629fae49393a05397450978507c4ef1
- username: Foobear

I'm not familiar with the RFC, but I hope this gets you rolling.

answered Sep 4, 2009 at 9:40

PaulMcG

64.1k16 gold badges98 silver badges135 bronze badges

1 Comment

Kris Walker Over a year ago

This solution is the use of pyparsing that I was originally thinking of, and, as far as I can tell, it produces nice results.

Community · Accepted Answer · 2021-10-07 10:53:52Z

An older question but one I found very helpful.

(edit after recent upvote) I've created a package that builds on this answer (link to tests to see how to use the class in the package).
pip install authparser

I needed a parser to handle any properly formed Authorization header, as defined by RFC7235 (raise your hand if you enjoy reading ABNF).

Authorization = credentials

BWS = <BWS, see [RFC7230], Section 3.2.3>

OWS = <OWS, see [RFC7230], Section 3.2.3>

Proxy-Authenticate = *( "," OWS ) challenge *( OWS "," [ OWS
 challenge ] )
Proxy-Authorization = credentials

WWW-Authenticate = *( "," OWS ) challenge *( OWS "," [ OWS challenge
 ] )

auth-param = token BWS "=" BWS ( token / quoted-string )
auth-scheme = token

challenge = auth-scheme [ 1*SP ( token68 / [ ( "," / auth-param ) *(
 OWS "," [ OWS auth-param ] ) ] ) ]
credentials = auth-scheme [ 1*SP ( token68 / [ ( "," / auth-param )
 *( OWS "," [ OWS auth-param ] ) ] ) ]

quoted-string = <quoted-string, see [RFC7230], Section 3.2.6>

token = <token, see [RFC7230], Section 3.2.6>
token68 = 1*( ALPHA / DIGIT / "-" / "." / "_" / "~" / "+" / "/" )
 *"="

Starting with PaulMcG's answer, I came up with this:

import pyparsing as pp

tchar = '!#$%&\'*+-.^_`|~' + pp.nums + pp.alphas
t68char = '-._~+/' + pp.nums + pp.alphas

token = pp.Word(tchar)
token68 = pp.Combine(pp.Word(t68char) + pp.ZeroOrMore('='))

scheme = token('scheme')

header = pp.Keyword('Authorization')
name = pp.Word(pp.alphas, pp.alphanums)
value = pp.quotedString.setParseAction(pp.removeQuotes)
name_value_pair = name + pp.Suppress('=') + value
params = pp.Dict(pp.delimitedList(pp.Group(name_value_pair)))

credentials = scheme + (token68('token') ^ params('params'))

auth_parser = header + pp.Suppress(':') + credentials

This allows for parsing any Authorization header:

parsed = auth_parser.parseString('Authorization: Basic Zm9vOmJhcg==')
print('Authenticating with {0} scheme, token: {1}'.format(parsed['scheme'], parsed['token']))

which outputs:

Authenticating with Basic scheme, token: Zm9vOmJhcg==

Bringing it all together into an Authenticator class:

import pyparsing as pp
from base64 import b64decode
import re

class Authenticator:
    def __init__(self):
        """
        Use pyparsing to create a parser for Authentication headers
        """
        tchar = "!#$%&'*+-.^_`|~" + pp.nums + pp.alphas
        t68char = '-._~+/' + pp.nums + pp.alphas

        token = pp.Word(tchar)
        token68 = pp.Combine(pp.Word(t68char) + pp.ZeroOrMore('='))

        scheme = token('scheme')

        auth_header = pp.Keyword('Authorization')
        name = pp.Word(pp.alphas, pp.alphanums)
        value = pp.quotedString.setParseAction(pp.removeQuotes)
        name_value_pair = name + pp.Suppress('=') + value
        params = pp.Dict(pp.delimitedList(pp.Group(name_value_pair)))

        credentials = scheme + (token68('token') ^ params('params'))

        # the moment of truth...
        self.auth_parser = auth_header + pp.Suppress(':') + credentials


    def authenticate(self, auth_header):
        """
        Parse auth_header and call the correct authentication handler
        """
        authenticated = False
        try:
            parsed = self.auth_parser.parseString(auth_header)
            scheme = parsed['scheme']
            details = parsed['token'] if 'token' in parsed.keys() else parsed['params']

            print('Authenticating using {0} scheme'.format(scheme))
            try:
                safe_scheme = re.sub("[!#$%&'*+-.^_`|~]", '_', scheme.lower())
                handler = getattr(self, 'auth_handle_' + safe_scheme)
                authenticated = handler(details)
            except AttributeError:
                print('This is a valid Authorization header, but we do not handle this scheme yet.')

        except pp.ParseException as ex:
            print('Not a valid Authorization header')
            print(ex)

        return authenticated


    # The following methods are fake, of course.  They should use what's passed
    # to them to actually authenticate, and return True/False if successful.
    # For this demo I'll just print some of the values used to authenticate.
    @staticmethod
    def auth_handle_basic(token):
        print('- token is {0}'.format(token))
        try:
            username, password = b64decode(token).decode().split(':', 1)
        except Exception:
            raise DecodeError
        print('- username is {0}'.format(username))
        print('- password is {0}'.format(password))
        return True

    @staticmethod
    def auth_handle_bearer(token):
        print('- token is {0}'.format(token))
        return True

    @staticmethod
    def auth_handle_digest(params):
        print('- username is {0}'.format(params['username']))
        print('- cnonce is {0}'.format(params['cnonce']))
        return True

    @staticmethod
    def auth_handle_aws4_hmac_sha256(params):
        print('- Signature is {0}'.format(params['Signature']))
        return True

To test this class:

tests = [
    'Authorization: Digest qop="chap", realm="[email protected]", username="Foobar", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41"',
    'Authorization: Bearer cn389ncoiwuencr',
    'Authorization: Basic Zm9vOmJhcg==',
    'Authorization: AWS4-HMAC-SHA256 Credential="AKIAIOSFODNN7EXAMPLE/20130524/us-east-1/s3/aws4_request", SignedHeaders="host;range;x-amz-date", Signature="fe5f80f77d5fa3beca038a248ff027d0445342fe2855ddc963176630326f1024"',
    'Authorization: CrazyCustom foo="bar", fizz="buzz"',
]

authenticator = Authenticator()

for test in tests:
    authenticator.authenticate(test)
    print()

Which outputs:

Authenticating using Digest scheme
- username is Foobar
- cnonce is 5ccc069c403ebaf9f0171e9517f40e41

Authenticating using Bearer scheme
- token is cn389ncoiwuencr

Authenticating using Basic scheme
- token is Zm9vOmJhcg==
- username is foo
- password is bar

Authenticating using AWS4-HMAC-SHA256 scheme
- signature is fe5f80f77d5fa3beca038a248ff027d0445342fe2855ddc963176630326f1024

Authenticating using CrazyCustom scheme 
This is a valid Authorization header, but we do not handle this scheme yet.

In future if we wish to handle CrazyCustom we'll just add

def auth_handle_crazycustom(params):

Ned Batchelder · Accepted Answer · 2009-08-28 21:36:41Z

1

If those components will always be there, then a regex will do the trick:

test = '''Authorization: Digest qop="chap", realm="[email protected]", username="Foobear", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41"'''

import re

re_auth = re.compile(r"""
    Authorization:\s*(?P<protocol>[^ ]+)\s+
    qop="(?P<qop>[^"]+)",\s+
    realm="(?P<realm>[^"]+)",\s+
    username="(?P<username>[^"]+)",\s+
    response="(?P<response>[^"]+)",\s+
    cnonce="(?P<cnonce>[^"]+)"
    """, re.VERBOSE)

m = re_auth.match(test)
print m.groupdict()

produces:

{ 'username': 'Foobear', 
  'protocol': 'Digest', 
  'qop': 'chap', 
  'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 
  'realm': '[email protected]', 
  'response': '6629fae49393a05397450978507c4ef1'
}

answered Aug 28, 2009 at 21:36

Ned Batchelder

378k77 gold badges583 silver badges675 bronze badges

1 Comment

Kris Walker Over a year ago

This solution produces correct results as far as I've been able to see.

Piotr Czapla · Accepted Answer · 2009-08-28 21:38:11Z

I would recommend finding a correct library for parsing http headers unfortunately I can't reacall any. :(

For a while check the snippet below (it should mostly work):

input= """
 Authorization: Digest qop="chap",
     realm="[email protected]",
     username="Foob,ear",
     response="6629fae49393a05397450978507c4ef1",
     cnonce="5ccc069c403ebaf9f0171e9517f40e41"
"""

field, sep, value = input.partition(":")
if field.endswith('Authorization'):
   protocol, sep, opts_str = value.strip().partition(" ")

   opts = {}
   for opt in opts_str.split(",\n"):
        key, value = opt.strip().split('=')
        key = key.strip(" ")
        value = value.strip(' "')
        opts[key] = value

   opts['protocol'] = protocol

   print opts

Community · Accepted Answer · 2017-05-23 12:30:26Z

1

Your original concept of using PyParsing would be the best approach. What you've implicitly asked for is something that requires a grammar... that is, a regular expression or simple parsing routine is always going to be brittle, and that sounds like it's something you're trying to avoid.

It appears that getting pyparsing on google app engine is easy: How do I get PyParsing set up on the Google App Engine?

So I'd go with that, and then implement the full HTTP authentication/authorization header support from rfc2617.

edited May 23, 2017 at 12:30

CommunityBot

11 silver badge

answered Aug 28, 2009 at 21:42

Jason R. Coombs

43.1k11 gold badges87 silver badges96 bronze badges

3 Comments

Jason R. Coombs Over a year ago

I decided to take this approach and tried to implement a fully-compliant parser for the Authorization header using the RFC spec. This task appears to be much more daunting than I had anticpated. Your choice of the simple regex, while not rigorously correct, is probably the best pragmatic solution. I'll report back here if I eventually get a fully-functional header parser.

Kris Walker Over a year ago

Yeah, it would be nice to see something more rigorously correct.

biscuit314 Over a year ago

Hi Jason - if you're still looking, see my answer. PyParsing is amazing!

Community · Accepted Answer · 2021-10-07 06:09:01Z

1

The http digest Authorization header field is a bit of an odd beast. Its format is similar to that of rfc 2616's Cache-Control and Content-Type header fields, but just different enough to be incompatible. If you're still looking for a library that's a little smarter and more readable than the regex, you might try removing the Authorization: Digest part with str.split() and parsing the rest with parse_dict_header() from Werkzeug's http module. (Werkzeug can be installed on App Engine.)

edited Oct 7, 2021 at 6:09

CommunityBot

11 silver badge

answered May 14, 2010 at 0:13

ʇsәɹoɈ

23.7k7 gold badges58 silver badges62 bronze badges

1 Comment

Kris Walker Over a year ago

Thanks a lot. I may replace that regex with this. It seems more robust.

Brian McFarland · Accepted Answer · 2011-09-13 15:09:58Z

Nadia's regex only matches alphanumeric characters for the value of a parameter. That means it fails to parse at least two fields. Namely, the uri and qop. According to RFC 2617, the uri field is a duplicate of the string in the request line (i.e. the first line of the HTTP request). And qop fails to parse correctly if the value is "auth-int" due to the non-alphanumeric '-'.

This modified regex allows the URI (or any other value) to contain anything but ' ' (space), '"' (qoute), or ',' (comma). That's probably more permissive than it needs to be, but shouldn't cause any problems with correctly formed HTTP requests.

reg re.compile('(\w+)[:=] ?"?([^" ,]+)"?')

Bonus tip: From there, it's fairly straight forward to convert the example code in RFC-2617 to python. Using python's md5 API, "MD5Init()" becomes "m = md5.new()", "MD5Update()" becomes "m.update()" and "MD5Final()" becomes "m.digest()".

Pinochle · Accepted Answer · 2009-08-28 21:49:25Z

0

If your response comes in a single string that that never varies and has as many lines as there are expressions to match, you can split it into an array on the newlines called authentication_array and use regexps:

pattern_array = ['qop', 'realm', 'username', 'response', 'cnonce']
i = 0
parsed_dict = {}

for line in authentication_array:
    pattern = "(" + pattern_array[i] + ")" + "=(\".*\")" # build a matching pattern
    match = re.search(re.compile(pattern), line)         # make the match
    if match:
        parsed_dict[match.group(1)] = match.group(2)
    i += 1

edited Aug 28, 2009 at 21:49

answered Aug 28, 2009 at 21:38

Pinochle

5,5732 gold badges28 silver badges20 bronze badges

Collectives™ on Stack Overflow

Parse an HTTP request Authorization header with Python

10 Answers 10

6 Comments

2 Comments

1 Comment

Comments

1 Comment

Comments

3 Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

6 Comments

2 Comments

1 Comment

Comments

1 Comment

Comments

3 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related