14

I need to take a header like this:

 Authorization: Digest qop="chap",
     realm="[email protected]",
     username="Foobear",
     response="6629fae49393a05397450978507c4ef1",
     cnonce="5ccc069c403ebaf9f0171e9517f40e41"

And parse it into this using Python:

{'protocol':'Digest',
  'qop':'chap',
  'realm':'[email protected]',
  'username':'Foobear',
  'response':'6629fae49393a05397450978507c4ef1',
  'cnonce':'5ccc069c403ebaf9f0171e9517f40e41'}

Is there a library to do this, or something I could look at for inspiration?

I'm doing this on Google App Engine, and I'm not sure if the Pyparsing library is available, but maybe I could include it with my app if it is the best solution.

Currently I'm creating my own MyHeaderParser object and using it with reduce() on the header string. It's working, but very fragile.

Brilliant solution by nadia below:

import re

reg = re.compile('(\w+)[=] ?"?(\w+)"?')

s = """Digest
realm="stackoverflow.com", username="kixx"
"""

print str(dict(reg.findall(s)))
2
  • So far this solution has proven on only to be super clean, but also very robust. While not the most "by the book" implementation of the RFC, I've yet to build a test case that returns invalid values. However, I am only using this to parse the Authorization header, nonce of the other headers I'm interested in need parsing, so this may not be a good solution as a general HTTP header parser. Commented Sep 4, 2009 at 11:35
  • I came here looking for an full-fledged RFC-ified parser. Your question and the answer by @PaulMcG got me on the right path (see my answer below). Thank you both! Commented Sep 23, 2018 at 2:02

10 Answers 10

14

A little regex:

import re
reg=re.compile('(\w+)[:=] ?"?(\w+)"?')

>>>dict(reg.findall(headers))

{'username': 'Foobear', 'realm': 'testrealm', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'response': '6629fae49393a05397450978507c4ef1', 'Authorization': 'Digest'}
Sign up to request clarification or add additional context in comments.

6 Comments

Wow, I love Python. "Authorization:" is not actually part of the header string, so I did this instead: #! /usr/bin/env python import re def mymain(): reg = re.compile('(\w+)[=] ?"?(\w+)"?') s = """Digest realm="fireworksproject.com", username="kristoffer" """ print str(dict(reg.findall(s))) if name == 'main': mymain() I'm not getting the "Digest" protocol declaration, but I don't need it anyway. Essentially 3 lines of code... Brilliant!!!
I think it would be more explicit to use a raw string or \\.
Actually the " aren't mandatory (algorithm for example usually doesn't delimit its value with ") and a value itself can contain escaped ". "? is a bit risky =) (I asked the same question for PHP.)
More tolerant version: re.compile(r'(\w+)[:=][\s"]?([^",]+)"?')
In my case (Docker API) this regex pattern is better: ([^\s,]+) ?[=] ?"?([^\s,"]+)"? because the parameters values may contain : and . for example: Bearer realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:bitnamicharts/nginx:pull",error="insufficient_scope"
|
10

You can also use urllib2 as [CheryPy][1] does.

here is the snippet:

input= """
 Authorization: Digest qop="chap",
     realm="[email protected]",
     username="Foobear",
     response="6629fae49393a05397450978507c4ef1",
     cnonce="5ccc069c403ebaf9f0171e9517f40e41"
"""
import urllib2
field, sep, value = input.partition("Authorization: Digest ")
if value:
    items = urllib2.parse_http_list(value)
    opts = urllib2.parse_keqv_list(items)
    opts['protocol'] = 'Digest'
    print opts

it outputs:

{'username': 'Foobear', 'protocol': 'Digest', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'realm': '[email protected]', 'response': '6629fae49393a05397450978507c4ef1'}

[1]: https://web.archive.org/web/20130118133623/http://www.google.com:80/codesearch/p?hl=en#OQvO9n2mc04/CherryPy-3.0.1/cherrypy/lib/httpauth.py&q=Authorization Digest http lang:python

2 Comments

In Python 3, these functions still exist (though they aren't documented) but they're in urllib.request instead of urllib2
Warning: urllib.request is one of the heaviest imports in the Python standard library. If you’re just using these two functions it might not be worth it.
3

Here's my pyparsing attempt:

text = """Authorization: Digest qop="chap",
    realm="[email protected]",     
    username="Foobear",     
    response="6629fae49393a05397450978507c4ef1",     
    cnonce="5ccc069c403ebaf9f0171e9517f40e41" """

from pyparsing import *

AUTH = Keyword("Authorization")
ident = Word(alphas,alphanums)
EQ = Suppress("=")
quotedString.setParseAction(removeQuotes)

valueDict = Dict(delimitedList(Group(ident + EQ + quotedString)))
authentry = AUTH + ":" + ident("protocol") + valueDict

print authentry.parseString(text).dump()

which prints:

['Authorization', ':', 'Digest', ['qop', 'chap'], ['realm', '[email protected]'],
 ['username', 'Foobear'], ['response', '6629fae49393a05397450978507c4ef1'], 
 ['cnonce', '5ccc069c403ebaf9f0171e9517f40e41']]
- cnonce: 5ccc069c403ebaf9f0171e9517f40e41
- protocol: Digest
- qop: chap
- realm: [email protected]
- response: 6629fae49393a05397450978507c4ef1
- username: Foobear

I'm not familiar with the RFC, but I hope this gets you rolling.

1 Comment

This solution is the use of pyparsing that I was originally thinking of, and, as far as I can tell, it produces nice results.
2

An older question but one I found very helpful.

(edit after recent upvote) I've created a package that builds on this answer (link to tests to see how to use the class in the package).

pip install authparser

I needed a parser to handle any properly formed Authorization header, as defined by RFC7235 (raise your hand if you enjoy reading ABNF).

Authorization = credentials

BWS = <BWS, see [RFC7230], Section 3.2.3>

OWS = <OWS, see [RFC7230], Section 3.2.3>

Proxy-Authenticate = *( "," OWS ) challenge *( OWS "," [ OWS
 challenge ] )
Proxy-Authorization = credentials

WWW-Authenticate = *( "," OWS ) challenge *( OWS "," [ OWS challenge
 ] )

auth-param = token BWS "=" BWS ( token / quoted-string )
auth-scheme = token

challenge = auth-scheme [ 1*SP ( token68 / [ ( "," / auth-param ) *(
 OWS "," [ OWS auth-param ] ) ] ) ]
credentials = auth-scheme [ 1*SP ( token68 / [ ( "," / auth-param )
 *( OWS "," [ OWS auth-param ] ) ] ) ]

quoted-string = <quoted-string, see [RFC7230], Section 3.2.6>

token = <token, see [RFC7230], Section 3.2.6>
token68 = 1*( ALPHA / DIGIT / "-" / "." / "_" / "~" / "+" / "/" )
 *"="

Starting with PaulMcG's answer, I came up with this:

import pyparsing as pp

tchar = '!#$%&\'*+-.^_`|~' + pp.nums + pp.alphas
t68char = '-._~+/' + pp.nums + pp.alphas

token = pp.Word(tchar)
token68 = pp.Combine(pp.Word(t68char) + pp.ZeroOrMore('='))

scheme = token('scheme')

header = pp.Keyword('Authorization')
name = pp.Word(pp.alphas, pp.alphanums)
value = pp.quotedString.setParseAction(pp.removeQuotes)
name_value_pair = name + pp.Suppress('=') + value
params = pp.Dict(pp.delimitedList(pp.Group(name_value_pair)))

credentials = scheme + (token68('token') ^ params('params'))

auth_parser = header + pp.Suppress(':') + credentials

This allows for parsing any Authorization header:

parsed = auth_parser.parseString('Authorization: Basic Zm9vOmJhcg==')
print('Authenticating with {0} scheme, token: {1}'.format(parsed['scheme'], parsed['token']))

which outputs:

Authenticating with Basic scheme, token: Zm9vOmJhcg==

Bringing it all together into an Authenticator class:

import pyparsing as pp
from base64 import b64decode
import re

class Authenticator:
    def __init__(self):
        """
        Use pyparsing to create a parser for Authentication headers
        """
        tchar = "!#$%&'*+-.^_`|~" + pp.nums + pp.alphas
        t68char = '-._~+/' + pp.nums + pp.alphas

        token = pp.Word(tchar)
        token68 = pp.Combine(pp.Word(t68char) + pp.ZeroOrMore('='))

        scheme = token('scheme')

        auth_header = pp.Keyword('Authorization')
        name = pp.Word(pp.alphas, pp.alphanums)
        value = pp.quotedString.setParseAction(pp.removeQuotes)
        name_value_pair = name + pp.Suppress('=') + value
        params = pp.Dict(pp.delimitedList(pp.Group(name_value_pair)))

        credentials = scheme + (token68('token') ^ params('params'))

        # the moment of truth...
        self.auth_parser = auth_header + pp.Suppress(':') + credentials


    def authenticate(self, auth_header):
        """
        Parse auth_header and call the correct authentication handler
        """
        authenticated = False
        try:
            parsed = self.auth_parser.parseString(auth_header)
            scheme = parsed['scheme']
            details = parsed['token'] if 'token' in parsed.keys() else parsed['params']

            print('Authenticating using {0} scheme'.format(scheme))
            try:
                safe_scheme = re.sub("[!#$%&'*+-.^_`|~]", '_', scheme.lower())
                handler = getattr(self, 'auth_handle_' + safe_scheme)
                authenticated = handler(details)
            except AttributeError:
                print('This is a valid Authorization header, but we do not handle this scheme yet.')

        except pp.ParseException as ex:
            print('Not a valid Authorization header')
            print(ex)

        return authenticated


    # The following methods are fake, of course.  They should use what's passed
    # to them to actually authenticate, and return True/False if successful.
    # For this demo I'll just print some of the values used to authenticate.
    @staticmethod
    def auth_handle_basic(token):
        print('- token is {0}'.format(token))
        try:
            username, password = b64decode(token).decode().split(':', 1)
        except Exception:
            raise DecodeError
        print('- username is {0}'.format(username))
        print('- password is {0}'.format(password))
        return True

    @staticmethod
    def auth_handle_bearer(token):
        print('- token is {0}'.format(token))
        return True

    @staticmethod
    def auth_handle_digest(params):
        print('- username is {0}'.format(params['username']))
        print('- cnonce is {0}'.format(params['cnonce']))
        return True

    @staticmethod
    def auth_handle_aws4_hmac_sha256(params):
        print('- Signature is {0}'.format(params['Signature']))
        return True

To test this class:

tests = [
    'Authorization: Digest qop="chap", realm="[email protected]", username="Foobar", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41"',
    'Authorization: Bearer cn389ncoiwuencr',
    'Authorization: Basic Zm9vOmJhcg==',
    'Authorization: AWS4-HMAC-SHA256 Credential="AKIAIOSFODNN7EXAMPLE/20130524/us-east-1/s3/aws4_request", SignedHeaders="host;range;x-amz-date", Signature="fe5f80f77d5fa3beca038a248ff027d0445342fe2855ddc963176630326f1024"',
    'Authorization: CrazyCustom foo="bar", fizz="buzz"',
]

authenticator = Authenticator()

for test in tests:
    authenticator.authenticate(test)
    print()

Which outputs:

Authenticating using Digest scheme
- username is Foobar
- cnonce is 5ccc069c403ebaf9f0171e9517f40e41

Authenticating using Bearer scheme
- token is cn389ncoiwuencr

Authenticating using Basic scheme
- token is Zm9vOmJhcg==
- username is foo
- password is bar

Authenticating using AWS4-HMAC-SHA256 scheme
- signature is fe5f80f77d5fa3beca038a248ff027d0445342fe2855ddc963176630326f1024

Authenticating using CrazyCustom scheme 
This is a valid Authorization header, but we do not handle this scheme yet.

In future if we wish to handle CrazyCustom we'll just add

def auth_handle_crazycustom(params):

Comments

1

If those components will always be there, then a regex will do the trick:

test = '''Authorization: Digest qop="chap", realm="[email protected]", username="Foobear", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41"'''

import re

re_auth = re.compile(r"""
    Authorization:\s*(?P<protocol>[^ ]+)\s+
    qop="(?P<qop>[^"]+)",\s+
    realm="(?P<realm>[^"]+)",\s+
    username="(?P<username>[^"]+)",\s+
    response="(?P<response>[^"]+)",\s+
    cnonce="(?P<cnonce>[^"]+)"
    """, re.VERBOSE)

m = re_auth.match(test)
print m.groupdict()

produces:

{ 'username': 'Foobear', 
  'protocol': 'Digest', 
  'qop': 'chap', 
  'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 
  'realm': '[email protected]', 
  'response': '6629fae49393a05397450978507c4ef1'
}

1 Comment

This solution produces correct results as far as I've been able to see.
1

I would recommend finding a correct library for parsing http headers unfortunately I can't reacall any. :(

For a while check the snippet below (it should mostly work):

input= """
 Authorization: Digest qop="chap",
     realm="[email protected]",
     username="Foob,ear",
     response="6629fae49393a05397450978507c4ef1",
     cnonce="5ccc069c403ebaf9f0171e9517f40e41"
"""

field, sep, value = input.partition(":")
if field.endswith('Authorization'):
   protocol, sep, opts_str = value.strip().partition(" ")

   opts = {}
   for opt in opts_str.split(",\n"):
        key, value = opt.strip().split('=')
        key = key.strip(" ")
        value = value.strip(' "')
        opts[key] = value

   opts['protocol'] = protocol

   print opts

Comments

1

Your original concept of using PyParsing would be the best approach. What you've implicitly asked for is something that requires a grammar... that is, a regular expression or simple parsing routine is always going to be brittle, and that sounds like it's something you're trying to avoid.

It appears that getting pyparsing on google app engine is easy: How do I get PyParsing set up on the Google App Engine?

So I'd go with that, and then implement the full HTTP authentication/authorization header support from rfc2617.

3 Comments

I decided to take this approach and tried to implement a fully-compliant parser for the Authorization header using the RFC spec. This task appears to be much more daunting than I had anticpated. Your choice of the simple regex, while not rigorously correct, is probably the best pragmatic solution. I'll report back here if I eventually get a fully-functional header parser.
Yeah, it would be nice to see something more rigorously correct.
Hi Jason - if you're still looking, see my answer. PyParsing is amazing!
1

The http digest Authorization header field is a bit of an odd beast. Its format is similar to that of rfc 2616's Cache-Control and Content-Type header fields, but just different enough to be incompatible. If you're still looking for a library that's a little smarter and more readable than the regex, you might try removing the Authorization: Digest part with str.split() and parsing the rest with parse_dict_header() from Werkzeug's http module. (Werkzeug can be installed on App Engine.)

1 Comment

Thanks a lot. I may replace that regex with this. It seems more robust.
1

Nadia's regex only matches alphanumeric characters for the value of a parameter. That means it fails to parse at least two fields. Namely, the uri and qop. According to RFC 2617, the uri field is a duplicate of the string in the request line (i.e. the first line of the HTTP request). And qop fails to parse correctly if the value is "auth-int" due to the non-alphanumeric '-'.

This modified regex allows the URI (or any other value) to contain anything but ' ' (space), '"' (qoute), or ',' (comma). That's probably more permissive than it needs to be, but shouldn't cause any problems with correctly formed HTTP requests.

reg re.compile('(\w+)[:=] ?"?([^" ,]+)"?')

Bonus tip: From there, it's fairly straight forward to convert the example code in RFC-2617 to python. Using python's md5 API, "MD5Init()" becomes "m = md5.new()", "MD5Update()" becomes "m.update()" and "MD5Final()" becomes "m.digest()".

Comments

0

If your response comes in a single string that that never varies and has as many lines as there are expressions to match, you can split it into an array on the newlines called authentication_array and use regexps:

pattern_array = ['qop', 'realm', 'username', 'response', 'cnonce']
i = 0
parsed_dict = {}

for line in authentication_array:
    pattern = "(" + pattern_array[i] + ")" + "=(\".*\")" # build a matching pattern
    match = re.search(re.compile(pattern), line)         # make the match
    if match:
        parsed_dict[match.group(1)] = match.group(2)
    i += 1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.