Reverse of Python string formatting: generating a dict from a string with named parameters

Question

I have a string like this, where symbol and property vary:

a = '/stock/%(symbol)s/%(property)s'

I have another string like this, where AAPL and price vary:

b = '/stock/AAPL/price'

I'm trying to generate a dict like this:

c = {
    'symbol': 'AAPL',
    'property': 'price'
}

With string formatting, I could do a this:

> a % c == b
True

But I'm trying to go the other direction. Time for some regex magic?

Are you sure you don't want your dictionary to be D = {'APPL' : price} so you can look up price by symbol? Otherwise you will need a new dictionary for each stock. — beroe
– beroe, Commented Aug 21, 2013 at 16:26
I'm assuming (unlike other answers so far) that your first-string doesn't necessarily say symbol and/or property, e.g., it might read /zog/%(evil)s=%(level)s,%(flavor)s. Is that the case? — torek
– torek, Commented Aug 21, 2013 at 16:33
Do you have control of the format of a? If you use a more modern interpolation style, certain things become easier. — DSM
– DSM, Commented Aug 21, 2013 at 17:50
@DSM I might be able to control it. What format would be easier? — nathancahill
– nathancahill, Commented Aug 21, 2013 at 17:51
By control, I mean the %(symbol)s part. The slashes aren't changeable. — nathancahill
– nathancahill, Commented Aug 21, 2013 at 17:52

moliware · Accepted Answer · 2013-08-21 16:32:11Z

4

A solution with regular expressions:

>>> import re
>>> b = '/stock/AAPL/price'
>>> result = re.match('/.*?/(?P<symbol>.*?)/(?P<property>.*)', b)
>>> result.groupdict()
{'symbol': 'AAPL', 'property': 'price'}

You can adjust a bit more the regular expression but, in essence, this is the idea.

answered Aug 21, 2013 at 16:32

moliware

10.4k3 gold badges39 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ashwini Chaudhary · Accepted Answer · 2013-08-21 17:43:02Z

2

This is similar to @moliware's solution, but there's no hard-coding of keys required in this solution:

import re

class mydict(dict):
    def __missing__(self, key):
        self.setdefault(key, '')
        return ''

def solve(a, b):
    dic = mydict()
    a % dic
    strs = a
    for x in dic:
        esc = re.escape(x)
        strs = re.sub(r'(%\({}\).)'.format(esc), '(?P<{}>.*)'.format(esc), strs)
    return re.search(strs, b).groupdict()

if __name__ == '__main__':
    a = '/stock/%(symbol)s/%(property)s'
    b = '/stock/AAPL/price'
    print solve(a, b)
    a = "Foo %(bar)s spam %(eggs)s %(python)s"
    b = 'Foo BAR spam 10 3.x'
    print solve(a, b)

Output:

{'symbol': 'AAPL', 'property': 'price'}
{'python': '3.x', 'eggs': '10', 'bar': 'BAR'}

As @torek pointed out for cases with ambiguous output(no space between keys) the answer can be wrong here.

For eg.

a = 'leading/%(A)s%(B)s/trailing'
b = 'leading/helloworld/trailing'

Here looking at just b it's hard to tell the actual value of either either A or B.

edited Aug 21, 2013 at 17:43

answered Aug 21, 2013 at 17:09

Ashwini Chaudhary

252k60 gold badges478 silver badges519 bronze badges

5 Comments

torek Over a year ago

Note: you'll need dic=mydict() and I get '/stock/None/None' as the value in the call used to populate dic. I'd just do dic = collections.defaultdict(str) though. (Oops, you fixed the missing dic= part while I was typing the comment.)

torek Over a year ago

BTW this works really well (it's the way to go here) but there are indistinguishable variations for which this just picks "any solution that works", e.g., a = 'leading/%(A)s%(B)s/trailing' and b = 'leading/helloworld/trailing'. This chooses A='helloworld' and B=''. (And if string b is cannot be generated by format a regardless of dictionary values, the re.search() returns None.)

Ashwini Chaudhary Over a year ago

@torek Good test case, I think this one can be considered ambiguous too because B can be either '' or 'helloworld' or a can be '' or 'helloworld'.(So a space or some other character is required between two keys to get correct answer). Another issues is returning a str for missing keys would raise error for %d or other directives, I am not sure how to fix that.

Ashwini Chaudhary Over a year ago

@torek I think I could use a couple of try-except blocks to catch those type mismatch errors and pass some other default value.

nathancahill Over a year ago

The last example won't be a problem, there will always be some sort of delimiter between keys.

tdelaney · Accepted Answer · 2013-08-21 16:58:11Z

2

Assuming well-behaved input, you could just split the strings and zip them to a dict

keys = ('symbol', 'property')
b = '/stock/AAPL/price'
dict(zip(keys, b.split('/')[2:4]))

answered Aug 21, 2013 at 16:58

tdelaney

77.9k6 gold badges91 silver badges129 bronze badges

4 Comments

Kirk Strauser Over a year ago

I came up with the letter-for-letter same solution. str.split is almost always going to be many times more time-efficient than the re-based equivalent.

tdelaney Over a year ago

@KirkStrauser - yeah, there's a hundred ways to parse strings, but I like the simple solutions.

torek Over a year ago

As long as it's always slashes, and slashes don't appear in the output from some key(s). If the output might begin with, e.g., /nyse/stock/ (vs say /ftse/stock/ and just /stock/) sometimes, you'd need to adjust the indices too. In short, much depends on input constraints.

tdelaney Over a year ago

@torek - agreed. As more details of the input are learned, the script could be updated. But split and the dict constructor are fast, so its a good start.

Collectives™ on Stack Overflow

Reverse of Python string formatting: generating a dict from a string with named parameters

3 Answers 3

Comments

5 Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

5 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related