1

I have a string like this, where symbol and property vary:

a = '/stock/%(symbol)s/%(property)s'

I have another string like this, where AAPL and price vary:

b = '/stock/AAPL/price'

I'm trying to generate a dict like this:

c = {
    'symbol': 'AAPL',
    'property': 'price'
}

With string formatting, I could do a this:

> a % c == b
True

But I'm trying to go the other direction. Time for some regex magic?

8
  • Are you sure you don't want your dictionary to be D = {'APPL' : price} so you can look up price by symbol? Otherwise you will need a new dictionary for each stock. Commented Aug 21, 2013 at 16:26
  • 1
    I'm assuming (unlike other answers so far) that your first-string doesn't necessarily say symbol and/or property, e.g., it might read /zog/%(evil)s=%(level)s,%(flavor)s. Is that the case? Commented Aug 21, 2013 at 16:33
  • Do you have control of the format of a? If you use a more modern interpolation style, certain things become easier. Commented Aug 21, 2013 at 17:50
  • @DSM I might be able to control it. What format would be easier? Commented Aug 21, 2013 at 17:51
  • By control, I mean the %(symbol)s part. The slashes aren't changeable. Commented Aug 21, 2013 at 17:52

3 Answers 3

4

A solution with regular expressions:

>>> import re
>>> b = '/stock/AAPL/price'
>>> result = re.match('/.*?/(?P<symbol>.*?)/(?P<property>.*)', b)
>>> result.groupdict()
{'symbol': 'AAPL', 'property': 'price'}

You can adjust a bit more the regular expression but, in essence, this is the idea.

Sign up to request clarification or add additional context in comments.

Comments

2

This is similar to @moliware's solution, but there's no hard-coding of keys required in this solution:

import re

class mydict(dict):
    def __missing__(self, key):
        self.setdefault(key, '')
        return ''

def solve(a, b):
    dic = mydict()
    a % dic
    strs = a
    for x in dic:
        esc = re.escape(x)
        strs = re.sub(r'(%\({}\).)'.format(esc), '(?P<{}>.*)'.format(esc), strs)
    return re.search(strs, b).groupdict()

if __name__ == '__main__':
    a = '/stock/%(symbol)s/%(property)s'
    b = '/stock/AAPL/price'
    print solve(a, b)
    a = "Foo %(bar)s spam %(eggs)s %(python)s"
    b = 'Foo BAR spam 10 3.x'
    print solve(a, b)

Output:

{'symbol': 'AAPL', 'property': 'price'}
{'python': '3.x', 'eggs': '10', 'bar': 'BAR'}

As @torek pointed out for cases with ambiguous output(no space between keys) the answer can be wrong here.

For eg.

a = 'leading/%(A)s%(B)s/trailing'
b = 'leading/helloworld/trailing'

Here looking at just b it's hard to tell the actual value of either either A or B.

5 Comments

Note: you'll need dic=mydict() and I get '/stock/None/None' as the value in the call used to populate dic. I'd just do dic = collections.defaultdict(str) though. (Oops, you fixed the missing dic= part while I was typing the comment.)
BTW this works really well (it's the way to go here) but there are indistinguishable variations for which this just picks "any solution that works", e.g., a = 'leading/%(A)s%(B)s/trailing' and b = 'leading/helloworld/trailing'. This chooses A='helloworld' and B=''. (And if string b is cannot be generated by format a regardless of dictionary values, the re.search() returns None.)
@torek Good test case, I think this one can be considered ambiguous too because B can be either '' or 'helloworld' or a can be '' or 'helloworld'.(So a space or some other character is required between two keys to get correct answer). Another issues is returning a str for missing keys would raise error for %d or other directives, I am not sure how to fix that.
@torek I think I could use a couple of try-except blocks to catch those type mismatch errors and pass some other default value.
The last example won't be a problem, there will always be some sort of delimiter between keys.
2

Assuming well-behaved input, you could just split the strings and zip them to a dict

keys = ('symbol', 'property')
b = '/stock/AAPL/price'
dict(zip(keys, b.split('/')[2:4]))

4 Comments

I came up with the letter-for-letter same solution. str.split is almost always going to be many times more time-efficient than the re-based equivalent.
@KirkStrauser - yeah, there's a hundred ways to parse strings, but I like the simple solutions.
As long as it's always slashes, and slashes don't appear in the output from some key(s). If the output might begin with, e.g., /nyse/stock/ (vs say /ftse/stock/ and just /stock/) sometimes, you'd need to adjust the indices too. In short, much depends on input constraints.
@torek - agreed. As more details of the input are learned, the script could be updated. But split and the dict constructor are fast, so its a good start.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.