13

I'm creating a class that renames a file using a user-specified format. This format will be a simple string whose str.format method will be called to fill in the blanks.

It turns out that my procedure will require extracting variable names contained in braces. For example, a string may contain {user}, which should yield user. Of course, there will be several sets of braces in a single string, and I'll need to get the contents of each, in the order in which they appear and output them to a list.

Thus, "{foo}{bar}" should yield ['foo', 'bar'].

I suspect that the easiest way to do this is to use re.split, but I know nothing about regular expressions. Can someone help me out?

Thanks in advance!

1
  • In case you know all possible variables beforehand, you can just pass them all to str.format - it will ignore those not in pattern. '{user}_{bar}'.format(user='Mike', foo=1, bar=2) will output Mike_2. I happend to have allowed vars fixed in a dict, so I could skip looking for vars in pattern. Anyway knowing about string.Formatter() is useful. Commented Mar 11, 2013 at 10:10

2 Answers 2

70

Another possibility is to use Python's actual Formatter itself to extract the field names for you:

>>> import string
>>> s = "{foo} spam eggs {bar}"
>>> string.Formatter().parse(s)
<formatteriterator object at 0x101d17b98>
>>> list(string.Formatter().parse(s))
[('', 'foo', '', None), (' spam eggs ', 'bar', '', None)]
>>> field_names = [name for text, name, spec, conv in string.Formatter().parse(s) if name is not None]
>>> field_names
['foo', 'bar']

or (shorter but less informative):

>>> field_names = [v[1] for v in string.Formatter().parse(s) if v[1] is not None]
>>> field_names
['foo', 'bar']

and the reason why we need the is not None

>>> [v[1] for v in string.Formatter().parse("a{b}c")]
['b', None]
>>> [v[1] for v in string.Formatter().parse("a{b}c") if v[1] is not None]
['b']

Sign up to request clarification or add additional context in comments.

3 Comments

oooooh... I like this! I'll probably accept Ashwini Chaudhary's answer because I specifically asked for a regex solution, but I think I'll use yours since I understand it a bit better! Thank you!
Can this be modified to find %(name)s placeholders?
This is the right answer in my opinion. Uses the same mechanism as .format() do.
18

Using re.findall():

In [5]: import re

In [8]: strs = "{foo} spam eggs {bar}"

In [9]: re.findall(r"{(\w+)}", strs)
Out[9]: ['foo', 'bar']

3 Comments

Just a quick question. Are the results from re.findall guaranteed to be listed in the same order as they appear in the string?
@blz yes, as the string is parsed from left to right.
Beware, this does not account for format specifiers such as {spam:3f}. @DSM's answer should be the accepted one. Modifying the \w to include more characters until it matches the full spec of str.format could work, but using the formatter itself is better (and not prone to breakage if the syntax evolves)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.