How can I find all placeholders for str.format in a python string using a regex? [duplicate]

Question

I'm creating a class that renames a file using a user-specified format. This format will be a simple string whose str.format method will be called to fill in the blanks.

It turns out that my procedure will require extracting variable names contained in braces. For example, a string may contain {user}, which should yield user. Of course, there will be several sets of braces in a single string, and I'll need to get the contents of each, in the order in which they appear and output them to a list.

Thus, "{foo}{bar}" should yield ['foo', 'bar'].

I suspect that the easiest way to do this is to use re.split, but I know nothing about regular expressions. Can someone help me out?

Thanks in advance!

In case you know all possible variables beforehand, you can just pass them all to str.format - it will ignore those not in pattern. '{user}_{bar}'.format(user='Mike', foo=1, bar=2) will output Mike_2. I happend to have allowed vars fixed in a dict, so I could skip looking for vars in pattern. Anyway knowing about string.Formatter() is useful. — yentsun
– yentsun, Commented Mar 11, 2013 at 10:10

Markus Dutschke · Accepted Answer · 2024-01-30 18:08:35Z

70

Another possibility is to use Python's actual Formatter itself to extract the field names for you:

>>> import string
>>> s = "{foo} spam eggs {bar}"
>>> string.Formatter().parse(s)
<formatteriterator object at 0x101d17b98>
>>> list(string.Formatter().parse(s))
[('', 'foo', '', None), (' spam eggs ', 'bar', '', None)]
>>> field_names = [name for text, name, spec, conv in string.Formatter().parse(s) if name is not None]
>>> field_names
['foo', 'bar']

or (shorter but less informative):

>>> field_names = [v[1] for v in string.Formatter().parse(s) if v[1] is not None]
>>> field_names
['foo', 'bar']

and the reason why we need the is not None

>>> [v[1] for v in string.Formatter().parse("a{b}c")]
['b', None]
>>> [v[1] for v in string.Formatter().parse("a{b}c") if v[1] is not None]
['b']

edited Jan 30, 2024 at 18:08

Markus Dutschke

10.8k5 gold badges73 silver badges67 bronze badges

answered Dec 27, 2012 at 21:58

DSM

355k67 gold badges606 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Louis Thibault Over a year ago

oooooh... I like this! I'll probably accept Ashwini Chaudhary's answer because I specifically asked for a regex solution, but I think I'll use yours since I understand it a bit better! Thank you!

TheDarkLord Over a year ago

Can this be modified to find %(name)s placeholders?

Gustavo Gonçalves Over a year ago

This is the right answer in my opinion. Uses the same mechanism as .format() do.

Ashwini Chaudhary · Accepted Answer · 2017-12-06 16:20:56Z

18

Using re.findall():

In [5]: import re

In [8]: strs = "{foo} spam eggs {bar}"

In [9]: re.findall(r"{(\w+)}", strs)
Out[9]: ['foo', 'bar']

edited Dec 6, 2017 at 16:20

answered Dec 27, 2012 at 21:48

Ashwini Chaudhary

252k60 gold badges478 silver badges519 bronze badges

3 Comments

Louis Thibault Over a year ago

Just a quick question. Are the results from re.findall guaranteed to be listed in the same order as they appear in the string?

Ashwini Chaudhary Over a year ago

@blz yes, as the string is parsed from left to right.

Gwenn Over a year ago

Beware, this does not account for format specifiers such as {spam:3f}. @DSM's answer should be the accepted one. Modifying the \w to include more characters until it matches the full spec of str.format could work, but using the formatter itself is better (and not prone to breakage if the syntax evolves)

Collectives™ on Stack Overflow

How can I find all placeholders for str.format in a python string using a regex? [duplicate]

2 Answers 2

3 Comments

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

3 Comments

Linked

Related