Edit regex strings in Python using format method

Question

I want to develop a regex in Python where a component of the pattern is defined in a separate variable and combined to a single string on-the-fly using Python's .format() string method. A simplified example will help to clarify. I have a series of strings where the space between words may be represented by a space, an underscore, a hyphen etc. As an example:

new referral
new-referal
new - referal
new_referral

I can define a regex string to match these possibilities as:

space_sep = '[\s\-_]+'

(The hyphen is escaped to ensure it is not interpreted as defining a character range.)

I can now build a bigger regex to match the strings above using:

myRegexStr = "new{spc}referral".format(spc = space_sep)

The advantage of this method for me is that I need to define lots of reasonably complex regexes where there may be several different commonly-occurring stings that occur multiple times and in an unpredictable order; defining commonly-used patterns beforehand makes the regexes easier to read and allows the strings to be edited very easily.

However, a problem occurs if I want to define the number of occurrences of other characters using the {m,n} or {n} structure. For example, to allow for a common typo in the spelling of 'referral', I need to allow either 1 or 2 occurrences of the letter 'r'. I can edit myRegexStr to the following:

myRegexStr = "new{spc}refer{1,2}al".format(spc = space_sep)

However, now all sorts of things break due to confusion over the use of curly braces (either a KeyError in the case of {1,2} or an IndexError: tuple index out of range in the case of {n}).

Is there a way to use the .format() string method to build longer regexes whilst still being able to define number of occurrences of characters using {n,m}?

hurlenko · Accepted Answer · 2020-04-13 14:07:26Z

2

You can double the { and } to escape them or you can use the old-style string formatting (% operator):

my_regex = "new{spc}refer{{1,2}}al".format(spc="hello")
my_regex_old_style = "new%(spc)srefer{1,2}al" % {"spc": "hello"}

print(my_regex)           # newhellorefer{1,2}al
print(my_regex_old_style) # newhellorefer{1,2}al

answered Apr 13, 2020 at 14:07

hurlenko

1,4452 gold badges13 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user1718097 Over a year ago

Perfect solution. Thanks.

Collectives™ on Stack Overflow

Edit regex strings in Python using format method

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related