I want to develop a regex in Python where a component of the pattern is defined in a separate variable and combined to a single string on-the-fly using Python's .format() string method. A simplified example will help to clarify. I have a series of strings where the space between words may be represented by a space, an underscore, a hyphen etc. As an example:
new referral
new-referal
new - referal
new_referral
I can define a regex string to match these possibilities as:
space_sep = '[\s\-_]+'
(The hyphen is escaped to ensure it is not interpreted as defining a character range.)
I can now build a bigger regex to match the strings above using:
myRegexStr = "new{spc}referral".format(spc = space_sep)
The advantage of this method for me is that I need to define lots of reasonably complex regexes where there may be several different commonly-occurring stings that occur multiple times and in an unpredictable order; defining commonly-used patterns beforehand makes the regexes easier to read and allows the strings to be edited very easily.
However, a problem occurs if I want to define the number of occurrences of other characters using the {m,n} or {n} structure. For example, to allow for a common typo in the spelling of 'referral', I need to allow either 1 or 2 occurrences of the letter 'r'. I can edit myRegexStr to the following:
myRegexStr = "new{spc}refer{1,2}al".format(spc = space_sep)
However, now all sorts of things break due to confusion over the use of curly braces (either a KeyError in the case of {1,2} or an IndexError: tuple index out of range in the case of {n}).
Is there a way to use the .format() string method to build longer regexes whilst still being able to define number of occurrences of characters using {n,m}?