0

I want to develop a regex in Python where a component of the pattern is defined in a separate variable and combined to a single string on-the-fly using Python's .format() string method. A simplified example will help to clarify. I have a series of strings where the space between words may be represented by a space, an underscore, a hyphen etc. As an example:

new referral
new-referal
new - referal
new_referral

I can define a regex string to match these possibilities as:

space_sep = '[\s\-_]+'

(The hyphen is escaped to ensure it is not interpreted as defining a character range.)

I can now build a bigger regex to match the strings above using:

myRegexStr = "new{spc}referral".format(spc = space_sep)

The advantage of this method for me is that I need to define lots of reasonably complex regexes where there may be several different commonly-occurring stings that occur multiple times and in an unpredictable order; defining commonly-used patterns beforehand makes the regexes easier to read and allows the strings to be edited very easily.

However, a problem occurs if I want to define the number of occurrences of other characters using the {m,n} or {n} structure. For example, to allow for a common typo in the spelling of 'referral', I need to allow either 1 or 2 occurrences of the letter 'r'. I can edit myRegexStr to the following:

myRegexStr = "new{spc}refer{1,2}al".format(spc = space_sep)

However, now all sorts of things break due to confusion over the use of curly braces (either a KeyError in the case of {1,2} or an IndexError: tuple index out of range in the case of {n}).

Is there a way to use the .format() string method to build longer regexes whilst still being able to define number of occurrences of characters using {n,m}?

1 Answer 1

2

You can double the { and } to escape them or you can use the old-style string formatting (% operator):

my_regex = "new{spc}refer{{1,2}}al".format(spc="hello")
my_regex_old_style = "new%(spc)srefer{1,2}al" % {"spc": "hello"}

print(my_regex)           # newhellorefer{1,2}al
print(my_regex_old_style) # newhellorefer{1,2}al
Sign up to request clarification or add additional context in comments.

1 Comment

Perfect solution. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.