0

What I am trying to achieve is to substitute a string using python regex with a variable (contents of the variable). Since I need to retain some of the matched expression, I use the \1 and \3 group match args.

My regex/sub looks like this:

pattern = "\1" + id + "\3" \b
out = re.sub(r'(;11=)(\w+)(;)',r'%s' % pattern, line)

What appears to be happening is \1 and \3 do not get added to the output.

I have also tried this with the substitution expression:

r'\1%s\3'%orderid

But I got similar results. Any suggestion on what might fix this?

2
  • 1
    Why do you even need to store those matches in \1 and \3? They are always the same values, just put them as strings. re.sub(r'(;11=)(\w+)(;)', ";11=" + id + ";", line), or remove the captures completely: re.sub(r';11=\w+;', ";11=" + id + ";", line) (and you don't seem to be using the \w+ anyway). Commented Oct 8, 2013 at 20:58
  • I tried the others , but this is what worked for me eventually. Thanks Jerry. I can't believe I didn't see that. Commented Oct 8, 2013 at 21:07

2 Answers 2

1

You need to use raw strings or double the backslashes:

pattern = r"\1" + id + r"\3"

or

pattern = "\\1" + id + r"\\3"

In a regular Python string literal, \number is interpreted as an octal character code instead:

>>> '\1'
'\x01'

while the backslash has no special meaning in a raw string literal:

>>> r'\1'
'\\1'

Raw string literals are just a notation, not a type. Both r'' and '' produce strings, and only differ in how they interpret backslashes in source code.

Note that since group 1 and group3 match literal text, you don't need to use substitutions at all; simply use:

out = re.sub(r';11=\w+;', ';11=%s;' % id, line)

or use look-behind and lookahead and forgo having to repeat the literals:

out = re.sub(r'(?<=;11=)\w+(?=;)', id, line)

Demo:

>>> import re
>>> line = 'foobar;11=spam;hameggs'
>>> id = 'monty'
>>> re.sub(r';11=\w+;', ';11=%s;' % id, line)
'foobar;11=monty;hameggs'
>>> re.sub(r'(?<=;11=)\w+(?=;)', id, line)
'foobar;11=monty;hameggs'
Sign up to request clarification or add additional context in comments.

Comments

0

This isn't going to work:

pattern = "\1" + id + "\3"
# ...
r'%s' % pattern

The r prefix only affects how the literal is interpreted. So, r'%s' mean that the % and s will be interpreted raw—but that's the same way they'd be interpreted without the r. Meanwhile, the pattern has non-raw literals "\1" and "\3", so it's already a control-A and a control-C before you even get to the %.

What you want is:

pattern = r"\1" + id + r"\3"
# ...
'%s' % pattern

However, you really don't need the % formatting at all; just use pattern itself and you'll get the exact same thing.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.