0

I use Python's re module to replace a substring, such as:

>>> import re
>>> re.sub(r"a.*b","ab","acbacbacb")
'ab'

This matches .* with cbacbac, but I want it to match c three times, so that the output is ababab.

Could anybody tell me how to do it?

1

4 Answers 4

2

The easiest solution would be to use the lazy (non-greedy) *? operator:

>>> re.sub(r"a.*?b","ab","acbacbacb")
'ababab'

This might however have an impact on performance. Because of the structure of this regex, you can just as well use the equivalent

re.sub(r"a[^b]*b","ab","acbacbacb")

which could perform better, depending on how good the optimizer is.

If you have even more a priori knowledge about the structure of the .* part, you should make it even more explicit. Say, for example, that you already know that between the a and the b there will be only c's, you can do

re.sub(r"ac*b","ab","acbacbacb")
Sign up to request clarification or add additional context in comments.

Comments

1

Regex by default is greedy. Use .*?

>>> import re
>>> re.sub(r"a.*?b","ab","acbacbacb")
'ababab'
>>> 

http://docs.python.org/library/re.html

The *, +, and ? qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE <.*> is matched against '<H1>title</H1>', it will match the entire string, and not just '<H1>'. Adding ? after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using .*? in the previous expression will match only '<H1>'.

Comments

0

use non-greedy match:

re.sub(r"a.*?b","ab","acbacbacb")
'ababab'

from http://docs.python.org/library/re.html:

The '*', '+', and '?' qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn’t desired. [...] Adding '?' after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched.

Comments

0
re.sub(r"a.b","ab","acbacbacb")
'ababab'

2 Comments

This will not match accb, for example. How can you know that this is intended?
Yes.. it doesn't work for accb. The above one works only for single character replacement.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.