String substitution using regex in Python with overlapping pattern

Question

I am trying to find && using regex and substitute it with and using Python. Here is my regex:

r"(?=( && ))"

test input: x&& &&& && && x || | ||\|| x, expected output: x&& &&& and and x or | ||\|| x. My Python code:

import re

input = "x&& &&& && && x || | ||\|| x"
result = re.sub(r"(?=( && ))", " and ", input)
print(result)

My output is: x&& &&& and && and && x || | ||\|| x. This actually works, but instead of substitution it leaves the original pattern just adds my substitution string when it finds pattern. This is really confusing.

re.sub(r"(?<!\S)&&(?!\S)", "and", text)? You only have a trailing space that overlaps. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Jun 7, 2020 at 20:43
overlap can be control by lookahead, but yuo can limit the overlap to a single end char i.e [ ]&&(?=[ ]) replace ` and` this way endchar can be reuse (overlap) adjacent ` &&` if needed. of course you can get fancy and im so great stuiff afnd do assertunz back forward and upside down, but that is up to you — user13469682
– user13469682, Commented Jun 7, 2020 at 20:53
hundreds of overlapping regex examples on this site, did look a little, yes ? — user13469682
– user13469682, Commented Jun 9, 2020 at 21:45

Wiktor Stribiżew · Accepted Answer · 2020-06-07 20:48:08Z

2

A capturing group inside a positive lookahead can be used to extract overlapping patterns. To replace them, you actually need to consume the text, lookarounds do not consume the text they match.

In this case, you only have an overlapping trailing space, so you might use either of the two approaches:

text = re.sub(r"( )&&(?= )", r"\1and", text)

Or, if you need to replace any && that is neither preceded nor followed with any whitespace char, use

text = re.sub(r"(?<!\S)&&(?!\S)", r"and", text)

Note that input is a Python builtin, you should name your text variable in a different way, say, text.

answered Jun 7, 2020 at 20:48

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

AmacOS Over a year ago

Thanks a lot for quick reply. Just to clarify: re.sub(r"( )&&(?= )" means && should be followed by whitespace in order to match our pattern, but we are not going to consume this whitespace when substitution is done. How r"\1and" works? I assume ' &&' needs to be substituted by "and" right?

Wiktor Stribiżew Over a year ago

@AmacOS ( ) is a capturing group, \1 backreference refers to this group value. If that is a regular space, you surely can just use text = re.sub(r" &&(?= )", r" and", text). I provided a backreference solution in case you want to use \s instead of a literal space to match any whitespace.

Cary Swoveland · Accepted Answer · 2020-06-07 21:18:19Z

0

Your output is consistent with your statement of the problem (sic):

I am trying to find " && " and substitute it with " and "

Since the output is not what you want there must be a problem with your statement of the problem. I believe you actually want:

I am trying to find " &&" followed by a space and substitute it with " and"

Once you get that right it's just a matter of translating it to a regular expression:

import re

s = 'x&& &&& && && x || | ||\|| x'
print re.sub(r' &&(?= )', ' and', s)
  #=> x&& &&& and and x || | ||\|| x

Demo

(?= ) is a positive lookahead that asserts that the match is followed by a space.

answered Jun 7, 2020 at 21:18

Cary Swoveland

111k6 gold badges69 silver badges105 bronze badges

1 Comment

Wiktor Stribiżew Over a year ago

See stackoverflow.com/questions/62251399/…

Collectives™ on Stack Overflow

String substitution using regex in Python with overlapping pattern

2 Answers 2

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related