0

I've been trying to do the following : Given a char like "i", find and replace the second of every pair of "i" (without overlapping).

"I am so irritated with regex. Seriously" -> "I am so rritated wth regex. Seriously". 

I almost found a solution using positive lookbehind, but it's overlapping :(

Can anyone help me?

My best was this (I think) -> "(?<=i).*?(i)"

EDIT : My description is wrong. I am supposed to replace the SECOND item of a pair, so the result should've been: "I am so irrtated with regex. Serously"

4
  • So, what was your effort? Commented Apr 29, 2016 at 15:18
  • My best was this (I think) -> "(?<=i).*?(i)" Commented Apr 29, 2016 at 15:23
  • Still don't understand capture groups (can we replace stuff in capture groups...?) and the concept of "consuming chars". Commented Apr 29, 2016 at 15:23
  • What is your expected output? Commented Apr 29, 2016 at 15:47

1 Answer 1

2

Your regex matches overlapped substrings because of the lookbehind (?<=i). You need to use a consuming pattern for non-overlapping matches:

i([^i]*i)

Replace with \1 backreference to the text captured with ([^i]*i). See the regex demo.

The pattern matches:

  • i - a literal i, after matching it, the regex index advances to the right (the regex engine processes the string from left to right by default, in re, there is no other option), 1 char
  • ([^i]*i) - this is Group 1 matching 0+ characters other than i up to the first i. The whole captured value is inside .group(1). After matching it, the regex index is after the second i matched and consumed with the whole pattern. Thus, no overlapping matches occur when the regex engine goes on to look for the remaining matches in the string.

Python demo:

import re
pat = "i"
p = re.compile('{0}([^{0}]*{0})'.format(pat))
test_str = "I am so irritated with regex. Seriously"
result = re.sub(p, r"\1", test_str)
print(result)
Sign up to request clarification or add additional context in comments.

9 Comments

@Robᵩ: Yes, but lazy matching is less efficient than character classes, and it requires a DOTALL modifier to match across lines.
What you meant after the literal "i" is that the regex index advances to the right, correct?
Sorry Wiktor, I stated the wrong issue. I've edited the question. The regex should be able to replace a second item, not a first.
And I just got it. It's (i[^i]*)i
No problem, readjust the grouping. (i[^i]*)i. I am ona mobile now, hence slow response.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.