Python regex - Find and replace the second item of a pair

Question

I've been trying to do the following : Given a char like "i", find and replace the second of every pair of "i" (without overlapping).

"I am so irritated with regex. Seriously" -> "I am so rritated wth regex. Seriously".

I almost found a solution using positive lookbehind, but it's overlapping :(

Can anyone help me?

My best was this (I think) -> "(?<=i).*?(i)"

EDIT : My description is wrong. I am supposed to replace the SECOND item of a pair, so the result should've been: "I am so irrtated with regex. Serously"

Still don't understand capture groups (can we replace stuff in capture groups...?) and the concept of "consuming chars". — Eduardo Almeida
– Eduardo Almeida, Commented Apr 29, 2016 at 15:23

Wiktor Stribiżew · Accepted Answer · 2016-04-29 15:54:48Z

2

Your regex matches overlapped substrings because of the lookbehind (?<=i). You need to use a consuming pattern for non-overlapping matches:

i([^i]*i)

Replace with \1 backreference to the text captured with ([^i]*i). See the regex demo.

The pattern matches:

i - a literal i, after matching it, the regex index advances to the right (the regex engine processes the string from left to right by default, in re, there is no other option), 1 char
([^i]*i) - this is Group 1 matching 0+ characters other than i up to the first i. The whole captured value is inside .group(1). After matching it, the regex index is after the second i matched and consumed with the whole pattern. Thus, no overlapping matches occur when the regex engine goes on to look for the remaining matches in the string.

Python demo:

import re
pat = "i"
p = re.compile('{0}([^{0}]*{0})'.format(pat))
test_str = "I am so irritated with regex. Seriously"
result = re.sub(p, r"\1", test_str)
print(result)

edited Apr 29, 2016 at 15:54

answered Apr 29, 2016 at 15:23

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Wiktor Stribiżew Over a year ago

@Robᵩ: Yes, but lazy matching is less efficient than character classes, and it requires a DOTALL modifier to match across lines.

Eduardo Almeida Over a year ago

What you meant after the literal "i" is that the regex index advances to the right, correct?

Eduardo Almeida Over a year ago

Sorry Wiktor, I stated the wrong issue. I've edited the question. The regex should be able to replace a second item, not a first.

Eduardo Almeida Over a year ago

And I just got it. It's (i[^i]*)i

Wiktor Stribiżew Over a year ago

No problem, readjust the grouping. (i[^i]*)i. I am ona mobile now, hence slow response.

|

Collectives™ on Stack Overflow

Python regex - Find and replace the second item of a pair

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related