Python - re.sub without replacing a part of regex

Question

So for example I have a string "perfect bear hunts" and I want to replace the word before occurence of "bear" with word "the".

So the resulting string would be "the bear hunts"

I thought I would use

re.sub("\w+ bear","the","perfect bear hunts")

but it replaces "bear" too. How do I exclude bear from being replaced while also having it used in matching?

@Rawing very good, edited it

Gillian
– Gillian

2017-10-05 15:00:04 +00:00
Commented Oct 5, 2017 at 15:00 — Gillian
– Gillian, Commented Oct 5, 2017 at 15:00

alexwlchan · Accepted Answer · 2017-10-06 16:17:40Z

2

Like the other answers, I’d use a positive lookahead assertion.

Then to fix the issue raised by Rawing in a couple of the comments (what about words like “beard”?), I’d add (\b|$). This matches a word boundary or the end of the string, so you only match on the word bear, and nothing longer.

So you get the following:

import re

def bear_replace(string):
    return re.sub(r"\w+ (?=bear(\b|$))", "the ", string)

and test cases (using pytest):

import pytest

@pytest.mark.parametrize('string, expected', [
    ("perfect bear swims", "the bear swims"),

    # We only capture the first word before 'bear
    ("before perfect bear swims", "before the bear swims"),

    # 'beard' isn't captured
    ("a perfect beard", "a perfect beard"),

    # We handle the case where 'bear' is the end of the string
    ("perfect bear", "the bear"),

    # 'bear' is followed by a non-space punctuation character
    ("perfect bear-string", "the bear-string"),
])
def test_bear_replace(string, expected):
    assert bear_replace(string) == expected

edited Oct 6, 2017 at 16:17

answered Oct 5, 2017 at 15:33

alexwlchan

6,1448 gold badges41 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Aran-Fey Over a year ago

Sorry for being nitpicky, but I want to point out that bear(\s|$) won't match if the word "bear" is followed by any kind of punctuation - "the bear." or "the bear, who" for example. I'd suggest using a word boundary \b instead (though admittedly that isn't a perfect solution either; it would match "bear-sized" for example).

Felk · Accepted Answer · 2017-10-05 14:57:55Z

1

An alternative to using lookaheads:

Capture the part you want to keep using a group () and reinsert it using \1 in the replacement.

re.sub("\w+ (bear)",r"the \1","perfect bear swims")

answered Oct 5, 2017 at 14:57

Felk

8,3032 gold badges41 silver badges79 bronze badges

1 Comment

Aran-Fey Over a year ago

Note that this will also match words like "beard". You should consider adding a word boundary \b.

Igl3 · Accepted Answer · 2017-10-05 15:00:31Z

1

Use a Positive Lookahead to replace everything before bear:

re.sub(".+(?=bear )","the ","perfect bear swims")

.+ will capture any character (except for line terminators).

edited Oct 5, 2017 at 15:00

answered Oct 5, 2017 at 14:56

Igl3

5,1085 gold badges39 silver badges69 bronze badges

3 Comments

Aran-Fey Over a year ago

This will replace literally everything before the characters "bear", not just the word before it. Try this on "my long beard" to see the problem...

Igl3 Over a year ago

Updated with a space. Thanks for the hint ;)

Aran-Fey Over a year ago

It still turns "a big bear " into "thebear " instead of "a the bear ". The OP said they want to replace the word before "bear", not the entire string. You went and changed OP's \w+ for absolutely no reason whatsoever.

hspandher · Accepted Answer · 2017-10-05 15:15:38Z

1

Look Behind and Look Ahead regular expressions is what you are looking for.

re.sub(".+(?=bear)", "the ", "prefect bear swims")

edited Oct 5, 2017 at 15:15

answered Oct 5, 2017 at 14:55

hspandher

16.8k2 gold badges35 silver badges49 bronze badges

2 Comments

Aran-Fey Over a year ago

This will replace literally everything before the characters "bear". Try this on "my long beard".

Igl3 Over a year ago

This will yield 'thebear swims'

Collectives™ on Stack Overflow

Python - re.sub without replacing a part of regex

4 Answers 4

1 Comment

1 Comment

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

1 Comment

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related