3

So for example I have a string "perfect bear hunts" and I want to replace the word before occurence of "bear" with word "the".

So the resulting string would be "the bear hunts"

I thought I would use

re.sub("\w+ bear","the","perfect bear hunts")

but it replaces "bear" too. How do I exclude bear from being replaced while also having it used in matching?

1
  • @Rawing very good, edited it Commented Oct 5, 2017 at 15:00

4 Answers 4

2

Like the other answers, I’d use a positive lookahead assertion.

Then to fix the issue raised by Rawing in a couple of the comments (what about words like “beard”?), I’d add (\b|$). This matches a word boundary or the end of the string, so you only match on the word bear, and nothing longer.

So you get the following:

import re

def bear_replace(string):
    return re.sub(r"\w+ (?=bear(\b|$))", "the ", string)

and test cases (using pytest):

import pytest

@pytest.mark.parametrize('string, expected', [
    ("perfect bear swims", "the bear swims"),

    # We only capture the first word before 'bear
    ("before perfect bear swims", "before the bear swims"),

    # 'beard' isn't captured
    ("a perfect beard", "a perfect beard"),

    # We handle the case where 'bear' is the end of the string
    ("perfect bear", "the bear"),

    # 'bear' is followed by a non-space punctuation character
    ("perfect bear-string", "the bear-string"),
])
def test_bear_replace(string, expected):
    assert bear_replace(string) == expected
Sign up to request clarification or add additional context in comments.

1 Comment

Sorry for being nitpicky, but I want to point out that bear(\s|$) won't match if the word "bear" is followed by any kind of punctuation - "the bear." or "the bear, who" for example. I'd suggest using a word boundary \b instead (though admittedly that isn't a perfect solution either; it would match "bear-sized" for example).
1

An alternative to using lookaheads:

Capture the part you want to keep using a group () and reinsert it using \1 in the replacement.

re.sub("\w+ (bear)",r"the \1","perfect bear swims")

1 Comment

Note that this will also match words like "beard". You should consider adding a word boundary \b.
1

Use a Positive Lookahead to replace everything before bear:

re.sub(".+(?=bear )","the ","perfect bear swims")

.+ will capture any character (except for line terminators).

3 Comments

This will replace literally everything before the characters "bear", not just the word before it. Try this on "my long beard" to see the problem...
Updated with a space. Thanks for the hint ;)
It still turns "a big bear " into "thebear " instead of "a the bear ". The OP said they want to replace the word before "bear", not the entire string. You went and changed OP's \w+ for absolutely no reason whatsoever.
1

Look Behind and Look Ahead regular expressions is what you are looking for.

re.sub(".+(?=bear)", "the ", "prefect bear swims")

2 Comments

This will replace literally everything before the characters "bear". Try this on "my long beard".
This will yield 'thebear swims'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.