Replace specific substring in match python [duplicate]

Question

I was to use regex to replace a substring of a matched string in a df series. I have looked through the documentation (e.g. HERE ) and I have found a solution that is able to capture the specific type of string that I want to match. However, during the replace, it does not replace the substring.

I have cases such as

data
initthe problem
nationthe airline
radicthe groups
professionthe experience
the cat in the hat

In this particular case, I am interested in substituting "the" with "al" in those cases where "the" is not a standalone string (i.e. preceeded and followed by whitespaces).

I have tried the following solution:

patt = re.compile(r'(?:[a-z])(the)')
df['data'].str.replace(patt, r'al')

However, it also replaces the non-whitespace character preceding the "the".

Any suggestions on how what I can do to just repalce those specific cases of a substring?

But inithe will turn into inial, I guess you need initial? Even if you fix it to df['data'].str.replace(r'(?<=[a-z])the', r'al') — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Oct 8, 2018 at 10:34

Tim Biegeleisen · Accepted Answer · 2018-10-08 10:37:16Z

1

Try using a lookbehind, which checks (asserts) for a character before the, but does not actually consume anything:

input = "data\ninitthe problem\nnationthe airline\nradicthe groups\nprofessionthe experience\nthe cat in the hat"

output = re.sub(r'(?<=[a-z])the', 'al', input)
print(output)

data
inital problem
national airline
radical groups
professional experience
the cat in the hat

Demo

edited Oct 8, 2018 at 10:37

answered Oct 8, 2018 at 10:34

Tim Biegeleisen

526k32 gold badges323 silver badges399 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Wiktor Stribiżew Over a year ago

Though it is what OP tries to use, the result will probably not be "final" since inithe will turn into inial.

Tim Biegeleisen Over a year ago

@WiktorStribiżew I interpret this as bad sample data, not a bad regex solution.

Wiktor Stribiżew Over a year ago

Well, another dupe anyway.

owwoow14 Over a year ago

Yes, sorry. There was an error in the simple data I updated it.

Collectives™ on Stack Overflow

Replace specific substring in match python [duplicate]

1 Answer 1

Demo

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Linked

Related