Python Regex: How to find a substring

Question

I have a list of titles that I need to normalize. For example, if a title contains 'CTO', it needs to be changed to 'Chief Technology Officer'. However, I only want to replace 'CTO' if there is no letter directly to the left or right of 'CTO'. For example, 'Director' contains 'cto'. I obviously wouldn't want this to be replaced. However, I do want it to be replaced in situations where the title is 'Founder/CTO' or 'CTO/Founder'.

Is there a way to check if a letter is before 'CXO' using regex? Or what would be the best way to accomplish this task?

EDIT: My code is as follows...

test = 'Co-Founder/CTO'
test = re.sub("[^a-zA-Z0-9]CTO", 'Chief Technology Officer', test)

The result is 'Co-FounderChief Technology Officer'. The '/' gets replaced for some reason. However, this doesn't happen if test = 'CTO/Co-Founder'.

Does this answer your question? Python regex lookbehind and lookahead — Cireo
– Cireo, Commented Jun 14, 2021 at 19:11

Wes Hardaker · Accepted Answer · 2021-06-14 19:09:38Z

2

What you want is a regex that excludes a list of stuff before a point:

"[^a-zA-Z0-9]CTO"

But you actually also need to check for when CTO occurs at the beginning of the line:

"^CTO"

To use the first expression within re.sub, you can add a grouping operator (()s) and then use it in the replacement to pull out the matching character (eg, space or /):

re.sub("([^a-zA-Z0-9])CTO","\\1Chief Technology Officer", "foo/CTO")

Will result in

'foo/Chief Technology Officer'

edited Jun 14, 2021 at 19:09

answered Jun 14, 2021 at 18:34

Wes Hardaker

22.3k2 gold badges42 silver badges70 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

codr Over a year ago

How would I implement this using re.sub()? In the case of 'Founder/CTO', the '/' gets replaced so the end result is 'FounderChief Technology Officer. Or is there a better way other than re.sub()?

codr Over a year ago

Thanks, much appreciated. Just to clarify, the '\\1' in the replacement references the "([^a-zA-Z0-9])" grouping?

Wes Hardaker Over a year ago

that's correct. You can group things in ()s and then extract whatever it matched later.

Cireo · Accepted Answer · 2021-06-14 19:22:02Z

1

Answer: "(?<=[^a-zA-Z0-9])CTO|^CTO"

Lookbehinds are perfect for this

cto_re = re.compile("(?<=[^a-zA-Z0-9])CTO")

but unfortunately won't work for the start of lines (due only to the python implementation requiring fixed length).

for eg in "Co-Founder/CTO", "CTO/Bossy", "aCTOrMan":
    print(cto_re.sub("Chief Technology Officer", eg))

Co-Founder/Chief Technology Officer
CTO/Bossy
aCTOrMan

You would have to check for that explicitly via |:

cto_re = re.compile("(?<=[^a-zA-Z0-9])CTO|^CTO")

for eg in "Co-Founder/CTO", "CTO/Bossy", "aCTOrMan":
    print(cto_re.sub("Chief Technology Officer", eg))

Co-Founder/Chief Technology Officer
Chief Technology Officer/Bossy
aCTOrMan

answered Jun 14, 2021 at 19:22

Cireo

4,4572 gold badges22 silver badges27 bronze badges

Collectives™ on Stack Overflow

Python Regex: How to find a substring

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related