1

I have a list of titles that I need to normalize. For example, if a title contains 'CTO', it needs to be changed to 'Chief Technology Officer'. However, I only want to replace 'CTO' if there is no letter directly to the left or right of 'CTO'. For example, 'Director' contains 'cto'. I obviously wouldn't want this to be replaced. However, I do want it to be replaced in situations where the title is 'Founder/CTO' or 'CTO/Founder'.

Is there a way to check if a letter is before 'CXO' using regex? Or what would be the best way to accomplish this task?

EDIT: My code is as follows...

test = 'Co-Founder/CTO'
test = re.sub("[^a-zA-Z0-9]CTO", 'Chief Technology Officer', test)

The result is 'Co-FounderChief Technology Officer'. The '/' gets replaced for some reason. However, this doesn't happen if test = 'CTO/Co-Founder'.

1

2 Answers 2

2

What you want is a regex that excludes a list of stuff before a point:

"[^a-zA-Z0-9]CTO"

But you actually also need to check for when CTO occurs at the beginning of the line:

"^CTO"

To use the first expression within re.sub, you can add a grouping operator (()s) and then use it in the replacement to pull out the matching character (eg, space or /):

re.sub("([^a-zA-Z0-9])CTO","\\1Chief Technology Officer", "foo/CTO")

Will result in

'foo/Chief Technology Officer'
Sign up to request clarification or add additional context in comments.

3 Comments

How would I implement this using re.sub()? In the case of 'Founder/CTO', the '/' gets replaced so the end result is 'FounderChief Technology Officer. Or is there a better way other than re.sub()?
Thanks, much appreciated. Just to clarify, the '\\1' in the replacement references the "([^a-zA-Z0-9])" grouping?
that's correct. You can group things in ()s and then extract whatever it matched later.
1

Answer: "(?<=[^a-zA-Z0-9])CTO|^CTO"

Lookbehinds are perfect for this

cto_re = re.compile("(?<=[^a-zA-Z0-9])CTO")

but unfortunately won't work for the start of lines (due only to the python implementation requiring fixed length).

for eg in "Co-Founder/CTO", "CTO/Bossy", "aCTOrMan":
    print(cto_re.sub("Chief Technology Officer", eg))

Co-Founder/Chief Technology Officer
CTO/Bossy
aCTOrMan

You would have to check for that explicitly via |:

cto_re = re.compile("(?<=[^a-zA-Z0-9])CTO|^CTO")
for eg in "Co-Founder/CTO", "CTO/Bossy", "aCTOrMan":
    print(cto_re.sub("Chief Technology Officer", eg))

Co-Founder/Chief Technology Officer
Chief Technology Officer/Bossy
aCTOrMan

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.