1

Say I have this dataframe:

df = pd.DataFrame({'Col': ['DDJFHGBC', 'AWDGUYABC']})

And I want to replace everything ending with ABC with ABC and everything ending with BC (except the ABC-cases) with BC. The output would look like:

    Col
0   BC
1   ABC

How can I achieve this using regular expressions? I've tried things like:

df.Col.str.replace(r'\w*BC\b', 'BC')
df.Col.str.replace(r'\w*ABC\b', 'ABC')

But obviously these two lines are conflicting and I would end up with just BC in whichever order I use them.

5
  • What is the goal here? Commented May 7, 2020 at 8:58
  • I fail to understand the purpose. Maybe add more examples so we can see the logic behind what you want. Commented May 7, 2020 at 8:59
  • To replace everything ending with ABC with ABC and everything ending with BC (except the ABC-cases) with BC. Commented May 7, 2020 at 8:59
  • 1
    Perhaps match A?BC$ or match \w*?(A?BC)\b and replace with group 1 regex101.com/r/fMcfHI/1 Commented May 7, 2020 at 9:00
  • I realize it should be sufficient to replace everything before BC or ABC with "". How can I do that? Commented May 7, 2020 at 9:08

3 Answers 3

4

You could match as least word chars using \w*? and then capture in group 1 matching an optional A followed by BC (A?BC) followed by a word boundary.

\w*?(A?BC)\b

Regex demo

In there replacement use group 1

df.Col.str.replace(r'\w*?(A?BC)\b', r'\1')
Sign up to request clarification or add additional context in comments.

Comments

2

You may a replace solution like:

df['Col'].str.replace(r'(?s)^.*?(A?BC)$', r'\1')
# 0     BC
# 1    ABC

Here, (?s).*?(A?BC)$ matches

  • (?s) - a . will match any char including line break chars
  • ^ - start of string
  • .*? - any 0+ chars, as few as possible
  • (A?BC) - Group 1 (referred to with \1 from the replacement pattern): an optional A and then BC
  • $ - end of string.

8 Comments

This is not the best solution as it would alter strings that end with neither BC nor ABC.
@CHRD Which one? I have just finished writing the answer. BTW, both "work" for your data, in your question, there is no indication what to do with no-matches.
I meant the first one. Thank you!
@CHRD What if you have 1ABC 2ABC? What will be the expected result?
They would still end up as ABC.
|
1

How about this?

df.Col.str.replace(r'\w*ABC\b', 'ABC_').str.replace(r'\w*BC\b', 'BC').str.replace(r'\w*ABC_\b', 'ABC')

It first replaces \w*ABC\b with ABC_. ABC_ won't be affected by replace(r'\w*BC\b', 'BC').

Then it replaces ABC_ with ABC to convert the string back to the original one.

1 Comment

This works. But how about replacing everything before BC or ABC with ""? That would only require one .replace().

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.