1

I want to remove occurrence V, I or VI only when it is inside a bracket such as below:

Input:

VINE(PCI); BLUE(PI)
BLACK(CVI)
CINE(PCVI)

Output desired:

VINE(PC); BLUE(P)
BLACK(C)
CINE(PC)

When I use df['col'].str.replace('[PC]+([VI]+)', "") it replaces everything inside the brackets. and when I use just df['col'].str.replace('[VI]+', "") it ofcourse doesn't work as it then removes all other occurrences of V and I. Inside the bracket there will only be these 4 letters in any combination of either (or both) PC and either (or both) VI. What am I doing wrong here pls?

Thanks

1
  • I have used pythex.org and it shows that the match capture should be as I want, but the match is the whole bit inside the string Commented Nov 7, 2018 at 3:23

2 Answers 2

1

Use str.replace with a capture group and callback:

import re
df['col'] = df['col'].str.replace(
    r'\((.*?)\)', lambda x: re.sub('[VI]', '', f'({x.group(1)})'))

Or,

df['col'] = df['col'].str.replace(r'\((P|PC|C)[VI]+\)',r'(\1)') # Credit, OP
print(df)
                 col
0  VINE(PC); BLUE(P)
1           BLACK(C)
2           CINE(PC)
Sign up to request clarification or add additional context in comments.

4 Comments

actually V will be treated the same way as I.. so will adjust for that too.. but I see what you are doing.. just going tom implement and come back
great! used this df['col'].str.replace(r'\((P|PC|C)[VI]+\)',r'(\1)', regex=True) and it works. Will accept the answer and read and learn the other ways. I actually have a case such as CASH(PVI;CVI) which is odd, as it should just have been (PCVI) .. so will make adjustments for that
thank you! What does f'() do? as in 'f' and not 'r' or anything else
@spiff f-strings are python3.6 syntax meant to perform string formatting. Also, wanted to mention the first solution can be easily extended to remove other characters besides V and I later if needed.
0

Another solution using only pandas :

import pandas as pd
S = pd.Series(["VINE(PCI)", "BLUE(PI)", "BLACK(CVI)", 'CINE(PCVI)'])
S.str.split('[\(\)]').apply(lambda x :  x[0] + "(" + x[1].replace("I", "").replace("V", "") + ")" + x[2])
0    VINE(PC)
1     BLUE(P)
2    BLACK(C)
3    CINE(PC)
dtype: object

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.