2

I have a dataframe of which this is a part.

   CodeID    Codes
0  'code1'   '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]'   ...
1  'code2'   '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]'   ...
2  'code3'   '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]'   ...
...

What I'm trying to do is extract the part of the string in column Codes that matches the pattern r"\[<code in CodeID column>[^][]*\]"

Something like:

df['Code'] = df['Codes'].str.find(r"\[<code in CodeID column>[^][]*\]")

This recent question seems to imply it's not possible in a vectorised way but it's not exactly the same situation.

2
  • 1
    If it is possible, then the regex will look like r"\[<code in CodeID column>[^][]*\]" Commented Dec 3, 2015 at 17:55
  • Thanks. I'm always blind to regex and leave that part of debugging till last! Commented Dec 3, 2015 at 17:56

1 Answer 1

2

We can certainly use string from one column to compare another like below,

In lambda expression x[0] is codeID and x[1] is codes.

import re
import pandas as pd

Out[20]: 
    CodeID                                         Codes
0  'code1'  '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]'
1  'code2'  '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]'
2  'code3'  '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]'

df[['CodeID','Codes']].apply(lambda x: re.match(r"\[%s[^][]*\]"%x[0], x[1]),axis=1)
Out[21]: 
0    None
1    None
2    None
dtype: object

Well it returns None because of my bad regex skills :)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.