Using regex and pandas in the DataFrame to replace the value

Question

import pandas as pd
import re

regexdf_data = {
    'STag': ['Title_1', 'Abs_1', 'Abs_3', 'Abs_4'],
    'E1': ['pacnes', 'acne|dfe|sac', 'pI', 'kera'],
    'E1_CUI': ['C3477', 'C2166', 'C9871', 'C2567']
}
df3 = pd.DataFrame(regexdf_data)
df3

    E1             E1_CUI    STag
0   pacnes         C3477     Title_1
1   acne|dfe|sac   C2166     Abs_1
2   pI             C9871     Abs_3
3   kera           C2567     Abs_4

Now I want only acne from acne|dfe|sac value of E1 column in place of C2166 of E1_CUI column when there is Abs_1 value in STag column of the corresponding row.

I have tried this df3.loc[df3['STag'] == 'Abs_1', 'E1_CUI'] = re.split("\|",df3['E1']) but its not working.

Expected Output

    E1             E1_CUI    STag
0   pacnes         C3477     Title_1
1   acne|dfe|sac   acne      Abs_1
2   pI             C9871     Abs_3
3   kera           C2567     Abs_4

How do you determine it should be acne? is it a match on a specific word or simply the first split on |? — ALollz
– ALollz, Commented Mar 16, 2021 at 17:45
@ALollz Yes, It should be taken as simply the first split on | — Sachin Sinkar
– Sachin Sinkar, Commented Mar 16, 2021 at 17:52

Scott Boston · Accepted Answer · 2021-03-17 14:03:04Z

3

Try this using string accessor with split and the get shortcut for first element (improvements by @ShubhamSharma):

import pandas as pd

regexdf_data = {
    'STag': ['Title_1', 'Abs_1', 'Abs_3', 'Abs_4'],
    'E1': ['pacnes', 'acne|dfe|sac', 'pI', 'kera'],
    'E1_CUI': ['C3477', 'C2166', 'C9871', 'C2567']
}

df3 = pd.DataFrame(regexdf_data)

m = df3['STag'] == 'Abs_1'
df3.loc[m, 'E1_CUI'] = df3.loc[m, 'E1'].str.split('|').str[0]

Output:

print(df3)

      STag            E1 E1_CUI
0  Title_1        pacnes  C3477
1    Abs_1  acne|dfe|sac   acne
2    Abs_3            pI  C9871
3    Abs_4          kera  C2567

edited Mar 17, 2021 at 14:03

answered Mar 16, 2021 at 17:50

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Shubham Sharma Over a year ago

Hi Scott! Nice answer just a small optimization tip may be we can use boolean masking to slice only the interested portion of column E1 to avoid splitting the entire column :)

Scott Boston Over a year ago

@ShubhamSharma Yes, you are correct. Please feel free to edit the solution and document with your name. Excellent idea!

Shubham Sharma Over a year ago

Edited the answer!

Collectives™ on Stack Overflow

Using regex and pandas in the DataFrame to replace the value

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related