2

I have a dataframe like this:

fict={'well':['10B23','10B23','10B23','10B23','10B23','10B23'],
      'tag':['15B22|TestSep_OutletFlow','15B22|TestSep_GasOutletFlow','15B22|TestSep_WellNum','15B22|TestSep_GasPresValve','15B22|TestSep_Temp','WHT']}
df=pd.DataFrame(dict)
df

    well    tag
0   10B23   15B22|TestSep_OutletFlow
1   10B23   15B22|TestSep_GasOutletFlow
2   10B23   15B22|TestSep_WellNum
3   10B23   15B22|TestSep_GasPresValve
4   10B23   15B22|TestSep_Temp
5   10B23   WHT

Now I'd like to replace anything before | in column of tag to a string like 11A22, so the dataframe after replace should look like this:

well    tag
0   10B23   11A22|TestSep_OutletFlow
1   10B23   11A22|TestSep_GasOutletFlow
2   10B23   11A22|TestSep_WellNum
3   10B23   11A22|TestSep_GasPresValve
4   10B23   11A22|TestSep_Temp
5   10B23   WHT

I am thinking to use regular expression with group to replace group by a string, something in my mind look like this

df['tag2']=df['tag'].str.replace(r'([a-z0-9]*)|TestSep_[a-z0-9]*','11A22',regex=True)

then i got result of

well    tag tag2
0   10B23   15B22|TestSep_OutletFlow    11A2211A22B11A2211A22|11A2211A2211A22O11A2211A...
1   10B23   15B22|TestSep_GasOutletFlow 11A2211A22B11A2211A22|11A2211A2211A22G11A2211A...
2   10B23   15B22|TestSep_WellNum   11A2211A22B11A2211A22|11A2211A2211A22W11A2211A...
3   10B23   15B22|TestSep_GasPresValve  11A2211A22B11A2211A22|11A2211A2211A22G11A2211A...
4   10B23   15B22|TestSep_Temp  11A2211A22B11A2211A22|11A2211A2211A22T11A2211A22
5   10B23   WHT 11A22W11A22H11A22T11A22

Thanks for your help

1 Answer 1

4

(|) is a special character in regex, you need to escape it.

df["tag2"] = df["tag"].str.replace(r"^\w*\|", "11A22|", regex=True)

​ Output :

print(df)

    well                          tag                         tag2
0  10B23     15B22|TestSep_OutletFlow     11A22|TestSep_OutletFlow
1  10B23  15B22|TestSep_GasOutletFlow  11A22|TestSep_GasOutletFlow
2  10B23        15B22|TestSep_WellNum        11A22|TestSep_WellNum
3  10B23   15B22|TestSep_GasPresValve   11A22|TestSep_GasPresValve
4  10B23           15B22|TestSep_Temp           11A22|TestSep_Temp
5  10B23                          WHT                          WHT
Sign up to request clarification or add additional context in comments.

8 Comments

oh yeah, it works beautifully! Thanks Timeless, I appreciate your help. so we don't need to use group?
You're welcome. It is not necessary in this case to capture any group since you just need to replace a word that occurs at (the beginning of the string/column + before a delimiter).
You might use df["tag"].str.replace(r"^\w*\|(.*)", r"11A22|\1", regex=True). Here, we capture any character found after the delimiter | with (.*). Then, we use "\1" to place this captured group right after the 11A22|.
I see, that is nice. that is really helpful! I definitely learn this trick tonight. Thank you Timeless, I appreciate it.
You're welcome roudan.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.