python/pandas - substring replacement using regular expression

Question

I have inherited an old code file that has the following code. It seems the last line of the code below is removing all the open ( and close ) parentheses, and - character from the phone number field.

Question: But why it is using regex='\(' in .replace(regex='\(',value='') part of that last line? Some other online examples (such as here and here) I have seen don't seem to use regex keyword in their replacement function. What regex='\(' is doing in the replace function here?

import sqlalchemy as sq
import pandas as pd
import re

pw = dbutils.secrets.get(scope='SomeScope',key='sql')
engine = sq.create_engine('mssql+pymssql://SERVICE.Databricks.NONPUBLICETL:'+pw+'MyAzureSQL.database.windows.net:1433/TEST', isolation_level="AUTOCOMMIT")

pandas_df = pd.read_sql('select * from SQLTable1', con=engine)

pandas_df['MOBILE_PHONE'].replace(regex='\(',value='').replace(regex='\)',value='').replace(regex='\-',value='').str.strip()

The links that you provided are related to re package or to string.replace() method. But in the inherited code, the replace() method is refering to pandas.pydata.org/docs/reference/api/… , see regex keyword. — Marcel Preda
– Marcel Preda, Commented Jan 5, 2022 at 22:20
@MarcelPreda So, in replace(regex='\)',value='') is it saying: my regex pattern is \). So find all substrings that match this pattern ( close parenthesis in this case) and remove it (i.e., replace it with empty string)? — nam
– nam, Commented Jan 5, 2022 at 22:32
yes it is the behavior in your case: everything that matches with regex, is replaced by value. — Marcel Preda
– Marcel Preda, Commented Jan 5, 2022 at 22:42

wwnde · Accepted Answer · 2022-01-05 23:08:48Z

1

Coding precision depends on experience, logic and mastery of syntax. Its like mastery of normal language. The answer you adapted achieves exactly what the code below does

df['MOBILE_PHONE2'] = df['MOBILE_PHONE'].str.replace('[^\d]','',regex=True)

Explanation

\d is regex for digits

[^] is regex for everything except

[^\d] everything except digits

So, using the pandas API, I replace everything except digits in the string with nothing

Outcome

    MOBILE_PHONE  MOBILE_PHONE2
0  (425) 555-1234    4255551234

answered Jan 5, 2022 at 23:08

wwnde

26.7k6 gold badges22 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

nam Over a year ago

A good one liner with nice explanation. +1

Corralien · Accepted Answer · 2022-01-05 22:50:48Z

1

The signature of the replace function has changed.

Replace your last line by:

df['MOBILE_PHONE2'] = df['MOBILE_PHONE'].replace('[()-]', '', regex=True).str.strip()
print(df)

# Output
     MOBILE_PHONE MOBILE_PHONE2
0  (425) 555-1234   425 5551234

Replace ( or ) or - by ''

answered Jan 5, 2022 at 22:50

Corralien

121k8 gold badges44 silver badges69 bronze badges

Comments

SuperPineapple · Accepted Answer · 2022-01-05 22:24:36Z

0

You can use the regex keyword as a boolean to tell .replace() whether to interpret the string as regex or as the regular expression itself

answered Jan 5, 2022 at 22:24

SuperPineapple

313 bronze badges

Collectives™ on Stack Overflow

python/pandas - substring replacement using regular expression

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related