1

I have inherited an old code file that has the following code. It seems the last line of the code below is removing all the open ( and close ) parentheses, and - character from the phone number field.

Question: But why it is using regex='\(' in .replace(regex='\(',value='') part of that last line? Some other online examples (such as here and here) I have seen don't seem to use regex keyword in their replacement function. What regex='\(' is doing in the replace function here?

import sqlalchemy as sq
import pandas as pd
import re

pw = dbutils.secrets.get(scope='SomeScope',key='sql')
engine = sq.create_engine('mssql+pymssql://SERVICE.Databricks.NONPUBLICETL:'+pw+'MyAzureSQL.database.windows.net:1433/TEST', isolation_level="AUTOCOMMIT")

pandas_df = pd.read_sql('select * from SQLTable1', con=engine)

pandas_df['MOBILE_PHONE'].replace(regex='\(',value='').replace(regex='\)',value='').replace(regex='\-',value='').str.strip()
3
  • 1
    The links that you provided are related to re package or to string.replace() method. But in the inherited code, the replace() method is refering to pandas.pydata.org/docs/reference/api/… , see regex keyword. Commented Jan 5, 2022 at 22:20
  • @MarcelPreda So, in replace(regex='\)',value='') is it saying: my regex pattern is \). So find all substrings that match this pattern ( close parenthesis in this case) and remove it (i.e., replace it with empty string)? Commented Jan 5, 2022 at 22:32
  • yes it is the behavior in your case: everything that matches with regex, is replaced by value. Commented Jan 5, 2022 at 22:42

3 Answers 3

1

Coding precision depends on experience, logic and mastery of syntax. Its like mastery of normal language. The answer you adapted achieves exactly what the code below does

df['MOBILE_PHONE2'] = df['MOBILE_PHONE'].str.replace('[^\d]','',regex=True)

Explanation

\d is regex for digits

[^] is regex for everything except

[^\d] everything except digits

So, using the pandas API, I replace everything except digits in the string with nothing

Outcome

    MOBILE_PHONE  MOBILE_PHONE2
0  (425) 555-1234    4255551234
Sign up to request clarification or add additional context in comments.

1 Comment

A good one liner with nice explanation. +1
1

The signature of the replace function has changed.

Replace your last line by:

df['MOBILE_PHONE2'] = df['MOBILE_PHONE'].replace('[()-]', '', regex=True).str.strip()
print(df)

# Output
     MOBILE_PHONE MOBILE_PHONE2
0  (425) 555-1234   425 5551234

Replace ( or ) or - by ''

Comments

0

You can use the regex keyword as a boolean to tell .replace() whether to interpret the string as regex or as the regular expression itself

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.