2

I need a bit of help.

I'm pretty new to Python (I use version 3.0 bundled with Anaconda) and I want to use regex to validate/return a list of only valid numbers that match a criteria (say \d{11} for 11 digits). I'm getting the list using Pandas

df = pd.DataFrame(columns=['phoneNumber','count'], data=[
    ['08034303939',11],
    ['08034382919',11],
    ['0802329292',10],
    ['09039292921',11]])

When I return all the items using

for row in df.iterrows(): # dataframe.iterrows() returns tuple
    print(row[1][0])

it returns all items without regex validation, but when I try to validate with this

for row in df.iterrows(): # dataframe.iterrows() returns tuple
    print(re.compile(r"\d{11}").search(row[1][0]).group())

it returns an Attribute error (since the returned value for non-matching values is None.

How can I work around this, or is there an easier way?

1 Answer 1

5

If you want to validate, you can use str.match and convert to a boolean mask using df.astype(bool):

x = df['phoneNumber'].str.match(r'\d{11}').astype(bool)
x

0     True
1     True
2    False
3     True
Name: phoneNumber, dtype: bool

You can use boolean indexing to return only rows with valid phone numbers.

df[x]

   phoneNumber  count
0  08034303939     11
1  08034382919     11
3  09039292921     11
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.