Validate strings using regex in pandas

Question

I need a bit of help.

I'm pretty new to Python (I use version 3.0 bundled with Anaconda) and I want to use regex to validate/return a list of only valid numbers that match a criteria (say \d{11} for 11 digits). I'm getting the list using Pandas

df = pd.DataFrame(columns=['phoneNumber','count'], data=[
    ['08034303939',11],
    ['08034382919',11],
    ['0802329292',10],
    ['09039292921',11]])

When I return all the items using

for row in df.iterrows(): # dataframe.iterrows() returns tuple
    print(row[1][0])

it returns all items without regex validation, but when I try to validate with this

for row in df.iterrows(): # dataframe.iterrows() returns tuple
    print(re.compile(r"\d{11}").search(row[1][0]).group())

it returns an Attribute error (since the returned value for non-matching values is None.

How can I work around this, or is there an easier way?

cs95 · Accepted Answer · 2019-01-12 15:36:23Z

5

If you want to validate, you can use str.match and convert to a boolean mask using df.astype(bool):

x = df['phoneNumber'].str.match(r'\d{11}').astype(bool)
x

0     True
1     True
2    False
3     True
Name: phoneNumber, dtype: bool

You can use boolean indexing to return only rows with valid phone numbers.

df[x]

   phoneNumber  count
0  08034303939     11
1  08034382919     11
3  09039292921     11

edited Jan 12, 2019 at 15:36

answered Jul 18, 2017 at 18:40

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Validate strings using regex in pandas

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related