0

I am writing a code to match the US phone number format

So it should match:

123-333-1111
(123)111-2222
123-2221111

But should not match 1232221111

matchThreeDigits = r"(?:\s*\(?[\d]{3}\)?\s*)"
matchFourDigits = r"(?:\s*[\d]{4}\s*)"
phoneRegex = '('+ '('+  matchThreeDigits + ')' + '-?' +   '('+  matchThreeDigits + ')' + '-?' + '(' + matchFourDigits + ')' +')';
matches = re.findall(re.compile(phoneRegex),line)

The problem is I need to make sure at least one of () or '-' is present in present in the pattern (or else it can be a nine digit number rather than a phone number). I don't want to do another pattern search for efficiency reasons. Is there any way to accommodate this information in the regex pattern itself.

2
  • 1
    See the following section of Dive into Python: diveinto.org/python3/regular-expressions.html#phonenumbers Commented Nov 6, 2013 at 10:53
  • Not sure why you accepted the answer that you did, considering that mine is much simpler and just as correct. No big, deal, just a little confused. Commented Nov 11, 2013 at 3:09

3 Answers 3

3

Something like this?

pattern = r'(\(?(\d{3})\)?(?P<A>-)?(\d{3})(?(A)-?|-)(\d{4}))'

Using it:

import re
regex = re.compile(pattern)
check = ['123-333-1111', '(123)111-2222', '123-2221111', '1232221111']
for number in check:
    match = regex.match(number)
    print number, bool(match)
    if match:
        # show the numbers
        print 'nums:', filter(lambda x: x and x.isalnum(), match.groups())

>>> 
123-333-1111 True
nums: ('123', '333', '1111')
(123)111-2222 True
nums: ('123', '111', '2222')
123-2221111 True
nums: ('123', '222', '1111')
1232221111 False

Note:

You requested an explanation of: (?P<A>-) and (?(A)-?|-)

  • (?P<A>-) : Is a named capture group with the name A, (?P<NAME> ... )
  • (?(A)-?|-) : Is a group that checks if the named group A captured something or not, if so it does the YES, else it does the NO capture. (?(NAME)YES|NO)

All this can be easily learned if you do a simple help(re) in the Python interpreter, or a Google search for Python Regular Expressions....

Sign up to request clarification or add additional context in comments.

2 Comments

Can you please eloburate how this regex work specially (?P<A>-) and (?(A)-?|-) portion
Added a note. Please consult your documentation next time.
3

You can use the following regex:

regex = r'(?:\d{3}-|\(\d{3}\))\d{3}-?\d{4}'

assuming that (123)1112222 is acceptable.

The | acts as an or, and \( and \) escape ( and ), respectively.

1 Comment

A question to OP, should 123456-7890 matches?
2
import re
phoneRegex = re.compile("(\({0,1}[\d]{3}\)(?=[\d]{3})|[\d]{3}-)([\d]{3}[-]{0,1}[\d]{4})")
numbers = ["123-333-1111", "(123)111-2222", "123-2221111", "1232221111", "(123)-111-2222"]
for number in numbers:
    print bool(re.match(phoneRegex, number))

Output

True
True
True
False
False

You can see an explanation to this regular expression here : http://regex101.com/r/bA4fH8

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.