Using if and else in regular expression

Question

I'm having difficulty trying to understand this particular regular expression (it is currently used to check user input for phone number) :

^((\+\d{1,3}(-| )?\(?\d\)?(-| )?\d{1,3})|(\(?\d{2,3}\)?))(-| )?(\d{1,4})(-| )?(\d{6})(( x| ext)\d{1,5}){0,1}$

I read that "?()" is used for if condition in regular expression, but it still not really clear for me the logic behind this regular expression and what kind of input is accepted and rejected by it.

Thanks

There would not be a better place for this link :) programmers.stackexchange.com/questions/10998/… — denis.solonenko
– denis.solonenko, Commented May 20, 2013 at 6:32
The ? is used to denote zero or one of the preceding element. Not conditionals in the standard imperative programming sense. — basarat
– basarat, Commented May 20, 2013 at 6:34
Go to regex-explain.googlecode.com/hg/explain.html and type in your regex, and it will explain it to you — dave
– dave, Commented May 20, 2013 at 6:34
Try regexper.com, type in the regexp and it will draw you a state diagram... — Jimbo
– Jimbo, Commented May 20, 2013 at 6:46
thanks, both regex-explain.googlecode.com/hg/explain.html and especially regexper.com makes it easier for me to understand the logic of the regular expression — jsendo
– jsendo, Commented May 20, 2013 at 7:44

Jimbo · Accepted Answer · 2013-05-20 07:32:42Z

Firstly, in regexp, ?() is not a conditional. ? matches the character (group) to the left of it 0 or 1 times and () starts a capture group with nothing in it?... no conditionals I'm afraid :) The closest might be (a|b) which matches either a or b...

The regexp is a little difficult to read, so

^((\+\d{1,3}(-| )?\(?\d\)?(-| )?\d{1,3})|(\(?\d{2,3}\)?))(-| )?(\d{1,4})(-| )?(\d{6})(( x| ext)\d{1,5}){0,1}$

Try regexper.com, type in the regexp and it will draw you a state diagram...

Using some tabbing to break up the expression:

^(
    (\+\d{1,3}(-| )?
    \(?\d\)?(-| )?
    \d{1,3})
  |(
    \(?\d{2,3}\)?
   )
)

(-| )?(\d{1,4})
(-| )?(\d{6})
(
    ( x| ext)\d{1,5}
){0,1}$

(Note makes some spaces hard to read but we'll go through that by referencing the original)

^ matches the start of a line

The next group is ((\+\d{1,3}(-| )?$?\d$?(-| )?\d{1,3})|($?\d{2,3}$?))

This has two parts: (X|Y), where X=(\+\d{1,3}(-| )?$?\d$?(-| )?\d{1,3}) and Y=($?\d{2,3}$?). This will match either X or Y...

Breaking down X=(\+\d{1,3}(-| )?$?\d$?(-| )?\d{1,3}):

The outer () are a capture, so strip these...
\+ matches a literal plus sign. Note that it has to be escaped with the \ because + is a meta character meaning "match one or more of the previous".
\d{1,3} matches any decimal digit eiter 1, 2 or 3 times but no more or less
(-| )? matches either - or (space) zero or one times. The ? is wht specifies zero or one times.
$?\d$ matches a literal '(' (notice the escape) zero or one times. Then a decimal digit, then another literal )
(-| )? we've seen before (matches either - or (space) zero or one times. The ? is wht specifies zero or one times.)
\d{1,3} we've also seen before (matches any decimal digit eiter 1, 2 or 3 times but no more or less)

So we can say that X matches (and captures - that's wat the outer () is doing) any string that starts with a plus, has 1 to 3 digits then possibly a space or a hyphen, a digit inside brackets, possibly another space or hyphen and then another 1 to 3 digits. This is captured as the first capture group... phew!

Breaking down Y=($?\d{2,3}$?):

The outer () are a capture so string these...
\(? matches a literal ( zero or one times.
\d{2,3} matches any digit two or three times
\)? matches a literal ) zero or one times

So we can say that Y matches an two or three digit number, possibly surrounded by brackets. This is captures as the first capture group. Jeez!

Now we have X and Y we can see what the first chunk of the regexp matches (brain melting!).

The first chunk, call it CHUNK1 matches and captures either

any string that starts with a plus, has 1 to 3 digits then possibly a space or a hyphen, a digit inside brackets, possibly another space or hyphen and then another 1 to 3 digits OR
any two or three digit number, possibly surrounded by brackets

Continuing... (-| )? we've seen before (matches either - or (space) zero or one times. The ? is wht specifies zero or one times.)

(\d{1,4}) matches a string of digit characters that is 1,2,3 or 4 digits in length. This forms the second capture group.

(-| )? we've seen before (matches either - or (space) zero or one times. The ? is wht specifies zero or one times.)

(\d{6}) matches a string of exactly 6 digits

So here you are matching a string with a possible space or hypen, 1 to 4 numbers, another possible space or hyphen and then 6 numbers. Call this chunk2

So far we have matched any string consistiing of chunk1 followed immediately by chunk2...

This concludes the main bit of the phone number, the rest appears to handle extensions...

The next bit is (( x| ext)\d{1,5}){0,1}. Lets break this down a little.

The surrounding brackets are the capture group.
( x| ext) matches either of the two literal strings ' x' or ' ext' - note the beginning space.
\d{1,5} matches any digit 1,2,3,4 or 5 times.
{0,1} matches the capture group zero or one times... i.e. the phone number does not need to have an extension

Finally $ matches the end of line.

Hopefully this has broken down the string well enough for you to work through :)

Wow.. thank you so much for the detailed and long explanation. It makes a lot more sense for me now. Thanks.

Pritesh Tayade · Accepted Answer · 2013-05-20 07:09:31Z

Just break the regular Expression into pieces and see what they mean:

First Group ((\+\d{1,3}(-| )?$?\d$?(-| )?\d{1,3})|($?\d{2,3}$?)) which means Either Match this group (\+\d{1,3}(-| )?$?\d$?(-| )?\d{1,3}) OR this group ($?\d{2,3}$?)

Now take : (\+\d{1,3}(-| )?$?\d$?(-| )?\d{1,3})

\+ will match "+" character

\d{1,3} will match any digit with 1 to 3 occurrence

(-| ) will match - or space after that ? which says preceding token is optional that means (-| ) it is optional

\(? will match opening bracket which is optional

\d will match a digit

\)? will match optional closing bracket

(-| )? will match - or space which is optional

\d{1,3} will match digit with 1 to 3 occurrence

Now take this group: ($?\d{2,3}$?)

\(? is optional opening bracket

\d{2,3} digit with 2 to 3 occurrence

\)? is optional closing bracket

now take later part of regex: (-| )?(\d{1,4})(-| )?(\d{6})(( x| ext)\d{1,5}){0,1}

(-| )? will match - or space (optional)

(\d{1,4}) will match digit with 1 to 4 occurrence

(-| )? will match - or space (optional)

(\d{6}) will match digit with exact 6 occurrence

(( x| ext)\d{1,5}) will match " x" or " ext" and then digit 1 to 5 occurrence and whole group of this could be 0 to 1 occurrence.

Possible matches:

(23)-1-123232
23-1-123232
2312121321214
231212131
2312121311222
123232121 x1
123232121 ext1
+1-(1)-1-1234-123456
+1-(1)-1-1234-123456 x1
+1-(1)-1-1234-123456 ext12345

Collectives™ on Stack Overflow

Using if and else in regular expression

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related