2

I have an odd problem occurring with our regex for email addresses. Here is the expression:

^(\w)+(([(\.?)\w\-+])*[\w]+)*@((\[([\d]{1,3}\.){3}[\d]{1,3}\])|((\w)+((\.?)[\w\-]+)*\.[a-z]{2,6}))$

Anything we've thrown at it which matches is fine, the problem is with failures, long strings cause the expression to hang. On our webserver it will spike the CPU. Some examples follow. The problem is when people enter long email addresses errantly, it crashes the server.

This is a failure which works.

rubular failure 1 short@failure

This is a failure which causes the hanging, you can see rubular has issues as well.

rubular failure 2 thisisamuchlonger@expressionleadingtofailure

The interesting thing is if you make it proper:

rubular pass [email protected]

This passes easily.

Edit: A note, I've also attempted to run this using the client side javascript tester and it will fail/succeed in the same ways. There is something about this regex which causes parsers to eat memory and fail, I'm just not sure what part it is.

2
  • 2
    This is an interesting read regarding this general topic. regular-expressions.info/email.html Commented Aug 29, 2011 at 19:42
  • 1
    If you cannot rule out your string contains line breaks, you should use \A and \z instead of ^ and $. Commented Aug 29, 2011 at 19:51

2 Answers 2

4

Your regular expression combines the worst-case for regular expressions repeatedly. Your regex gets stuck backtracking over the string when the regex fails to match. Take out the *s and ? s and your regular expression will perform admirably.

See http://swtch.com/~rsc/regexp/regexp1.html for a thorough explanation of why you can't do what you are trying to do in a performant manner.

My personal opinion is that you should just check for /@/ and send a confirmation e-mail, but you can probably find a regex elsewhere on the web that will perform adequately while matching most e-mail addresses.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your response, unfortunately we don't have the option of confirmation emails as the system is unmonitored and runs this autonomously. Also we need to account for everything which is allowed under the RFC specifications. We found most rules were too strict.
@Jeremy you're not really ever validating an e-mail, though. Even an e-mail that passes your RFC-compliant filter can still be an invalid address.
1

Try this for example, rubular eats this well

^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}$

By the way google first serp leads to more examples: http://www.regular-expressions.info/email.html

1 Comment

This is a nice start but you lopped off IP addresses.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.