6

I'm working on a HTML5 input pattern polyfill and I'm trying to validate an input type=url in JavaScript exactly as the browser (Chrome) does but can't find any documentation on a JavaScript or PERL compatible regular expression. As it's a polyfill, I don't particularly mind if it matches all URL's exactly (which is impossible) but rather that it imitates how the browser works.

Would anyone know of an identical pattern in PERL syntax?

Thanks

4
  • 1
    It probably depends on the browser. Commented May 16, 2012 at 20:36
  • Possibly. It's pretty difficult to digest the spec regarding input type url and what qualifies as a valid url. I mentioned Chrome in the original question, so would anyone have any ideas regarding that browser specifically? Commented May 16, 2012 at 20:48
  • Isn't that browser (partially? chromium?) open source? Commented May 16, 2012 at 21:03
  • Yes, and honestly I haven't looked through the source to know for sure but Chrome was written in C++, Assembly, Python, and JavaScript so even if I knew what to look for I wouldn't be guaranteed to find it in PERL syntax. If I can't find it anywhere else, then I may have to dig through the source of Chrome. Commented May 16, 2012 at 21:10

2 Answers 2

6

After searching through several HTML5 shivs on GitHub to see if anyone else has come across an ideal expression, I believe I found something that's very close but it doesn't match perfectly.

Alexander Farkas (https://github.com/aFarkas/webshim/blob/master/src/shims/form-shim-extend.js#L285) uses this pattern to test URLs:

/^([a-z]([a-z]|\d|\+|-|\.)*):(\/\/(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*@)?((\[(|(v[\da-f]{1,}\.(([a-z]|\d|-|\.|_|~)|[!\$&'\(\)\*\+,;=]|:)+))\])|((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=])*)(:\d*)?)(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)*)*|(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)*)*)?)|((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)*)*)|((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)){0})(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|\/|\?)*)?$/i;

Also, just for anyone who stumbles across this via Google, if you don't need the pattern, but just want to check if something's valid through JavaScript (perhaps onChange), you can use the formelement.checkValidity() method. Obviously this doesn't help with a polyfill (which assumes no native HTML5 validation support) but it is useful nonetheless.

Sign up to request clarification or add additional context in comments.

Comments

4

Read the regarding specification at http://www.w3.org/TR/html5/forms.html#url-state-(type=url):

Your polyfill should start with sanitizing the input, i.e. removing linebreaks and trimming the string. The sentence "User agents must not allow users to insert "LF" (U+000A) or "CR" (U+000D) characters" might also be interesting.

The results should be a valid, absolute URL. The there referenced RFCs 3986 and 3987 will be describing the URL validation, the section about parsing URLs may be as well interesting.

Your polyfill might not only validate URIs, it also may resolve relative URIs. At least, validating a URI will be much simpler with an algortihm instead of finding an appropriate regexp. Yet, even the RFC mentions a regexp for parsing a already validated URI in appendix B.

2 Comments

Thanks for your help and suggestions. This definitely looks like a good place to start. I had grazed over the spec, but was thinking that someone had already blazed the trail and if so, I'd just use what was out there. Looks like I'm going to have to get my hands a little dirty. Again, thanks for the advice!
Hey @bergi! Looks like W3C links in this answer is not available anymore. Would you be so kind to update them? Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.