0

On the server I am sanitizing inputs by removing a list of characters like so.

FORBIDDEN_CHARS = %w[# % & * ( ) + = ; " , < > ? \\].freeze
'# % & * ( ) + valid  = ;  bit " , < > ? \\'.delete(FORBIDDEN_CHARS.join).strip.gsub(/\s{2,}/, ' ')
=> "valid bit"

I would like to preempt this with an HTML pattern on my input field. How can I easily convert this list of forbidden characters into a regular expression for the HTML5 pattern attribute?

I need something like

pattern='[^#%&*()+=;",<>?\\]+'`

However, pattern: "[^#{FORBIDDEN_CHARS}]+" does not properly escape the backslash and Firefox reports Unable to check <input pattern='[^#%&*()+=;",<>?\]+'> because the pattern is not a valid regexp: unterminated character class.

pattern: "[^#{%w[# % & * ( ) + = ; " , < > ?].join}]+"

This does work without the backslash or if I add it in during concatenation...

pattern: "[^#{FORBIDDEN_CHARS.join}\\]+"

Using Regexp.quote seems to escape too many characters.

> "[^#{Regexp.quote FORBIDDEN_CHARS.join}\\]+"
 => "[^\\#%&\\*\\(\\)\\+=;\",<>\\?\\\\\\]+"

Update 2017-08-02 I've decided to go for a whitelist pattern. I now understand that the HTML5 pattern attribute is a JavaScript regular expression. I want to take an array of allowed symbols, escape those which need to be escaped in a JS regular expression, and create the pattern that includes letters, numbers, spaces and those symbols.

ALLOWED_SYMBOLS = %w[% & - : ' .]
6
  • Are you asking what the correct pattern is, or how to write Ruby code that will generate that pattern? Commented Jul 31, 2017 at 20:03
  • Furthermore, how are you generating your HTML? Are you using Rails? Commented Jul 31, 2017 at 20:07
  • I'm asking how to properly convert a ruby regular expression into an HTML 5 regular expression. Specifically, how do I make sure characters are properly escaped? I'm using Haml to generate HTML. I know the regex pattern I need but I would like to be able to edit my list of forbidden characters in one place and have the HTML pattern update in my views along with my server-side processing. Commented Jul 31, 2017 at 20:18
  • Whitelisting, NOT blacklisting, is the only safe choice for input fields. Commented Aug 1, 2017 at 1:05
  • It looks like the HTML pattern attribute using JS regex "The regular expression language is the same as JavaScript RegExp algorithm" - developer.mozilla.org/en-US/docs/Web/HTML/Element/Input Commented Aug 1, 2017 at 15:35

2 Answers 2

1

The error message is pretty clear, as error messages go:

Unable to check <input pattern='[^#%&*()+=;",<>?\]+'> because the pattern is not a valid regexp: unterminated character class

"Unterminated character class" means that it's looking for the ] that ends the character class, but can't find it. You can see that that's because instead of having an escaped \ (\\), you have a single \ escaping the ] (\]), and as you already know if that was \\] it would work correctly.

Using Regexp.quote seems to escape too many characters.

Well... no. Regexp.quote is for working with Ruby regular expressions. It's not for working with HTML5 (JavaScript) patterns. In the former, \# works. In the latter, it doesn't. There's no good way around this.

The core problem here is that you've come up with the cleverest solution instead of the best one. The best one is the one that's simple and easy for a human to understand and maintain. Half of that solution looks like this:

# Note to future me/other developers: If you change one of the below
# lines, you *must* also change the other.
FORBIDDEN_CHARS = '#%&*()+=;",<>?\\'
ALLOWED_CHARS_PATTERN = '[#%&*()+=;",<>?\\\\]+'

The other half of the solution is, of course, unit tests. Your Ruby tests and your browser tests should throw the same test data at both of these, so if they're changed in some way that isn't consistent, your tests will fail.

If you'd rather be clever, though, the only characters you need to escape inside a character class (square brackets) in JavaScript are \ and ]:

FORBIDDEN_CHARS = '#%&*()+=;",<>?\\'.freeze
ALLOWED_CHARS_PATTERN = "[^#{ FORBIDDEN_CHARS.gsub(/\\/, '\\\\\0') }]+".freeze

puts ALLOWED_CHARS_PATTERN
# => [^#%&*()+=;",<>?\\]+

Of course, you'll still need those unit tests.

Sign up to request clarification or add additional context in comments.

7 Comments

Thank you for this detailed answer! I am going to go for a whitelist pattern in my HTML since that seems safer easier for my use case.
There are more characters that need to be escaped for JS regex. developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/…
@HarlemSquirrel Inside a character class (square brackets)? Which characters?
MDN says \ ^ $ * + ? . are special regex characters.
Oh wait, I see now. "Special characters like the dot(.) and asterisk (*) are not special inside a character set, so they don't need to be escaped." Sorry, you're right! However, I'm assuming hyphens, square brackets and backslashes need to be escaped.
|
0

Try this:

require 'sinatra'

get '/' do
  FORBIDDEN_CHARS = %w[# % & * ( ) + = ; " , < > ? \\].freeze
  pattern = FORBIDDEN_CHARS.join('').inspect[1..-2].gsub('\"', '"')
  "<input pattern='[^#{pattern}]+' />"
end

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.