24

I wanted to write a javascript function to sanitize user input and remove any unwanted and dangerous characters.

It must allow only the following characters:

  • Alfanumeric characters (case insentitive): [a-z][0-9].
  • Inner whitespace, like "word1 word2".
  • Spanish characters (case insentitive): [áéíóúñü].
  • Underscore and hyphen [_-].
  • Dot and comma [.,].
  • Finally, the string must be trimmed with trim().

My first attempt was:

function sanitizeString(str){
str = str.replace(/[^a-z0-9áéíóúñü_-\s\.,]/gim,"");
return str.trim();
}

But if I did:

sanitizeString("word1\nword2")

it returns:

"word1
word2"

So I had to rewrite the function to remove explícitly \t\n\f\r\v\0:

function sanitizeString(str){
str = str.replace(/([^a-z0-9áéíóúñü_-\s\.,]|[\t\n\f\r\v\0])/gim,"");
return str.trim();
}

I'd like to know:

  1. Is there a better way to sanitize input with javascript?
  2. Why \n and \t doesn't matches in the first version RegExp?
5
  • 4
    Save yourself some time and don't bother, do it on the server. Javascript filtering is too easy to bypass Commented Apr 20, 2014 at 20:09
  • 1
    Actually I'm doing that on the server with SSJS because I'm using XPages on the backend. I could say the same if I was using Node.js. :) Commented Apr 20, 2014 at 20:15
  • 3
    You're allowing new line, tab etc. with \s. If you just want to allow spaces, use a space in the regular expression instead. Commented Apr 20, 2014 at 20:22
  • 3
    \s doesn't mean "a space". It includes "tab", "space", "carriage return", "new line", "vertical tab", and "form feed". Commented Apr 20, 2014 at 20:40
  • @RobG Add it as an answer, \s was the problem. You mentioned that before Derek (also I had to move the hyphen at the end). Commented Apr 20, 2014 at 20:53

1 Answer 1

37

The new version of the sanitizeString function:

function sanitizeString(str){
    str = str.replace(/[^a-z0-9áéíóúñü \.,_-]/gim,"");
    return str.trim();
}

The main problem was mentioned by @RobG and @Derek: (@RobG write your comment as an answer and I will accept it) \s doesn't mean what now w3Schools says

Find a whitespace character

It means what MDN says

Matches a single white space character, including space, tab, form feed, line feed. Equivalent to [ \f\n\r\t\v​\u00a0\u1680​\u180e\u2000​\u2001\u2002​\u2003\u2004​\u2005\u2006​\u2007\u2008​\u2009\u200a​\u2028\u2029​​\u202f\u205f​\u3000].

I trusted in w3Schools when I wrote the function.

A second change was to move the dash character (-) to the end in order to avoid it's range separator meaning.

  • Note 1: This is a server side validation using javascript.
  • Note 2: (for IBM Notes XPagers) I love javascript in XPages SSJS. This is simpler for me than the Java way.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.