1

I am trying to loop over an array which contains strings that I want my input string to compare with. So approach is I am looping over an input string to check if each character matches with one of the elements present in the array. If not just replace that character with just ''. Note: regular expression is really not an option.

Here is how my JavaScript looks like

var input = 'this is A [{}].-_+~`:; *6^123@#$%&*()?{}|\ ';
input.toLowerCase(input)

var allowed = ['0','1','2','3','4','5','6','7','8','9','a','b','c','d', 'e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','s','à','â','ä','è','é','ê','ë','î','ï','ô','œ','ù','û','ü','ÿ','ç','À','Â','Ä','È','É','Ê','Ë','Î','Ï','Ô','Œ','Ù','Û','Ü','Ÿ','Ç', ' '] 

var cleanStr = '';
for(var i = 0; i < input.length; i++){
    for(var j = 0; j< allowed.length; j++){
    if(input[i] !== allowed[j]){
        cleanStr = input.replace(input[i], ' ');
      console.log(cleanStr);
    }
  }
}

The console log output doesn't appear to be any different than the input field. What am I missing?

Here is my fiddle

https://jsfiddle.net/sghoush1/nvnk7r9j/4/

5
  • 2
    Why regular expression is really not an option.? Commented Feb 12, 2016 at 16:24
  • Lets just it is not an option. The reason "why not" is not in scope of this question Commented Feb 12, 2016 at 16:27
  • 1
    @soum not good enough - no sane programmer would deploy an O(n^2) loop to achieve this when a pre-compiled regex of allowable characters would be vastly more efficient Commented Feb 12, 2016 at 16:29
  • @soum are you wanting the unmatched characters to be removed, or replaced with a space? Your question says the former, your code implies the latter. Commented Feb 12, 2016 at 16:45
  • 1. To update the string to lowercase, assign it input = input.toLowerCase() 2. You've s twice in the array. Commented Feb 12, 2016 at 16:50

3 Answers 3

4

You can do this in a single loop.

var input = 'this is A [{}].-_+~`:; *6^123@#$%&*()?{}|\ ';
input = input.toLowerCase(); // Note the syntax here

var allowed = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'à', 'â', 'ä', 'è', 'é', 'ê', 'ë', 'î', 'ï', 'ô', 'œ', 'ù', 'û', 'ü', 'ÿ', 'ç', 'À', 'Â', 'Ä', 'È', 'É', 'Ê', 'Ë', 'Î', 'Ï', 'Ô', 'Œ', 'Ù', 'Û', 'Ü', 'Ÿ', 'Ç', ' '];

var cleanStr = '';

// Loop over each character in the string
for (var i = 0; i<input.length; i++) {

    // Check if the character is allowed or not
    if (allowed.indexOf(input[i]) !== -1) {
        // Concat the allowed character to result string
        cleanStr += input[i];
    }
}

console.log(cleanStr);
document.body.innerHTML = cleanStr;


RegEx Approach:

You can create RegEx from a string using the RegExp constructor. To replace non-allowed characters, negated character class RegEx can be used.

var regex = new RegExp('[^' + allowed.join('') + ']', 'g');
var cleanStr = input.replace(regex, '');

Note: You'll need to escape meta-characters that have special meaning in the Character class.

Meta-characters that are needed to escape by preceding backslash \ in the character classQuoting from www.regular-expressions.info.

In most regex flavors, the only special characters or metacharacters inside a character class are the closing bracket (]), the backslash (\), the caret (^), and the hyphen (-).

var input = 'this is A [{}].-_+~`:; *6^123@#$%&*()?{}|\ ';
var allowed = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 's', 'à', 'â', 'ä', 'è', 'é', 'ê', 'ë', 'î', 'ï', 'ô', 'œ', 'ù', 'û', 'ü', 'ÿ', 'ç', 'À', 'Â', 'Ä', 'È', 'É', 'Ê', 'Ë', 'Î', 'Ï', 'Ô', 'Œ', 'Ù', 'Û', 'Ü', 'Ÿ', 'Ç', ' '];

var regex = new RegExp('[^' + allowed.join('') + ']', 'gi');
console.log(regex);

var cleanStr = input.replace(regex, '');
console.log(cleanStr);

If the allowed characters array is fixed, you can use following RegEx to replace the non-allowed characters. Also, there is no need to convert the string to lower-case, use i flag for case-insensitive match.

var regex = /[^0-9a-zàâäèéêëîïôœùûüÿç ]/gi;

RegEx101 Live Demo

Sign up to request clarification or add additional context in comments.

6 Comments

I seriously like the regular expression approach
@soum so why did you say it "really is not an option" ?!
If you're using /i, why include both upper and lower case chars in the regex ?
@Alnitak Right, didn't notice that. Thanks, that'll make the regex more compact.
Yeah the allowed [] is really not gonna be a fixed array. So that might not work. Also I hate the for loop. Reason being the size of the loop. It will never be an efficient solution. Sometimes that input can be 200 character long...and even maybe more...so the loop really wont work. I think i will use the var regex = new RegExp('[^' + allowed.join('') + ']', 'gi'); console.log(regex); var cleanStr = input.replace(regex, ''); console.log(cleanStr);
|
4

Using ES6's Set class, available in all good browsers:

let input = 'this is A [{}].-_+~`:; *6^123@#$%&*()?{}|\ '.toLowerCase();
let allowed = new Set(['0','1','2','3','4','5','6','7','8','9','a','b','c','d', 'e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','s','à','â','ä','è','é','ê','ë','î','ï','ô','œ','ù','û','ü','ÿ','ç','À','Â','Ä','È','É','Ê','Ë','Î','Ï','Ô','Œ','Ù','Û','Ü','Ÿ','Ç', ' ']);

let cleanStr = [].map.call(input, c => allowed.has(c) ? c : ' ').join('');

The last line uses an efficient Set lookup operation to determine if the character is allowed or not.

The [].map.call(input, ...) allows the Array.prototype.map function to operate directly on the input string. Since the result is an array, it needs to be joined back together again afterwards.

In algorithmic complexity terms, this uses two O(n) array operations, and n Set lookups - I don't know what complexity they have but it'll be likely O(log n) or perhaps even O(1), depending on the implementation.

The creation of the initial Set has a computation cost too, of course, but it's trivial and should be done just once, e.g. when the program starts.

If instead you actually wanted to remove the non-matching characters, you could use .filter instead of .map:

let cleanStr = [].filter.call(input, c => allowed.has(c)).join('');

5 Comments

Wouldn't it be better to use filter?
@MinusFour the OP appears to want the illegal characters to be replaced with a space. filter would be more appropriate if they were to be removed altogether.
Well, that's odd. In the description is says: '', where it's code actually has a ' '.
Huh. It's never occurred to me that you could use [].map.call to iterate over a string like that. Nice trick @Alnitak
@Andy a string is a pseudo-array - it has numeric index array-style accessors for the individual characters and a .length property. All of the standard Array methods are invocable this way (since they're deliberately specified to only require those two characteristics)
1

Ok so the problem with your code is that every time you loop to check if an element of the input is allowed, you assign cleanStr to the input with only that character changed to an empty string element. Keep in mind that at every loop your input is always the same and clearStr is the result of the last replacement you did. So you are actually throwing away every replacement done so far and at the end of your computation you will have the input string with only the last replacement you did. What you wanna do is build the resulting string incrementally, so that at the end of the loop you have the result you expected.

var input = 'this is A [{}].-_+~`:; *6^123@#$%&*()?{}|\ ';
input.toLowerCase(input)

var allowed = ['0','1','2','3','4','5','6','7','8','9','a','b','c','d', 'e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','s','à','â','ä','è','é','ê','ë','î','ï','ô','œ','ù','û','ü','ÿ','ç','À','Â','Ä','È','É','Ê','Ë','Î','Ï','Ô','Œ','Ù','Û','Ü','Ÿ','Ç', ' '] 

var cleanStr = '';
for(var i = 0; i < input.length; i++){
  if(allowed.indexOf(input[i]) !== -1){
        cleanStr += input[i];
  }
}

console.log(cleanStr);

I thought it was important for you to understand what was your mistake. Other than the fact that you can use some builtin functions of js to avoid a double for loop for such a simple task. Although as many suggested a regex would be much more efficient.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.