14

I am trying to build a regex function that will remove any non alpha numeric characters and remove all duplicate characters e.g. this : aabcd*def%gGGhhhijkklmnoP\1223 would become this : abcddefgGhijklmnoPR3. I am able to remove the special characters easily but can't for the life of me work out how to remove the duplicate characters ? This is my current code for removing the special characters :

var oldString = aabcd*def%gGGhhhijkklmnoP\122
var filtered = oldStringt.replace(/[^\w\s]/gi, ""); 

How can I extend the above regex to check for duplicate characters and those duplicate characters separated by non-alphanumeric characters.

4 Answers 4

35

The regex is /[^\w\s]|(.)\1/gi

Test here: http://jsfiddle.net/Cte94/

it uses the backreference to search for any character (.) followed by the same character \1

Unless by "check for duplicate characters" you meant that aaa => a

Then it's /[^\w\s]|(.)(?=\1)/gi

Test here: http://jsfiddle.net/Cte94/1/

Be aware that both regexes don't distinguish between case. A == a, so Aa is a repetition. If you don't want it, take away the i from /gi

Sign up to request clarification or add additional context in comments.

9 Comments

thank you so much is there anyway i can ensure that duplicates are removed that are separated by non-alphanumeric characters.
@jonathanp Make it two Regexes (one to remove non-alphanumeric and one to remove duplicated). It's useless to make uber-complex Regexes, especially if you then have to handle/modify them
@jonathanp var filtered = oldString.replace(/[^\w\s]/g, "").replace(/(.)(?=\1)/gi, "");
@Pascalius Yep. Because the question didn't require it. The question explicitly used \w, that is a-zA-Z0-9.
@Faks /[^\w\s]|(.)(?=\1\1)/gi will remove only if there are at least three aaa, and will leave aa alone
|
5

\1+ is the key

"aabcdd".replace(/(\w)\1+/g, function (str, match) {
    return match[0]
}); // abcd

Comments

2

Non regex version:

var oldString = "aabcd*def%gGGhhhijkklmnoP\122";
var newString = "";

var len = oldString.length;
var c = oldString[0];
for ( var i = 1; i < len; ++i ) {
  if ( c != oldString[i] ) {
    newString += c;
  }
  c = oldString[i];
}

1 Comment

Hi that does a great job of removing the duplicated characters so would you suggest runing my regex first then executing the above ?
1

short and simple input=Brahmananda output:Brahmnd ref:http://jsfiddle.net/p7yu8etz/5/

var str = "Brahmananda";
var reg = /(.)(.*?)(\1)/g;
while (reg.test(str))
str = str.replace(reg, "$1$2");
$("div").text(str);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div><div>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.