Remove zero-width space characters from a JavaScript string

Question

I take user-input (JS code) and execute (process) them in realtime to show some output.

Sometimes the code has those zero-width spaces; it's really weird. I don't know how the users are inputting that. Example: "($".length === 3

I need to be able to remove that character from my code in JS. How do I do so? or maybe there's some other way to execute that JS code so that the browser doesn't take the zero-width space characters into account?

How did you infer that there is a zero-width character? From the length alone? The length of a non-BMP character is 2. — Jukka K. Korpela
– Jukka K. Korpela, Commented Jul 3, 2012 at 8:23
when i go to the end of the string ,and hit the left arrow, at one point, it doesnt moves to the left until hitting the left arrow key twice. that's how i infered. — user1437328
– user1437328, Commented Jul 3, 2012 at 8:29
Then you need to analyze the characters e.g. by writing out the numeric codes. The data may contain combining marks so that two or more characters are treated as a unit when moving to the left. — Jukka K. Korpela
– Jukka K. Korpela, Commented Jul 3, 2012 at 15:26

Mathias Bynens · Accepted Answer · 2012-07-03 06:58:01Z

173

Unicode has the following zero-width characters:

U+200B zero width space
U+200C zero width non-joiner Unicode code point
U+200D zero width joiner Unicode code point
U+FEFF zero width no-break space Unicode code point

To remove them from a string in JavaScript, you can use a simple regular expression:

var userInput = 'a\u200Bb\u200Cc\u200Dd\uFEFFe';
console.log(userInput.length); // 9
var result = userInput.replace(/[\u200B-\u200D\uFEFF]/g, '');
console.log(result.length); // 5

Note that there are many more symbols that may not be visible. Some of ASCII’s control characters, for example.

answered Jul 3, 2012 at 6:58

Mathias Bynens

151k54 gold badges224 silver badges254 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Mathias Bynens Over a year ago

@Iván Castellanos mentioned some other characters that may be considered for this: U+200E LEFT-TO-RIGHT MARK and U+200F RIGHT-TO-LEFT MARK. As I said, there may be other symbols that are not strictly visible by themselves.

klewis Over a year ago

How do we detect if these values actually exist on the page after the DOM loads? Thanks!

user7892745 Over a year ago

var HTMLe=document.getElementsByTagName('html')[0]; HTMLe.outerHTML = HTMLe.outerHTML.replace(/[\u200B-\u200D\uFEFF\u200E\u200F]/g, '');

... that's how

mplungjan Over a year ago

This does not work if the string is a string of unicode chars - it will give error just to create a var with them

var s = "\ud83d\ude0d\ud83d\ude0d\ud83d\ude0d\ud83d\ude0d\ud83d\ude0d‌\ud83d\ude0d\ud83d\u‌de0d\ud83d\ude0d\ud8‌3d\ude0d\ud83d\ude18‌\ud83d\ude18\ud83d\u‌de18"

<-- contains actual \u200c and d

Milad Xandi Over a year ago

You can see an example in Persian in this link

Technotronic · Accepted Answer · 2015-09-21 11:38:41Z

11

I had a problem some invisible characters were corrupting my JSON and causing Unexpected Token ILLEGAL exception which was crashing my site.

Here is my solution using RegExp variable:

    var re = new RegExp("\u2028|\u2029");
    var result = text.replace(re, '');

More about Javascript and zero width spaces you can find here: Zero Width Spaces

edited Sep 21, 2015 at 11:38

answered Oct 19, 2014 at 13:50

Technotronic

9,0034 gold badges43 silver badges57 bronze badges

2 Comments

Jack G Over a year ago

The or symbol would probably be slower (in IE) because it is optimised for multi-character matches. But, with google's V8, who knows, it probably runs just as fast.

Eric Leschinski Over a year ago

These invisible zero-width unicode characters can be used to hide metadata credentials for those users who dare Copy and Paste through a browser to another editor that knows to receive the message and convert the zero width metadata back to the absence of characters. So what happens is you copy and paste the word "hi" and what gets transmitted is the h, then the string of metadata credentials, then the i. But the source and destination just show the word hi. It's going to be a struggle to keep these zero width barbarians and their persian messengers out at the spartan moat. Sad!

Tarek Salah uddin Mahmud · Accepted Answer · 2016-07-26 12:50:14Z

6

str.replace(/\u200B/g,'');

200B is the hexadecimal of the zero width space 8203. replace this with empty string to remove this

answered Jul 26, 2016 at 12:50

Tarek Salah uddin Mahmud

9448 silver badges18 bronze badges

Comments

Yvonne Aburrow · Accepted Answer · 2022-01-06 22:03:16Z

If you are trying to do this in JavaScript, try this regex.

/([\u200B]+|[\u200C]+|[\u200D]+|[\u200E]+|[\u200F]+|[\uFEFF]+)/g

submit.onclick = evt => {
  const stringToTrim = stringValue.value;
  zeroWidthTrim(stringToTrim);
}

/**
 * Given a string, when it has zero-width spaces in it, then remove them
 *
 * @param {String} stringToTrim The string to be trimmed of unicode spaces
 *
 * @return the trimmed string
 *
 * Regex for zero-width space Unicode characters.
 *
 * U+200B zero-width space.
 * U+200C zero-width non-joiner.
 * U+200D zero-width joiner.
 * U+200E left-to-right mark.
 * U+200F right-to-left mark.
 * U+FEFF zero-width non-breaking space.
 */
function zeroWidthTrim(stringToTrim) {
  const ZERO_WIDTH_SPACES_REGEX = /([\u200B]+|[\u200C]+|[\u200D]+|[\u200E]+|[\u200F]+|[\uFEFF]+)/g;
  console.log('stringToTrim = ' + stringToTrim);
  const trimmedString = stringToTrim.replace(ZERO_WIDTH_SPACES_REGEX, '');
  console.log('trimmedString = ' + trimmedString);
  return trimmedString;
};

<form runat="server">
  <input name="stringValue" id="stringValue" type="text" placeholder="enter your string" value="[&#x200b;&#x200c;]" />
  <input type="button" value="remove zero-width characters" id="submit" />
</form>

(Once you have run the above code snippet, paste the stringToTrim value and the trimmedString value into the regex101 test window and you will see that the Unicode character has gone from the trimmedString value.)

Florian Margaine · Accepted Answer · 2012-07-03 06:54:12Z

4

[].filter.call( str, function( c ) {
    return c.charCodeAt( 0 ) !== 8203;
} );

Filter each character to remove the 8203 char code (zero-width space unicode number).

answered Jul 3, 2012 at 6:54

Florian Margaine

61.2k15 gold badges94 silver badges120 bronze badges

1 Comment

Grant Humphries Over a year ago

This is a clever solution, using modern JavaScript it could be reduced to this one-liner: [].filter.call(strVal, c => c.charCodeAt() !== 8203).join('')

Collectives™ on Stack Overflow

Remove zero-width space characters from a JavaScript string

5 Answers 5

5 Comments

2 Comments

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

5 Comments

2 Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related