0

I have this PHP regex for remove utf8 special characters from strings:

[\x00-\x1F]|\xC2[\x80-\x9F]|\xE2[\x80-\x8F]{2}|\xE2\x80[\xA4-\xA8]|\xE2\x81[\x9F-\xAF]

I need to convert it to Javascript regex. I tryed this code:

str = str.replace(/[\x00-\x1F]|\xC2[\x80-\x9F]|\xE2[\x80-\x8F]{2}|\xE2\x80[\xA4-\xA8]|\xE2\x81[\x9F-\xAF]/g, '');

But it does nothing.

I need your help. Thank you.

2 Answers 2

3

Simple mistake, big effect:

strTest = strTest.replace(/your regex here/g, "$1");
// ----------------------------------------^

without the "global" flag, the replace occurs for the first match only.

Side note: To remove any character that does not fulfill some kind of complex condition, like falling into a set of certain Unicode character ranges, you can use negative lookahead:

var regex = /(?![\x00-\x7F]|[\xC0-\xDF][\x80-\xBF]|[\xE0-\xEF][\x80-\xBF]{2}|[\xF0-\xF7][\x80-\xBF]{3})./g;
strTest = strTest.replace(regex, "")

where regex reads as

(?!      # negative look-ahead: a position *not followed by*:
  […]    #   any allowed character range from above
)        # end lookahead
.        # match this character (only if previous condition is met!)
Sign up to request clarification or add additional context in comments.

Comments

0

Try this :

str = str.replace(/[\x00-\x1F]|\xC2[\x80-\x9F]|\xE2[\x80-\x8F]{2}|\xE2\x80[\xA4-\xA8]|\xE2\x81[\x9F-\xAF]/gi, '');

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.