0

I need to rewrite this javascript regular expression in PHP for use with preg_replace:

var PATTERN = /([\ud800-\udbff])([\udc00-\udfff])/g;

If I use:

$strText = preg_replace("/([\ud800-\udbff])([\udc00-\udfff])/", "emoji", $strText);

I get:

Compilation failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 3

4
  • try replacing \ud800 with \x{d800} Commented Oct 15, 2014 at 18:33
  • 1
    If I use: preg_replace("/([\x{d800}-\x{dbff}])([\x{dc00}-\x{dfff}])/", "emoji", $strText); I get Compilation failed: character value in \x{...} sequence is too large at offset 9. Commented Oct 15, 2014 at 18:36
  • 2
    try adding a u at the end of your regex. ...)/u"... Commented Oct 15, 2014 at 18:50
  • As an aside, the capture groups are useless and the first range can be replaced with \p{Cs}: ~(*UTF8)\p{Cs}[\x{dc00}-\x{dfff}]~ Commented Oct 15, 2014 at 19:16

1 Answer 1

1

Try the following:

preg_replace("/([\x{d800}-\x{dbff}])([\x{dc00}-\x{dfff}])/u", "emoji", $strText);

PCRE doesn't support the \uXXXX format, so you can use \x{XXXX} instead. Also you'll need the u modifier (at end of regex) for dealing with UTF-8


Information on syntax from http://www.regular-expressions.info/unicode.html

Perl and PCRE do not support the \uFFFF syntax. They use \x{FFFF} instead.

Information on u modifier from http://php.net/manual/en/reference.pcre.pattern.modifiers.php

u (PCRE_UTF8) This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern and subject strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern and the subject is checked since PHP 4.3.5. An invalid subject will cause the preg_* function to match nothing; an invalid pattern will trigger an error of level E_WARNING. Five and six octet UTF-8 sequences are regarded as invalid since PHP 5.3.4 (resp. PCRE 7.3 2007-08-28); formerly those have been regarded as valid UTF-8.

Sign up to request clarification or add additional context in comments.

3 Comments

I get Compilation failed: disallowed Unicode code point (>= 0xd800 && <= 0xdfff) at offset 9 with this.
@OneCleverMonkey, I searched that error and found this... stackoverflow.com/questions/21666895/…, it seems weird, but can you try replacing the leading D's with E's?
Or convert to UTF8 (presumably from UTF16) before doing your regex, wisercoder.com/php-preg_match-failing-reason-blame-utf-16

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.