1

Why is this preg_replace not working?

FYI, I have the PHP script set to UTF8 Without BOM and I have the function here set to remove all matches of the pattern (instead of what I will actually do, which is remove all non-matches) because that is easier for testing. Note also that the character is not in my regex, so this should be the only character left behind.

$string='The Story of Jewād';
echo preg_replace('@([!"#$&’\(\)\*\+,\-\./0123456789:;<=>\?ABCDEFGHIJKLMNOPQRSTUVWXYZ\[\\\]\^_‘abcdefghijklmnopqrstuvwxyz\{\|\}~¡¢£⁄¥ƒ§¤“«‹›fifl–†‡·¶•‚„”»…‰¿`´ˆ˜¯˘˙¨˚¸˝˛ˇ—ÆªŁØŒºæıłøœß÷¾¼¹×®Þ¦Ð½−çð±Çþ©¬²³™°µ ÁÂÄÀÅÃÉÊËÈÍÎÏÌÑÓÔÖÒÕŠÚÛÜÙÝŸŽáâäàåãéêëèíîïìñóôöòõšúûüùýÿž€\'])@u','',$string);

The result I get is $string unchanged. Why would this be?

4
  • 1
    Try with \pL+ instead of relisting accentuated letters individually. Commented Mar 16, 2013 at 15:54
  • 1
    might it not be easier to do a regex that matches the characters you do want to allow, rather than listing all those non-allowed characters. Also, for digits, you can use \d and for contiguous ranges, you can use things like A-Z. That will make the expression shorter and easier to manage. Commented Mar 16, 2013 at 15:56
  • @Spudley, yes that is what I am doing. The above example is inversed for easy testing. Commented Mar 16, 2013 at 16:09
  • @mario, I can't use \pL+ because this list is specific. It is all the characters I can use in a specific font I am using. Commented Mar 16, 2013 at 16:09

1 Answer 1

3

This works as reverse:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" >
<?php 

$string='The Story of Jewād';
echo preg_replace('@([ā])@','',$string);

?>

So, there is just a syntax problem somewhere ... This isn't a good idea to list all characters as a RegExp. You can do listings something like this:

ltrChars : 'A-Za-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02B8\u0300-\u0590\u0800-\u1FFF'+'\u2C00-\uFB1C\uFDFE-\uFE6F\uFEFD-\uFFFF';
rtlChars : '\u0591-\u07FF\uFB1D-\uFDFD\uFE70-\uFEFC';
Sign up to request clarification or add additional context in comments.

3 Comments

I need to list all the characters out specifically because these are all the characters I have in a font.
Well, at least I can see some ranges out there; like A-Z or 0-9
Your method here did not work exactly, but with a small change it did: @([^\x{0020}-\x{007E}\x{FB01}\x{FB02}\x{00A1}-\x{00AC}\x{00AE}-\x{00FF}\x{0160}\x{0161}\x{0192}\x{2013}\x{2018}-\x{201A}\x{2020}-\x{2022}\x{2026}\x{2030}\x{2039}\x{2044}\x{201C}-\x{201E}\x{203A}\x{02C6}\x{02D8}-\x{02DD}\x{02C7}\x{2014}\x{0141}\x{0142}\x{0131}\x{0152}\x{0153}\x{2212}\x{2122}\x{0178}\x{017D}\x{017E}\x{20AC}])@u

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.