1

For example, trim() does not remove U+3000, the space character used in Chinese. It would be cumbersome to change every instance of trim() to include U+3000. Is it possible to modify trim()'s default parameter?

Also, PHP's regex's \s doesn't match U+3000 either. Is it possible to somehow make \s match U+3000?

4
  • 4
    “Is it possible to modify trim()'s default parameter?” – only if you change the C source code, and compile your own PHP. Commented Jun 9, 2014 at 18:16
  • 1
    Create a myTrim() function that calls trim() with the additional arguments that you need, then use that instead Commented Jun 9, 2014 at 18:18
  • why not use str_replace? Commented Jun 9, 2014 at 18:22
  • OP is asking for trim(), not \trim() which makes it very possible. Commented Jun 9, 2014 at 18:34

3 Answers 3

3

Unfortunatly trim() is not part of mbstring's function set (mb_*). Otherwise you could simply enable mbstring's Function Overloading Feature.

But thanks to PHP's namespace fallback policy it is possible:

For functions and constants, PHP will fall back to global functions or constants if a namespaced function or constant does not exist.

I.e. you can override trim()(not \trim()). You have to use namespaces and call trim without explicitly prefixing the global namespace (i.e. no \ prefix).

namespace myns;

function trim($str, $charlist="  ") {
    $pregCharacters = preg_quote($charlist);
    return preg_replace("/^[$pregCharacters]+|[$pregCharacters]+$/", '', $str);
}

var_dump(trim(" a b c "));

Didn't think too much about that RegExp. It should just illustrate overriding of trim().

AFAIK the only thing you have to take care of is that the definition of \myns\trim() should happen before your first trim() call. This is a very attractive technique for mocking time() in unit tests.


Regarding your second question, \s would match U+3000 if you turn on the u-switch (PCRE_UTF8):

var_dump(preg_match("/\s/u", " "));
Sign up to request clarification or add additional context in comments.

Comments

0

No, it isn't possible to modify the internal workings of trim() function without modifying the C source code. However, you could create a new function, say customTrim() and then write code that removes all the characters you want removed. This will only be possible if you know beforehand what are the possible whitespace characters that would occur in these strings.

If you need to do this with preg_replace(), you can use the following:

$str = preg_replace('/^[\pZ\pC]+|[\pZ\pC]+$/u', '', $str);

The regex is from this blog entry. It will remove all whitespace characters (including the ones that \s matches), control characters. It will also remove the Unicode character 'IDEOGRAPHIC SPACE' (U+3000).

Test case:

$str = ' ';
$str = preg_replace('/^[\pZ\pC]+|[\pZ\pC]+$/u', '', $str);
var_dump($str, mb_strlen($str));

Output:

string(0) ""
int(0)

Comments

-2

I think you cannot overload functions in PHP (but long time no PHP). Instead you could write your own function first calling trim if necessary. Afterwards take a look at the str_replace() function; you might be able to "replace" the Chinese Unicode space character by "an empty character" (i.e. ''). How to write that in your code seems to depend on your character encoding, see also Replace unicode character

1 Comment

str_replace() will remove all the occurrences of the search character(s) whereas the OP needs to remove them only from the beginning and/or end. They're not equivalent.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.