6

I'm in the process of making my PHP site Unicode-aware. I'm wondering if anyone has experience with the mbstring.func_overload setting, which replaces the normal string functions (e.g. strlen) with their multi-byte equivalents (mb_strlen). There aren't any comments on the PHP manual page.

Are there any potential problems that I should be aware of? Any cases where calling the multi-byte version is a bad idea?

I suppose one example would be functions that deal with encryption, since they may expect to deal with strings of bytes, rather than strings of characters.

Also, the manual page includes a note: "It is not recommended to use the function overloading option in the per-directory context, because it's not confirmed yet to be stable enough in a production environment and may lead to undefined behaviour."

Does that mean that it's not stable in a per-directory context, or it's generally not stable? The wording is unclear.

2 Answers 2

7

My answer is: definitely not!

The problem is that there is no easy way to "reset" str* functions once they are overloaded.

For some time this can work well with your project, but almost surely you will run into an external library that uses string functions to, for example, implement a binary protocol, and they will fail. They will fail and you will spend hours trying to find out why they are failing.

After you have found that it's mbstring.func_overload, you don't have too much option. You can ini_set the mbstring.internal_encoding to some one-byte-per-char encoding every time you call the external library and set it back right after, but if your library makes callbacks to your application, it will just mess up things.

Another option is to tweak the library manually, changing all str* functions to their mb_string counterpart and passing a one-byte-per-char as encoding parameter. This, however, isn't a great idea either, because you lose the ability to easily update your external, and you might cause some performance issues as well.

So, again, don't use func_overload. If you work with multi-byte strings, use the appropriate mb_ functions.

Sign up to request clarification or add additional context in comments.

1 Comment

mbstring.func_overload just bit me in a bad way, and I have to wonder how many of the currently unresolved issue I have received are due to this. I wrote a class to generate ePub files, and a companion class to handle Zip files. There were some reasons the build in Zip functions weren't useful. It took me this entire weekend looking, until the one reporting the bug mentioned they had set up their server to use utf-8. I didn't even know mbstring.func_overload existed, and now I'm in trouble, because setting mbstring to use ascii is not possible either, as I also use UTF-8 with mb_ functions.
4

one issue you should definitely watch for is 3rd party scripts (perhaps a library or pear extension) which uses non mb-aware versions of functions. for example, libraries that use strlen() could cause issues if you overload it.

as well, this bug report shows that the virtual host bleeding of mb_overloaded functions have been corrected in 5.2/5.3 CVS versions. the bug is specific to per-directory configurations.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.