2

I have PHP configured with mbstring.func_overload = 7, so all the single-byte-string functions are mapped to their multi-byte equivalents. But I still sometimes need to treat strings as byte arrays; for example, when calculating their size or doing encryption.

What's the best approach here? Can I just use the multi-byte functions and pass them a single-byte encoding, even though that's not actually how the string is encoded? For example:

mb_substr($utf8str, 0, 1, "latin1");
mb_strlen($utf8str, "latin1");

EDIT: I noticed when looking through PHP's source that they rename the original functions to mb_orig_X, as in mb_orig_strlen. Probably not safe to use, as they're not documented, but interesting.

1 Answer 1

2

I think you shouldn't be overriding these functions if you need to use the original ones (i.e., if you really need to operate on binary strings), it is quite a dirty solution. This forces you to make an even dirtier workaround for that choice you made earlier. And it possibly breaks libraries you are using without you being aware of that (but the PHP team keeps inventing more and more stupid features like that).

But if you must keep it that way, you should:

  1. use a language-neutral encoding like ASCII (not for the interpreter, but for those reading your code - even if that's you in 2 years.) and
  2. document why you did that thoroughly, since it will be very confusing for everyone looking into that piece of code.
Sign up to request clarification or add additional context in comments.

5 Comments

I don't think it's a dirty solution. Sometimes you just need to work with binary data. But I agree you have to be careful with it (see stackoverflow.com/questions/1647419/…). Also, an even better choice for the encoding name to use would be binary or 8bit.
Overriding the behaviour of a well-documented function is always a bad idea. Think of it this way: the function is lying to you, i.e. it does not do, what it promises to do. Or here is another one: What would happen if your arrays would stop storing NULL values, silently ignoring them without even generating a key in the array? All by the configuration value array.store_null_values = false (I hope noone on the PHP team is reading this, I'm probably giving them bad ideas.)
Is binary a real encoding? I don't see it listed on php.net/manual/en/mbstring.supported-encodings.php, but it seems to work. Do you know what the differences are between binary, 8bit, and ascii?
Looked through the source. binary and 8bit seem to be the same. 7bit includes only 7-bit characters (of course), and ascii includes 0x20-0x80, plus 0, 0x09, 0x0a, and 0x0d.
binary is an alias of 8bit (bugs.php.net/bug.php?id=26699). All those differences shouldn't matter for just getting the string length in any case, except for readability, like soulmerge said.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.