3

I want to encode normal characters to html-entities like

a => a
A => A
b => b
B => B

but

echo htmlentities("a");

doesn't work. It outputs the normal charaters (a A b B) in the html source code instead of the html-entities.

How can I convert them?

1 Answer 1

2

You can build a function for this fairly easily using mb_ord or IntlChar::ord, either of which will give you the numeric value for a Unicode Code Point.

You can then convert that to a hexadecimal string using base_convert, and add the '&#x' and ';' around it to give an HTML entity:

function make_entity(string $char) {
    $codePoint = mb_ord($char, 'UTF-8'); // or IntlChar::ord($char); 
    $hex = base_convert($codePoint, 10, 16);
    return '&#x' . $hex . ';';
}
echo make_entity('a');
echo make_entity('€');
echo make_entity('🐘');

You then need to run that for each code point in your UTF-8 string. It is not enough to loop over the string using something like substr, because PHP's string functions work with individual bytes, and each UTF-8 code point may be multiple bytes.

One approach would be to use a regular expression replacement with a pattern of /./u:

  • The . matches each single "character"
  • The /u modifier turns on Unicode mode, so that each "character" matched by the . is a whole code point

You can then run the above make_entity function for each match (i.e. each code point) with preg_replace_callback.


Since preg_replace_callback will pass your callback an array of matches, not just a string, you can make an arrow function which takes the array and passes element 0 to the real function:

$callback = fn($matches) => make_entity($matches[0]);

So putting it together, you have this:

echo preg_replace_callback('/./u', fn($m) => make_entity($m[0]), 'a€🐘');

Arrow functions were introduced in PHP 7.4, so if you're stuck on an older version, you can write the same thing as a regular anonymous function:

echo preg_replace_callback('/./u', function($m) { return make_entity($m[0]) }, 'a€🐘');

Or of course, just a regular named function (or a method on a class or object; see the "callable" page in the manual for the different syntax options):

function make_entity_from_array_item(array $matches) {
    return make_entity($matches[0]);
}
echo preg_replace_callback('/./u', 'make_entity_from_array_item', 'a€🐘');
Sign up to request clarification or add additional context in comments.

4 Comments

I still don't understand the last line. It breaks for $char = "äöüß", if I call the function via: for ($i = 0; $i < strlen($char); $i++) { $result .= self::make_entity(substr($char, $i, 1)); }
@User123456789 Yes, substr() works with a single byte of the string, it doesn't know about UTF-8, where a single code point takes multiple bytes to represent; that's why I suggested preg_replace_callback. See updated answer.
What does "fn($m) => make_entity($m[0])" mean? If I copy the code, it says "syntax error, unexpected '=>' (T_DOUBLE_ARROW)". I also tried "self::make_entity($m[0])" since I copied it into a class and thus changed the function to static. But then it says "undefined variable m". The example in the php documentation says e.g.: "next_year" as function without fn($m) =>
@User123456789 Ah, that's just an "arrow function", a concise way to write simple anonymous functions introduced in PHP 7.4. If you're stuck on an older version, you can just use a different syntax; I've added some more explanation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.