I'm trying to get the UTF-8 bytes (in decimal) of a unicode string. For instance:
function unicode_to_utf8_bytes($string) {
}
$text = 'Hello 😀';
$result = unicode_to_utf8_bytes($text);
var_dump($result);
array(10) {
[0]=>
int(72)
[1]=>
int(101)
[2]=>
int(108)
[3]=>
int(108)
[4]=>
int(111)
[5]=>
int(32)
[6]=>
int(240)
[7]=>
int(159)
[8]=>
int(152)
[9]=>
int(128)
}
An example of the result can be seen here:
http://apps.timwhitlock.info/unicode/inspect?s=Hello+%F0%9F%98%80
I feel I'm close, this is what I managed to get:
function utf8_char_code_at($str, $index) {
$char = mb_substr($str, $index, 1, 'UTF-8');
if (mb_check_encoding($char, 'UTF-8')) {
$ret = mb_convert_encoding($char, 'UTF-32BE', 'UTF-8');
return hexdec(bin2hex($ret));
}
else
return null;
}
function unicode_to_utf8_bytes($str) {
$result = array();
for ($i=0; $i<mb_strlen($str, '8bit'); $i++)
$result[] = utf8_char_code_at($str, $i);
return $result;
}
$string = 'Hello 😀';
var_dump(unicode_to_utf8_bytes($string));
array(10) {
[0]=>
int(72)
[1]=>
int(101)
[2]=>
int(108)
[3]=>
int(108)
[4]=>
int(111)
[5]=>
int(32)
[6]=>
int(128512)
[7]=>
int(0)
[8]=>
int(0)
[9]=>
int(0)
}
Any help will be much appreciated!
UTF-8is one possible representation of unicode characters, others do exist. Therefore a "conversion from unicode to UTF-8" does not really make sense. So what do you actually mean when you say "unicode"? What do you mean by "UTF-8 bytes"?