PHP Unicode to UTF-8 code

Question

I'm trying to get the UTF-8 bytes (in decimal) of a unicode string. For instance:

function unicode_to_utf8_bytes($string) {

}

$text = 'Hello 😀';
$result = unicode_to_utf8_bytes($text);

var_dump($result);

array(10) {
  [0]=>
  int(72)
  [1]=>
  int(101)
  [2]=>
  int(108)
  [3]=>
  int(108)
  [4]=>
  int(111)
  [5]=>
  int(32)
  [6]=>
  int(240)
  [7]=>
  int(159)
  [8]=>
  int(152)
  [9]=>
  int(128)
}

An example of the result can be seen here:

http://apps.timwhitlock.info/unicode/inspect?s=Hello+%F0%9F%98%80

I feel I'm close, this is what I managed to get:

function utf8_char_code_at($str, $index) {

    $char = mb_substr($str, $index, 1, 'UTF-8');

    if (mb_check_encoding($char, 'UTF-8')) {
        $ret = mb_convert_encoding($char, 'UTF-32BE', 'UTF-8');
        return hexdec(bin2hex($ret));
    }
    else
        return null;

}

function unicode_to_utf8_bytes($str) { 

    $result = array();

    for ($i=0; $i<mb_strlen($str, '8bit'); $i++)
        $result[] = utf8_char_code_at($str, $i);

    return $result;

}

$string = 'Hello 😀';

var_dump(unicode_to_utf8_bytes($string));

array(10) {
  [0]=>
  int(72)
  [1]=>
  int(101)
  [2]=>
  int(108)
  [3]=>
  int(108)
  [4]=>
  int(111)
  [5]=>
  int(32)
  [6]=>
  int(128512)
  [7]=>
  int(0)
  [8]=>
  int(0)
  [9]=>
  int(0)
}

Any help will be much appreciated!

Sorry, but it is unclear what you are actually trying to do... UTF-8 is one possible representation of unicode characters, others do exist. Therefore a "conversion from unicode to UTF-8" does not really make sense. So what do you actually mean when you say "unicode"? What do you mean by "UTF-8 bytes"? — arkascha
– arkascha, Commented Jan 3, 2016 at 18:59
This may be of help Just call that function in the answer on all characters in your string and it should work. — segFault
– segFault, Commented Jan 3, 2016 at 19:03

Community · Accepted Answer · 2017-05-23 10:28:21Z

0

This gets the results you were looking for:

$string = 'Hello 😀';
var_export(ascii_to_dec($string));

function ascii_to_dec($str)
{
  for ($i = 0, $j = strlen($str); $i < $j; $i++) {
    $dec_array[] = ord($str{$i});
  }
  return $dec_array;
}

Results:

array (
  0 => 72,
  1 => 101,
  2 => 108,
  3 => 108,
  4 => 111,
  5 => 32,
  6 => 240,
  7 => 159,
  8 => 152,
  9 => 128,
)

Source

edited May 23, 2017 at 10:28

CommunityBot

11 silver badge

answered Jan 3, 2016 at 19:08

segFault

4,0541 gold badge21 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

roeland Over a year ago

You should add a bit of explanation to this one. I think assuming your source file is encoded as UTF-8, the string already contains an UTF-8 encoded string. That function in this context would more accurately be named bytes_to_dec.

Collectives™ on Stack Overflow

PHP Unicode to UTF-8 code

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related