1

In PHP, I want to convert a string which contains non-ASCII characters into a sequence of hexadecimal numbers which represents the UTF-8 encoding of these characters. For instance, given this:

$text = 'ąćę';

I need to produce this:

C4=84=C4=87=C4=99

How do I do that?

4
  • 1
    What are those numbers exactly? What are you ultimately doing with them? You could use json_encode on them but you won't get the values you mentioned. Commented Mar 6, 2015 at 21:41
  • I took out some irrelevant text, clarified your ultimate goal as I understand it, and made it clear that the string involved is just an example. Now, we need some more information from you in order to answer the question. 1: Is this quoted-printable encoding you're going for? 2: There's supposed to be an equals sign before the first C4, yes? 3: What should happen to ASCII characters? (e.g. if the string was 'ącę' instead, should that come out =C4=84=63=C4=99, or =C4=84c=C4=99?) Commented Mar 7, 2015 at 2:25
  • @mkaatman I didn't check, but I am 99% sure that C4 84 C4 87 C4 99 is the hexadecimal representation of each byte in the UTF-8 encoding of the character sequence ąćę (that is, U+0105 U+0107 U+0119). And the =XX notation looks suspiciously like MIME quoted-printable encoding to me. Commented Mar 7, 2015 at 2:27
  • 1
    I'm thinking that the 84 should actually be 85; you can URL encode the text to check quickly. Commented Mar 7, 2015 at 3:09

1 Answer 1

2

As your question is written, and assuming that your text is properly UTF-8 encoded to start with, this should work:

$text = 'ąćę';
$result = implode('=', str_split(strtoupper(bin2hex($text)), 2));

If your text is not UTF-8, but some other encoding, then you can use

$utf8 = mb_convert_encoding($text, 'UTF-8', $yourEncoding);

to get it into UTF-8, where $yourEncoding is some other character encoding like 'ISO-8859-1'.

This works because in PHP, strings are just arrays of bytes. So as long as your text is encoded properly to start with, you don't have to do anything special to treat it as bytes. In fact, this code will work for any character encoding you want without modification.

Now, if you want to do quoted-printable, then that's another story. You could try using the function quoted_printable_encode (requires PHP 5.3 or higher).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.