3

I have a string with Unicode and ASCII characters.

I can use utf8_decode to convert ASCII to Unicode characters, but it also converts unicode to unicode characters. How can I filter or convert only ASCII characters to Unicode in a mixed string?

For example:

utf8_decode(& #225; rỉ);
~> á rỉ

3 Answers 3

4

Two things. ASCII characters are 7-bit, 0x00 to 0x7F. So if you have a Unicode string, the ASCII characters don't need to be converted, because they are the same in Unicode...

Now, your á is 0xE1, thus it's not ASCII but ISO Latin 1. And you can't have two encodings in one string (or you're up shit creek....). So what you need is to convert from ISO Latin 1 to UTF-8.

Sign up to request clarification or add additional context in comments.

1 Comment

how to convert from ISO Latin 1 to UTF-8?
1

á is not an ASCII character. ASCII charset table

You can also try this.

echo mb_convert_encoding('á rỉ', "UTF-8", "UTF-8");

Comments

-1

you can use $string = iconv('ASCII//TRANSLIT','UTF-8', $string);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.