1

Suppose to have string like this one:

Çë½ÌΪʲôÎÒÒ»½øÐв鶾ʱ¾Í·¢ÏÖϵͳÅÌ¿ÉÓÃ¿Õ ¼ä¾Í¼±¾ç¼õÉÙ£

They have been wrongly encoded. How do you think it's possible to know if in fact, it's wrongly encoded? An example of right encoded string would be

Ciao mamm@ guardà come mi divertò

I tough that there are 2 mayor differences among the 2:

  • Numer of whitespaces/string_lenght
  • Number of vocals(aeiou)/string_lenght

Than the code would be something like

if({Numer of whitespaces/string_lenght} < 0.05 (1 every 20 other characters)}
     OR {Number of vocals(aeiou)/string_lenght} < 0.2 (1 every 5 other characters)})
  return WRONG
else
  return OK

Do you have any better idea? Maybe there is some php function already tested that fits my case? Thanks!

3
  • possible duplicate of How to check the charset of string? Commented Dec 29, 2011 at 14:12
  • No, it's a different question ;) Commented Dec 29, 2011 at 14:22
  • What, exactly, counts as "wrongly encoded"? Do you mean that character data is reported (e.g. in an HTTP header, or the character set of a database column) to have one encoding, but should have another? Is the sample string supposed to be the latin1 code points of the characters shown, or the UTF-8 code points that actually appear in the page? Commented Dec 29, 2011 at 16:23

1 Answer 1

1

If you know what the encoding should be, use mb_check_encoding. If you don't know what the encoding should be, try mb_detect_encoding, which returns FALSE if no valid encoding is found.

Sign up to request clarification or add additional context in comments.

2 Comments

No, that won't work. The string in this case are valid latin1, but they are meaningless. Also mb_check_encoding don't check if every character is valid, it only checks if the stream is valid ;)
Then you'd need to base it on the frequency of characters in the expected language and or use pspell

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.