0

Some time ago I asked how to properly convert HTML encodings and I had an answer. I'm using this answer since that, but today I noticed a strange behavior of the mb_convert_encoding. Depending on the place where you are running, it outputs differently:

Browser (Tested in Chrome and Firefox):
W/o mb: 42Â sp
mb: 42 sp

PHP CLI:
W/o mb: 31 sp
mb: 31�sp

Using the browser, the mb_convert_encoding outputs correctly but, on the other hand, using the command line, without mb_convert_encoding is the right output. Am I missing something? Thanks in advance!

The code

$str_html = $this->curlHelper->file_get_contents_curl($page);

$dom = new DOMDocument();
$dom->resolveExternals = true;
$dom->substituteEntities = false;
@$dom->loadHTML($str_html);

$xpath = new DomXpath($dom);

(...)

foreach ($table_lines as $line) {
    $tds = $line->childNodes;

    $sp = mb_convert_encoding($tds->item(0)->nodeValue,
            'ISO-8859-1', 'UTF-8');

    echo "W/o mb: " . $tds->item(0)->nodeValue . PHP_EOL;
    echo "mb: " . $sp . PHP_EOL;
}
4
  • did u try utf8_decode ? Commented Nov 17, 2013 at 15:25
  • @Vishnu Nope, but I just did. It outputs like the mb, in both cases. Commented Nov 17, 2013 at 15:31
  • 1
    Start reading here: kunststube.net/encoding Commented Nov 17, 2013 at 15:39
  • Hi @deceze, congratulations it's an awesome article. Let's see if I understood after reading your article. The website is ISO-8859-1 and, when I call file_get_contents_post, the result is UTF-8. So, I correctly call mb_convert_encoding to interpret it as ISO-8859-1. That said, I assume that the browser interprets correctly the ISO-8859-1 string but the CLI doesn't (it interprets as UTF-8). Right? Commented Nov 17, 2013 at 16:42

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.