Some time ago I asked how to properly convert HTML encodings and I had an answer. I'm using this answer since that, but today I noticed a strange behavior of the mb_convert_encoding. Depending on the place where you are running, it outputs differently:
Browser (Tested in Chrome and Firefox):
W/o mb: 42Â sp
mb: 42 sp
PHP CLI:
W/o mb: 31 sp
mb: 31�sp
Using the browser, the mb_convert_encoding outputs correctly but, on the other hand, using the command line, without mb_convert_encoding is the right output. Am I missing something? Thanks in advance!
The code
$str_html = $this->curlHelper->file_get_contents_curl($page);
$dom = new DOMDocument();
$dom->resolveExternals = true;
$dom->substituteEntities = false;
@$dom->loadHTML($str_html);
$xpath = new DomXpath($dom);
(...)
foreach ($table_lines as $line) {
$tds = $line->childNodes;
$sp = mb_convert_encoding($tds->item(0)->nodeValue,
'ISO-8859-1', 'UTF-8');
echo "W/o mb: " . $tds->item(0)->nodeValue . PHP_EOL;
echo "mb: " . $sp . PHP_EOL;
}
ISO-8859-1and, when I callfile_get_contents_post, the result isUTF-8. So, I correctly callmb_convert_encodingto interpret it asISO-8859-1. That said, I assume that the browser interprets correctly theISO-8859-1string but the CLI doesn't (it interprets as UTF-8). Right?