0

I'm trying to parse a HTML page, but the encoding is messing my results. After some research I found a very popular solution using utf8_encode() and utf8_decode(), but it doesn't change anything. In the following lines, you can check my code and the output.

Code

$str_html = $this->curlHelper->file_get_contents_curl($page);
$str_html = utf8_encode($str_html);

$dom = new DOMDocument();
$dom->resolveExternals = true;
$dom->substituteEntities = false;
@$dom->loadHTML($str_html);
$xpath = new DomXpath($dom);

(...)
$profile = array();
for ($index = 0; $index < $table_lines->length; $index++) {
    $desc = utf8_decode($table_lines->item($index)->firstChild->nodeValue);
}

Output

Testar é bom

Should be

Testar é bom

What I've tried

  • htmlentities():

    htmlentities($table_lines->item($index)->lastChild->nodeValue, ENT_NOQUOTES, ini_get('ISO-8859-1'), false);

  • htmlspecialchars():

    htmlspecialchars($table_lines->item($index)->lastChild->nodeValue, ENT_NOQUOTES, 'ISO- 8859-1', false);

  • Change my file's charset as decribed here.

Some more information

  • Website encoding: <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />

Thanks in advance!

1

1 Answer 1

3

Try using the following without a prior utf8_decode():

mb_convert_encoding($str, 'ISO-8859-1', 'UTF-8');

Alternatively, don't use utf8_decode() and try to change your website meta to:

<meta http-equiv="content-type" content="text/html; charset=UTF-8" />

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, it worked! Just to understand... since my HTML is ISO-8859-1, why is it passed as $to_encoding and UTF-8 as $from_encoding?
@Doon Because the string you're trying to print is in UTF-8 encoding but needs to be in ISO-8859-1 encoding to be properly printed on your ISO-8859-1 encoded website. So naturally, you need to convert from UTF-8 to ISO-8859-1 encoding.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.