PHP HTML encoding

Question

I'm trying to parse a HTML page, but the encoding is messing my results. After some research I found a very popular solution using utf8_encode() and utf8_decode(), but it doesn't change anything. In the following lines, you can check my code and the output.

Code

$str_html = $this->curlHelper->file_get_contents_curl($page);
$str_html = utf8_encode($str_html);

$dom = new DOMDocument();
$dom->resolveExternals = true;
$dom->substituteEntities = false;
@$dom->loadHTML($str_html);
$xpath = new DomXpath($dom);

(...)
$profile = array();
for ($index = 0; $index < $table_lines->length; $index++) {
    $desc = utf8_decode($table_lines->item($index)->firstChild->nodeValue);
}

Output

Testar Ã© bom

Should be

Testar é bom

What I've tried

htmlentities():

htmlentities($table_lines->item($index)->lastChild->nodeValue, ENT_NOQUOTES, ini_get('ISO-8859-1'), false);
htmlspecialchars():

htmlspecialchars($table_lines->item($index)->lastChild->nodeValue, ENT_NOQUOTES, 'ISO- 8859-1', false);
Change my file's charset as decribed here.

Some more information

Website encoding: <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />

Thanks in advance!

Refer this link as it would give you more options to handle such characters php.net/manual/en/function.utf8-encode.php — Nitesh morajkar
– Nitesh morajkar, Commented Oct 12, 2013 at 14:18

Cobra_Fast · Accepted Answer · 2013-10-12 14:11:14Z

3

Try using the following without a prior utf8_decode():

mb_convert_encoding($str, 'ISO-8859-1', 'UTF-8');

Alternatively, don't use utf8_decode() and try to change your website meta to:

<meta http-equiv="content-type" content="text/html; charset=UTF-8" />

mb_convert_encoding()

answered Oct 12, 2013 at 14:11

Cobra_Fast

16.2k8 gold badges64 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Doon Over a year ago

Thanks, it worked! Just to understand... since my HTML is ISO-8859-1, why is it passed as $to_encoding and UTF-8 as $from_encoding?

Cobra_Fast Over a year ago

@Doon Because the string you're trying to print is in UTF-8 encoding but needs to be in ISO-8859-1 encoding to be properly printed on your ISO-8859-1 encoded website. So naturally, you need to convert from UTF-8 to ISO-8859-1 encoding.

Collectives™ on Stack Overflow

PHP HTML encoding

Code

Output

What I've tried

Some more information

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Code

Output

What I've tried

Some more information

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related