1

I am parsing an HTML page. At some point I am getting the text between a div and using html_entity_decode to print that text.

The problem is that the page contains characters like this star or others like shapes like ⬛︎, ◄, ◉, etc. I have checked and these characters are not encoded on the source page, they are like you see them normally.

The page is using charset="UTF-8"

So, when I use

html_entity_decode($string, ENT_QUOTES, 'UTF-8');

The star, for example, is "decoded" to â˜

$string is being obtained by using

document.getElementById("id-of-div").innerText

I would like to decode them correctly. How do I do that in PHP?

NOTE: I have tried htmlspecialchars_decode($string, ENT_QUOTES); and it produces the same result.

9
  • 1. Does the star have an equivalent HTML entity? 2. So, what does $string contain? 3. It seems like a character code issue to me. Commented Jan 5, 2014 at 21:32
  • 1. I don't have a clue. 2. in theory all the string contained in a specific div. 3. I am not sure. Commented Jan 5, 2014 at 21:34
  • 1
    "I have checked and these characters are not encoded on the source page ... I would like to decode them correctly." If they're not encoded, how exactly do you expect to decode them? html_entity_decode is purely about converting entities of the form &something; (including numeric values of something) to "real" characters. What you have here looks like a UTF-8 string which you're then echoing in a non-UTF-8 context. Commented Jan 5, 2014 at 21:36
  • 2
    Indeed. Part of the question is really, why are you trying to do this? If you've got some UTF-8 characters you want to print out, why are you doing html_entity_decode at all? Why not just, er, print them out? And can we see an example of the source document and your actual code? Commented Jan 5, 2014 at 21:39
  • 1
    I've just tested html_entity_decode on the characters in your question, and, as expected, it leaves them untouched. How are you creating your output, and how are you looking at it? My guess: html_entity_decode is a red herring, and you're actually outputting untouched UTF-8 characters, but your character encoding is wrong, so they get mangled on display. Commented Jan 5, 2014 at 21:48

1 Answer 1

5

I've tried to reproduce your issue with this simple bit of PHP:

<?php
  // Make sure our client knows we're sending UTF-8
  header('Content-Type: text/plain; charset=utf-8');
  $string = "The page contains characters like this star ★ or others like shapes like ⬛︎, ◄, ◉, etc. Here are some entities: This is a &quot;test&quot;.";
  echo 'String: ' . $string . "\n";
  echo 'Decoded: ' . html_entity_decode($string, ENT_QUOTES, 'UTF-8');

As expected, the output is:

String: The page contains characters like this star ★ or others like shapes like ⬛︎, ◄, ◉, etc. Here are some entities: This is a &quot;test&quot;.
Decoded: The page contains characters like this star ★ or others like shapes like ⬛︎, ◄, ◉, etc. Here are some entities: This is a "test".

If I change the charset in the header to iso-8859-1, I see this:

String: The page contains characters like this star ★ or others like shapes like ⬛︎, ◄, ◉, etc. Here are some entities: <span>This is a &quot;test&quot;.
Decoded: The page contains characters like this star ★ or others like shapes like ⬛︎, ◄, ◉, etc. Here are some entities: <span>This is a "test".

So, I'd say that your issue is a display issue. The "interesting" characters are being left completely untouched by html_entity_decode, as you'd expect. It's just that whatever code you've got, or whatever you're using to look at your output, is using incorrectly using iso-8859-1 to display them.

Sign up to request clarification or add additional context in comments.

1 Comment

you are right! T H A N K S! I forgot to add header('Content-Type: text/html; charset=utf-8'); to the beginning of the code, so it would force UTF-8 to the output. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.