I am parsing an HTML page. At some point I am getting the text between a div and using html_entity_decode to print that text.
The problem is that the page contains characters like this star ★ or others like shapes like ⬛︎, ◄, ◉, etc. I have checked and these characters are not encoded on the source page, they are like you see them normally.
The page is using charset="UTF-8"
So, when I use
html_entity_decode($string, ENT_QUOTES, 'UTF-8');
The star, for example, is "decoded" to â˜
$string is being obtained by using
document.getElementById("id-of-div").innerText
I would like to decode them correctly. How do I do that in PHP?
NOTE: I have tried htmlspecialchars_decode($string, ENT_QUOTES); and it produces the same result.
$stringcontain? 3. It seems like a character code issue to me.html_entity_decodeis purely about converting entities of the form&something;(including numeric values ofsomething) to "real" characters. What you have here looks like a UTF-8 string which you're then echoing in a non-UTF-8 context.