13

I'm trying to decode encoded long dash from numeric entity to string, but it seems that I can't find a function which can do this properly.

The best that I found is mb_decode_numericentity(), however, for some reason it fails to decode long dash and some other special characters.

$str = '–';

$str = mb_decode_numericentity($str, array(0xFF, 0x2FFFF, 0, 0xFFFF), 'ISO-8859-1');

This will return "?".

Anyone knows how to solve this problem?

3
  • 3
    Is long dash present in the ISO-8859-1? Commented May 4, 2010 at 11:30
  • 1
    @ColShrapnel: Indeed not. It's present in Windows cp1252, which is similar, but not ISO-8859-1. Better: use UTF-8. Commented May 4, 2010 at 11:42
  • 1
    Definitely, there is no long dash in ISO/IEC 8859-1 (Latin-1). Actually, this is a unicode character, and using UTF-8 helped. It was my mistake that I forgot to change encoding in the browser. Thanks everyone! Commented May 4, 2010 at 12:21

2 Answers 2

19

The following code snippet (mostly stolen from here and improved) will work for literal, numeric decimal, and numeric hexa-decimal entities:

header("content-type: text/html; charset=utf-8");

/**
* Decodes all HTML entities, including numeric and hexadecimal ones.
* 
* @param mixed $string
* @return string decoded HTML
*/

function html_entity_decode_numeric($string, $quote_style = ENT_COMPAT, $charset = "utf-8")
{
$string = html_entity_decode($string, $quote_style, $charset);
$string = preg_replace_callback('~&#x([0-9a-fA-F]+);~i', "chr_utf8_callback", $string);
$string = preg_replace('~&#([0-9]+);~e', 'chr_utf8("\\1")', $string);
return $string; 
}

/** 
 * Callback helper 
 */

function chr_utf8_callback($matches)
 { 
  return chr_utf8(hexdec($matches[1])); 
 }

/**
* Multi-byte chr(): Will turn a numeric argument into a UTF-8 string.
* 
* @param mixed $num
* @return string
*/

function chr_utf8($num)
{
if ($num < 128) return chr($num);
if ($num < 2048) return chr(($num >> 6) + 192) . chr(($num & 63) + 128);
if ($num < 65536) return chr(($num >> 12) + 224) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);
if ($num < 2097152) return chr(($num >> 18) + 240) . chr((($num >> 12) & 63) + 128) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);
return '';
}


$string ="&#x201D;"; 

echo html_entity_decode_numeric($string);

Improvement suggestions are welcome.

Sign up to request clarification or add additional context in comments.

3 Comments

Though &apos; is not a valid html entity reference, it is not rare to "spill over" from XML documents. Add the following to be completely water-proof: $string = str_ireplace('&apos;', "'", $string);
Another improvement: This code has a terrible memory leak. Each time this is called a new lambda function created with create_function() get stuck in memory. Yes, the manual on preg_replace_callback() suggests that the lambda function is a "great idea" to make the code look cleaner. But it is wrong. There is nothing wrong with creating a simple real function function chr_utf8_callback($matches) { return chr_utf8(hexdec($matches[1])); } and using this instead $string = preg_replace_callback('~&#x([0-9a-fA-F]+);~i', chr_utf8_callback, $string); Memory leak gone.
Please revisit preg_replace() and case-insensitive pattern modifiers.
1

mb_decode_numericentity does not handle hexadecimal, only decimal. Do you get the expected result with:

$str = '–';

$str = mb_decode_numericentity ( $str , Array(255, 3145727, 0, 65535) , 'ISO-8859-1');

You can use hexdec to convert your hexadecimal to decimal.

Also, out of curiosity, does the following work:

$str = '&#8211;';

 $str = html_entity_decode($str);

3 Comments

Thanks for a quick reply, but this returns '?' as well.
>$str = html_entity_decode($str); That was the first thing I tried. No.
@Yuriy please disprove or confirm your comments on this answer after you wrote the comment to this question about your mistake. I think html_entity_decode() is the simplest correct solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.