htmlentities() double encoding entities in string

Question

I want only the unencoded characters to get converted to html entities, without affecting the entities which are already present. I have a string that has previously encoded entities, e.g.:

gaIUSHIUGhj>&hyphen; hjb&times;jkn.jhuh>hh> &hellip;

When I use htmlentities(), the & at the beginning of entities gets encoded again. This means &hyphen; and other entities have their & encoded to &:

&amp;times;

I tried decoding the complete string, then encoding it again, but it does not seem to work properly. This is the code I tried:

header('Content-Type: text/html; charset=iso-8859-1');
...

$b = 'gaIUSHIUGhj>&hyphen; hjb&times;jkn.jhuh>hh> &hellip;';
$b = html_entity_decode($b, ENT_QUOTES, 'UTF-8');
$b = iconv("UTF-8", "ISO-8859-1//TRANSLIT", $b);
$b = htmlentities($b, ENT_QUOTES, 'UTF-8');

But it does not seem to work the right way. Is there a way to prevent or stop this from happening?

Niet the Dark Absol · Accepted Answer · 2013-03-09 03:52:56Z

6

Set the optional $double_encode variable to false. See the documentation for more information.

Your resulting code should look like:

$b = htmlentities($b, ENT_QUOTES, 'UTF-8', false);

answered Mar 9, 2013 at 3:52

Niet the Dark Absol

326k86 gold badges480 silver badges604 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Jared Farrish · Accepted Answer · 2013-03-09 05:05:54Z

You did good looking at the documentation, but you missed the best part. It can be hard to decipher this sometimes:

//     >    >    >    >    >    >    Scroll    >>>    >    >    >    >    >     Keep going.    >    >    >    >>>>>>  See below.  <<<<<<
string htmlentities ( string $string [, int $flags = ENT_COMPAT | ENT_HTML401 [, string $encoding = 'UTF-8' [, bool $double_encode = true ]]] )

^{Look at the very end.}

I know. Confusing. I usually ignore the signature line and go straight down to the next block (Parameters) for the blurbs on each argument.

So you want to use the double_encoded argument at the end to tell htmlentities not to re-encode (and you probably want to stick with UTF-8 unless you have a specific reason not to):

$str = "gaIUSHIUGhj>&hyphen; hjb&times;jkn.jhuh>hh> &hellip;";

// Double-encoded!
echo htmlentities($str, ENT_COMPAT, 'utf-8', true) . "\n";

// Not double-encoded!
echo htmlentities($str, ENT_COMPAT, 'utf-8', false);

https://ignite.io/code/513ab23bec221e4837000000

Collectives™ on Stack Overflow

htmlentities() double encoding entities in string

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related