1

I have a string that contains html. How would I htmlentity the string so that everything except tags are encoded? For example:

$foo = '<div class="link">Here\'s is a link: "<a href="http://www.example.com">Doors & windows</a>'</div>';

I want to convert it to

$out = '<div class="link">Here\'s is a link: &quot;<a href="http://www.example.com">Doors &amp; windows</a>&quot;</div>';
3
  • 1
    That'd have to be $out = "<div class=\"link\" etc.... anyways, otherwise both of your versions are syntax errors. Commented Jun 20, 2011 at 21:04
  • Can you explain why do you need this? Because I see no sense of doing this. I know only 2 cases when htmlentities are needed: (1) When we want to make regular text or values of html atributes to NOT interfere with html language elements. (2) When there is a need to show some special symbols from different charset than page's one. Commented Jun 20, 2011 at 21:28
  • Well, the answer to your question is simple, I want valid html. My string contains html which may contain characters that aren't valid. I want the html elements to remain intact, but run htmlentities on the data inside of those elements. Commented Jun 20, 2011 at 21:33

3 Answers 3

1

This code snippet shows a function that will load some xml (ensure that at least tags opened have a closing pendant and such, otherwise you will see / read some errors) and then applies htmlentities onto all text-nodes. I actually have no real clue for what you need that, but probably it makes you happy:

$foo = '<div class="link">Here\'s is a link: <a href="http://www.example.com">Doors & windows</a></div>';

echo text_htmlentities(utf8_encode($foo));

/**
 * add htmlentities onto the text-nodes of an
 * xml fragment.
 * 
 * @param string $foo xml fragment (utf8)
 * @return string
 */
function text_htmlentities($foo) {
    $foo = str_replace('&', '&amp;', $foo);
    $dom = new DOMDocument;
    $dom->loadXml($foo);
    $xpath = new DomXpath($dom);
    foreach($xpath->query('//text()') as $node) {
        $node->nodeValue = htmlentities($node->nodeValue, ENT_QUOTES, 'UTF-8', false);
    }
    return str_replace('&amp;','&', $dom->saveXml($dom->firstChild));
}

output:

<div class="link">Here&#039;s is a link: <a href="http://www.example.com">Doors &amp; windows</a></div>
Sign up to request clarification or add additional context in comments.

3 Comments

I was beginning to wonder if anybody was understanding what i was wanting to do.. This works, except it only works for xml, is there a solution for html?
@nathanjosiah which charset uses your page?
@nathanjosiah: Updated the code. The function expects a string with utf8 encoding. utf8_encode — Encodes an ISO-8859-1 string to UTF-8
1

First replace the brackets with another token, call htmlentities, then convert back.

$html = str_replace("<","***OPENBRACKET***",$html);
$html = str_replace(">","***CLOSEBRACKET***",$html);

$html = htmlentities($html);

$html = str_replace("***OPENBRACKET***","<",$html);
$html = str_replace("***CLOSEBRACKET***",">",$html);

1 Comment

Then you need to use an HTML parser, and run htmlentities on the text nodes.
0

Try using html_entity_decode function

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.