0

I am trying to process a HTML file with php as a DOM document. Processing is okay, but when I save the html document with $html->saveHTMLFile("file_out.html"); all link tags are converted from:

Click here: <a title="editable" href="http://somewhere.net">somewhere.net</a>

to

Click here: &lt;a title="editable" href="http://somewhere.net"&gt; somewhere.net &lt;/a&gt; 

I process the links as php scripts, maybe this makes a difference? I cannot convert the &lt; back to < with htmlentitites_decode() or such. Is there any other conversion or encoding I can use?

The php script looks like the following:

<?php
$text = $_POST["textareaX"];
$id = $_GET["id"];
$ref = $_GET["ref"];
$html = new DOMDocument(); 
$html->preserveWhiteSpace = true;
$html->formatOutput       = false;
$html->substituteEntities = false;
$html->loadHTMLFile($ref.".html"); 
$elem = $html->getElementById($id); 
$elem->nodeValue = $innerHTML;

if ($text == "")
  { $text = "--- No details. ---"; }
$newtext = "";
$words = explode(" ",$text);
foreach ($words as $word) {
  if (strpos($word, "http://") !== false) {
    $newtext .= "<a alt=\"editable\" href=\"".$word."\">".$word."</a>"; 
    }
  else {$newtext .= $word." ";}
}

$text = $newtext;

function setInnerHTML($DOM, $element, $innerHTML) {
  $node = $DOM->createTextNode($innerHTML);
  $children = $element->childNodes;
  foreach ($children as $child) {
    $element->removeChild($child);
  }
  $element->appendChild($node);
}

setInnerHTML($html, $elem, $text);
$html->saveHTMLFile($ref.".html");
header('Location: '."tracking.php?ref=$ref&user=unLock");
?>

We get the reference to a file from "id" and "ref" and the input data from array "textareaX". Next I open the file, identify the html element by id and replace its content (a link) with the input data from the textarea. I provide only the href in the textarea and the script builds the hyperlink from that. Next I plug this back into the original file and overwrite the input file.

When I write the new file though, the link <a href= ...> </a> is converted to &lt;a href=...&gt; &lt;/a&gt;, which is a problem.

2
  • 2
    The same code wiorks for me. I have used $html = new DOMDocument(); $html->loadHTMLFile("file_in.html"); $html->preserveWhiteSpace = true; $html->formatOutput = true; $html->saveHTMLFile("file_out.html"); Commented Jul 11, 2016 at 11:11
  • Hi @Jack! Maybe I did not provide enough information. I edited my question and now include the whole script. The problem persists. Commented Jul 11, 2016 at 15:44

1 Answer 1

1

Here is part of your code with the issue identified:

<?php

function setInnerHTML($DOM, $element, $innerHTML) {
  /*********************************
      Well, there's your problem:
  **********************************/
  $node = $DOM->createTextNode($innerHTML);
  $children = $element->childNodes;
  foreach ($children as $child) {
    $element->removeChild($child);
  }
  $element->appendChild($node);
}

?>

What you are doing is passing your new anchor (a) tag as a string then creating a text node out of it (text is just that - text, not HTML). The createTextNode function automatically encodes any HTML tags so that they will be visible as text when viewed by a browser (this is so you can present HTML as visible code on your page if you choose to).

What you need to do is create the element as HTML (not a text node) then append it:

<?php

function setInnerHTML($DOM, $element, $innerHTML) {

  $f = $DOM->createDocumentFragment();
  $f->appendXML($innerHTML);
  $element->appendChild($f);

}

?>
Sign up to request clarification or add additional context in comments.

1 Comment

No problem, glad I could help

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.