3

Example of markup:

<p> a paragraph </p>
<pre lang="html">
  &lt;p&gt; a paragraph &lt;/p&gt;
</pre>
<code lang="html">
  &lt;p&gt; a paragraph &lt;/p&gt;
</code>

How can I select all the stuff between <pre>,</pre>,<code>,</code> and run a function on it? Trough this function I need to pass 3 arguments: the part of the string that's selected (&lt;p&gt; a paragraph &lt;/p&gt;), the container type (pre or code), and the parameters of the container (like lang="html").

The function should change the selected part of the string based on the other 2 parameters (if it's relevant I want run the GeShi highlighter on it), then replace the contents of the original string with it, including the container. Something like:

<p> a paragraph </p>
<div class="html pre">
  &lt;p&gt; a paragraph &lt;/p&gt;
</div>
<div class="html code">
  &lt;p&gt; a paragraph &lt;/p&gt;
</div>
3
  • (related) Best Methods to parse HTML Commented Apr 2, 2011 at 17:37
  • Is this a full HTML page with a root element or only a partial as shown above? Commented Apr 2, 2011 at 17:40
  • no, it's partial html block, it's basically a article or a comment containing code samples... Commented Apr 2, 2011 at 17:43

1 Answer 1

3

I think it should go like this:

$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

$elements = $xpath->query('//pre | //code');

In some cases (e.g.: if you use getElementsByTagName instead of XPath), you will need to operate on an array to get the proper behaviour (see this question), so you need to copy the nodes to an array. I'll do it for this example:

$array = array();
foreach ($elements as $element) {
    $array[] = $element;
}

foreach ($array as $element) {
    $tag = $element->tagName;
    $content = $element->textContent;
    $lang = $element->getAttribute('lang');
    $new_content = my_function($tag, $content, $lang);

    $new_element = $dom->createElement('div');
    $new_element->setAttribute('class', "$tag $lang");
    $new_element->nodeValue = $new_content;
    $element->parentNode->replaceChild($new_element, $element);
}

Of course, in the example above, the my_function is undefined. But it should give you a good idea on the howto.

Note that this won't work on nested elements, like these:

<pre lang="html">
  <p>some nested element</p>
  &lt;p&gt; a paragraph &lt;/p&gt;
</pre>

If you want to work on nested elements, use a function to get the innerHTML instead of using $element->textContent.

Sign up to request clarification or add additional context in comments.

1 Comment

thank you. I'm sorry to be so stupid, but how do I get the processed string? :) $new_content only has the code stuff in it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.