-1

How can I search and replace a specific string (text + html tags) in a web page using the native PHP DOM Parser?

For example, search for

<p> <a href="google.com"> Check this site </a> </p>

This string is somewhere inside inside an html tree.

I would like to find it and replace it with another string. For example,

<span class="highligher"><p> <a href="google.com"> Check this site </a> </p></span>

Bear in mind that there is no ID to the <p> or <a> nodes. There can be many of those identical nodes, holding different pieces of text.

I tried str_replace, however it fails with complex html markup, so I have turned to HTML Parsers now.

EDIT:

The string to be found and replaced might contain a variety of HTML tags, like divs, headlines, bolds etc.. So, I am looking for a solution that can construct a regex or DOM xpath query depending on the contents of the string being searched.

Thanks!

6
  • Aren't you better off using JavaScript and adding an id / class to <p>? Commented Nov 11, 2015 at 10:48
  • Have you tried: simplehtmldom.sourceforge.net Commented Nov 11, 2015 at 10:51
  • I have no control over the HTML document being parsed, so I cannot add any attributes. I read about Simple HTML DOM, however people say it is inferior to the native PHP DOM Parser Commented Nov 11, 2015 at 10:53
  • getElementsByTagName(..), then filter with getAttribute(..) on them? Commented Nov 11, 2015 at 10:56
  • This can return 20+ different <p> elements, how do you identify the right one and replace it ? Commented Nov 11, 2015 at 11:06

2 Answers 2

4

Is this what you wanted:

<?php
// load
$doc = new DOMDocument();
$doc->loadHTMLFile("filename.html");

// search p elements
$p_elements = $doc->getElementsByTagName('p');

// parse this elements, if available
if (!is_null($p_elements)) 
{
    foreach ($p_elements as $p_element) 
    {
        // get p element nodes
        $nodes = $p_element->childNodes;

        // check for "a" nodes in these nodes
        foreach ($nodes as $node) {

            // found an a node - check must be defined better!
            if(strtolower($node->nodeName) === 'a')
            {
                // create the new span element
                $span_element = $doc->createElement('span');
                $span_element->setAttribute('class', 'highlighter');

                // replace the "p" element with the span
                $p_element->parentNode->replaceChild($span_element, $p_element);
                // append the "p" element to the span
                $span_element->appendChild($p_element);
            }
        }
    }
}

// output
echo '<pre>';
echo htmlentities($doc->saveHTML());
echo '</pre>';

This HTML is the basis for conversion:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head><title>Your Title Here</title></head><body bgcolor="FFFFFF">
<hr><a href="http://somegreatsite.com">Link Name</a>
is a link to another nifty site
<h1>This is a Header</h1>
<h2>This is a Medium Header</h2>
<p> <a href="amazon.com"> Check this site </a> </p>
Send me mail at <a href="mailto:[email protected]">
[email protected]</a>.
<p> This is a new paragraph!
</p><hr><p> <a href="google.com"> Check this site </a> </p>
</body></html>

The output looks like that, it wraps the elements you mentioned:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head><title>Your Title Here</title></head><body bgcolor="FFFFFF">
<hr><a href="http://somegreatsite.com">Link Name</a>
is a link to another nifty site
<h1>This is a Header</h1>
<h2>This is a Medium Header</h2>
<span class="highlighter"><p> <a href="amazon.com"> Check this site </a> </p></span>
Send me mail at <a href="mailto:[email protected]">
[email protected]</a>.
<p> This is a new paragraph!
</p><hr><span class="highlighter"><p> <a href="google.com"> Check this site </a> </p></span>
</body></html>
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, however that will not be able to identify the exact node which contains the "check this site" text. Seems it will pick up the first <p> containing <a> element. There might be 20 other strings that meet this criteria but have different text inside. Additionally, the html to be replaced is dynamic. It might contain DIVs, bolds, header tags etc..
You can use trim($node->textContent) === 'Check this site' for checking for specific content. What do you mean by "the html to be replaced is dynamic"? Can you give more examples, I thought you wanted to wrap an <p> element with an <a> element inside with the text "check this site" with an <span> element.
0

You could use a regular expression with preg_replace.

 preg_replace("/<\s*p[^>]*>(.*?)<\s*\/\s*p>/", '<span class="highligher"><p>$1</p></span>', '<p><a href="google.com"> Check this site</a></p>');

The third parameter of preg_replace can be used to restrict the number of replacements

http://php.net/manual/en/function.preg-replace.php http://www.pagecolumn.com/tool/all_about_html_tags.htm - for more examples on regular expressions for HTML

You will need to edit the regular expression to only capture the p tags with the google href

EDIT

preg_replace("/<\s*\w.*?><a href\s*=\s*\"?\s*(.*)(google.com)\s*\">(.*?)<\/a>\s*<\/\s*\w.*?>/", '<span class="highligher"><p><a href="$1$2">$3</a></p></span>', $string);

4 Comments

Thanks, seems I will have to use regular expressions. However, the strings being searched and replaced can vary. It might be <div><a href="google.com"> <div style="color:red">Check this site</div></a></div>. So, I am looking for a more universal solution. May be a dynamic expression to handle all cases?
Also, does that mean that using DOM parser for this task is not possible? It must be possible to load some html string and search it in the already parsed file ?
I'm not to familiar with the DOM parser but I think it will be difficult if there's no class or id
If you down vote can you at least leave a comment, its pretty pointless down voting without an explanation isn't it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.