Replacing end div tags using preg_replace_callback function

Question

I am trying to develop a PHP script that replaces all divs in an HTML string with paragraphs except those which have attributes (e.g. <div id="1">). The first thing my script currently does is use a simple str_replace() to replace all occurrences of <div> with <p>, and this leaves behind any div tags with attributes and end div tags (</div>). However, replacing the </div> tags with </p> tags is a bit more problematic.

So far, I have developed a preg_replace_callback function that is designed to convert some </div> tags into </p> tags to match the opening <p> tags, but ignore other </div> tags when they are ending a <div> with attributes. Below is the script that I am using;

<?php
$input = "<div>Hello world!</div><div><div id=\"1\">How <div>are you</div> today?</div></div><div>I am fine.</div>";
$input2 = str_replace("<div>", "<p>", $input);
$output = preg_replace_callback("/(<div )|(<\/div>)/", 'replacer', $input2);

function replacer($matches){
    static $count = 0;
    $counter=count($matches);
    for($i=0;$i<$counter;$i++){
        if($matches[$i]=="<div "){
            return "<div ";
            $count++;
        } elseif ($matches[$i]=="</div>"){
            $count--;
            if ($count>=0){
                return "</div>";
            } elseif ($count<0){
                return "</p>";
                $count++;
            }
        }
    }
}
echo $output;
?>

The script basically puts all the remaining <div> and </div> tags into an array and then loop through it. A counter variable is then incremented when it encounters a <div> tag or decremented when it encounters a </div> within the array. When the counter is less than 0, a </p> tag is returned, otherwise a </div> is returned. The output of the script should be;

<p>Hello world!</p><p><div id="1">How <p>are you</p> today?</div></p><p>I am fine.</p>"

Instead the output I am getting is;

<p>Hello world!</p><p><div id="1">How <p>are you</p> today?</p></p><p>I am fine.</p>

I have spent hours making as many edits to the script as I can think of, and I keep getting the same output. Can anyone explain to me where I am going wrong or offer an alternative solution?

Any help would be appreciated.

See this SO favourite off-topic joke page. (That gets nag-posted needlessly everywhere, but for some reason never when it is actually relevant). Read past the jokes; though mostly incorrect still. You can use a regex for such purposes. It's just a bit effortful, requires a (?R) recursing regex. Doable, but not worth to be answered individually everytime someone asks. It's simpler if you just use a readymade solution like phpquery or querypath instead (html traversal frontends). — mario
– mario, Commented Jan 7, 2012 at 19:44

hakre · Accepted Answer · 2012-01-07 20:35:40Z

1

Next to what mario commented, comparable to phpquery or querypath, you can use the PHP DOMDocument class to search for the <div> elements in question and replace them with <p> elements.

The cornerstones are the DOM (Document Object Model) and XPath:

$input = "<div>Hello world!</div><div><div id=\"1\">How <div>are you</div> today?</div></div><div>I am fine.</div>";

$doc = new DOMDocument();
$doc->loadHTML("<div id='body'>{$input}</div>");
$root = $doc->getElementById('body');
$xp = new DOMXPath($doc);

$expression = './/div[not(@id)]';

while($r = $xp->query($expression, $root) and $r->length)
    foreach($r as $div)
    {
        $new = $doc->createElement('p');
        foreach($div->childNodes as $child)
            $new->appendChild($child->cloneNode(1));

        $div->parentNode->replaceChild($new, $div);
    }
    ;

$html = '';
foreach($root->childNodes as $child)
    $html .= rtrim($doc->saveHTML($child))
    ;

echo $html;

This will give you:

<p>Hello world!</p><p><div id="1">How <p>are you</p> today?</div></p><p>I am fine.</p>

edited Jan 7, 2012 at 20:35

answered Jan 7, 2012 at 20:20

hakre

200k55 gold badges454 silver badges866 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

siberiantiger Over a year ago

I have ran the code you suggested, and it works great for the code I have displayed. My only issue is that there could be multiple divs, each with different id's. Furthermore, there will be no way of predicting how many divs there are or what id's they may have. I have tried editing the code you suggested to meet my needs, but without success. Still, thank you very much for answering my question.

hakre Over a year ago

@siberiantiger: That can be controlled with the xpath expression, that's even more easy, I'll update the answer.

hakre Over a year ago

@siberiantiger: Please select the answer as it helped you, see meta.stackexchange.com/questions/5234/… - This is how this site works. Thanks!

v01pe · Accepted Answer · 2012-01-07 23:41:03Z

1

I took a different approach with multiple regular expressions:

$text = "<div>Hello world!</div><div><div id=\"1\">How <div>are you</div> today?</div></div><div>I am fine.</div><div>an other <div id=\"2\">small</div>test</div><div>nested<div>divs</div>...</div>";
echo "before: " . $text . "\n";

do
{
    $count1 = 0;
    $text = preg_replace("/<div>((?![^<]*?<div).*?)<\/div>/", "<p>$1</p>", $text, -1, $count1);
    $count2 = 0;
    $text = preg_replace("/<div ([^>]+)>((?![^<]*?<div).*?)<\/div>/", "<temporarytag $1>$2</temporarytag>", $text, -1, $count);
} while ($count1 + $count2 > 0);

$text = preg_replace("/(<[\/]?)temporarytag/", "$1div", $text);

echo "after: " . $text;

This will get you:

    before: <div>Hello world!</div><div><div id="1">How <div>are you</div> today?</div></div><div>I am fine.</div><div>an other <div id="2">small</div>test</div><div>nested<div>divs</div>...</div>
    after: <p>Hello world!</p><p><div id="1">How <p>are you</p> today?</div></p><p>I am fine.</p><p>an other <div id="2">small</div>test</p><p>nested<p>divs</p>...</p>

If you don't need the snippet, I have learned something about regexp's myself at least :P

answered Jan 7, 2012 at 23:41

v01pe

1,1362 gold badges11 silver badges21 bronze badges

1 Comment

siberiantiger Over a year ago

Thanks very much. I too have learned something about regexp's.

Collectives™ on Stack Overflow

Replacing end div tags using preg_replace_callback function

2 Answers 2

3 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related