1

I am trying to develop a PHP script that replaces all divs in an HTML string with paragraphs except those which have attributes (e.g. <div id="1">). The first thing my script currently does is use a simple str_replace() to replace all occurrences of <div> with <p>, and this leaves behind any div tags with attributes and end div tags (</div>). However, replacing the </div> tags with </p> tags is a bit more problematic.

So far, I have developed a preg_replace_callback function that is designed to convert some </div> tags into </p> tags to match the opening <p> tags, but ignore other </div> tags when they are ending a <div> with attributes. Below is the script that I am using;

<?php
$input = "<div>Hello world!</div><div><div id=\"1\">How <div>are you</div> today?</div></div><div>I am fine.</div>";
$input2 = str_replace("<div>", "<p>", $input);
$output = preg_replace_callback("/(<div )|(<\/div>)/", 'replacer', $input2);

function replacer($matches){
    static $count = 0;
    $counter=count($matches);
    for($i=0;$i<$counter;$i++){
        if($matches[$i]=="<div "){
            return "<div ";
            $count++;
        } elseif ($matches[$i]=="</div>"){
            $count--;
            if ($count>=0){
                return "</div>";
            } elseif ($count<0){
                return "</p>";
                $count++;
            }
        }
    }
}
echo $output;
?>

The script basically puts all the remaining <div> and </div> tags into an array and then loop through it. A counter variable is then incremented when it encounters a <div> tag or decremented when it encounters a </div> within the array. When the counter is less than 0, a </p> tag is returned, otherwise a </div> is returned. The output of the script should be;

<p>Hello world!</p><p><div id="1">How <p>are you</p> today?</div></p><p>I am fine.</p>"

Instead the output I am getting is;

<p>Hello world!</p><p><div id="1">How <p>are you</p> today?</p></p><p>I am fine.</p>

I have spent hours making as many edits to the script as I can think of, and I keep getting the same output. Can anyone explain to me where I am going wrong or offer an alternative solution?

Any help would be appreciated.

1
  • See this SO favourite off-topic joke page. (That gets nag-posted needlessly everywhere, but for some reason never when it is actually relevant). Read past the jokes; though mostly incorrect still. You can use a regex for such purposes. It's just a bit effortful, requires a (?R) recursing regex. Doable, but not worth to be answered individually everytime someone asks. It's simpler if you just use a readymade solution like phpquery or querypath instead (html traversal frontends). Commented Jan 7, 2012 at 19:44

2 Answers 2

1

Next to what mario commented, comparable to phpquery or querypath, you can use the PHP DOMDocument class to search for the <div> elements in question and replace them with <p> elements.

The cornerstones are the DOM (Document Object Model) and XPath:

$input = "<div>Hello world!</div><div><div id=\"1\">How <div>are you</div> today?</div></div><div>I am fine.</div>";

$doc = new DOMDocument();
$doc->loadHTML("<div id='body'>{$input}</div>");
$root = $doc->getElementById('body');
$xp = new DOMXPath($doc);

$expression = './/div[not(@id)]';

while($r = $xp->query($expression, $root) and $r->length)
    foreach($r as $div)
    {
        $new = $doc->createElement('p');
        foreach($div->childNodes as $child)
            $new->appendChild($child->cloneNode(1));

        $div->parentNode->replaceChild($new, $div);
    }
    ;

$html = '';
foreach($root->childNodes as $child)
    $html .= rtrim($doc->saveHTML($child))
    ;

echo $html;

This will give you:

<p>Hello world!</p><p><div id="1">How <p>are you</p> today?</div></p><p>I am fine.</p>
Sign up to request clarification or add additional context in comments.

3 Comments

I have ran the code you suggested, and it works great for the code I have displayed. My only issue is that there could be multiple divs, each with different id's. Furthermore, there will be no way of predicting how many divs there are or what id's they may have. I have tried editing the code you suggested to meet my needs, but without success. Still, thank you very much for answering my question.
@siberiantiger: That can be controlled with the xpath expression, that's even more easy, I'll update the answer.
@siberiantiger: Please select the answer as it helped you, see meta.stackexchange.com/questions/5234/… - This is how this site works. Thanks!
1

I took a different approach with multiple regular expressions:

$text = "<div>Hello world!</div><div><div id=\"1\">How <div>are you</div> today?</div></div><div>I am fine.</div><div>an other <div id=\"2\">small</div>test</div><div>nested<div>divs</div>...</div>";
echo "before: " . $text . "\n";

do
{
    $count1 = 0;
    $text = preg_replace("/<div>((?![^<]*?<div).*?)<\/div>/", "<p>$1</p>", $text, -1, $count1);
    $count2 = 0;
    $text = preg_replace("/<div ([^>]+)>((?![^<]*?<div).*?)<\/div>/", "<temporarytag $1>$2</temporarytag>", $text, -1, $count);
} while ($count1 + $count2 > 0);

$text = preg_replace("/(<[\/]?)temporarytag/", "$1div", $text);

echo "after: " . $text;

This will get you:

    before: <div>Hello world!</div><div><div id="1">How <div>are you</div> today?</div></div><div>I am fine.</div><div>an other <div id="2">small</div>test</div><div>nested<div>divs</div>...</div>
    after: <p>Hello world!</p><p><div id="1">How <p>are you</p> today?</div></p><p>I am fine.</p><p>an other <div id="2">small</div>test</p><p>nested<p>divs</p>...</p>

If you don't need the snippet, I have learned something about regexp's myself at least :P

1 Comment

Thanks very much. I too have learned something about regexp's.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.