1

I need to find a way to replace all the <p> within all the <blockquote> before the <hr />.

Here's a sample html:

<p>2012/01/03</p>
<blockquote>
    <h4>File name</h4>
    <p>Good Game</p>
</blockquote>
<blockquote><p>Laurie Ipsumam</p></blockquote>
<h4>Some title</h4>
<hr />
<p>Lorem Ipsum</p>
<blockquote><p>Laurel Ipsucandescent</p></blockquote>

Here's what I got:

    $pieces = explode("<hr", $theHTML, 2);
    $blocks = preg_match_all('/<blockquote>(.*?)<\/blockquote>/s', $pieces[0], $blockmatch); 

    if ($blocks) { 
        $t1=$blockmatch[1];
        for ($j=0;$j<$blocks;$j++) {
            $paragraphs = preg_match_all('/<p>/', $t1[$j], $paragraphmatch);
            if ($paragraphs) {
                $t2=$paragraphmatch[0]; 
                for ($k=0;$k<$paragraphs;$k++) { 
                    $t1[$j]=str_replace($t2[$k],'<p class=\"whatever\">',$t1[$j]);
                }
            }
        } 
    } 

I think I'm really close, but I don't know how to put back together the html that I just pieced out and modified.

2 Answers 2

1

You could try using simple_xml, or better DOMDocument (http://www.php.net/manual/en/class.domdocument.php) before you make it a valid html code, and use this functionality to find the nodes you are looking for, and replace them, for this you could try XPath (http://w3schools.com/xpath/xpath_syntax.asp).

Edit 1:

Take a look at the answer of this question:

RegEx match open tags except XHTML self-contained tags

Sign up to request clarification or add additional context in comments.

1 Comment

Well, what I'm trying to do is correct thousands of entries in a MySQLdatabase/Drupal that all start in this same pattern. My logic was to use php to get all the entries and replace all the tags by first ridding all the <h4> and inline styling, then add a class to all the <p> in the blockquotes, and finally removing the blockquotes. I made it work with the code below by adding a while(preg_match) but there is still the case that if there's a <blockquote> with no <p> in it. Only happens in a couple hundred cases but still happens. I'll take a look at your solutions and hopefully find something.
0
$string = explode('<hr', $string);
$string[0] = preg_replace('/<blockquote>(.*)<p>(.*)<\/p>(.*)<\/blockquote>/sU', '<blockquote>\1<p class="whatever">\2</p>\3</blockquote>', $string[0]);
$string = $string[0] . '<hr' . $string[1];

output:

<p>2012/01/03</p>
<blockquote>
    <h4>File name</h4>
    <p class="whatever">Good Game</p>
</blockquote>
<blockquote><p class="whatever">Laurie Ipsumam</p></blockquote>
<h4>Some title</h4>
<hr />
<p>Lorem Ipsum</p>
<blockquote><p>Laurel Ipsucandescent</p></blockquote>

6 Comments

blast, just noticed that didn't get the first <p> tag.
you do have an ugly regex, maybe it is not such a good idea to teach people that regex can be used to parse html, even if in this particular case it might work
Yeah. It won't work if there's more than one <p> tag in a <blockquote>. The complexity grows too quickly.
I had made it work with a while(preg_match) to just repeat the code you gave, but now I need to find a way to add a <p class="whatever> to any <blockquote> that have no <p> inside. I'm thinking regex wasn't made for all this trouble but I'm not exactly sure what type of solution I'm looking for.
I'm not sure what you mean by "add a <p class="whatever> to any <blockquote> that have no <p> inside". Do you mean that <blockquote>something</blockquote> should become <blockquote><p class="whatever">something</p></blockquote>?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.