2

I have this:

$text = 'text text text s html tagove
<div id="content">ss adsda sdsa </div>
oshte text s html tagove';
$content = preg_replace('/(<div\sid=\"content\">)[^<]+(<\/div>)/i', '', $text);
var_dump($content); 

But if the <div id="content"></div> contains other tags, such as <b>,<i> etc, it does not work.

For example:

$text = 'text text text s html tagove
<div id="content"><b> stfu </b> ss adsda sdsa </div>
oshte text s html tagove';
4
  • What text do you want to remove if there are multiple tags? Commented Mar 9, 2012 at 22:00
  • Don't parse HTML with regex. Use one of the parsers you have in PHP. Commented Mar 10, 2012 at 1:52
  • 1
    I would not use STFU as an ilustration of your need. Is a bad word. Commented Mar 19, 2013 at 14:35
  • 6
    @MarcelloGrechiLins - I'm sure the Southern Tenant Farmers' Union might think differently! ;-) Commented Mar 21, 2013 at 11:16

2 Answers 2

5

You can use lazy quantifiers instead.

$s="foo<div>Some content is <b>bold</b>.</div>bar\n";

print preg_replace("/<div>.+?<\/div>/i", "", $s);'

output:

foobar

UPDATE per comments:

[ghoti@pc ~]$ cat doit.php 
<?php

$text = 'text text text s html tagove
<div id="content"><b> stfu </b> ss adsda sdsa </div>
oshte text s html tagove';

print preg_replace('/<div id="content">.+?<\/div>/im', '', $text) .  "\n";

[ghoti@pc ~]$ php doit.php 
text text text s html tagove

oshte text s html tagove
[ghoti@pc ~]$ 
Sign up to request clarification or add additional context in comments.

6 Comments

this only matches the div tag if there are no attributes like it has in the example.
And it won't work, eg <div id="content">ss <div>adsda</div> sdsa </div>, -1. Don't parse HTML with regex.
@Qtax - There's nothing wrong with parsing HTML with regex if you've got predictable input and the problem is within the realm of what a regex can handle. The OP was worried about embedde <b>, not embedded <div>s.
@JonathanKuhn - this example was intended as a simple demonstration of a lazy quantifier. But okay, I'll add a correction to the OP's original preg_replace as an update. <sigh>
I agree. This works, and it addresses the OP's concerns. If handling HTML in RE is a bad idea, perhaps it's a downvote for this question, but not for the answer.
|
2

Better to use DOM to handle HTML text parsing. Here is a DOM based code to remove your div tag:

$html = <<< EOF
text text text s html tagove
<div id="content">ss <div>abcd</div>adsda sdsa </div>
oshte text s html tagove
<div id="content">foo <div>bar</div>baz foo</div>
some more text here
EOF;

$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$nlist = $xpath->query("//div[@id='content']");
for($i=0; $i < $nlist->length; $i++) {
   $node = $nlist->item($i);
   $node->parentNode->removeChild($node);
}
$newHTML =  $doc->saveHTML();
echo $newHTML;

Thanks to @Qtax for pointing it out to me that original question has changed after I wrote my previous regex based answer.

OUTPUT:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<p>text text text s html tagove
</p>
oshte text s html tagove

some more text here</body></html>

4 Comments

@Qtax: Glad that at least you left a comment for down vote. If you can tell a bit more why it is worse I will really appreciate it.
The code in your answer doesn't work or even attempt to solve the issue in question, read the question again. (Hint: He's having problems with nested tags.)
Ah crap, you're right. However this nested tag thingy wasn't there originally and when I posted this answer. I myself keep writing on SO on various questions to NOT to use regex for HTML parsing (and you can see my warning on top of my answer) and it now came back to bite me :)
@Qtax: I have edited and posted a DOM based code to remove the div tag.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.