1

As hard as I try, PREG and I don't get along, so, I am hoping one of you PHP gurus can help out ..

I have some HTML source code coming in to a PHP script, and I need specific items stripped out/removed from the source code.

First, if this comes in as part of HTML (could be multiple instances):

<SPAN class=placeholder title="" jQuery1262031390171="46">[[[SOMETEXT]]]</SPAN>

I want it converted into simply [[[SOMETEXT]]]

Note that the prefix will always be (I think):

<SPAN class=placeholder

.. and suffix will always be

</SPAN>

(yes, capital SPAN), but the title="" and jQuery###="#" pieces may be different. [[[SOMETEXT]]] could be anything. I essentially want the SPAN tag removed.

Next, if this comes as part of HTML (also could be multiple instances):

<span style="" class="placeholder" title="">[[[SOMETEXT]]</span>

.. same thing - just want the [[[SOMETEXT]]] part to remain. I think piece will always be prefix, and (in this case, lowercase span tags) will be suffix.

I understand this may probably take 2 PREG commands, but would like to be able to pass in the html text into a function and get a cleaned/stripped version, something like this:

$dirty_text = $_POST['html_text'];
$clean_text = strip_placeholder_spans($dirty_text);
function strip_placeholder_spans( $in_text ) {
 // all the preg magic happens here, and returns result
}

ADDED/UPDATED FOR CLARITY

Ok, getting some good feedback, and getting close. However, to make it clearer, here is an example. I want to sent this text into the function strip_placeholder_spans():

<blockquote>
<h2 align="center">Firefox: <span class="placeholder" title="">[[[ITEM1]]]</span></h2>
<h2 align="center">IE1:<SPAN class=placeholder title="" jQuery1262031390171="46">[[[ITEM2]]]</SPAN>
</h2>
<h2 align="center">IE2:<SPAN class=placeholder title="" jQuery1262031390412="52">[[[ITEM3]]]</SPAN> 
</h2>
<h2 align="center"><br><font face="Arial, Helvetica, sans-serif">COMPLETE</font></h2>
<p align="center">Your Text Can Go Here</p>
<p align="center"><a href="javascript:self.close()">Close this Window</a></p>
<p align="center"><br></p>
<p align="center"><a href="javascript:self.close()"><br></a></p></blockquote>
<p align="center"></p>

and when it comes back, it should be this:

<blockquote>
<h2 align="center">Firefox: [[[ITEM1]]]</h2>
<h2 align="center">IE1:[[[ITEM2]]]</h2>
<h2 align="center">IE2:[[[ITEM3]]]</h2>
<h2 align="center"><br><font face="Arial, Helvetica, sans-serif">COMPLETE</font></h2>
<p align="center">Your Text Can Go Here</p>
<p align="center"><a href="javascript:self.close()">Close this Window</a></p>
<p align="center"><br></p>
<p align="center"><a href="javascript:self.close()"><br></a></p></blockquote>
<p align="center"></p>
3
  • 2
    Here we go again about parsing HTML tags with regular expressions... Please see this answer - stackoverflow.com/questions/1732348/… Commented Dec 28, 2009 at 20:58
  • Parsing Html The Cthulhu Way: codinghorror.com/blog/archives/001311.html Commented Dec 28, 2009 at 20:58
  • Here we go again with the "html + regex is evil' stance. Not trying go PARSE HTML here LiraNuna. -- just want to search & replace some text. Don't want to use a power-saw to cut a toothpick. If it helps, pretend there are no < and > symbols in the text. Commented Dec 28, 2009 at 21:16

3 Answers 3

1

Use an HTML parse. This is the most robust solution. The following code will work for the two code examples you posted:

$s= <<<STR
<span style="" class="placeholder" title="">[[[SOMETEXT]]</span>
Some Other text &amp; <b>Html</b>
<SPAN class=placeholder title="" jQuery1262031390171="46">[[[SOMETEXT]]]</SPAN>
STR;

preg_match_all('/\<span[^>]+?class="*placeholder"*[^>]+?>([^<]+)?<\/span>/isU', $s, $m);
var_dump($m);

Using regular expressions results in very focused code. This example will only handle very specific HTML and well-formed HTML. For instance, it won't parse <span class="placeholder">some text < more text</span>. If you have control over the source HTML this may be good enough.

Sign up to request clarification or add additional context in comments.

1 Comment

I converted your preg_match_all to a preg_replace, and it appears to do what I need. Thanks -
1

I think this should solve your poble

function strip_placeholder_spans( $in_text ) {
preg_match("/>(.*?)<\//", $in_text, $result);
return $result[1]; }

2 Comments

hmm - not an expert, but wouldn't that strip out all tags?
oh yes sorry, misunderstood the question, you want only strip span, then you can use, function strip_placeholder_spans( $in_text ) { preg_match("/<span(.*?)>(.*?)<\/span>/", $in_text, $result); return $result[2]; } I'm not sure i understood it right again, im kind of confused waht you wanted
1

Step one: Remove regular expressions from your toolbox when dealing with HTML. You need a parser.

Step two: Download simple_html_dom for php.

Step three: Parse

$html = str_get_html('<SPAN class=placeholder title="" jQuery1262031390171="46">[[[SOMETEXT]]]</SPAN>');
$spanText = $html->find('span', 1)->innerText;

Step four: Profit!

Edit

$html->find('span.placeholder', 1)->tag, $matches); will return what you want. It looks for class=placeholder.

5 Comments

Byron - i don't know ahead of time the title or thejquery###="#" piece - any way to issue wildcards on those?
You said you want to strip the span, not keep the attributes?
just want the piece [[[SOMETEXT]]] to remain, everything else can go.
I'm also guessing there will be other non/placeholder spans in the source. So you'll need to select only the spans with the placeholder class and get their inner text.
yes, although sometimes the class is set like this: class=placeholder (no quotes), and sometimes with quotes.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.