PHP PREG Question

Question

As hard as I try, PREG and I don't get along, so, I am hoping one of you PHP gurus can help out ..

I have some HTML source code coming in to a PHP script, and I need specific items stripped out/removed from the source code.

First, if this comes in as part of HTML (could be multiple instances):

<SPAN class=placeholder title="" jQuery1262031390171="46">[[[SOMETEXT]]]</SPAN>

I want it converted into simply [[[SOMETEXT]]]

Note that the prefix will always be (I think):

<SPAN class=placeholder

.. and suffix will always be

</SPAN>

(yes, capital SPAN), but the title="" and jQuery###="#" pieces may be different. [[[SOMETEXT]]] could be anything. I essentially want the SPAN tag removed.

Next, if this comes as part of HTML (also could be multiple instances):

<span style="" class="placeholder" title="">[[[SOMETEXT]]</span>

.. same thing - just want the [[[SOMETEXT]]] part to remain. I think piece will always be prefix, and (in this case, lowercase span tags) will be suffix.

I understand this may probably take 2 PREG commands, but would like to be able to pass in the html text into a function and get a cleaned/stripped version, something like this:

$dirty_text = $_POST['html_text']; $clean_text = strip_placeholder_spans($dirty_text); function strip_placeholder_spans( $in_text ) { // all the preg magic happens here, and returns result }

ADDED/UPDATED FOR CLARITY

Ok, getting some good feedback, and getting close. However, to make it clearer, here is an example. I want to sent this text into the function strip_placeholder_spans():

<blockquote>
<h2 align="center">Firefox: <span class="placeholder" title="">[[[ITEM1]]]</span></h2>
<h2 align="center">IE1:<SPAN class=placeholder title="" jQuery1262031390171="46">[[[ITEM2]]]</SPAN>
</h2>
<h2 align="center">IE2:<SPAN class=placeholder title="" jQuery1262031390412="52">[[[ITEM3]]]</SPAN> 
</h2>
<h2 align="center"><br><font face="Arial, Helvetica, sans-serif">COMPLETE</font></h2>
<p align="center">Your Text Can Go Here</p>
<p align="center"><a href="javascript:self.close()">Close this Window</a></p>
<p align="center"><br></p>
<p align="center"><a href="javascript:self.close()"><br></a></p></blockquote>
<p align="center"></p>

and when it comes back, it should be this:

<blockquote>
<h2 align="center">Firefox: [[[ITEM1]]]</h2>
<h2 align="center">IE1:[[[ITEM2]]]</h2>
<h2 align="center">IE2:[[[ITEM3]]]</h2>
<h2 align="center"><br><font face="Arial, Helvetica, sans-serif">COMPLETE</font></h2>
<p align="center">Your Text Can Go Here</p>
<p align="center"><a href="javascript:self.close()">Close this Window</a></p>
<p align="center"><br></p>
<p align="center"><a href="javascript:self.close()"><br></a></p></blockquote>
<p align="center"></p>

Here we go again about parsing HTML tags with regular expressions... Please see this answer - stackoverflow.com/questions/1732348/… — LiraNuna
– LiraNuna, Commented Dec 28, 2009 at 20:58
Parsing Html The Cthulhu Way: codinghorror.com/blog/archives/001311.html — Rubens Farias
– Rubens Farias, Commented Dec 28, 2009 at 20:58
Here we go again with the "html + regex is evil' stance. Not trying go PARSE HTML here LiraNuna. -- just want to search & replace some text. Don't want to use a power-saw to cut a toothpick. If it helps, pretend there are no < and > symbols in the text. — OneNerd
– OneNerd, Commented Dec 28, 2009 at 21:16

leepowers · Accepted Answer · 2009-12-28 21:20:17Z

1

Use an HTML parse. This is the most robust solution. The following code will work for the two code examples you posted:

$s= <<<STR
<span style="" class="placeholder" title="">[[[SOMETEXT]]</span>
Some Other text &amp; <b>Html</b>
<SPAN class=placeholder title="" jQuery1262031390171="46">[[[SOMETEXT]]]</SPAN>
STR;

preg_match_all('/\<span[^>]+?class="*placeholder"*[^>]+?>([^<]+)?<\/span>/isU', $s, $m);
var_dump($m);

Using regular expressions results in very focused code. This example will only handle very specific HTML and well-formed HTML. For instance, it won't parse <span class="placeholder">some text < more text</span>. If you have control over the source HTML this may be good enough.

answered Dec 28, 2009 at 21:20

leepowers

38.4k24 gold badges103 silver badges132 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

OneNerd Over a year ago

I converted your preg_match_all to a preg_replace, and it appears to do what I need. Thanks -

ahmetunal · Accepted Answer · 2009-12-28 21:00:47Z

1

I think this should solve your poble

function strip_placeholder_spans( $in_text ) {
preg_match("/>(.*?)<\//", $in_text, $result);
return $result[1]; }

answered Dec 28, 2009 at 21:00

ahmetunal

3,9691 gold badge25 silver badges28 bronze badges

2 Comments

OneNerd Over a year ago

hmm - not an expert, but wouldn't that strip out all tags?

ahmetunal Over a year ago

oh yes sorry, misunderstood the question, you want only strip span, then you can use, function strip_placeholder_spans( $in_text ) { preg_match("/<span(.*?)>(.*?)<\/span>/", $in_text, $result); return $result[2]; } I'm not sure i understood it right again, im kind of confused waht you wanted

Byron Whitlock · Accepted Answer · 2009-12-28 21:28:44Z

1

Step one: Remove regular expressions from your toolbox when dealing with HTML. You need a parser.

Step two: Download simple_html_dom for php.

Step three: Parse

$html = str_get_html('<SPAN class=placeholder title="" jQuery1262031390171="46">[[[SOMETEXT]]]</SPAN>');
$spanText = $html->find('span', 1)->innerText;

Step four: Profit!

Edit

$html->find('span.placeholder', 1)->tag, $matches); will return what you want. It looks for class=placeholder.

edited Dec 28, 2009 at 21:28

answered Dec 28, 2009 at 21:00

Byron Whitlock

54.2k29 gold badges128 silver badges170 bronze badges

5 Comments

OneNerd Over a year ago

Byron - i don't know ahead of time the title or thejquery###="#" piece - any way to issue wildcards on those?

LiraNuna Over a year ago

You said you want to strip the span, not keep the attributes?

OneNerd Over a year ago

just want the piece [[[SOMETEXT]]] to remain, everything else can go.

leepowers Over a year ago

I'm also guessing there will be other non/placeholder spans in the source. So you'll need to select only the spans with the placeholder class and get their inner text.

OneNerd Over a year ago

yes, although sometimes the class is set like this: class=placeholder (no quotes), and sometimes with quotes.

Collectives™ on Stack Overflow

PHP PREG Question

3 Answers 3

1 Comment

2 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related