1

Here is my string:

$str="<p>Some <a href="#">link</a> with <a href="http://whatever.html?bla">LINK2</a> and <a href="http://whatever.html?bla" target="_blank">LINK3</a> and</p> more html"

I would like to remove the links LINK1 and LINK2 using php to get:

"<p>Some <a href="#">link</a> with and and</p> more html"

Here is what I think is close to what I need:

$find = array("<a(.*)LINK1(.*)</a>", "<a(.*)LINK2(.*)</a>");
$replace = array("", "");
$result=preg_replace("$find","$replace",$str);

This isn't working. I have searched for days and tried many other options but never managed to get this to work as expected. Also, I don't really mind if LINK1 and 2 appear as soon as the a tags are removed.

2

2 Answers 2

1

You are very close to a working solution. The problem you are facing is that regular expressions per default try to match as much as possible. The pattern <a(.*)LINK1(.*)</a> will in fact match the first <a to the last </a> if they have LINK1 inbetween. What you want is just to just get the nearest <a> tag.

There are a few ways to do this, but I usually go for making the matching ungreedy. Then it will instead try to find the smallest possible matches. Two ways of doing this is to append a ? after the quantifier or using the ungreedy modifier U. I prefer the first one.

Using ?:

/<a(.*?)LINK1(.*?)<\/a>/

Using modifier:

/<a(.*)LINK1(.*)<\/a>/U

Both should work equally well here. The entire source code will thus be as follows (using ?):

$find = array("/<a(.*?)LINK1(.*?)<\/a>/", "/<a(.*?)LINK2(.*?)<\/a>/");
$replace = array("", "");
$result = preg_replace($find, $replace, $str);

And yeah, as noted in other comments you shouldn't rely on regular expressions for manipulating HTML code (because it is really easy to construct valid HTML code that will go through the expression unnoticed). However, I believe it is perfectly ok if you trust the HTML code that you parse or that the result of this matching is not crucial for other important functions.

Sign up to request clarification or add additional context in comments.

5 Comments

Thank you so much for your help and detailed explanation! This seams to work well but you and Lix are saying that I shouldn't use regular expressions so I'm going to look into DOM parsers.. hopefully, it won't be much harder :)
It all depends on how you use it. Bad usage: Using it to remove unwanted contents from text coming from web visitors (like a filtering system for blog comments). Ok usage: Using it to do stuff with HTML code that you have written earlier (or another source which impossibly have the intention to hack you). Another semi-ok usage: Scanning through another web page for stuff.
ok Alaeus, my content come from trusted sources only so I should be able to use REGEXP then! Thank you for your comment. Do you guys also know how I could match links that contains "@" and numbers "1"?
I'm not sure I understand. /<a(.*?)>[@\d]+<\/a>/ will match links that only contains @ and numbers. Is it that what you were after?
Sorry Alaeus, I just wanted to remove an email address so this do the trick : $find= '[email protected]' My question was stupid, I wasn't using QUOTES that's why I had an error...
0

try this:

<?php
$str='<p>Some <a href="#">link</a> with <a href="http://whatever.html?bla">LINK2</a> and <a href="http://whatever.html?bla" target="_blank">LINK3</a> and</p> more html';
$find = array("/<a(.*)LINK1(.*)<\/a>/si", "/<a(.*)LINK2(.*)<\/a>/si");
$replace = array("", "");
$result=preg_replace($find, $replace, $str);

2 Comments

Thanks for your reply, unfortunately this seams to replace much more than just the link

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.