0

I have a HTML string in PHP. It may have several anchor tags like this

.....<p><span>qwerty</span></p>...qwerty....<a href="www.xyz.com">xyz</a>qwerty...<a href="www.xyz.com"><p><span>xyz</span></p></a>qwerty.....

<a> tag may contain several other HTML tags like <p>,<span> <br> etc.

I want a regex express which removes everything inside <a> tag including <a> tag i.e. remove all anchor tags along with all the data inside anchor tags

Output should be : <p><span>qwerty</span></p>....qwerty....qwerty....qwerty....

Please note that there is no xyz in final output.

Thanks

P/s: String may contain other HTML tags which are not embedded in Anchor tags. I want to keep them. Lets say string may contain p,span,div,strong etc tags. Only a tags should be removed. I need regex.

2 Answers 2

2

You don't need any regex for this, just use strip_tags function to strip HTML tags from input:

$s = '.....qwerty....<a href="www.xyz.com">xyz</a>qwerty...<a href="www.xyz.com"><p><span>xyz</span></p></a>qwerty.....';

echo strip_tags($s);

//=> .....qwerty....xyzqwerty...xyzqwerty.....

Based on edited question: You can whitelist some tags to allow them in input:

$s = '.....<p><span>qwerty</span></p>...qwerty....<a href="www.xyz.com">xyz</a>qwerty...<a href="www.xyz.com"><p><span>xyz</span></p></a>qwerty.....';

echo strip_tags($s, '<p><span>');
//=> .....<p><span>qwerty</span></p>...qwerty....xyzqwerty...<p><span>xyz</span></p>qwerty.....

With all the pitfalls of HTML parsing using regex here is one to work with OP's:

echo preg_replace('~<a [^>]*>.*?</a>~', '', $s);
//=> .....<p><span>qwerty</span></p>...qwerty....qwerty...qwerty.....
Sign up to request clarification or add additional context in comments.

4 Comments

Actually i couldn't explain. String may contain other HTML tags which are not embedded in Anchor tags. I want to keep them. So your solution would remove them too.
Yes, you can allow certain tags in strip_tags. Check link of strip_tags documentation in my answer.
No. Its not like this. We can allow all other HTML tags (perhaps all except anchor tags). We want to remove only Anchor tag.
Parsing HTML with regex can be error prone. Prefer DOM parser for better control.
0

You could use DOMDocument rather than a regex to achieve the desired result

function removeanchors( $strhtml ){
    $dom=new DOMDocument;
    $dom->loadHTML( $strhtml );
    $col=$dom->getElementsByTagName('a');

    /* need to work backwards through collection of nodes! */
    for ( $i = $col->length; --$i >= 0; ) {
      $a = $col->item( $i );
      $a->parentNode->removeChild( $a );
    }

    return $dom->saveHTML();
}

$strhtml='.....qwerty....<a href="www.xyz.com">xyz</a>qwerty...<a href="www.xyz.com"><p><span>xyz</span></p></a>qwerty.....womble<a href="www.xyz.com"><p><span>xyz</span></p></a> ..... badger <a href="www.xyz.com"><p><span>xyz</span></p></a>';

echo removeanchors( $strhtml );

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.