0

Would like to remove any attribute within html tags and I think this can be achieved using regex but I'm not good at using regex.

Tried working with str_replace but it's just not the right way to go. And I've searched for questions similar to this but could not find any.

Example:

Got html tags like this within a variable:

$str = '
<p class="class_style" style="font-size: medium; line-height: normal; letter-spacing: normal;">content</p>
<span class="another_class_style" style="font-size: medium; line-height: normal; letter-spacing: normal;">content</span>
<ul class="another_class_style" style="background:#006;"></ul>
<li class="another_class_style" style=" list-style:circle; color:#930;">content</li>';

Call to certain preg_match()

$new_str = preg_match('', $str)

Expected Output:

$new_str = '
<p>content</p>
<span>content</span>
<ul></ul>
<li>content</li>';

Pls note that I dont intend to strip off the html tags, rather i just need to remove any tag elements within the tags.

php strip_tags() isn't an option

Would be grateful getting help with this.

0

3 Answers 3

1

While regex can do the task, it's generally encouraged to use DOM functions for filtration or other HTML manipulation. Here is a reusable class that uses the DOM method for removing unwanted properties. You simply set which HTML tags and properties you want, and it filters out unwanted HTML portions.

class allow_some_html_tags {
    var $doc = null;
    var $xpath = null;
    var $allowed_tags = "";
    var $allowed_properties = array();

    function loadHTML( $html ) {
        $this->doc = new DOMDocument();
        $html = strip_tags( $html, $this->allowed_tags );
        @$this->doc->loadHTML( $html );
        $this->xpath = new DOMXPath( $this->doc );
    }
    function setAllowed( $tags = array(), $properties = array() ) {
        foreach( $tags as $allow ) $this->allowed_tags .= "<{$allow}>";
        foreach( $properties as $allow ) $this->allowed_properties[$allow] = 1;
    }
    function getAttributes( $tag ) {
        $r = array();
        for( $i = 0; $i < $tag->attributes->length; $i++ )
            $r[] = $tag->attributes->item($i)->name;
        return( $r );
    }
    function getCleanHTML() {
        $tags = $this->xpath->query("//*");
        foreach( $tags as $tag ) {
            $a = $this->getAttributes( $tag );
            foreach( $a as $attribute ) {
                if( !isset( $this->allowed_properties[$attribute] ) )
                    $tag->removeAttribute( $attribute );
            }
        }
        return( strip_tags( $this->doc->saveHTML(), $this->allowed_tags ) );
    }
}

The class uses strip_tags twice - once to quickly eliminate unwanted tags, and then after the properties have been removed from the remainder, it eliminates the additional tags inserted by DOM functions (doctype, html, body). To use, simply do this:

$comments = new allow_some_html_tags();
$comments->setAllowed( array( "p", "span", "ul", "li" ), array("tabindex") );
$comments->loadHTML( $str );
$clean = $comments->getCleanHTML();

The setAllowed function takes two arrays - a set of allowed tags, and set of allowed properties (if you later decide you want to preserve some.) I've altered your input string to contain an added tabindex="1" property somewhere to illustrate the filtering. Output of $clean is:

<p>content</p>
<span>content</span>
<ul tabindex="3"></ul><li>content</li>
Sign up to request clarification or add additional context in comments.

Comments

0
$str = '
<p class="class_style" style="font-size: medium; line-height: normal; letter-spacing: normal;">content</p>
<span class="another_class_style" style="font-size: medium; line-height: normal; letter-spacing: normal;">content</span>
<ul class="another_class_style" style="background:#006;"></ul>
<li class="another_class_style" style=" list-style:circle; color:#930;">content</li>';

$clean = preg_replace('/ .*".*"/', '', $str);

echo $clean;

Will return:

<p>content</p>
<span>content</span>
<ul></ul>
<li>content</li>

But please don't use regex for parsing HTML, use a DOM parser.

Comments

0

Easiest way for removing html tags in php is strip_tags()

Or you can remove through

preg_replace("/<.*?>/", "", $str);

1 Comment

OP is looking for a way to remove the attributes, not the tags themselves

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.