Please note
Using a regex is not the best way to modify HTML code!
In most situations it is better and much more reliable to use a DOMDocument or DOMDocumentFragement object to modify or extract data from HTML code.
However, there are valid scenarios where a regex is better, mainly when these factors apply:
- You know that the HTML code that you edit is going to be valid.
- The HTML structure that is modified will be identical in all cases.
- You're doing only very simple changes to the code.
- Performance is important (e.g. when it is executed inside a loop). DOMDocument is considerably slower than a simple regex!
The code
To strip the outermost tag from some HTML code use this regex:
/* Note:
* The code must start with an opening tag and end with a closing tag.
* No white space or other text must be present before the first
* tag/after the last tag, else you get some unexpected results.
*/
$contents = preg_replace( '/^<[^>]+>|<\/[^>]+>$/', '', $markup );
// ^<[^>]+> This removes the first tag
// <\/[^>]+>$ This removes the last closing tag
Examples
This regex works for most HTML markup e.g.
In: '<div class="my-text" id="text" style="color:red">some text</div>'
Out: 'some text' (expected result)
When the first tag contains the ">" character it's going to break everything, e.g.
In: '<div title="Home > Archives">Archive overview</div>'
Out: ' Archives">Archive overview' (unexpected result)
Also whitespace/text in the start or end will break the regex
In: '<div>Your name</div>:'
Out: 'Your name</div>:' (unexpected result)
And of course, any tag will be stripped, without any sanity check, e.g.
In: '<h2>Settings</h2><label>Page Title</label>'
Out: 'Settings</h2><label>Page Title' (unexpected result)
<div>bla <br> bla</div>and now i just need to strip the actual outer tag (div in this case) and keep the content with tags.$html = $domElement->ownerDocument->saveHTML($domElement);should return the content of the Dom node in $html without stripping the tags within it