8

I'm trying to remove all html tags except p, a and img tags. Right now I have:

content.replace(/(<([^>]+)>)/ig,""); 

But this removes all HTML tags.

This are examples of the content of the api:

    <table id="content_LETTER.BLOCK9" border="0" width="100%" cellspacing="0" cellpadding="0" bgcolor="#F7EBF5">
<tbody><tr><td class="ArticlePadding" colspan="1" rowspan="1" align="left" valign="top"><div>what is the opposite of...[] rest of text
5
  • You can go two ways, either fix your pattern to not match those tags or change your replace from an empty string to a function that checks the tag and returns the match if you want to keep it or an empty string if not. Commented May 16, 2017 at 18:36
  • add sample input - desired output - current output ... Commented May 16, 2017 at 18:36
  • Its for an app that's load WordPress data from a post with a lot of styling but that's undesirable for the app. Commented May 16, 2017 at 18:40
  • You can still make a simple example input/output without having tons of formatting. Even something as simple as <b>remove these</b><p>keep these</p> would be fine. Commented May 16, 2017 at 18:41
  • for example: <table id="content_LETTER.BLOCK9" border="0" width="100%" cellspacing="0" cellpadding="0" bgcolor="#F7EBF5">↵<tbody>↵<tr>↵<td class="ArticlePadding" colspan="1" rowspan="1" align="left" valign="top">↵<div>what is the opposite of... rest of text Commented May 16, 2017 at 18:45

3 Answers 3

14

You may match the tags to keep in a capture group and then, using alternation, all other tags. Then replace with $1:

(<\/?(?:a|p|img)[^>]*>)|<[^>]+>

Demo: https://regex101.com/r/Sm4Azv/2

And the JavaScript demo:

var input = 'b<body>b a<a>a h1<h1>h1 p<p>p p</p>p img<img />img';
var output = input.replace(/(<\/?(?:a|p|img)[^>]*>)|<[^>]+>/ig, '$1');
console.log(output);

Sign up to request clarification or add additional context in comments.

1 Comment

This would also omit <pre></pre> or any other tag which starts with a, p or img
8

You can use the below regex to remove all HTML tags except a, p and img:

<\/?(?!a)(?!p)(?!img)\w*\b[^>]*>

Replace with an empty string.

var text = '<tr><p><img src="url" /> some text <img another></img><div><a>blablabla</a></div></p></tr>';
var output = text.replace(/<\/?(?!a)(?!p)(?!img)\w*\b[^>]*>/ig, '');
console.log(output);

Regex 101 Demo

Comments

0
var input = 'b<p on>b <p>good p</p> a<a>a h1<h1>h1 p<pre>p p</p onl>p img<img src/>img';
var output = input.replace(/(<(?!\/?((a|img)(\s+[^>]+)*|p)\s*>)([^>]+)>)/ig, '');
console.log(output);
output: bb <p>good p</p> a<a>a h1h1 pp pp img<img src/>img

And if you'd like to remove JS event handler attributes:

var input = 'b<p on>b <p>good p</p> a<a>a h1<h1>h1 p<pre>p p</p onl>p img<img src="y.gif" /> see <img src="x.png" onerror alt="cat" /> there';
var output = input.replace(/(<(?!\/?((a|img)(\s+((?!on)[^>])+)*|p)\s*>)([^>]+)>)/ig, '');
console.log(output);
output: bb <p>good p</p> a<a>a h1h1 pp pp img<img src="y.gif" /> see  there

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.