0

I have some text that contains HTML (to be rendered in the browser), as well as arbitrary strings with <>. Is there a way to escape those arbitrary tags, but preserve the HTML? If it helps, the HTML being parsed is very strictly governed, and only a subset of tags is allowed (b, i, strong, br)

For example. Given this text:

<strong>Foobar</strong> <some other whatever>

I need

<strong>Foobar</strong> &lt;some other whatever&gt;
7
  • 1
    as one can add own custom elements nowadays (or in near future) it will be quite hard for the script to decide, what is a HTML tag and what is not. In your example <some> could be a valid HTML tag as well with two (empty) attributes other and whatever. Commented Aug 29, 2016 at 15:32
  • In order to do that, you'd have to be able to distinguish valid HTML tags from other things enclosed in angle-brackets. As the spec has become very flexible in that regard, this may be impossible when working with HTML 5. Commented Aug 29, 2016 at 15:33
  • I don't know what are you trying to accomplish, but it should be good to not mix things...is it possible to change the arbitrary string delimiter to something different like {some other wathever}..... Commented Aug 29, 2016 at 15:34
  • We have very strict controls over the HTML we're parsing; only a subset of tags are allowed. Commented Aug 29, 2016 at 15:36
  • "We have very strict controls over the HTML we're parsing; only a subset of tags are allowed." Is the subset list of allowed tags stored in an array or object? Commented Aug 29, 2016 at 15:38

1 Answer 1

1

A cheap option would be to replace <> with placeholders, and then restore them in "good" contexts:

allowedTags = ['strong', 'em', 'p'];

text = '<strong>Foobar</strong> <some other whatever> <b>??</b> <em>hey</em>'

text = text
  .replace(/</g, '\x01')
  .replace(/>/g, '\x02')
  .replace(new RegExp('\x01(/?)(' + allowedTags.join('|') + ')\x02', 'g'), "<$1$2>")
  .replace(/\x01/g, '&lt;')
  .replace(/\x02/g, '&gt;')

console.log(text)

A not-so-cheap, but more correct solution is to use an (event driven) html parser and escape unwanted stuff as you go.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.