1

For example, this html string:

Lorem <b>ipsum</b> dolor <span class="abc">sit</span> amet,<br/>consectetur <input value="ok"/> adipiscing elit.

into this array:

[ 
  'Lorem ',
  '<b>ipsum</b>',
  ' dolor ', 
  '<span class="abc">sit</span>', 
  ' amet,', 
  '<br/>', 
  'consectetur ', 
  '<input value="ok"/>', 
  'adipiscing elit.' 
]

Here is the example of html elements match:

const pattern = /<([A-Z][A-Z0-9]*)\b[^>]*>(.*?)<\/\1>|<([A-Z][A-Z0-9]*).*?\/>/gi;
let html = 'Lorem <b>ipsum</b> dolor <span class="abc">sit</span> amet,<br/>consectetur <input value="ok"/> adipiscing elit.'
let nodes = html.match(pattern);

console.log(nodes)

How to add the text nodes as well?

2
  • Is <b>ipsum<b/> a typo? Did you mean <b>ipsum</b>? Commented Apr 5, 2020 at 21:16
  • Yes, <b>ipsum<b/> is a typo. Thanks. Corrected. Commented Apr 6, 2020 at 5:59

1 Answer 1

4

If the HTML is formatted properly, consider using DOMParser instead, to select all children, then take each child's .outerHTML (for element nodes) or .textContent (for text nodes):

const str = `Lorem <b>ipsum</b> dolor <span class="abc">sit</span> amet,<br/>consectetur <input value="ok"/> adipiscing elit.`;

const doc = new DOMParser().parseFromString(str, 'text/html');
const arr = [...doc.body.childNodes]
  .map(child => child.outerHTML || child.textContent);
console.log(arr);

You don't have to use DOMParser - you could also put the string into an ordinary element on the page, then take that element's children, but that'll allow for arbitrary code execution, which should be avoided.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @CertainPerformance! I'm looping through an array of mixed strings and applying the code above. Do you know if this will cause performance issues? Any suggestions if it does?
A snippet of code like this one is extremely unlikely to have any performance implications at all unless you're doing something odd like parsing 100 pages a second - in which case downloading would be the bottleneck instead. A decent rule of thumb is to not think much about performance unless you're in a tight loop, or know that something needs to be optimized.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.