2

Say I have

var string = 
"<h1>Header</h1>
<p>this is a small paragraph</p>
<ul>
    <li>list element 1.</li>
    <li>list element 2.</li>
    <li>list element 3. With a small update.</li>
</ul>"
//newlines for clarity only

How can I split this string, using javascript so that I get

var array = string.split(/*...something here*/)

array = [
"<h1>Header</h1>",
"<p>this is a small paragraph</p>",
"<ul><li>list element 1.</li><li>list element 2.</li><li>list element 3. With a small update.</li></ul>"
]

I only want to split the top html elements, not the children.

3 Answers 3

3

You could do something like this:

var string = '<div><p></p></div><h1></h1>';
var elements = $(string).map(function() {
    return $('<div>').append(this).html();  // Basically `.outerHTML()`
});

And the result:

["<h1>Header</h1>", "<p>this is a small paragraph</p>", "<ul>    <li>list element 1.</li>    <li>list element 2.</li>    <li>list element 3. With a small update.</li></ul>"]
Sign up to request clarification or add additional context in comments.

5 Comments

I know I added the jquery tag, but is there a way of doing it without jquery? and even in non-browser javascript?
@EoinMurray: Non-browser JavaScript?
like in nodejs. I vaguely remember solutions where people would attach the string to the dom and then read the children, I mean without using the dom.
@EoinMurray: You can run jQuery in Node without any problems.
What I don't like about this answer is that if the string contains an img.src then just by parsing it with jquery it makes a request.
2

A performant solution ( http://jsperf.com/spliting-html ):

var splitter = document.createElement('div'),
  text = splitter.innerHTML = "<h1>Header</h1>\
<p>this is a small paragraph</p>\
<ul>\
    <li>list element 1.</li>\
    <li>list element 2.</li>\
    <li>list element 3. With a small update.</li>\
</ul>",
  parts = splitter.children,
  part = parts[0].innerHTML;

Comments

1

You can't do this with regular expressions. Your regular expression will fail if you have several nested elements of the same type, e.g.

<div>
  <div>
    <div>
    </div>
  </div>
</div>

This is due to the fact that regular expressions can only process regular languages, and HTML is a real context-free language (and context-free is "more complex" than regular).

See also: https://stackoverflow.com/a/1732454/2170192

But if you don't have nested elements of the same type, you may split your html-string by taking all matches returned by the following regular expression (which uses backlinks):

/<(\w+).*<\/\1\s*>/igsm
  • <(\w+) matches less-than-sign and several word-characters (letters, digits, underscores), while capturing the word-characters via parentheses (first capturing group).
  • .* matches contents of the element.
  • <\/ matches opening of the end-tag.
  • \1 is the backreference which matches exactly the sequence of symbols captured via the first capturing group.
  • \s*> matches optional whitespace and the greater-than sign.
  • igsm are modifiers: case-insensitive, global, dot-matches-all-symbols and multi-line.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.