0

I made this regex:

/\<+[a-zA-Z0-9\=\"\s]+\>+.+\<\/+[a-zA-Z0-9]+\>/gi

which matches a full html tag like:

<p>this is a paragraph</p>

But the problem with this that that it matches all of the elements as one match

<div><p>this is a paragraph</p></div>

But I would like to get all of the HTML elements separated.

Note: The HTML tags are in a string not in the DOM.

Before the regex solution I tried to create a new div element and I added the string as it's innerHTML. But doesn't worked properly I don't really know why...

So I'm looking for a REGEX solution which solves this one match problem.

Thanks

3
  • 1
    Show your innerHTML attempt. Commented Apr 3, 2011 at 18:49
  • I thought that somebody will ask me for it :D But I'm really curious how my current question can be figured out :) Commented Apr 3, 2011 at 18:51
  • you can't parse HTML with regex. Using the browser's existing parser through innerHTML (or some similar mechanism) is actually the right solution. Commented Apr 4, 2011 at 14:15

2 Answers 2

1

Replacing the inner +.+ with +[^<]+ would prevent it from matching the whole string, but regular expressions are not the correct choice for processing strings that contain nested components. For that you should be using a parser.

Regular expressions are simply the wrong tool for the job here.

Sign up to request clarification or add additional context in comments.

Comments

1

Regular expressions are not appropriate to handle html. As you mention that the HTML is not part of the DOM

Note: The HTML tags are in a string not in the DOM.

You can use JQuery to build an object from the HTML and use DOM selectors / traversion to work with it:

$(myHTMLString).find('p')...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.