1

I have some javascript that is looking at a string of text based on a users selection and wrapping the string in a <span> tag. What I'm looking for a regular expression that would look at the string of text and check for existing html tags in the string, and if they exist, break up the span so that it doesn't invalidate the html.

For example, lets say i have the following text

<p>Lorem ipsum dolor sit amet, <i>consectetur adipiscing elit</i>. 
Curabitur tortor risus, facilisis vitae bibendum sit amet, mattis non dui.</p>

And the user selects "amet, <i>consectetur". The string should end up as "<span>amet, </span><i><span>consectetur</span>" as opposed tp "<span>amet, <i>consectetur</span>"

1
  • 2
    It will be difficult to use regex to do the job. Usually regex are a bad idea for HTML parsing. Example: I suspect that if the closing tag appears within the selection you want to create only one span block? In that case, semantic comes into the process and regex will not fit. Look at HTML parsers. Commented Mar 23, 2011 at 15:12

1 Answer 1

3

HTML shouldn't be parsed with RegEx. See: RegEx match open tags except XHTML self-contained tags

Sign up to request clarification or add additional context in comments.

4 Comments

Please elaborate a bit more (without quoting Bobince's now famous rant in its entirety).
i would love to avoid regex to solve this issue. what would be a better approach?
+1 for alerting user to problems of using RegEx with HTML. However, note that question you reference also has an answer posted indicating that in certain limited cases it is reasonable to parse HTML with regular expressions. It is categorically true that that you cannot use a regular expression to parse the structure of HTML. But you can use a regex to determine what is a tag and what is text. That is a very different problem. I think the poster's question in this case can, in fact, be handled by a regular expression. (Though there may be better ways to solve the problem.)
well, the pros and cons are already in the thread mentioned so I won't repeat them. And yes there are cases where parsing html with RegEx can make sense (but is still not recommended). I would recommend having a look at HTML DOM parsers, which seem more natural to me for this task.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.