1

I want to find all empty HTML tags in a string, eg:

<div></div>
<span>test</span>
<a></a>

and add a space or a character to all of the empty tags in that string:

<div>something</div>
<span>test</span>
<a>something</a>

I've got an regex that matches all empty tags, but I'm not sure what's the best way replace the tags.

Regex:

<(\w+)(?:\s+\w+="[^"]+(?:"\$[^"]+"[^"]+)?")*>\s*</\1>
3
  • 1
    Tip: use HtmlAgilityPack (regex is overkill) Commented Jul 26, 2013 at 8:55
  • 1
    regex is not overkill. it is "underkill" :) Use a HTML/XML parser and interate over the DOM tree - this'll save you a lot of pain. Commented Jul 26, 2013 at 8:58
  • From what I've read the agilitypack can mess up the rest of the html making "fixes" to certain tags like <img />.. Commented Jul 26, 2013 at 8:59

3 Answers 3

3

Use HtmlAgilityPack

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
foreach(HtmlNode node in doc.DocumentElement.SelectNodes("//*").Where(x=>x.InnerText==""))
{
       node.ParentNode.ReplaceChild(HtmlTextNode.CreateNode(input), node);
}
doc.Save(yourFile);
Sign up to request clarification or add additional context in comments.

3 Comments

What If I dont got a complete html document? I just have a small part of a larger document and I just want to replace some tags.. Still the way to go? I feel like the crowd just have decided that if HTML and Regex is mencioned in a question HTML parser is the only option...
@f01 No matter in what form your html is in..be it incomplete,no end tags!,not an html!!! this parser would still work perfectly without any issues..
@f01 you could use a regex like (?<=<.*?[^/]>)(?=</.*?>) and replace it with something.. BUT i can give you 1000 cases where you would break your application
1

Description

Handling this via regex is probably not the best way to go, however because there may be reasons for using a regular expression such as "I'm not allowed to install HTMLAgilityPack" then this expression will:

  • find all tags which are simply open tag followed by a close tag
  • will avoid many of the edge cases that make pattern matching in HTML with regex difficult

Regex: (<(\w+)(?=\s|>)(?:[^'">=]*|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*>)(<\/\2>)

Replace with: $1~~~NewValue~~~$3

enter image description here

Example

Live Demo

Sample Text

Note the first line has some really difficult edge cases

<a onmouseover=' str=" <a></a> " ; if ( 6 > 4 ) { funDoSomething(str); } '></a>
<div></div>
<span>test</span>
<a></a>

Text After Replacement

<a onmouseover=' str=" <a></a> " ; if ( 6 > 4 ) { funDoSomething(str); } '>~~~NewValue~~~</a>
<div>~~~NewValue~~~</div>
<span>test</span>
<a>~~~NewValue~~~</a>

Comments

0

Use Html Agility Pack for Html Parsing never regex.

1 Comment

-1 because your proposed solution doesn't really provide an answer to the question. At best this is some vague direction to a rather specific request.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.