0

I have a large string and it might have the following:

<div id="Specs" class="plinks">
<div id="Specs" class="plinks2">
<div id="Specs" class="sdfsf">
<div id="Specs" class="ANY-OTHER_NAME">

How can I replace values in the string from anything above to:

<div id="Specs" class="">

this is what I came up with, but it does not work:

        string source = "bunch of text";
        string regex = "<div id=\"Specs\" class=[\"']([^\"']*)[\"']>";
        string regexReplaceTo = "<div id=\"Specs\" class=\"\">";
        string output = Regex.Replace(source, regex, regexReplaceTo); 
1
  • The funny thing is that it is working! :\ I was using incorrect source string to do it! Dunno! Thanks everyone for your help! Commented Mar 20, 2012 at 14:25

3 Answers 3

4

What about...

  • Regex to match : class=\"[A-Za-z0-9_\-]+\"
  • Replace with : class=\"\"

This way, we ignore the first part (id="Specs", etc) and just replace the class name... with nothing.

Sign up to request clarification or add additional context in comments.

2 Comments

But what if he only wants to clear the class attributes of #specs divs? Which I (i could be wrong) presume he does?
well, if that is the case, you could add the frontmost part as well, I suppose... like : \id=\"Specs\" class=\"[A-Za-z0-9_\-]+\"
4

Looks like another case of http://www.codinghorror.com/blog/2008/06/regular-expressions-now-you-have-two-problems.html. What happens to the following valid tags with a Regex?

<div class="reversed" id="Specs">            
<div  id="Specs"  class="additionalSpaces" >     
<div id="Specs" class="additionalAttributes" style="" >

I don't see a how using Linq2Xml wouldn't work with any combination:

XElement root = XElement.Parse(xml); // XDocument.Load(xmlFile).Root 
var specsDivs = root.Descendants()
                    .Where(e => e.Name == "div"
                           && e.Attributes.Any(a => a.Name == "id")
                           && e.Attributes.First(a => a.Name == "id").Value == "Specs"
                           && e.Attributes.Any(a => a.Name == "class"));
foreach(var div in specsDivs)
{
  div.Attributes.First(a => a.Name == "class").value = string.Empty;
}
string newXml = root.ToString()    

Comments

2

If your input isn't XML compliant, which most HTML isn't, then you can use the HTML Agility Pack to parse the HTML and manipulate the contents. With the HTML Agility PAck, combined with Linq or Xpath, the order of your attributes no longer matters (which it does when you use Regex) and the overall stability of your solution increases a lot.

Using the HTML Agility Pack (project page, nuget), this does the trick:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("your html here"); 
// or doc.Load(stream);

var nodes = doc.DocumentNode.DescendantNodes("div").Where(div => div.Id == "Specs");

foreach (var node in nodes)
{
    var classAttribute = node.Attributes["class"];
    if (classAttribute != null)
    {
        classAttribute.Value = string.Empty;
    }
}

var fixedText = doc.DocumentNode.OuterHtml;
//doc.Save(/* stream */);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.