Disclaimer:
As others have pointed out, using regex to parse non-regular languages is fraught with peril! It is best to use a dedicated parser specifically designed for the job, especially when parsing the tag soup that is HTML.
That said...
If you insist on using a regular expression, here is a regex solution that will do a pretty good job:
text = Regex.Replace(text, @"
# Change HTML element class attribute value: 'abc' to: 'xyz'.
( # $1: Everything up to 'abc'.
<\w+ # Begin (X)HTML element open tag.
(?: # Match any attribute(s) preceding 'class'.
\s+ # Whitespace required before each attribute.
(?!class\b) # Assert this attribute name is not 'class'.
[\w\-.:]+ # Required attribute name.
(?: # Begin optional attribute value.
\s*=\s* # Attribute value separated by =.
(?: # Group for attrib value alternatives.
""[^""]*"" # Either a double quoted value,
| '[^']*' # or a single quoted value,
| [\w\-.:]+ # or an unquoted value.
) # End group for attrib value alternatives.
)? # End optional attribute value.
)* # Zero or more attributes may precede class.
\s+ # Whitespace required before class attribute.
class # Literal class attribute name.
\s*=\s* # Attribute value separated by =.
(?: # Group for attrib value alternatives.
"" # Either a double quoted value.
[^""]*? # Zero or more classes may precede 'abc'.
| ' # Or a single quoted value.
[^']*? # Zero or more classes may precede 'abc'.
)? # Or 'abc' class attrib value is unquoted.
) # End $1: Everything up to 'abc'.
(?<=['""\s=]) # Assert 'abc' not part of '123-abc'.
abc # Match the 'abc' in class attribute value.
(?=['""\s>]) # Assert 'abc' not part of 'abc-123'.",
"$1xyz", RegexOptions.IgnorePatternWhitespace);
Example input:
class=abc ... class="abc" ... class='abc'
class = abc ... class = "abc" ... class = 'abc'
class="123 abc 456" ... class='123 abc 456'
class="123-abc abc 456-abc" ... class='123-abc abc 456-abc'
class="abc-123 abc abc-456" ... class='abc-123 abc abc-456'
Example output:
class=xyz ... class="xyz" ... class='xyz'
class = xyz ... class = "xyz" ... class = 'xyz'
class="123 xyz 456" ... class='123 xyz 456'
class="123-abc xyz 456-abc" ... class='123-abc xyz 456-abc'
class="abc-123 xyz abc-456" ... class='abc-123 xyz abc-456'
Note that there will always be edge cases where this solution will fail. e.g. Evil strings within CDATA sections, comments, scripts, styles and tag attribute values can trip this up. (See disclaimer above.) That said, this solution will do a pretty good job for many cases (but will never be 100% reliable!)
Edit: 2011-10-10 14:00 MDT Streamlined overal answer. Removed first regex solution. Modified to correctly ignore classes having similar names like: abc-123 and 123-abc.
(?(abc))do?classattribute may be quoted using"or'quotes and may or may not have spacing including tabs to mess with regular parsers) and the fact that the stringclass='abc'can appear in all sorts of contexts (plain text, etc) - I think your particular problem can be solved purely with regexes, but will either have false positives or negatives depending upon your exact requirements or take a LOT more work than you think.(?(abc)). I don't think that's the problem in this case, I am just curious if it is an expression new to me.