The sad truth about this post is that I have poor regex skills. I recently came across some code in an old project that I seriously want to do something about. Here it is:
strDocument = strDocument.Replace("font size=""1""", "font size=0.2")
strDocument = strDocument.Replace("font size='1'", "font size=0.2")
strDocument = strDocument.Replace("font size=1", "font size=0.2")
strDocument = strDocument.Replace("font size=""2""", "font size=1.5")
strDocument = strDocument.Replace("font size='2'", "font size=1.5")
strDocument = strDocument.Replace("font size=2", "font size=1.5")
strDocument = strDocument.Replace("font size=3", "font size=2")
strDocument = strDocument.Replace("font size=""3""", "font size=2")
strDocument = strDocument.Replace("font size='3'", "font size=2")
I'm guessing there is some easy regex pattern out there that I could use to find different ways of quoting attribute values and replace them with valid syntax. For example if somebody wrote some HTML that looks like:
<tag attribute1=value attribute2='value' />
I'd like to be able to easily clean that tag so that it ends up looking like
<tag attribute1="value" attribute2="value" />
The web application I'm working with is 10 years old and there are several thousand validation errors because of missing quotes and tons of other garbage, so if anybody could help me out that would be great!
EDIT:
I gave it a whirl (found some examples), and have something that will work, but would like it to be a little smarter:
Dim input As String = "<tag attribute=value attribute='value' attribute=""value"" />"
Dim test As String = "attribute=(?:(['""])(?<attribute>(?:(?!\1).)*)\1|(?<attribute>\S+))"
Dim result As String = Regex.Replace(input, test, "attribute=""$2""")
This outputs result correctly as:
<tag attribute="value" attribute="value" attribute="value" />
Is there a way I could change (and simplify!) this up a bit so that I could make it look for any attribute name?
UPDATE:
Here's what I have so far based on the comments. Perhaps it could be improved even more:
Dim input As String = "<tag border=2 style='display: none' width=""100%"" />"
Dim test As String = "\s*=\s*(?:(['""])(?<g1>(?:(?!\1).)*)\1|(?<g1>\S+))"
Dim result As String = Regex.Replace(input, test, "=""$2""")
which produces:
<tag border="2" style="display: none" width="100%" />
Any further suggestions? Otherwise I think I answered my own question, with your help of course.
('|"")is much less efficient. That was bad advice.