I need to figure out a good way using C# to parse an XML file for (NULL) and remove it from the tags and replace it with the word BAD.
For example:
<GC5_(NULL) DIRTY="False"></GC5_(NULL)>
should be replaced with
<GC5_BAD DIRTY="False"></GC5_BAD>
Part of the problem is I have no control over the original XML, I just need to fix it once I receive it. The second problem is that the (NULL) can appear in zero, one, or many tags. It appears to be an issue with users filling in additional fields or not. So I might get
<GC5_(NULL) DIRTY="False"></GC5_(NULL)>
or
<MH_OTHSECTION_TXT_(NULL) DIRTY="False"></MH_OTHSECTION_TXT_(NULL)>
or
<LCDATA_(NULL) DIRTY="False"></LCDATA_(NULL)>
I am a newbie to C# and programming.
EDIT: So I have come up with the following function that while not pretty, so far work.
public static string CleanInvalidXmlChars(string fileText)
{
List<char> charsToSubstitute = new List<char>();
charsToSubstitute.Add((char)0x19);
charsToSubstitute.Add((char)0x1C);
charsToSubstitute.Add((char)0x1D);
foreach (char c in charsToSubstitute)
fileText = fileText.Replace(Convert.ToString(c), string.Empty);
StringBuilder b = new StringBuilder(fileText);
b.Replace("�", string.Empty);
b.Replace("", string.Empty);
b.Replace("<(null)", "<BAD");
b.Replace("(null)>", "BAD>");
Regex nullMatch = new Regex("<(.+?)_\\(NULL\\)(.+?)>");
String result = nullMatch.Replace(b.ToString(), "<$1_BAD$2>");
result = result.Replace("(NULL)", "BAD");
return result;
}
I have only been able to find 6 or 7 bad XML files to test this code on, but it has worked on each of them and not removed good data. I appreciate the feedback and your time.
contentOfXml.Replace("(NULL)", "BAD")(can't be included in a name. You'll have to do this with string manipulation.