0

I need to figure out a good way using C# to parse an XML file for (NULL) and remove it from the tags and replace it with the word BAD.

For example:

<GC5_(NULL) DIRTY="False"></GC5_(NULL)>

should be replaced with

<GC5_BAD DIRTY="False"></GC5_BAD>

Part of the problem is I have no control over the original XML, I just need to fix it once I receive it. The second problem is that the (NULL) can appear in zero, one, or many tags. It appears to be an issue with users filling in additional fields or not. So I might get

<GC5_(NULL) DIRTY="False"></GC5_(NULL)>

or

<MH_OTHSECTION_TXT_(NULL) DIRTY="False"></MH_OTHSECTION_TXT_(NULL)>

or

<LCDATA_(NULL) DIRTY="False"></LCDATA_(NULL)>

I am a newbie to C# and programming.

EDIT: So I have come up with the following function that while not pretty, so far work.

public static string CleanInvalidXmlChars(string fileText)
    {
        List<char> charsToSubstitute = new List<char>();
        charsToSubstitute.Add((char)0x19);
        charsToSubstitute.Add((char)0x1C);
        charsToSubstitute.Add((char)0x1D);
        foreach (char c in charsToSubstitute)
            fileText = fileText.Replace(Convert.ToString(c), string.Empty);

        StringBuilder b = new StringBuilder(fileText);
        b.Replace("&#x0;", string.Empty);
        b.Replace("&#x1C;", string.Empty);
        b.Replace("<(null)", "<BAD");
        b.Replace("(null)>", "BAD>");

        Regex nullMatch = new Regex("<(.+?)_\\(NULL\\)(.+?)>");
        String result = nullMatch.Replace(b.ToString(), "<$1_BAD$2>");

        result = result.Replace("(NULL)", "BAD");

        return result;
    }

I have only been able to find 6 or 7 bad XML files to test this code on, but it has worked on each of them and not removed good data. I appreciate the feedback and your time.

3
  • 3
    Why are not you reading file as text,replacing all (NULL) with BAD and writing the text to file? Commented Jun 1, 2018 at 14:43
  • 1
    Is there a specific reason, why a simple string.Replace wouldn't work? -- as in contentOfXml.Replace("(NULL)", "BAD") Commented Jun 1, 2018 at 14:44
  • You can't parse that with a conventional XML parser, because it's not valid XML. ( can't be included in a name. You'll have to do this with string manipulation. Commented Jun 1, 2018 at 14:53

2 Answers 2

2

In general, regular expressions are not the right way of handling XML files. There's a range of solutions to handle XML files correctly - you can read up on System.Xml.Linq for a good start. If you're a newbie, it's certainly something you should learn at some point. As Ed Plunkett pointed out in the comments, though, your XML is not actually XML: ( and ) characters are not allowed in XML element names.

Since you will have to do it as an operation on a string, Corak's comment to use

contentOfXml.Replace("(NULL)", "BAD");

may be a good idea, but will break if any elements can contain the string (NULL) as anything other than their name.

If you want a regex approach, this might work decently, but I'm not sure if it's not missing any edge cases:

var regex = new Regex(@"(<\/?[^_]*_)\(NULL\)([^>]*>)");
var result = regex.Replace(contentOfXml, "$1BAD$2");
Sign up to request clarification or add additional context in comments.

Comments

0

Will it be suitable for you to read this XML as a string and perform a regex replacement? Like:

Regex nullMatch = new Regex("<(.+?)_\\(NULL\\)(.+?)>");
String processedXmlString = nullMatch.Replace(originalXmlString, "<$1_BAD$2>");

1 Comment

Thank you both, that is very helpful information. We have let the developer of the software who creates the XML know of the issue with the (null) but it does not look like they are going to fix it any time soon. This should help me create a filter to fix these bad XML files once we receive them. (Mainly because I am a tech and at the moment have to do it by hand)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.