3

I have such broken XML:

<root>
   <Abc Dfg Xyz>data data data</Abc Dfg Xyz>
   <Kmn fsd>data data</Kmn fsd>
   <Aa bb/>
</root>    

How can I replace whitespaces with underscores in node names to fix xml format, but leave them in data using Regex.Replace?

I need such kind of a document:

<root>
   <Abc_Dfg_Xyz>data data data</Abc_Dfg_Xyz>
   <Kmn_fsd>data data</Kmn_fsd>
   <Aa_bb/>
</root>

Thanks in advance.

3
  • 1
    Ever read this answer about parsing XML with regex? It actually is about HTML, but I guess the same applies to your case. Commented Jul 29, 2013 at 22:06
  • Bottom line is: don't do it. Commented Jul 29, 2013 at 22:08
  • 4
    How did you end up with broken XML? Probably fixing the source that is generating the XML is easier that fixing the broken XML. Commented Jul 29, 2013 at 22:54

1 Answer 1

3

It isn't a good idea to parse XML with regexes unless you understand your data. I would argue that in some limited cases it can be very helpful. @HighCore, see this answer to the same question.

We're not trying to understand all possible input in the world—we're trying to make something that works in a specific case. So, if you know that your input doesn't have < or > in the data, only in the node names, you can use a regex.

In C#, use a MatchEvaluator like so:

class MyReplacer {
   public string ReplaceSpaces(Match m)
   {
        return m.Value.Replace(" ", "_");
   }

void replacingMethod() {

   ...

   Regex re = new Regex("<.*>");

   MyReplacer r = new MyReplacer();
   // Assign the replace method to the MatchEvaluator delegate.
   MatchEvaluator myEvaluator = new MatchEvaluator(r.ReplaceSpaces);

   // Replace matched characters using the delegate method.
   sInput = re.Replace(sInput, myEvaluator);
}
Sign up to request clarification or add additional context in comments.

1 Comment

+1 - In most scenarios XML and HTML shouldn't be parsed with Regex. However I agree that this case is specific enough to warrant using regex (assuming OP has given all the information). The string in OP's case is no longer XML, it just looks like XML.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.