2

I want to get rid of ':' within the XML elements tags only, using regex in C#.

I am aware that parsing the document is the way to go instead of regex..but it is a legacy project and it uses Regex to replace the XML Document content. Not the ideal method to process XML Document, but nothing I can do about.

I am not good with regular expressions and just can't figure out a way to replace ':' only from the Element Tags and not values...

For example <tag:name> the value with the tag http://www.example.com </tag:name>

I want to replace : with _ only within the element name and not value. So the outcome should be :

<tag_name> the value with the tag http://www.example.com </tag_name>

Any idea?

Thanks!

2 Answers 2

2

This needle should do what you want:

<[^>]*(:)[^>]*>

The first pattern group will contain the (:) in the tag name. If you want to do a replacement you can replace (<[^>]*)(:)([^>]*>) with $1_$3 where $1 and $3 are sub-patterns.

Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for the reply. I am afraid it doesn't work as expected. I tried to replace using Regex.Replace(content, "(<[^>]*)(:)([^>]*>)", "_"), but it also replaces all the empty tags with '_'
I believe you want, Regex.Replace(content, @"(<[^>]*)(:)([^>]*>)", "$1_$3").
Ah, my bad. Still not perfect though :| My XML file has an element, for example, <Assets:POS1_filedetails></Assets:POS1_filedetails>, and that doesn't convert to <Assets_POS1_file></Assets_POS1_file>
frb, that works for me, in .net: [regex]::replace('<Assets:POS1_filedetails></Assets:POS1_filedetails>','(<[^>]*)(:)([^>]*>)','$1_$3')
@Sharon, I think it should work there. If there were two or more colons in one tag, it would just replace the last one. (I don't think that's normal in XML, but I'm not sure.)
|
1

Does this work for you?

Regex tagRegex = new Regex("<[^>]+>");
yourXML = tagRegex.Replace(yourXML, delegate(Match thisMatch)
{
   return thisMatch.Value.Replace(":", "_");
});

2 Comments

It will actually replace colons appearing anywhere in a start tag, end tag, comment, or processing instruction, in text content if delimited by a CDATA section, and in various places in a DTD - including for example colons in an attribute name or attribute value, colons in a comment, or colons in the URI referring to an external DTD. Yes, you are right - regexes are not the way to do this.
@mikel, maybe if you change your needle to something like <\s*[^\s=>]+, it would be more exclusive. That way it won't match attributes. You could go a bit further and say it can't start with !, <\s*[^!][^\s=>]+

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.