6

I have some legacy XML documents stored in a database as a blob, which are not well formed XML. I'm reading them in from a SQL database, and ultimately, as I am using C#.NET, would like to instantiate them as an XMLDocument.

When I try to do this, I obviously get an XMLException. Having looked at the XML documents, they are all failing because of undeclared namespaces in specific XML Nodes.

I am not concerned with any of the XML nodes which have this prefix, so I can ignore them or throw them away. So basically, before I load the string as an XMLDocument, I would like to remove the prefix in the string, so that

<tem:GetRouteID>
        <tem:PostCode>postcode</tem:PostCode>
        <tem:Type>ItemType</tem:Type>
</tem:GetRouteID>

becomes

<GetRouteID>
    <PostCode>postcode</PostCode>
    <Type>ItemType</Type>
</GetRouteID>

and this

<wsse:Security soapenv:actor="">
    <wsse:BinarySecurityToken>token</wsse:BinarySecurityToken>
</wsse:Security>

becomes this :

<Security soapenv:actor="">
    <BinarySecurityToken>token</BinarySecurityToken>
</Security>

I have one solution which does this like so :

<appSettings>
  <add key="STRIP_NAMESPACES" value="wsse;tem" />
</appSettings>
if (STRIP_NAMESPACES != null)
{
    string[] namespaces = Regex.Split(STRIP_NAMESPACES, ";");

    foreach (string ns in namespaces)
   {
        str2 = str2.Replace("<" + ns + ":", "<"); // Replace opening tag
        str2 = str2.Replace("</" + ns + ":", "</"); // Replace closing tag

    }
}

but Ideally I would like a generic approach for this, so I don't have to endlessly configure the namespaces I want to remove.

How can I achieve this in C#.NET. I am assuming that a Regex is the way to go here?

UPDATE 1

Ria's Regex below works well for the requirement above. However, how would I need to change the Regex to also change this

<wsse:Security soapenv:actor="">
    <BinarySecurityToken>authtoken</BinarySecurityToken>
</Security>

to this?

<Security>
    <BinarySecurityToken>authtoken</BinarySecurityToken>
</Security>

UPDATE 2

Think I've worked out the updated version myself based on Ria's answer like so :

<(/?)\w+:(\w+/?) ?(\w+:\w+.*)?>
6
  • I don't think it's a good idea to parse xml with regex. You can use XDocument, XElement, XmlDocument (if you use .NET 2.0). Commented Jul 31, 2012 at 9:58
  • 1
    plb - i don't think the OP is talking about parsing the xml per-se using regex, more making it compliant via editing some xml node prefixes, so that it can be read into an xmldoc Commented Jul 31, 2012 at 10:02
  • @jimtollan Yes, you're right. I misunderstood question. Commented Jul 31, 2012 at 10:06
  • Can the XML contain comments or CDATA? Commented Jul 31, 2012 at 10:50
  • In my specific example, the XML cannot contain comments or CDATA. However, what is the impact if if can? Commented Jul 31, 2012 at 12:17

1 Answer 1

9

UPDATE

For new issue (attribs namespace) try this general solution. this has no effect on node values:

Regex.Replace(originalXml, 
              @"((?<=</?)\w+:(?<elem>\w+)|\w+:(?<elem>\w+)(?==\"))", 
              "${elem}");

try this regex on my sample xml:

<wsse:Security soapenv:actor="dont match soapenv:actor attrib">
    <BinarySecurityToken>authtoken</BinarySecurityToken>
</Security> 

Try using XSL, You can apply XSL directly or using XslTransform class in .NET:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="no"/>

<xsl:template match="/|comment()|processing-instruction()">
    <xsl:copy>
      <xsl:apply-templates/>
    </xsl:copy>
</xsl:template>

<xsl:template match="*">
    <xsl:element name="{local-name()}">
      <xsl:apply-templates select="@*|node()"/>
    </xsl:element>
</xsl:template>

<xsl:template match="@*">
    <xsl:attribute name="{local-name()}">
      <xsl:value-of select="."/>
    </xsl:attribute>
</xsl:template>
</xsl:stylesheet>

or try this Regex:

var finalXml = Regex.Replace(originalXml, @"<(/?)\w+:(\w+/?)>", "<$1$2>");
Sign up to request clarification or add additional context in comments.

2 Comments

Why do you think this would work, when loading the XML into a document doesn't?
I have made a slight update to the question. The regex provided works, but there are some scenarios it doesn't match. See example under Update 1.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.