5

I wonder if there is a straightforward way to 'normalize' XML namespace definitions in an XML document represented as a DOM document in Java?

the reason I'd need this is to be able to compare two documents, which both use XML namespaces.

as XML namespaces can be specified anywhere in the document (in the root element, in any of the elements), etc., two documents that are in effect the same may be significantly different when looked at from a DOM tree perspective. for example, one can have all namespace attributes defined in the root element, while the other can define each namespace at the 'highest' element in the DOM tree hierarchy where the namespace is applicable. In essence these can be the same documents, but when comparing them, say with XmlUnit, one will get comparison issues.

to provide two examples:

<root xmlns:foo="http://foo/">
    <e1>
        <foo:e2>bar</foo:e2>
    </e1>
</root>

vs:

<root>
    <e1 xmlns:foo="http://foo/">
        <foo:e2>bar</foo:e2>
    </e1>
</root>

these documents are in effect the same, but an XML comparison will find them different.

I wonder if there is a straightforward / easy way to normalize namespace definitions, say, put them all in the root element?

of course one can write such code himself, but if this was available already, that would be way better :)

1 Answer 1

4

The XOM API has a Canonicalizer for exactly this purpose. it's not the standard W3C DOM API, but perhaps it does what you need.

Sign up to request clarification or add additional context in comments.

5 Comments

The apache xmlsec project also has one santuario.apache.org/Java/api/org/apache/xml/security/c14n/…, which does work with the w3c DOM api.
jtalhborn, thanks for the tip. unfortunately the apache XML security canonializer does not produce a canonical output in my case. my generic issue is with namespace declarations, for example: <root xmlns:foo="http://foo/"><e1><foo:e2>bar</foo:e2></e1></root> vs. <root><e1 xmlns:foo="http://foo/"><foo:e2>bar</foo:e2></e1></root> in this case, the canonical form XMLs are not the same, even though the document is effect is.
skaffman, I checked on the XOM API as well, and I have the same results. basically the location of the XML namespace declarations is not 'canonicalized', an thus these will be considered different in a 'simple' XML comparison. I've put an example into the main article to demonstrate the issue
@ÁkosMaróy: Did you found a solution to this?
@HaseebKhan nope :(

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.