6

I have xml which looks something like this -

<Root>
  <Fields>
    <Field name="abc" displayName="aaa" />
    <Field name="pqr" displayName="ppp" />
    <Field name="abc" displayName="aaa" />
    <Field name="xyz" displayName="zzz" />
  </Fields>
</Root>

I want the output to contain only those elements which have a repeating name-displayName combination, if there are any -

<Root>
      <Fields>
        <Field name="abc" displayName="aaa" />
        <Field name="abc" displayName="aaa" />
      </Fields>
</Root>

How can I do this using XSLT?

2
  • Good question, +1. See my answer for a short, easy and efficient XSLT 1.0 solution. Commented May 9, 2011 at 13:19
  • Also added an XSLT 2.0 solution. Commented May 9, 2011 at 13:30

2 Answers 2

9

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="kFieldByName" match="Field"
  use="concat(@name, '+', @displayName)"/>

 <xsl:template match=
  "Field[generate-id()
        =
         generate-id(key('kFieldByName',
                     concat(@name, '+', @displayName)
                     )[2])
        ]
  ">
     <xsl:copy-of select=
     "key('kFieldByName',concat(@name, '+', @displayName))"/>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<Root>
    <Fields>
        <Field name="abc" displayName="aaa" />
        <Field name="pqr" displayName="ppp" />
        <Field name="abc" displayName="aaa" />
        <Field name="xyz" displayName="zzz" />
    </Fields>
</Root>

produces the wanted result:

<Field name="abc" displayName="aaa"/>
<Field name="abc" displayName="aaa"/>

Explanation:

  1. Muenchian grouping using composite key (on the name and displayName attributes).

  2. The only template in the code matches any Field element that is the second in its corresponding group. Then, inside the body of the template, the whole group is output.

  3. Muenchian grouping is the efficient way to do grouping in XSLT 1.0. Keys are used for efficiency.

  4. See also my answer to this question.

II. XSLT 2.0 solution:

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
     <xsl:for-each-group select="/*/*/Field"
          group-by="concat(@name, '+', @displayName)">
       <xsl:sequence select="current-group()[current-group()[2]]"/>
   </xsl:for-each-group>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document (shown above), again the wanted, correct result is produced:

<Field name="abc" displayName="aaa"/>
<Field name="abc" displayName="aaa"/>

Explanation:

  1. Use of <xsl:for-each-group>

  2. Use of the current-group() function.

Sign up to request clarification or add additional context in comments.

Comments

1

To find duplicates, you need to iterate the Field elements and for each one, look for the set of Field elements in the whole document that have matching name and displayName attribute values. If the set has more than 1 element, you add that element into the output.

Here is an example of a template that achieves this:

<xsl:template match="Field">
    <xsl:variable name="fieldName" select="@name" />
    <xsl:variable name="fieldDisplayName" select="@displayName" />
    <xsl:if test="count(//Field[@name=$fieldName and @displayName=$fieldDisplayName]) > 1">
        <xsl:copy-of select="."/>
    </xsl:if>
</xsl:template>

Executing this template (wrapped in an appropriate XSLT file) on your sample data gives the following output:

<?xml version="1.0" encoding="utf-8"?>
<Root>
  <Fields>
    <Field name="abc" displayName="aaa" />
    <Field name="abc" displayName="aaa" />
  </Fields>
</Root>

6 Comments

@Jeff Yates: This is one possible solution, however its efficiency is O(N^2) and it is too slow to be used on XML documents with a large number of Field elements. See my answer for an efficient solution.
@Dimitre: Seems silly to do more effort than necessary. There is no reason to believe the real XML would be huge and there is no profiling information. I'd go for quick to write over quick to run any day until the profiling is in.
@Jeff Yates: One can and should use the known most-efficient solutions. Because people think otherwise we encounter everyday's problems about a transformation running 40 minutes and when refactored with Muenchian grouping then taking only 2 seconds. We should not propagate bad and naive algorithms.
@Dimitre: You are right although one should also consider the cost of implementation and maintenance when optimizing up front.
While the efficiency might be O(N^2) on many XSLT processors, it might be much better on an optimizing processor - try it on Saxon-EE. However, I agree it's best not to place too heavy a reliance on the optimizer - use xsl:for-each-group.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.