1

I want to convert XHTML into XML as follows but I cannot figure out how to do it. I want to read the input div.cmp-text's data and add it to an attribute in a XML element.

Input XML:

<?xml version="1.0" encoding="UTF-8"?>
<result>
    <div class="cmp-text">
        <strong xmlns="http://www.w3.org/1999/xhtml">Content</strong>
        <span xmlns="http://www.w3.org/1999/xhtml"
            class="data-class">May 19, 2020
        </span>
        <h2 xmlns="http://www.w3.org/1999/xhtml">Description</h2>
        <p xmlns="http://www.w3.org/1999/xhtml">
            Lorem ipsum dolor sit amet, consectetur adipisicing.
        </p>
    </div>
    
    <div class="cmp-horizontal-line">
        <hr xmlns="http://www.w3.org/1999/xhtml"/>
    </div>
    
    <div class="cmp-text">
        <ul xmlns="http://www.w3.org/1999/xhtml">
            <li>
                Lorem ipsum.
            </li>
        </ul>
        <table xmlns="http://www.w3.org/1999/xhtml"
            style="border-collapse: collapse;"
            border="1">
            <tbody>
                <tr>
                    <td style="width: 33.3333%;">111</td>
                    <td style="width: 33.3333%;">212</td>
                </tr>
            </tbody>
        </table>
    </div>
    
    <div class="cmp-horizontal-line">
        <hr xmlns="http://www.w3.org/1999/xhtml"/>
    </div>
</result>

Expected output:

<?xml version="1.0" encoding="UTF-8"?>
<result xmlns:jcr="http://www.jcp.org/jcr/1.0"
    xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
    xmlns:mix="http://www.jcp.org/jcr/mix/1.0"
    xmlns:sling="http://sling.apache.org/jcr/sling/1.0"
    xmlns:cq="http://www.day.com/jcr/cq/1.0"
    xmlns:xhtml="http://www.w3.org/1999/xhtml">
    <result>
        <text
            type="/text"
            text="&lt;strong xmlns='http://www.w3.org/1999/xhtml'&gt;Content&lt;/strong&gt;&lt;span xmlns='http://www.w3.org/1999/xhtml' class='data-class'&gt;May 19, 2020&lt;/span&gt;&lt;h2 xmlns='http://www.w3.org/1999/xhtml'&gt;Description&lt;/h2&gt;&lt;p xmlns='http://www.w3.org/1999/xhtml'&gt;Lorem ipsum dolor sit amet, consectetur adipisicing.&lt;/p&gt;"
            textIsRich="true"/>
        <horizontal_line type="/horizontal-line"/>
        <text type="/text"
            text="&lt;ul xmlns='http://www.w3.org/1999/xhtml'&gt;&lt;li&gt;Lorem ipsum.&lt;/li&gt;&lt;/ul&gt;&lt;table xmlns='http://www.w3.org/1999/xhtml' style='border-collapse: collapse;' border='1'&gt;&lt;tbody>&lt;tr>&lt;td style='width: 33.3333%;'>111&lt;/td>&lt;td style='width: 33.3333%;'>212&lt;/td>&lt;/tr>&lt;/tbody>&lt;/table>"
            textIsRich="true"/>
        <horizontal_line type="/horizontal-line"/>
    </result>
</result>

XSL:

<xsl:stylesheet
    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xhtml="http://www.w3.org/1999/xhtml"
    xmlns:jcr="http://www.jcp.org/jcr/1.0"
    xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
    xmlns:cq="http://www.day.com/jcr/cq/1.0"
    xmlns:mix="http://www.jcp.org/jcr/mix/1.0"
    xmlns:sling="http://sling.apache.org/jcr/sling/1.0">

    <xsl:output version="1.0"
        encoding="UTF-8"
        indent="yes"
        method="xml"
        omit-xml-declaration="no"/>
    <xsl:strip-space elements="*"/>

    <!--root element-->
    <xsl:template match="/">
        <result>
            <xsl:apply-templates/>
        </result>
    </xsl:template>

    <!--template I need help with: it should take the input cmp-text div's content(HTML tags) and add it to the text attribute of text element-->
    <xsl:template match="/result/div[@class='cmp-text']">
        <text>
            <xsl:attribute name="type">/text</xsl:attribute>
            <xsl:attribute name="text">value</xsl:attribute>
            <xsl:attribute name="text2">
                <xsl:value-of select="node()"/>
            </xsl:attribute>
            <xsl:attribute name="text3">
                <xsl:value-of select=".//*"/>
            </xsl:attribute>
        </text>
    </xsl:template>

    <!--horizontal line-->
    <xsl:template match="/result/div[@class='cmp-horizontal-line']">
        <horizontal_line type="/horizontal-line"/>
    </xsl:template>

    <!--horizontal line-->
    <xsl:template match="/result/xhtml:div[@class='cmp-horizontal-line']">
        <horizontal_line type="/horizontal-line"/>
    </xsl:template>

    <!--identity template copies everything forward by default-->
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

Output XML using above XSL:

<result xmlns:jcr="http://www.jcp.org/jcr/1.0"
    xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
    xmlns:mix="http://www.jcp.org/jcr/mix/1.0"
    xmlns:sling="http://sling.apache.org/jcr/sling/1.0"
    xmlns:cq="http://www.day.com/jcr/cq/1.0"
    xmlns:xhtml="http://www.w3.org/1999/xhtml">
    <result>
        <text type="/text"
            text="value"
            text2="Last Reviewed:"
            text3="Last Reviewed:"/>        
        <horizontal_line type="/horizontal-line"/>
        <text type="/text"
            text="value"
            text2="Criteria"
            text3="Criteria"/>
        <horizontal_line type="/horizontal-line"/>
    </result>
</result>

In the text element, attributes text, text2 and text3 are my unsuccessful attempts to get the node(HTML) as is in the attribute.

How to get the desired output?

Update: Updated the desired output to well-formed XML.

The solution needs to be in XSLT 1.0 so can't use serialize().

After Martin's comment, I used the lenzconsulting.com/xml-to-string and was able to get the desired result by making following changes to the XSL script:

<xsl:stylesheet
    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xhtml="http://www.w3.org/1999/xhtml">

    <xsl:import href="http://lenzconsulting.com/xml-to-string/xml-to-string.xsl"/>

    <xsl:template match="/result/div[@class='cmp-text']">
        <text>
            <xsl:attribute name="type">/text</xsl:attribute>
            <xsl:attribute name="text">
                <xsl:apply-templates select="./*" mode="xml-to-string"/>
            </xsl:attribute>
    </xsl:template>
</xsl:stylesheet>

which produced the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<result xmlns:jcr="http://www.jcp.org/jcr/1.0"
    xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
    xmlns:mix="http://www.jcp.org/jcr/mix/1.0"
    xmlns:sling="http://sling.apache.org/jcr/sling/1.0"
    xmlns:cq="http://www.day.com/jcr/cq/1.0"
    xmlns:xhtml="http://www.w3.org/1999/xhtml">
    <result>
        <text
            type="/text"
            text="&lt;strong xmlns='http://www.w3.org/1999/xhtml'&gt;Content&lt;/strong&gt;&lt;span xmlns='http://www.w3.org/1999/xhtml' class='data-class'&gt;May 19, 2020&lt;/span&gt;&lt;h2 xmlns='http://www.w3.org/1999/xhtml'&gt;Description&lt;/h2&gt;&lt;p xmlns='http://www.w3.org/1999/xhtml'&gt;Lorem ipsum dolor sit amet, consectetur adipisicing.&lt;/p&gt;"
            textIsRich="true"/>
        <horizontal_line type="/horizontal-line"/>
        <text type="/text"
            text="&lt;ul xmlns='http://www.w3.org/1999/xhtml'&gt;&lt;li&gt;Lorem ipsum.&lt;/li&gt;&lt;/ul&gt;&lt;table xmlns='http://www.w3.org/1999/xhtml' style='border-collapse: collapse;' border='1'&gt;&lt;tbody>&lt;tr>&lt;td style='width: 33.3333%;'>111&lt;/td>&lt;td style='width: 33.3333%;'>212&lt;/td>&lt;/tr>&lt;/tbody>&lt;/table>"
            textIsRich="true"/>
        <horizontal_line type="/horizontal-line"/>
    </result>
</result>
4
  • 1
    You would need to serialize a node, to do that with the standard XSLT/XPath you would need to use an XSLT 3 processor and the XPath 3.1 serialize function. For earlier versions, check whether your processor supports or allows the use of an extension function or whether you can make use of a library like lenzconsulting.com/xml-to-string Commented Sep 28, 2022 at 15:20
  • 2
    The desired output you show is not well-formed XML and cannot be produced using XSLT at all. An attribute cannot contain an unescaped < character. Commented Sep 28, 2022 at 15:39
  • @michael.hor257k It is okay if produced escaped characters like &lt; instead of <. I should have made that clear, will update the question. Commented Sep 28, 2022 at 16:13
  • @Martin This needs to be done in XSLT 1.0. I am trying the lenzconsulting.com/xml-to-string solution. Commented Sep 28, 2022 at 16:14

1 Answer 1

2

So your template for XSLT 3.0 would be e.g.

<!--template I need help with: it should take the input cmp-text div's content(HTML tags) and add it to the text attribute of text element-->
<xsl:template match="/result/div[@class='cmp-text']">
    <text>
        <xsl:attribute name="type">/text</xsl:attribute>
        <xsl:attribute name="text" select="serialize(*)"/>
    </text>
</xsl:template>

which could be simplified to e.g.

<!--template I need help with: it should take the input cmp-text div's content(HTML tags) and add it to the text attribute of text element-->
<xsl:template match="/result/div[@class='cmp-text']">
    <text type="/text" text="{serialize(*)}"/>
</xsl:template>

Output would then be more like e.g.

  <text type="/text"
        text="&lt;strong xmlns=&#34;http://www.w3.org/1999/xhtml&#34;&gt;Content&lt;/strong&gt;&lt;span xmlns=&#34;http://www.w3.org/1999/xhtml&#34; class=&#34;data-class&#34;&gt;May 19, 2020&#xA;        &lt;/span&gt;&lt;h2 xmlns=&#34;http://www.w3.org/1999/xhtml&#34;&gt;Description&lt;/h2&gt;&lt;p xmlns=&#34;http://www.w3.org/1999/xhtml&#34;&gt;&#xA;            Lorem ipsum dolor sit amet, consectetur adipisicing.&#xA;        &lt;/p&gt;"/>

If you really need to go the route the produce non-wellformed results then in XSLT 3 a character map can help e.g.

   <xsl:output version="1.0"
        encoding="UTF-8"
        indent="yes"
        method="xml"
        omit-xml-declaration="no" use-character-maps="m1"/>
    
    <xsl:character-map name="m1">
      <xsl:output-character character="&lt;" string="&lt;"/>
      <xsl:output-character character="&gt;" string=">"/>
      <xsl:output-character character="&quot;" string="&quot;"/>
    </xsl:character-map>

Saxon then produces output like e.g.

  <text type="/text"
        text='<strong xmlns="http://www.w3.org/1999/xhtml">Content</strong><span xmlns="http://www.w3.org/1999/xhtml" class="data-class">May 19, 2020&#xA;        </span><h2 xmlns="http://www.w3.org/1999/xhtml">Description</h2><p xmlns="http://www.w3.org/1999/xhtml">&#xA;            Lorem ipsum dolor sit amet, consectetur adipisicing.&#xA;        </p>'/>
Sign up to request clarification or add additional context in comments.

1 Comment

The output doesn't need to be non-wellformed XML. Updated the question's expected XML code. Since the solution needs to be in XSLT 1.0 I have used the lenzconsulting.com/xml-to-string and it worked! Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.