0

I have details of an object in XML format

MyObject is the name of a class which has an property, lets say: objectInformation. This is stored in XML format in DB. When I extract read from DB - I get below output.

<MyObject objectInformation="&lt;node1>&lt;node2>some Information here&lt;/node2>&lt;node3>&lt;![CDATA[&lt;TEXTFORMAT LEADING=&quot;2&quot;>&lt;P ALIGN=&quot;LEFT&quot;>&lt;FONT FACE=&quot;Verdana&quot; SIZE=&quot;11&quot; COLOR=&quot;#403F3F&quot; LETTERSPACING=&quot;0&quot; KERNING=&quot;0&quot;>&lt;B>&lt;I>comment in for new object&lt;/I>&lt;/B>&lt;/FONT>&lt;/P>&lt;/TEXTFORMAT>]]>&lt;/node3>&lt;node4>07/18/2013&lt;/node4>&lt;/node1>"</MyObject>

This I need to...:

  1. Parse through XSL.
  2. Read content of each node.
  3. Render them in PDF (NOTE: node3 above.. has rich text tags in it) So HTML in XML

For this... I tried few options below:

  1. Tried using disable-output-escaping="yes" This is not working... I am NOT able to traverse through nodes. I can just put the XML in unescaped format on to PDF. This is not what i want.

  2. Tried saxon.parse() : This throws me an error saying:

    SXXP0003: Error reported by XML parser: Premature end of file

Has anybody come across such a challenge, if so what is the solution for this.

1
  • I'm not sure if you made a copy-paste mistake but the XML you gave as an example is not well-formed (which might declare the saxon error). (you're missing a > to close the opening tag) Commented Jul 22, 2013 at 11:14

1 Answer 1

1

The snippet you have posted is not even well-formed XML, the MyObject start tag lacks a > so instead of what you have posted you need

<MyObject objectInformation="&lt;node1>&lt;node2>some Information here&lt;/node2>&lt;node3>&lt;![CDATA[&lt;TEXTFORMAT LEADING=&quot;2&quot;>&lt;P ALIGN=&quot;LEFT&quot;>&lt;FONT FACE=&quot;Verdana&quot; SIZE=&quot;11&quot; COLOR=&quot;#403F3F&quot; LETTERSPACING=&quot;0&quot; KERNING=&quot;0&quot;>&lt;B>&lt;I>comment in for new object&lt;/I>&lt;/B>&lt;/FONT>&lt;/P>&lt;/TEXTFORMAT>]]>&lt;/node3>&lt;node4>07/18/2013&lt;/node4>&lt;/node1>"></MyObject>

As for processing that with a commercial version of Saxon 9 where XSLT has access to the extension function saxon:parse (or the XSLT/XPath 3.0 parse-xml) I think it should work but you need to use it twice, once one the value of the objectInformation attribute of the MyObject element, then on the value of the node3 element so code would do e.g.

<xsl:template match="MyObject">
  <xsl:apply-templates select="saxon:parse(@objectInformation)/node()"/>
</xsl:template>

<xsl:template match="node3">
  <xsl:apply-templates select="saxon:parse(.)/node()"/>
</xsl:template>

<xsl:template match="TEXTFORMAT">
  <!-- now create or transform the elements as needed -->
</xsl:template>

To give you a more complete example, when I apply the stylesheet

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:fo="http://www.w3.org/1999/XSL/Format"
  xmlns:saxon="http://saxon.sf.net/"
  exclude-result-prefixes="saxon"
  version="2.0">

<xsl:output method="xml" indent="yes"/>

<xsl:template match="MyObject">
  <xsl:apply-templates select="saxon:parse(@objectInformation)/node()"/>
</xsl:template>

<xsl:template match="node3">
  <xsl:apply-templates select="saxon:parse(.)/node()"/>
</xsl:template>

<xsl:template match="TEXTFORMAT">
  <fo:block>
    <xsl:apply-templates/>
  </fo:block>
</xsl:template>

<xsl:template match="P">
  <fo:block>
    <xsl:apply-templates/>
  </fo:block>
</xsl:template>

</xsl:stylesheet>

to the input

<MyObject objectInformation="&lt;node1>&lt;node2>some Information here&lt;/node2>&lt;node3>&lt;![CDATA[&lt;TEXTFORMAT LEADING=&quot;2&quot;>&lt;P ALIGN=&quot;LEFT&quot;>&lt;FONT FACE=&quot;Verdana&quot; SIZE=&quot;11&quot; COLOR=&quot;#403F3F&quot; LETTERSPACING=&quot;0&quot; KERNING=&quot;0&quot;>&lt;B>&lt;I>comment in for new object&lt;/I>&lt;/B>&lt;/FONT>&lt;/P>&lt;/TEXTFORMAT>]]>&lt;/node3>&lt;node4>07/18/2013&lt;/node4>&lt;/node1>"></MyObject>

with Saxon 9.1.0.8 (latest open source version of Saxon 9 to support saxon:parse) I get the result

<?xml version="1.0" encoding="UTF-8"?>some Information here<fo:block xmlns:fo="http://www.w3.org/1999/XSL/Format">
   <fo:block>comment in for new object</fo:block>
</fo:block>07/18/2013

I realize that is not a complete and valid XSL-FO document but it shows that the templates for the elements that are escaped in the input and then parsed via saxon:parse are called. So you simply need to add further templates to transform the other elements as needed and to create a valid XSL-FO document, if you need help on that I suggest you ask a new question outlining which FO structure you want for the input elements once they have been parsed (i.e. how you want to transform those node elements and how those HTML elements), then hopefully someone who is more fluent with XSL-FO than I am can help out.

Sign up to request clarification or add additional context in comments.

2 Comments

tried all options but its not working. I am not able to take the output of call saxon:parse(.)/node() into a variable at all.. Is there is any system configuration to be done before running saxon parser? Should one download any jar file to make run?
@ShrihariJ, I added a more complete example I tried with Saxon 9.1. It should also work with later releases like Saxon 9.4 or 9.5, but only if you use the commercial editions (EE or PE) as unfortunately the extension functions are not available in the open source version.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.