12

I have a text file which looks like that:

XXX^YYYY^AAAAA^XXXXXX^AAAAAA....

Fields are separated using a caret(^), my presumptions are:

the first field = NAME
the second filed = Last name
third field = Address

etc..

I would like to turn it into a valid XML using xsl (XSLT). such as:

<name>XXX</name>
<l_name>YYYY</l_name>

I know It can be done easily with Perl, but I need to do it with XSLT, if possible.

1
  • Good question, +1. See my answer for a complete XSLT 1.0 solution and for a description of the more powerful text processing capabilities of XSLT 2.0 and a pointer to a real world XSLT 2.0 text processing example. Commented Apr 15, 2011 at 13:29

2 Answers 2

12

Text (non-XML) files can be read with the standard XSLT 2.0 function unparsed-text().

Then one can use the standard XPath 2.0 function tokenize() and two other standard XPath 2.0 functions that accept regular a expression as one of their arguments -- matches() and replace().

XSLT 2.0 has its own powerful instructions to handle text processing using regular expressions:: the <xsl:analyze-string>, the <xsl:matching-substring> and the <xsl:non-matching-substring> instruction.

See some of the more powerful capabilities of XSLT text processing with these functions and instructions in this real-world example: an XSLT solution to the WideFinder problem.

Finally, here is an XSLT 1.0 solution:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ext="http://exslt.org/common"
 xmlns:my="my:my" exclude-result-prefixes="ext my">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <my:fieldNames>
  <name>FirstName</name>
  <name>LastName</name>
  <name>City</name>
  <name>State</name>
  <name>Zip</name>
 </my:fieldNames>

 <xsl:variable name="vfieldNames" select=
  "document('')/*/my:fieldNames"/>

 <xsl:template match="/">
  <xsl:variable name="vrtfTokens">
   <xsl:apply-templates/>
  </xsl:variable>

  <xsl:variable name="vTokens" select=
       "ext:node-set($vrtfTokens)"/>

  <results>
   <xsl:apply-templates select="$vTokens/*"/>
  </results>
 </xsl:template>

 <xsl:template match="text()" name="tokenize">
  <xsl:param name="pText" select="."/>

     <xsl:if test="string-length($pText)">
       <xsl:variable name="vWord" select=
       "substring-before(concat($pText, '^'),'^')"/>

       <word>
        <xsl:value-of select="$vWord"/>
       </word>

       <xsl:call-template name="tokenize">
        <xsl:with-param name="pText" select=
         "substring-after($pText,'^')"/>
       </xsl:call-template>
     </xsl:if>
 </xsl:template>

 <xsl:template match="word">
  <xsl:variable name="vPos" select="position()"/>

  <field>
      <xsl:element name="{$vfieldNames/*[position()=$vPos]}">
      </xsl:element>
      <value><xsl:value-of select="."/></value>
  </field>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied to the following XML document:

<t>John^Smith^Bellevue^WA^98004</t>

the wanted, correct result is produced:

<results>
   <field>
      <FirstName/>
      <value>John</value>
   </field>
   <field>
      <LastName/>
      <value>Smith</value>
   </field>
   <field>
      <City/>
      <value>Bellevue</value>
   </field>
   <field>
      <State/>
      <value>WA</value>
   </field>
   <field>
      <Zip/>
      <value>98004</value>
   </field>
</results>
Sign up to request clarification or add additional context in comments.

9 Comments

+1 This "I have a text file" require XSLT 2.0. (Unless you have a DTD's-internal-subset-aware XML parser)
@Alejandro: An entity is part of the XML document -- the OP wants to be able to read any file given its URL -- probably the file URL would be passed as a parameter to the stylesheet. BTW, I appended my answer with a complete XSLT 1.0 solution :)
@Dimitre: This XML wrapper <!DOCTYPE test [<!ENTITY text SYSTEM "test.txt">]><test>&text;</test> with test.txt file as John^Smith^Bellevue^WA^98004, result in the same output.
@Alejandro: Yes. However this has nothing to do with XSLT -- only with XML. Also, let's not forget that due to security concerns many XML parsers disable entities by default.
@Dimitre: Yes. And I think is a bad thing: security concerns about accessing external resource should be handle by the system. There are so many use for full DTD support... like getting the document URI with <!ENTITY uri SYSTEM "#" NDATA uri> and unparsed-entity-uri('uri')
|
1

Tokenizing and sorting with XSLT 1.0

If you use xslt 2.0 it's much simpler: fn:tokenize(string,pattern)

Example: tokenize("XPath is fun", "\s+")
Result: ("XPath", "is", "fun")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.