0

I have an xml document(This xml is not well formed) as follows

<ads>
   <adv>
       <a>BURGER & BROWN ENGINEERING</a>
       <b>123*3491</b>
   <adv>
   <adv>
       <x>Roster Service</x>
       <y>BROWN & BURGER ENGINEERING</y>
       <z>905*3490</z>
   <adv>
<ads>

I would like to have an XSLT to transform the XML to this.

i) ampersand(&) should be replaced with " and "

ii) * should be replaced with " "

<ads>
   <adv>
       <a>BURGER and BROWN ENGINEERING</a>
       <b>123 3491</b>
   <adv>
   <adv>
       <x>Roster Service</x>
       <y>BROWN and BURGER ENGINEERING</y>
       <z>905 3490</z>
   <adv>
<ads>

I have an xsl as follows but this does not satisfy my requirement.

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes"/>

<xsl:template match="node()|@*">
   <xsl:copy>
     <xsl:apply-templates select="node()|@*"/>
   </xsl:copy>
</xsl:template>

<xsl:template match="text()">
  <xsl:value-of select="translate(., '&', ' and ')" />
  <xsl:value-of select="translate(., '*', ' ')" />
</xsl:template>

4
  • 2
    What you show is as your input is not an XML document; you cannot have an unescaped ampersand in XML. Commented Oct 5, 2015 at 10:52
  • @michael.hor257k - I am getting an xml document through some processes as follows. Its my duty to correct it. This is surely wrong xml. How can I do it Commented Oct 5, 2015 at 10:57
  • If that's really what your input looks like, there's practically nothing you will be able to do with it in XSLT. Commented Oct 5, 2015 at 11:03
  • 1
    When someone says "This xml is not well formed", that's the same as saying "This is not XML". If it's not XML, then XSLT can't handle it. Commented Oct 5, 2015 at 14:07

2 Answers 2

2

Given a well-formed XML input such as:

XML

<ads>
   <adv>
       <a>BURGER &amp; BROWN ENGINEERING</a>
       <b>123*3491</b>
   </adv>
   <adv>
       <x>Roster Service</x>
       <y>BROWN &amp; BURGER ENGINEERING</y>
       <z>905*3490</z>
   </adv>
</ads>

You can use the following stylesheet:

XSLT 2.0

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:template match="@*|*">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="text()">
  <xsl:value-of select="replace(translate(., '*', ' '), '&amp;', 'and')" />
</xsl:template>

</xsl:stylesheet>

to return:

<?xml version="1.0" encoding="UTF-8"?>
<ads>
   <adv>
       <a>BURGER and BROWN ENGINEERING</a>
       <b>123 3491</b>
   </adv>
   <adv>
       <x>Roster Service</x>
       <y>BROWN and BURGER ENGINEERING</y>
       <z>905 3490</z>
   </adv>
</ads>
Sign up to request clarification or add additional context in comments.

1 Comment

I used perl -pi -e 's/&/&amp;/' $file command to replace & with &amp; Then proceeded with the above instructions. Thanks
2

Your input is not XML, so no tool designed for processing XML will be able to read it.

The best solution with bad XML is always to fix the software that's generating it. But if the software is written by some cowboy outfit that doesn't care about quality or support or users, then that may not be possible.

If you need to repair bad XML, then you will need non-XML tools to do it, typically some combination of Perl/awk/sed. It's not always possible, of course, because if the software is generating XML that's ill-formed, it may also be generating XML that's well-formed but contains the wrong information.

Failing to escape ampersands is quite a common problem, and it depends how good a fix you need. Sometimes you can fix 99% of the problems by replacing any & that isn't followed by a letter, '#', or a digit by &amp;.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.