0

I have an xml string from which I want to remove the empty elements and the line containing the element.

So fir example:

XML:

<ct>
   <c>http://192.168.105.213</c>
   <l>http://192.168.105.213</l>
   <o></o>
   <l>http://192.168.105.213</l>
   <o>http://192.168.105.213</o>
<ct>

In this <o></o> is the empty element, so after removing this element I want :

   <ct>
       <c>http://192.168.105.213</c>
       <l>http://192.168.105.213</l>
       <l>http://192.168.105.213</l>
       <o>http://192.168.105.213</o>
    <ct>

So the whole line must be removed such that it is indented back.

I tried: xml.replaceAll("<(\\w+)></\\1>", ""));

This leaves an empty line in between:

<ct>
   <c>http://192.168.105.213</c>
   <l>http://192.168.105.213</l>

   <l>http://192.168.105.213</l>
   <o>http://192.168.105.213</o>
</ct>

How to remove the space or \n, \t, \r correctly to get the proper indentation ?

3

3 Answers 3

2

This would work:

xml.replaceAll("<(\\w+)></\\1>\n\\s+", ""));

It would match a new line followed by one or more empty spaces (including tabs), which is preceded by your pattern.

EDIT: xml.replaceAll("\n\\s+<(\\w+)></\\1>", "") should work for deeper levels as well.

And if you expect the root element also to be empty and any of the child elements to be unintended, you might need to make the newline and spaces optional as

xml.replaceAll("\n?\\s*<(\\w+)></\\1>", "")
Sign up to request clarification or add additional context in comments.

2 Comments

It works for a one level indentation, but for a deeply nested empty element will this remove proper spaces to maintain the indentation ?
@SiddharthTrikha Please have the newline+spaces combination before the tags as in the edit. It should work for deeper ones.
1

This should to solve it for you

xml.replaceAll("\n\t<(\\w+)></\\1>", "");

Comments

1

As advised in comments, reconsider using regex directly on HTML/XML documents as these are not regular languages. Instead, use regex on parsed text/value content but not to transform documents.

One great XML manipulator tool is XSLT, the transformation language and sibling to XPath. And Java ships with a built-in XSLT 1.0 processor, and can also call or source external processors (Xalan, Saxon, etc.). Consider the following setup:

XSLT Script (save as .xsl file used below; script removes empty nodes)

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

  <!-- Identity Transform to Copy Document as is -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- Empty Template to Remove Such Nodes -->
  <xsl:template match="*[.='']"/>

</xsl:transform>

Java Code

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import javax.xml.transform.*;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.TransformerException;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import javax.xml.transform.OutputKeys;

import java.io.File;
import java.io.IOException;
import java.net.URISyntaxException;

import org.w3c.dom.Document;
import org.xml.sax.SAXException;

public class XMLTransform {
    public static void main(String[] args) throws IOException, URISyntaxException,
                                                  SAXException, ParserConfigurationException,
                                                  TransformerException {            
            // Load XML and XSL Document
            String inputXML = "path/to/Input.xml";
            String xslFile = "path/to/XSLT/Script.xsl";
            String outputXML = "path/to/Output.xml";

            Source xslt = new StreamSource(new File(xslFile));            
            DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();            
            DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
            Document doc = docBuilder.parse (new File(inputXML));

            // XSLT Transformation  with pretty print
            TransformerFactory prettyPrint = TransformerFactory.newInstance();
            Transformer transformer = prettyPrint.newTransformer(xslt);

            transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
            transformer.setOutputProperty(OutputKeys.STANDALONE, "yes");
            transformer.setOutputProperty(OutputKeys.METHOD, "xml");
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
            transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");                        

            DOMSource source = new DOMSource(doc);
            StreamResult result = new StreamResult(new File(outputXML));        
            transformer.transform(source, result);
    }
}

Output

<ct>
    <c>http://192.168.105.213</c>
    <l>http://192.168.105.213</l>
    <l>http://192.168.105.213</l>
    <o>http://192.168.105.213</o>
</ct>

NAMESPACES

When working with namespaces such as the below XML:

<prefix:ct xmlns:prefix="http://www.example.com">
   <c>http://192.168.105.213</c>
   <l>http://192.168.105.213</l>
   <o></o>
   <l>http://192.168.105.213</l>
   <o>http://192.168.105.213</o>
</prefix:ct>

Use the following XSLT with declaration in header and added template:

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
               xmlns:prefix="http://www.example.com">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

  <!-- Identity Transform -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- Retain Namespace Prefix -->
  <xsl:template match="ct">
    <xsl:element name='prefix:{local-name()}' namespace='http://www.example.com'>
      <xsl:copy-of select="namespace::*"/>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:element>
  </xsl:template>

  <!-- Remove Empty Nodes -->
  <xsl:template match="*[.='']"/>

</xsl:transform>

Output

<prefix:ct xmlns:prefix="http://www.example.com">
    <c>http://192.168.105.213</c>
    <l>http://192.168.105.213</l>
    <l>http://192.168.105.213</l>
    <o>http://192.168.105.213</o>
</prefix:ct>

9 Comments

I tried initially with XSLT with the same template as given by you but without this part <!-- Empty Template to Remove Such Nodes --> <xsl:template match="*[.='']"/>, with that the empty space was showing there. Will try with this with this last part added. Will this remove the spaces ?
Basically the white space of the indentation was also getting stripped.
Yes, as shown. In fact, that template match is the key item of script. The Identity Transform copies the entire document as is so changes nothing if you leave this empty template out. Also using <xsl:strip-space elements="*"/> removes unneeded whitespaces.
Ok.. If we want to remove only particular empty elements with particular names, <one></one>, <two></two>, <three></three> Are three empty elements an I want to remove only one and three element: <xsl:template match="one|three|node()"> would work??
See updated section for Namespaces where you declare namespace in XSLT's header and add the new template. Example included. Aside - always include namespaces when asking XML questions. Also, with namespaces, this would be insane to do with regex!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.