Remove empty elements xml string java?

Question

I have an xml string from which I want to remove the empty elements and the line containing the element.

So fir example:

XML:

<ct>
   <c>http://192.168.105.213</c>
   <l>http://192.168.105.213</l>
   <o></o>
   <l>http://192.168.105.213</l>
   <o>http://192.168.105.213</o>
<ct>

In this <o></o> is the empty element, so after removing this element I want :

   <ct>
       <c>http://192.168.105.213</c>
       <l>http://192.168.105.213</l>
       <l>http://192.168.105.213</l>
       <o>http://192.168.105.213</o>
    <ct>

So the whole line must be removed such that it is indented back.

I tried: xml.replaceAll("<(\\w+)></\\1>", ""));

This leaves an empty line in between:

<ct>
   <c>http://192.168.105.213</c>
   <l>http://192.168.105.213</l>

   <l>http://192.168.105.213</l>
   <o>http://192.168.105.213</o>
</ct>

How to remove the space or \n, \t, \r correctly to get the proper indentation ?

Please, do not use regular expressions to parse XML. Never. See stackoverflow.com/questions/6751105/… — vanje
– vanje, Commented Sep 30, 2016 at 10:37
@vanje I like this answer better: stackoverflow.com/questions/1732348/… — online Thomas
– online Thomas, Commented Sep 30, 2016 at 10:44

Naveed S · Accepted Answer · 2016-09-30 11:48:15Z

2

This would work:

xml.replaceAll("<(\\w+)></\\1>\n\\s+", ""));

It would match a new line followed by one or more empty spaces (including tabs), which is preceded by your pattern.

EDIT: xml.replaceAll("\n\\s+<(\\w+)></\\1>", "") should work for deeper levels as well.

And if you expect the root element also to be empty and any of the child elements to be unintended, you might need to make the newline and spaces optional as

xml.replaceAll("\n?\\s*<(\\w+)></\\1>", "")

edited Sep 30, 2016 at 11:48

answered Sep 30, 2016 at 10:45

Naveed S

5,2964 gold badges37 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Siddharth Trikha Over a year ago

It works for a one level indentation, but for a deeply nested empty element will this remove proper spaces to maintain the indentation ?

Naveed S Over a year ago

@SiddharthTrikha Please have the newline+spaces combination before the tags as in the edit. It should work for deeper ones.

noned · Accepted Answer · 2016-09-30 10:45:33Z

1

This should to solve it for you

xml.replaceAll("\n\t<(\\w+)></\\1>", "");

answered Sep 30, 2016 at 10:45

noned

7410 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 12:17:19Z

1

As advised in comments, reconsider using regex directly on HTML/XML documents as these are not regular languages. Instead, use regex on parsed text/value content but not to transform documents.

One great XML manipulator tool is XSLT, the transformation language and sibling to XPath. And Java ships with a built-in XSLT 1.0 processor, and can also call or source external processors (Xalan, Saxon, etc.). Consider the following setup:

XSLT Script (save as .xsl file used below; script removes empty nodes)

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

  <!-- Identity Transform to Copy Document as is -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- Empty Template to Remove Such Nodes -->
  <xsl:template match="*[.='']"/>

</xsl:transform>

Java Code

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import javax.xml.transform.*;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.TransformerException;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import javax.xml.transform.OutputKeys;

import java.io.File;
import java.io.IOException;
import java.net.URISyntaxException;

import org.w3c.dom.Document;
import org.xml.sax.SAXException;

public class XMLTransform {
    public static void main(String[] args) throws IOException, URISyntaxException,
                                                  SAXException, ParserConfigurationException,
                                                  TransformerException {            
            // Load XML and XSL Document
            String inputXML = "path/to/Input.xml";
            String xslFile = "path/to/XSLT/Script.xsl";
            String outputXML = "path/to/Output.xml";

            Source xslt = new StreamSource(new File(xslFile));            
            DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();            
            DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
            Document doc = docBuilder.parse (new File(inputXML));

            // XSLT Transformation  with pretty print
            TransformerFactory prettyPrint = TransformerFactory.newInstance();
            Transformer transformer = prettyPrint.newTransformer(xslt);

            transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
            transformer.setOutputProperty(OutputKeys.STANDALONE, "yes");
            transformer.setOutputProperty(OutputKeys.METHOD, "xml");
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
            transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");                        

            DOMSource source = new DOMSource(doc);
            StreamResult result = new StreamResult(new File(outputXML));        
            transformer.transform(source, result);
    }
}

Output

<ct>
    <c>http://192.168.105.213</c>
    <l>http://192.168.105.213</l>
    <l>http://192.168.105.213</l>
    <o>http://192.168.105.213</o>
</ct>

NAMESPACES

When working with namespaces such as the below XML:

<prefix:ct xmlns:prefix="http://www.example.com">
   <c>http://192.168.105.213</c>
   <l>http://192.168.105.213</l>
   <o></o>
   <l>http://192.168.105.213</l>
   <o>http://192.168.105.213</o>
</prefix:ct>

Use the following XSLT with declaration in header and added template:

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
               xmlns:prefix="http://www.example.com">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

  <!-- Identity Transform -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- Retain Namespace Prefix -->
  <xsl:template match="ct">
    <xsl:element name='prefix:{local-name()}' namespace='http://www.example.com'>
      <xsl:copy-of select="namespace::*"/>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:element>
  </xsl:template>

  <!-- Remove Empty Nodes -->
  <xsl:template match="*[.='']"/>

</xsl:transform>

Output

<prefix:ct xmlns:prefix="http://www.example.com">
    <c>http://192.168.105.213</c>
    <l>http://192.168.105.213</l>
    <l>http://192.168.105.213</l>
    <o>http://192.168.105.213</o>
</prefix:ct>

edited May 23, 2017 at 12:17

CommunityBot

11 silver badge

answered Sep 30, 2016 at 16:27

Parfait

108k19 gold badges103 silver badges138 bronze badges

9 Comments

Siddharth Trikha Over a year ago

I tried initially with XSLT with the same template as given by you but without this part  <xsl:template match="*[.='']"/>, with that the empty space was showing there. Will try with this with this last part added. Will this remove the spaces ?

Siddharth Trikha Over a year ago

Basically the white space of the indentation was also getting stripped.

Parfait Over a year ago

Yes, as shown. In fact, that template match is the key item of script. The Identity Transform copies the entire document as is so changes nothing if you leave this empty template out. Also using <xsl:strip-space elements="*"/> removes unneeded whitespaces.

Siddharth Trikha Over a year ago

Ok.. If we want to remove only particular empty elements with particular names, <one></one>, <two></two>, <three></three> Are three empty elements an I want to remove only one and three element: <xsl:template match="one|three|node()"> would work??

Parfait Over a year ago

See updated section for Namespaces where you declare namespace in XSLT's header and add the new template. Example included. Aside - always include namespaces when asking XML questions. Also, with namespaces, this would be insane to do with regex!

|

Collectives™ on Stack Overflow

Remove empty elements xml string java?

3 Answers 3

2 Comments

Comments

9 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

9 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related