Java Remove empty XML tags

Question

I'm looking for a simple Java snippet to remove empty tags from a (any) XML structure

<xml>
    <field1>bla</field1>
    <field2></field2>
    <field3/>
    <structure1>
       <field4>bla</field4>
       <field5></field5>
    </structure1>
</xml>

should turn into;

<xml>
    <field1>bla</field1>
    <structure1>
       <field4>bla</field4>
    </structure1>
</xml>

Are you currently parsing the XML into data structures in any particular way (JDOM, etc)? Or are you starting from scratch? — Tom Elliott
– Tom Elliott, Commented Nov 6, 2009 at 12:17

Chris R · Accepted Answer · 2009-11-06 12:36:02Z

8

This XSLT stylesheet should do what you're looking for:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="@*|node()">
    <xsl:if test=". != '' or ./@* != ''">
      <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

It should also preserve elements which are empty but have attributes which aren't. If you don't want this behaviour then change:

<xsl:if test=". != '' or ./@* != ''">

To: <xsl:if test=". != ''">

If you want to know how to apply XSLT in Java, there should be plenty of tutorials out there on the Interwebs. Good luck!

edited Nov 6, 2009 at 12:36

answered Nov 6, 2009 at 12:30

Chris R

8335 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Community · Accepted Answer · 2017-05-23 12:01:49Z

I was wondering whether it would be easy to do this with the XOM library and gave it a try.

It turned out to be quite easy:

import nu.xom.*;

import java.io.File;
import java.io.IOException;

public class RemoveEmptyTags {

    public static void main(String[] args) throws IOException, ParsingException {
        Document document = new Builder().build(new File("original.xml"));
        handleNode(document.getRootElement());
        System.out.println(document.toXML()); // empty elements now removed
    }

    private static void handleNode(Node node) {
        if (node.getChildCount() == 0 && "".equals(node.getValue())) {
            node.getParent().removeChild(node);
            return;
        }
        // recurse the children
        for (int i = 0; i < node.getChildCount(); i++) { 
            handleNode(node.getChild(i));
        }
    }
}

This probably won't handle all corner cases properly, like a completely empty document. And what to do about elements that are otherwise empty but have attributes?

If you want to save XML tags with attributes, we can add in the method 'handleNode' the following check:

... && ((Element) node).getAttributeCount() == 0) )

Also, if the xml has two or more empty tags, one after another; this recursive method doesn't remove all empty tags!

(This answer is part of my evaluation of XOM as a potential replacement to dom4j.)

mhaller · Accepted Answer · 2009-11-06 12:49:59Z

3

As a side note: The different states of a tag actually have meaning:

Open-Closed Tag: The element exists and its value is an empty string
Single-Tag: The element exists, but the value is null or nil
Missing Tag: The element does not exist

So, by removing empty Open-Closed tags and Single-Tags, you're merging them with the group of missing tags and thus lose information.

answered Nov 6, 2009 at 12:49

mhaller

14.2k2 gold badges44 silver badges62 bronze badges

2 Comments

Chris R Over a year ago

Very good point - there are times when it is useful to remove tags whose value is empty or null, but there are also times when doing so could potentially be detrimental to the application.

Raymond Over a year ago

For my purpose, this is irrelevant

Sam · Accepted Answer · 2015-08-18 00:32:45Z

I tested Jonik's and Marco's sample codes. But those are not exactly what I want. So I modified their source and below code works well for me. I've already adjust this code in my project. please test it, if you want.

public String removeEmptyNode(String xml){
    String cleanedXml = null;
    try{
        xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n" + xml;
        InputStream input = new ByteArrayInputStream(xml.getBytes("UTF-8"));
        Document document = new Builder().build(input);
        removeEmptyNode(document.getRootElement());
        cleanedXml = document.toXML();
    }catch(Exception e){
        e.printStackTrace();
    }
    return cleanedXml;
}

private static void removeEmptyNode(Node node) {
    if(node.getChildCount()!=0){
        int count = node.getChildCount();
        for (int i = count-1; i >= 0 ; i--) { 
            removeEmptyNode(node.getChild(i));
        }
    }

    doCheck(node);
}

private static void doCheck(Node node){
    if(node.getChildCount() == 0 && "".equals(node.getValue().trim())) {
        try{node.getParent().removeChild(node);}catch(Exception e){}
    }       
}

Kennet · Accepted Answer · 2009-11-06 15:04:00Z

1

If the xml is feed as a String; regex could be used to filter out empty elements:

<(\\w+)></\\1>|<\\w+/>

This will find empty elements.

data.replaceAll(re, "")

data in this case a variable holding your xml string.
Not saying this would be the best of solutions, but it is possible...

answered Nov 6, 2009 at 15:04

Kennet

5,7962 gold badges26 silver badges24 bronze badges

Comments

TimP · Accepted Answer · 2011-12-22 12:40:51Z

1

I needed to add strip-space and indent elements to Chris R's answer, otherwise enclosing blocks, newly empty, are not removed:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:strip-space elements="*"/>
  <xsl:output indent="yes" />
  <xsl:template match="@*|node()">
    <xsl:if test=". != '' or ./@* != ''">
      <xsl:copy>
        <xsl:apply-templates  select="@*|node()"/>
      </xsl:copy>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

answered Dec 22, 2011 at 12:40

TimP

9651 gold badge8 silver badges13 bronze badges

Comments

Alex · Accepted Answer · 2009-11-06 12:19:34Z

0

With XSLT you could transform your XML to ignore the empty tags and re-write the document.

answered Nov 6, 2009 at 12:19

Alex

12.5k7 gold badges47 silver badges52 bronze badges

Comments

Luigi · Accepted Answer · 2015-07-23 09:21:39Z

To remove all empty tags, even if they are one after another, one possibile solution is:

 private void removeEmptyTags(Document document) {
    List<Node> listNode = new ArrayList<Node>();
    findListEmptyTags(document.getRootElement(), listNode);
    if (listNode.size() == 0)
        return;

    for (Node node : listNode) {
        node.getParent().removeChild(node);
    }
    removeEmptyTags(document);
}

private void findListEmptyTags(Node node, List<Node> listNode) {

    if (node != null && node.getChildCount() == 0 && "".equals(node.getValue()) && ((Element) node).getAttributeCount() == 0) {
        listNode.add(node);
        return;
    }
    // recurse the children
    for (int i = 0; i < node.getChildCount(); i++) {
        findListEmptyTags(node.getChild(i), listNode);
    }
}

Stéphane GRILLON · Accepted Answer · 2017-09-19 12:23:51Z

public static void main(String[] args) {

    final String regex1 = "<([a-zA-Z0-9-\\_]*)[^>]*/>";
    final String regex2 = "<([a-zA-Z0-9-\\_]*)[^>]*>\\s*</\\1>";

    String xmlString = "<xml><field1>bla</field1><field2></field2><field3/><structure1><field4><field50><field50/></field50></field4><field5></field5></structure1></xml>";
    System.out.println(xmlString);

    final Pattern pattern1 = Pattern.compile(regex1);
    final Pattern pattern2 = Pattern.compile(regex2);

    Matcher matcher1;
    Matcher matcher2;
    do { 
        xmlString = xmlString.replaceAll(regex1, "").replaceAll(regex2, "");
        matcher1 = pattern1.matcher(xmlString);
        matcher2 = pattern2.matcher(xmlString);
    } while (matcher1.find() || matcher2.find());

    System.out.println(xmlString);
}

Console:

<xml>
    <field1>bla</field1>
    <field2></field2>
    <field3/>
    <structure1>
        <field4>
            <field50>
                <field60/>
            </field50>
        </field4>
        <field5></field5>
    </structure1>
</xml>

<xml>
    <field1>bla</field1>
</xml>

Online demo here

Collectives™ on Stack Overflow

Java Remove empty XML tags

9 Answers 9

Comments

Comments

2 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

Comments

Comments

2 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related