0

I have a lot of empty xml tags which needs to be removed from string.

 String dealData = dealDataWriter.toString();
 someData = someData.replaceAll("<somerandomField1/>", "");
 someData = someData.replaceAll("<somerandomField2/>", "");
 someData = someData.replaceAll("<somerandomField3/>", "");
 someData = someData.replaceAll("<somerandomField4/>", "");

This uses a lot of string operations which is not efficient, what can be better ways to avoid these operations.

2
  • 3
    Your string is actually a xml content ? You should use a java xml parser and delete your empty tags with it; it will be more efficiant. Commented Jan 10, 2018 at 11:20
  • Also you should specify what exactly you want to remove - are <somerandomField1 />, <somerandomField1></somerandomField1>, <somerandomField1 xsi:nil="true"/>, <somerandomField1 xsi:nil="true"></somerandomField1> empty tags? They are all empty elements. Commented Jan 10, 2018 at 11:41

4 Answers 4

1

I would not suggest to use Regex when operating on HTML/XML... but for a simple case like yours maybe it is ok to use a rule like this one:

someData.replaceAll("<\\w+?\\/>", "");

Test: link

If you want to consider also the optional spaces before and after the tag names:

someData.replaceAll("<\\s*\\w+?\\s*\\/>", "");

Test: link

Sign up to request clarification or add additional context in comments.

10 Comments

I would account for a probable space after the nodename with \s? or even \s*
What's the purpose of \\ preceding /?
You're totally right, thanks (OP: do you see how inadeguate can regex be for xml?)
And there's tons of characters except \w that might appear. The hyphen is just one that's popular.
@laune you have to put a backslash to consider a character literally in Regex. Since we're in Java, you have to put an extra backslash for the same reason
|
0

Try the following code, You can remove all the tag which does not have any space in it.

someData.replaceAll("<\w+/>","");

Comments

0

Alternatively to using regex or string matching, you can use an xml parser to find empty tags and remove them.

See the answers given over here: Java Remove empty XML tags

Comments

0

If you like to remove <tagA></tagA> and also <tagB/> you can use following regex. Please note that \1 is used to back reference matching group.

// identifies empty tag i.e <tag1></tag> or <tag/>
// it also supports the possibilities of white spaces around or within the tag. however tags with whitespace as value will not match.
private static final String EMPTY_VALUED_TAG_REGEX = "\\s*<\\s*(\\w+)\\s*></\\s*\\1\\s*>|\\s*<\\s*\\w+\\s*/\\s*>";

Run the code on ideone

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.