Remove XML Tag and Content in XML String using Java Regex

Question

I have a XML String of 400 lines and it does consists of below tags repeated twice. I want to remove those tags

<Address>
<Location>Beach</Location>
<Dangerous>
    <Flag>N</Flag>
</Dangerous>
</Address>

I am using the below regex pattern but it's not replacing

xmlRequest.replaceAll("<Address>.*?</Address>$","");

I can able to do this in Notepad ++ by selecting [x].matches newline checkbox next to Regular Expression radio button in Find/Replace dialog box

Can anyone suggest what's wrong with my regular expression

Once again: do not process XML/HTML with regexes. Use XML tools. XML/HTML is a context-free language, a regular expression is not the right tool to process such languages. Only regular languages can be processed with regexes. — willeM_ Van Onsem
– willeM_ Van Onsem, Commented Mar 20, 2017 at 1:42
Indeed - please read stackoverflow.com/questions/6751105/… — James Fry
– James Fry, Commented Mar 20, 2017 at 1:43
@efektive, I need to completely remove that block inside the 400 lines of xml string — Vikram
– Vikram, Commented Mar 20, 2017 at 2:07

Vikram · Accepted Answer · 2017-03-20 03:13:39Z

8

xmlRequest.replaceAll("<Address>[\\s\\S]*?</Address>","");

.* don't contains the \n\r , so need use [\s\S] to match all

edited Mar 20, 2017 at 3:13

Vikram

7,58318 gold badges87 silver badges131 bronze badges

answered Mar 20, 2017 at 2:27

Kerwin

1,2121 gold badge7 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Vikram Over a year ago

Works fine Kerwin. Thank you

Michael Kay Over a year ago

No, it doesn't work fine. It works on the one test case that you have applied it to. It will fail on other test cases, and whoever has to investigate the bug will curse the person who wrote the code. Do not use regular expressions to process XML, use an XML parser.

Michael Kay Over a year ago

To expand on this, here are some cases it won't handle correctly: An address element with attributes. An address element with whitespace in the start or end tag. An address element containing a nested Address element. Address tags appearing within comments or CDATA sections. An empty Address element using a self-closing tag.

Matt D. Over a year ago

Hopefully the developer can think for themselves to determine whether this is valid to use or not. How about unit tests? How about me wanting to remove password's from SOAP requests before logging? Not everything is critical.

Community · Accepted Answer · 2017-05-23 12:25:29Z

0

As improper as it may be to do what you're suggesting. (See https://stackoverflow.com/a/1732454/6552039 for hilarity and enlightenment.)

You should be able to just ingest your xml with a org.w3c.dom.Document parser, then do a getElementsByTagName("Address"), and have it .remove(Element) the second one. (Assuming a particular interpretation of "below tags repeated twice".

edited May 23, 2017 at 12:25

CommunityBot

11 silver badge

answered Mar 20, 2017 at 2:07

b4n4n4p4nd4

701 silver badge10 bronze badges

Comments

Raju · Accepted Answer · 2018-01-21 12:05:15Z

0

A solution with JSoup

public static void main(String[] args){
    String XmlContent="<Address> <Location>Beach</Location><Dangerous> 
        <Flag>N</Flag> </Dangerous> </Address>";

    String tagToReplace="Address";
    String newValue="";

    Document doc = Jsoup.parse(XmlContent);
    ArrayList<Element> els =doc.getElementsByTag(tagToReplace);
    for(int i=0;i<els.size();i++){
        Element el = els.get(i);
        el.remove();
    }
    XmlContent=doc.body().children().toString();
}

answered Jan 21, 2018 at 12:05

Raju

3,0028 gold badges37 silver badges60 bronze badges

Collectives™ on Stack Overflow

Remove XML Tag and Content in XML String using Java Regex

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related