0

I have a XML String of 400 lines and it does consists of below tags repeated twice. I want to remove those tags

<Address>
<Location>Beach</Location>
<Dangerous>
    <Flag>N</Flag>
</Dangerous>
</Address>

I am using the below regex pattern but it's not replacing

xmlRequest.replaceAll("<Address>.*?</Address>$","");

I can able to do this in Notepad ++ by selecting [x].matches newline checkbox next to Regular Expression radio button in Find/Replace dialog box

Can anyone suggest what's wrong with my regular expression

5
  • 1
    Once again: do not process XML/HTML with regexes. Use XML tools. XML/HTML is a context-free language, a regular expression is not the right tool to process such languages. Only regular languages can be processed with regexes. Commented Mar 20, 2017 at 1:42
  • 1
    Indeed - please read stackoverflow.com/questions/6751105/… Commented Mar 20, 2017 at 1:43
  • Jsoup seems like a good option Commented Mar 20, 2017 at 1:49
  • Could you post the expected output? Commented Mar 20, 2017 at 1:52
  • @efektive, I need to completely remove that block inside the 400 lines of xml string Commented Mar 20, 2017 at 2:07

3 Answers 3

8
xmlRequest.replaceAll("<Address>[\\s\\S]*?</Address>","");

.* don't contains the \n\r , so need use [\s\S] to match all

Sign up to request clarification or add additional context in comments.

4 Comments

Works fine Kerwin. Thank you
No, it doesn't work fine. It works on the one test case that you have applied it to. It will fail on other test cases, and whoever has to investigate the bug will curse the person who wrote the code. Do not use regular expressions to process XML, use an XML parser.
To expand on this, here are some cases it won't handle correctly: An address element with attributes. An address element with whitespace in the start or end tag. An address element containing a nested Address element. Address tags appearing within comments or CDATA sections. An empty Address element using a self-closing tag.
Hopefully the developer can think for themselves to determine whether this is valid to use or not. How about unit tests? How about me wanting to remove password's from SOAP requests before logging? Not everything is critical.
0

As improper as it may be to do what you're suggesting. (See https://stackoverflow.com/a/1732454/6552039 for hilarity and enlightenment.)

You should be able to just ingest your xml with a org.w3c.dom.Document parser, then do a getElementsByTagName("Address"), and have it .remove(Element) the second one. (Assuming a particular interpretation of "below tags repeated twice".

Comments

0

A solution with JSoup

public static void main(String[] args){
    String XmlContent="<Address> <Location>Beach</Location><Dangerous> 
        <Flag>N</Flag> </Dangerous> </Address>";

    String tagToReplace="Address";
    String newValue="";

    Document doc = Jsoup.parse(XmlContent);
    ArrayList<Element> els =doc.getElementsByTag(tagToReplace);
    for(int i=0;i<els.size();i++){
        Element el = els.get(i);
        el.remove();
    }
    XmlContent=doc.body().children().toString();
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.