1

Hi I found really useful the apache operator

StringUtils.substringBetween(fileContent, "<![CDATA[", "]]>") 

to extract information inside

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<envelope>
    <xxxx>
        <yyyy>
            <![CDATA[

                    <?xml version="1.0" encoding="UTF-8" ?>
                    <Document >
                        <eee>
                            <tt>
                                <ss>zzzzzzz</ss>
                                <aa>2021-09-09T10:39:29.850Z</aa>
                                <aaaa>
                                    <Cd>cccc</Cd>
                                </aaaa>
                                <dd>ssss</dd>
                                <ff></ff>
                            </tt>
                        </eee>
                    </Document>
                ]]>
        </yyyy>
    </xxxx>
</envelope>

But now what I'm looking is another operator or regex that allow me to replace a dynamic xml

![CDATA["old_xml"]] 

by another xml

![CDATA["new_xml"]]

Any idea idea how to accomplish this?

Regards.

4
  • 1
    This works great..... until you have an XML with two CDATA sections, one after the other. As has been discussed at great and passionate length on this site over the past decade, regex is categorically the WRONG tool for working with arbitrary XML, HTML, JSON, etc. You need a real parser for whatever flavor you're dealing with. "Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems." -- Jamie Zawinski Commented Oct 30, 2021 at 0:09
  • 100% agree, but I only extracting the "text" from CDATA, once that I extract the text I use DOM parser Commented Oct 30, 2021 at 9:22
  • If you insist on using regex, be prepared for it to break when you least expect it. Also prepare to be cursed by whoever has to maintain it. Commented Oct 31, 2021 at 0:25
  • The idea has been finally rejected XD Commented Oct 31, 2021 at 0:26

2 Answers 2

1

Instead of StringUtils, you can use String#replaceAll method:

fileContent = fileContent
  .replaceAll("(?s)(<!\\[CDATA\\[).+?(]]>)", "$1foo$2");

Explanation:

  • (?s): Enable DOTALL mode so that . can match line breaks as well in .+?
  • (<!\\[CDATA\\[): Match opening <![CDATA[ substring and capture in group #1
  • .+?: Match 0 or more of any characters including line break
  • (]]>): Match closing ]]? substring and capture in group #2
  • $1foo$2: Replace with foo surrounded with back-references of capture group 1 and 2 on both sides
Sign up to request clarification or add additional context in comments.

3 Comments

@paul: Answer is updated. Had you provided this use-case data upfront we would have got to this answer in first go itself.
.replaceAll("(?s)(<!\\[CDATA\\[).+?(]]>)", "$1foo$2") -- what is (?s)? Did you mean (?:\\s)?
@JimGarrison: (?s) is for enabling DOTALL mode so that . can match line breaks as well in .+?
1

You can use the regex, (\<!\[CDATA\[).*?(\]\]>).

Demo:

public class Main {
    public static void main(String[] args) {
        String xml = """
                ...
                    <data><![CDATA[a < b]]></data>
                ...
                """;

        String replacement = "foo";

        xml = xml.replaceAll("(\\<!\\[CDATA\\[).*?(\\]\\]>)", "$1" + replacement + "$2");

        System.out.println(xml);
    }
}

Output:

...
    <data><![CDATA[foo]]></data>
...

Explanation of the regex:

  • ( : Start of group#1
    • \<!\[CDATA\[ : String <![CDATA[
  • ) : End of group#1
  • .*? : Any character any number of times
  • ( : Start of group#2
    • \]\]>: String ]]>
  • ) : End of group#2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.