StringUtils replace text in between two patterns

Question

Hi I found really useful the apache operator

StringUtils.substringBetween(fileContent, "<![CDATA[", "]]>")

to extract information inside

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<envelope>
    <xxxx>
        <yyyy>
            <![CDATA[

                    <?xml version="1.0" encoding="UTF-8" ?>
                    <Document >
                        <eee>
                            <tt>
                                <ss>zzzzzzz</ss>
                                <aa>2021-09-09T10:39:29.850Z</aa>
                                <aaaa>
                                    <Cd>cccc</Cd>
                                </aaaa>
                                <dd>ssss</dd>
                                <ff></ff>
                            </tt>
                        </eee>
                    </Document>
                ]]>
        </yyyy>
    </xxxx>
</envelope>

But now what I'm looking is another operator or regex that allow me to replace a dynamic xml

![CDATA["old_xml"]]

by another xml

![CDATA["new_xml"]]

Any idea idea how to accomplish this?

Regards.

This works great..... until you have an XML with two CDATA sections, one after the other. As has been discussed at great and passionate length on this site over the past decade, regex is categorically the WRONG tool for working with arbitrary XML, HTML, JSON, etc. You need a real parser for whatever flavor you're dealing with. "Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems." -- Jamie Zawinski — Jim Garrison
– Jim Garrison, Commented Oct 30, 2021 at 0:09
100% agree, but I only extracting the "text" from CDATA, once that I extract the text I use DOM parser — paul
– paul, Commented Oct 30, 2021 at 9:22
If you insist on using regex, be prepared for it to break when you least expect it. Also prepare to be cursed by whoever has to maintain it. — Jim Garrison
– Jim Garrison, Commented Oct 31, 2021 at 0:25

anubhava · Accepted Answer · 2021-10-31 03:33:54Z

1

Instead of StringUtils, you can use String#replaceAll method:

fileContent = fileContent
  .replaceAll("(?s)(<!\\[CDATA\\[).+?(]]>)", "$1foo$2");

Explanation:

(?s): Enable DOTALL mode so that . can match line breaks as well in .+?
(<!\\[CDATA\\[): Match opening <![CDATA[ substring and capture in group #1
.+?: Match 0 or more of any characters including line break
(]]>): Match closing ]]? substring and capture in group #2
$1foo$2: Replace with foo surrounded with back-references of capture group 1 and 2 on both sides

edited Oct 31, 2021 at 3:33

answered Oct 29, 2021 at 22:26

anubhava

790k67 gold badges603 silver badges671 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

anubhava Over a year ago

@paul: Answer is updated. Had you provided this use-case data upfront we would have got to this answer in first go itself.

Jim Garrison Over a year ago

.replaceAll("(?s)(<!\\[CDATA\\[).+?(]]>)", "$1foo$2") -- what is (?s)? Did you mean (?:\\s)?

anubhava Over a year ago

@JimGarrison: (?s) is for enabling DOTALL mode so that . can match line breaks as well in .+?

Arvind Kumar Avinash · Accepted Answer · 2021-10-29 23:17:46Z

1

You can use the regex, (\<!\[CDATA\[).*?(\]\]>).

Demo:

public class Main {
    public static void main(String[] args) {
        String xml = """
                ...
                    <data><![CDATA[a < b]]></data>
                ...
                """;

        String replacement = "foo";

        xml = xml.replaceAll("(\\<!\\[CDATA\\[).*?(\\]\\]>)", "$1" + replacement + "$2");

        System.out.println(xml);
    }
}

Output:

...
    <data><![CDATA[foo]]></data>
...

Explanation of the regex:

( : Start of group#1
- \<!\[CDATA\[ : String <![CDATA[
) : End of group#1
.*? : Any character any number of times
( : Start of group#2
- \]\]>: String ]]>
) : End of group#2

edited Oct 29, 2021 at 23:17

answered Oct 29, 2021 at 23:00

Arvind Kumar Avinash

81k10 gold badges98 silver badges144 bronze badges

Collectives™ on Stack Overflow

StringUtils replace text in between two patterns

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related