0

This is the regex which is used to find block comments and it works absolutely fine

/\\*(?>(?:(?>[^*]+)|\\*(?!/))*)\\*/

I just need to modify it a little bit. Find a semi-colon (;) that "may" exists in the block comments and replace it with a white space.

Currently I am doing this

while (m.find()) {
    if (m.group().contains(";")) {
        replacement = m.group().replaceAll(";", "");
        m.appendReplacement(sb, replacement);
    }
}
m.appendTail(sb);

But I need to replace it with a str.replaceAll kind of statement. In short anything that is more efficient because I get out of memory exception. I fixed a few other regex that used to throw same exception and they are working fine. I hope this regex can also be optimized.

--- Edit ---

These are the string you can test this regex on

/* this* is a ;*comment ; */

/* This ; is* 
another
;*block
comment;
;*/

Thanks

2
  • I might write you the Regex if you can post an example phrase and then highlight what you want from it. Commented Nov 25, 2011 at 10:07
  • @Misha Please view the edited question. I have provided a sample String. Thanks Commented Nov 25, 2011 at 11:07

2 Answers 2

3

It'l be much simper to use (?s)/\*.+?\*/ regexp. In your expression you use negative lookahead that "eat" your memory. And your code may be simpler:

while (m.find()) {
    m.appendReplacement(sb, m.group().replace(";","");
}
m.appendTail(sb);
Sign up to request clarification or add additional context in comments.

5 Comments

+1 for clearing away the clutter (especially that m.group().contains(";") call), but his regex is optimized for efficiency, so I wouldn't change that. It's not how I would have written it, but it should perform significantly better than /\*.+?\*/.
I'm not sure about better efficiency. Lookahead could call problems with memory, especially with large text at entrance. Atomic group can't solve this at all.
It's only looking ahead for one character, and it only does that after it sees an asterisk. Performance-wise, lookaheads don't cause nearly as many problems as do alternations and quantifiers with overlapping effects. That's what the atomic groups are there for.
@AlanMoore Can you explain this a little bit "alternations and quantifiers with overlapping effects" .. Examples might be helpful. Thanks
@Ali: This question demonstrateshow an alternation can bog down a match when two or more alternatives are capable of matching the same characters. As for the quantifiers, see this.
0

There are two variants(try both):

1). Why are you using ?>? I don't know what it means and I don't see a need to use something special here like ?>. Change it to ?:.

2). Your loop is infinite. You need this:

    int index = 0;
    while (m.find(index)) {
        if (m.group().contains(";")) {
            replacement = m.group().replaceAll(";", "");
            m.appendReplacement(sb, replacement);
        }
        index = m.end();
    }

1 Comment

(?>...) is an atomic group, and it's necessary for maximum efficiency (possessive quantifiers would work, too). And it's not an infinite loop; the find() method keeps track of the match-start position by itself. find(int) is for special cases.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.