0

This is probably an incredibly simple question, as well as likely a duplicate (although I did try to check beforehand), but which is less expensive when used in a loop, String.replaceAll() or matcher.replaceAll()?
While I was told

Pattern regexPattern = Pattern.compile("[^a-zA-Z0-9]");
Matcher matcher;
String thisWord;
while (Scanner.hasNext()) {
   matcher = regexPattern.matcher(Scanner.next());
   thisWord = matcher.replaceAll("");
   ...
} 

is better, because you only have to compile the regex once, I would think that the benefits of

String thisWord;
while (Scanner.hasNext()) {
   thisWord = Scanner.next().replaceAll("[^a-zA-Z0-9]","");
   ...
}

far outweigh the matcher method, due to not having to initialize the matcher every time. (I understand the matcher exists already, so you are not recreating it.)

Can someone please explain how my reasoning is false? Am I misunderstanding what Pattern.matcher() does?

2
  • 1
    Although this one in particular doesn't depend on the machine specific, you can/should benchmark it before asking. Commented Sep 22, 2020 at 4:43
  • Comment: Pattern.compile does not cache the result. Commented Sep 22, 2020 at 4:44

2 Answers 2

1

In OpenJDK, String.replaceAll is defined as follows:

    public String replaceAll(String regex, String replacement) {
        return Pattern.compile(regex).matcher(this).replaceAll(replacement);
    }

[code link]

So at least with that implementation, it won't give better performance than compiling the pattern only once and using Matcher.replaceAll.

It's possible that there are other JDK implementations where String.replaceAll is implemented differently, but I'd be very surprised if there were any where it performed better than Matcher.replaceAll.


[…] due to not having to initialize the matcher every time. (I understand the matcher exists already, so you are not recreating it.)

I think you have a misunderstanding here. You really do create a new Matcher instance on each loop iteration; but that is very cheap, and not something to be concerned about performance-wise.


Incidentally, you don't actually need a separate 'matcher' variable if you don't want one; you'll get exactly the same behavior and performance if you write:

   thisWord = regexPattern.matcher(Scanner.next()).replaceAll("");
Sign up to request clarification or add additional context in comments.

6 Comments

Doesn't Matcher matcher; create the variable, and Pattern.matcher() initializes it, or is the variable naming just to sort of reserve the name, and not much else?
@AharonKatz You may want to read a book or something for that. It seems that you misunderstood the concept. In java a variable (with object type) is actually a reference.
@AharonKatz: Terminology-wise, we say that Matcher matcher; declares the variable, and that matcher = ...; assigns a value to the variable. (The latter "initializes" the variable only the first time it's invoked.) Both are nearly free. Creating the instance of Matcher -- which happens inside the call to Pattern.matcher -- is a bit more expensive, though still quite cheap.
@AharonKatz: My pleasure! Shana tova, BTW. :-)
@ruakh I was curious about the name. You too.
|
0

There is a more efficient way if you reset the same matcher, then it is not regenerated on each occasion inside the loop which makes a copy of most of the same information relating to the Pattern structure.

Pattern regexPattern = Pattern.compile("[^a-zA-Z0-9]");
Matcher matcher = regexPattern.matcher("");
String thisWord;
while (Scanner.hasNext()) {
   matcher = matcher.reset(Scanner.next());
   thisWord = matcher.replaceAll("");
   // ...
} 

There is a one-off cost to create the matcher outside the loop regexPattern.matcher("") but the calls to matcher.reset(xxx) will be quicker because they re-use that matcher rather than re-generating a new matcher instance each time. This reduces the amount of GC required.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.