9

I'm facing a dilemma. I'm parsing a string and can either do

s.matches(regex)

or I can do

s.startsWith(..) && s.endsWith(..)

As you already realize, its not a complicated regexp, and both situations will work. The idea is, the string may be very long (hundreds of chars), so I wish to maximize efficiency. What works how and better suits the issue?

7
  • 3
    Regex is slower as it needs regex compilation Commented Nov 6, 2013 at 8:38
  • 8
    When I see questions like "what is faster?" I always think "well just benchmark it and check it out yourself". Commented Nov 6, 2013 at 8:39
  • 1
    you could write the benchmarks in about 5 minutes flat Commented Nov 6, 2013 at 8:41
  • remember to cache your regex Commented Nov 6, 2013 at 8:41
  • 2
    Why don't you time the execution of both and tell us the timings? Commented Nov 6, 2013 at 8:43

3 Answers 3

9

Here's a really rather crude benchmark to give you an idea. Adapt it to your use cases to give you more relevant results.

TL;DR

  • startsWith() and endsWith() are much faster

Detailed results

Results after 1 000 000 runs:

- Uncompiled pattern:        1091 ms
- Compiled pattern:          745 ms
- startsWith() / endsWith(): 24 ms
public class TestRegex {

    String regex = "^start.*end$";
    Pattern p = Pattern.compile(regex);
    
    String start = "start";
    String end = "end";
    String search = start + "fewbjlhfgljghfadsjhfdsaglfdhjgahfgfjkhgfdkhjsagafdskghjafdkhjgfadskhjgfdsakhjgfdaskhjgafdskjhgafdsjhkgfads" + end;
    
    int runs = 1000000;

    @Test
    public final void test() {
        // Init run
        for (int i = 0; i < runs; i++) {
            search.matches(regex);
        }
        for (int i = 0; i < runs; i++) {
            p.matcher(search).matches();
        }
        for (int i = 0; i < runs; i++) {
            search.startsWith(start);
            search.endsWith(end);
        }


        // Timed run
        Stopwatch s = Stopwatch.createStarted();
        for (int i = 0; i < runs; i++) {
            search.matches(regex);
        }
        System.out.println(s.elapsed(TimeUnit.MILLISECONDS));
        s.reset();
        
        s.start();
        for (int i = 0; i < runs; i++) {
            p.matcher(search).matches();
        }
        System.out.println(s.elapsed(TimeUnit.MILLISECONDS));
        s.reset();
        
        s.start();
        for (int i = 0; i < runs; i++) {
            search.startsWith(start);
            search.endsWith(end);
        }
        System.out.println(s.elapsed(TimeUnit.MILLISECONDS));
    }

}
Sign up to request clarification or add additional context in comments.

1 Comment

+1 The JIT will replace certain String methods with highly optimized assembler code; sometimes, they can be replaced with a single CPU opcode.
5

Note that both methods may report different results in cases where the string expeted at the end is a suffix of the start string:

^start.*art$

will not match

"start"

while

"start".startsWith("start") && "start".endsWith("art")

will be true.

Comments

2

Indeed, the difference is there and noticeable for small strings as well. Having a compiled version of the regex using a pattern does make some enhancements, but no doubt, its the worst idea when the match is easy.

Thanks everyone.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.