1

Greetings,

Let's say you wanted to test a string to see if it's an exact match, or, if it's a match with an _ and any number of characters appended following the _

Valid match examples:

MyTestString
MyTestString_
MyTestString_1234

If performance was a huge concern, which methods would you investigate? Currently I am doing the following:

if (String.equals(stringToMatch)) {
            // success
        } else {
            if (stringToMatch.contains(stringToMatch + "_")) {
                // success
            }
            // fail
        }

I tried replacing the pattern the String.contains _ with a Java.util.regex.Pattern match on _*, but that performed much worse. Is my solution here ideal or can you think of something more cleaver to improve performance a bit more?

Thanks for any thoughts

3
  • 2
    Just to mention: This stringToMatch.contains(stringToMatch + "_") will always evaluate to false, because the string, that should be within the first one, is longer than that. Commented May 27, 2011 at 14:01
  • Post the exact pattern that you tried. Btw, I wouldn't use a greedy one. Commented May 27, 2011 at 14:02
  • What does your data look like? Should "abc MyTestString cde" have a match? Is "MyTestStringFooBar" a valid match? Do you have a large text block? Commented May 27, 2011 at 14:06

3 Answers 3

8

You can do something like

if(string.startsWith(testString)) {
    int len = testString.length();
    if(string.length() == len || string.charAt(len) == '_')
          // success
}

I assume you want the testString to appear even if you have a "_"?


EDIT: On whether to use one long condition or nested if statements, there is no difference in code or performance.

public static void nestedIf(boolean a, boolean b) {
    if (a) {
        if (b) {
            System.out.println("a && b");
        }
    }
}

public static void logicalConditionIf(boolean a, boolean b) {
    if (a && b) {
        System.out.println("a && b");
    }
}

compiles to the same code. If you do javap -c

public static void nestedIf(boolean, boolean);
  Code:
   0:   iload_0
   1:   ifeq    16
   4:   iload_1
   5:   ifeq    16
   8:   getstatic       #7; //Field java/lang/System.out:Ljava/io/PrintStream;
   11:  ldc     #8; //String a && b
   13:  invokevirtual   #9; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   16:  return

public static void logicalConditionIf(boolean, boolean);
  Code:
   0:   iload_0
   1:   ifeq    16
   4:   iload_1
   5:   ifeq    16
   8:   getstatic       #7; //Field java/lang/System.out:Ljava/io/PrintStream;
   11:  ldc     #8; //String a && b
   13:  invokevirtual   #9; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   16:  return

The complied code is identical.

Sign up to request clarification or add additional context in comments.

6 Comments

putting all the checks in the same "if" will improve the performance, if this tidbit is called a lot. (as in a long loop)
Using a nested if, or a single if with a logical condition is the same thing. Using a nested if avoids needing to check the length() if there is not a match.
Further, that sounds like premature optimization. I suspect the JIT will end up doing the same thing in either case if the code really is hot.
Peter, thank you for the excellent response, this was most helpful!
@Peter: Thanks. I just had a case with a PHP script where replacing nested conditions by a single "if" improved performance significantly (several seconds faster), and I thought it might be worth checking as the OP seemed concerned about performance. Your test shows clearly that this is not the case in Java. About "length()", would it be called if the first part of the condition fails? @bkail: You are right, but it was worth a try.
|
2

You could use regular expressions to match patterns. You can use stringToMatch.matches(".*?_.*?"). This returns a boolean.

Comments

1

I ran some benchmarks. This is the quickest I can get.

    String a = "Test123";
    String b = "Test123_321tseT_Test_rest";
    int len1 = a.length();
    int len2 = b.length();
    if ((len1 == len2 || (len2 > len1 && (b.charAt(len1)) == '_'))
        && b.startsWith(a)) {
        System.out.println("success");
    } else {
        System.out.println("Fail");
    }

This will at least work correctly at reasonable performance.

Edit: I switched the _ check and the startsWith check, since startsWith will have worse perforam the _ check.

Edit2: Fixed StringIndexOutOfBoundsException.

Edit3: Peter Lawrey is correct that making only 1 call to a.length() spares time. 2.2% in my case. Latest benchmark shows I'm 88% faster then OP and 10% faster then Peter Lawrey's code.

Edit4: I replace all str.length() with a local var, and ran dozen more benchmarks. Now the results of the benchmarks are getting so random it's impossible to say what code is faster. My latest version seems to win by a notch.

1 Comment

I ran a benchmark vs Peter Lawrey's code, my code is 9% faster then his. It's also 85% faster then the code from the OP.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.