1

I have a program that read and process data in a raw text String using StringTokenizer

Originally the StringTokenizer contains about 1,500 tokens and the program works fine. However the raw content increased and now it become about 12,000 tokens and the CPU consumption is largely increased.

I'm looking into the problem and try to identify the root cause. The program uses a while loop to check if there is any token left, and based on the token read, a different action would be taken. I'm checking those different actions to see if those action could be improved.

Meanwhile I would like to ask if handling one long length StringTokenizer would cost more CPU comparing to handling 10 short StringTokenizers.

4
  • 8
    Are you sure it's StringTokenizer and not what you're doing with it? Please show a short but complete program which demonstrates the problem. Commented Sep 14, 2011 at 9:57
  • I don't think so. Strings are random-access, that should not slow down for long Strings. Commented Sep 14, 2011 at 9:58
  • 1
    There isn't anything in StringTokenizer that would blow up for long inputs. It has to be something in the surrounding code. Commented Sep 14, 2011 at 10:00
  • 3
    This question is worthless without an SSCCE. Commented Sep 14, 2011 at 10:18

3 Answers 3

1

First of all, thanks for your opinions. During last weekend I have run stress test with real data using a revised program and so happy that my problem is solved (Many thanks to A.J. ^_^ ). I would like to share my findings.

After studying the example mentioned by A.J., I have run some test program to read and process data using StringTokenizer and "indexOf" (Regex is even worst compared to StringTokenizer in my situation). My test program would count how many mini second is needed to process 24 messages (~12000 tokens each).

StringTokenizer need ~2700ms to complete, and "indexOf" only take ~210ms!

I've then revised my program like this (with minimum changes) and tested with real volume during last weekend:

Original program:

public class MsgProcessor {
    //Some other definition and methods ...

    public void processMessage (String msg) 
    {
        //...

        StringTokenizer token = new StringTokenizer(msg, FieldSeparator);
        while (token.hasMoreTokens()) {
            my_data = token.nextToken();
            // peformance different action base on token read
        }
    }
}

And here is updated program using "indexOf":

public class MsgProcessor {
    //Some other definition and methods ...
    private int tokenStart=0;
    private int tokenEnd=0;

    public void processMessage (String msg) 
    {
        //...
        tokenStart=0;
        tokenEnd=0;

        while (isReadingData) {
            my_data = getToken(msg);
            if (my_data == null)
                break;
            // peformance different action base on token read ...
        }
    }

    private String getToken (String msg)
    {
        String result = null;
        if ((tokenEnd = msg.indexOf(FieldSeparator, tokenStart)) >= 0) {
            result = msg.substring(tokenStart, tokenEnd);
            tokenStart = tokenEnd + 1;
        }
        return result;
    }
}
  • Please noticed that there is no "null" data in original tokens. If no FieldSeparator found, "getToken(msg)" will return null (as a signal for "no more token").
Sign up to request clarification or add additional context in comments.

Comments

1

StringTokenizer usage is discouraged according to the StringTokenizer java doc. It is not deprecated though so its possible to use. only its not recommended. here is what is written:

"StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead."

Please check the following post. It has a very nice example of various ways to doing the same thing that you try to do.

performance-of-stringtokenizer-class-vs-split-method-in-java

you can try the samples provided there and see what works best for you.

1 Comment

Thanks A.J. Your recommended post is very helpful for solving my problem.
0

Why don't you try the newer Scanner class instead? Scanners can be constructed using streams and files. Not sure it is more efficient than the old StringTokenizer, though.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.