Java Performance issue for long length StringTokenizer

Question

I have a program that read and process data in a raw text String using StringTokenizer

Originally the StringTokenizer contains about 1,500 tokens and the program works fine. However the raw content increased and now it become about 12,000 tokens and the CPU consumption is largely increased.

I'm looking into the problem and try to identify the root cause. The program uses a while loop to check if there is any token left, and based on the token read, a different action would be taken. I'm checking those different actions to see if those action could be improved.

Meanwhile I would like to ask if handling one long length StringTokenizer would cost more CPU comparing to handling 10 short StringTokenizers.

Are you sure it's StringTokenizer and not what you're doing with it? Please show a short but complete program which demonstrates the problem. — Jon Skeet
– Jon Skeet, Commented Sep 14, 2011 at 9:57
I don't think so. Strings are random-access, that should not slow down for long Strings. — Thilo
– Thilo, Commented Sep 14, 2011 at 9:58
There isn't anything in StringTokenizer that would blow up for long inputs. It has to be something in the surrounding code. — Barend
– Barend, Commented Sep 14, 2011 at 10:00

Terence Chan · Accepted Answer · 2011-09-19 04:06:36Z

First of all, thanks for your opinions. During last weekend I have run stress test with real data using a revised program and so happy that my problem is solved (Many thanks to A.J. ^_^ ). I would like to share my findings.

After studying the example mentioned by A.J., I have run some test program to read and process data using StringTokenizer and "indexOf" (Regex is even worst compared to StringTokenizer in my situation). My test program would count how many mini second is needed to process 24 messages (~12000 tokens each).

StringTokenizer need ~2700ms to complete, and "indexOf" only take ~210ms!

I've then revised my program like this (with minimum changes) and tested with real volume during last weekend:

Original program:

public class MsgProcessor {
    //Some other definition and methods ...

    public void processMessage (String msg) 
    {
        //...

        StringTokenizer token = new StringTokenizer(msg, FieldSeparator);
        while (token.hasMoreTokens()) {
            my_data = token.nextToken();
            // peformance different action base on token read
        }
    }
}

And here is updated program using "indexOf":

public class MsgProcessor {
    //Some other definition and methods ...
    private int tokenStart=0;
    private int tokenEnd=0;

    public void processMessage (String msg) 
    {
        //...
        tokenStart=0;
        tokenEnd=0;

        while (isReadingData) {
            my_data = getToken(msg);
            if (my_data == null)
                break;
            // peformance different action base on token read ...
        }
    }

    private String getToken (String msg)
    {
        String result = null;
        if ((tokenEnd = msg.indexOf(FieldSeparator, tokenStart)) >= 0) {
            result = msg.substring(tokenStart, tokenEnd);
            tokenStart = tokenEnd + 1;
        }
        return result;
    }
}

Please noticed that there is no "null" data in original tokens. If no FieldSeparator found, "getToken(msg)" will return null (as a signal for "no more token").

Community · Accepted Answer · 2017-05-23 12:29:09Z

1

StringTokenizer usage is discouraged according to the StringTokenizer java doc. It is not deprecated though so its possible to use. only its not recommended. here is what is written:

"StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead."

Please check the following post. It has a very nice example of various ways to doing the same thing that you try to do.

performance-of-stringtokenizer-class-vs-split-method-in-java

you can try the samples provided there and see what works best for you.

edited May 23, 2017 at 12:29

CommunityBot

11 silver badge

answered Sep 14, 2011 at 10:50

user890904

1 Comment

Terence Chan Over a year ago

Thanks A.J. Your recommended post is very helpful for solving my problem.

Mister Smith · Accepted Answer · 2011-09-14 10:41:12Z

0

Why don't you try the newer Scanner class instead? Scanners can be constructed using streams and files. Not sure it is more efficient than the old StringTokenizer, though.

answered Sep 14, 2011 at 10:41

Mister Smith

28.2k21 gold badges116 silver badges204 bronze badges

Collectives™ on Stack Overflow

Java Performance issue for long length StringTokenizer

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related