3

I have a string which holds some data and I need to remove some special characters from it and tokenize data.

Which of the following two methods should be preferred for better performance:

String data = "Random data (For performance) Waiting for reply?"
data=data.replaceAll("?", "");
data=data.replaceAll(".", "");
data=data.replaceAll(",", "");
data=data.replaceAll("(", "");
data=data.replaceAll(")", "");  

String[] tokens = data.split("\\s+");  
for(int j = 0; j < tokens.length; j++){
  //Logic on tokens
}  

OR

String data = "Random data (For performance) Waiting for reply?"

String[] tokens = data.split("\\s+");  
for(int j = 0; j < tokens.length; j++){
    tokens[j]=tokens[j].replace("?", "");
    tokens[j]=tokens[j].replace(".", "");
    tokens[j]=tokens[j].replace(",", "");
    tokens[j]=tokens[j].replace("(", "");
    tokens[j]=tokens[j].replace(")", "");      

  //Logic on each token
}  

Or Is there any other approach which can increase performance? (Some statistics on same would be greatly appreciated)

The For loop provided above will be used for performing other logic on each token.
Is the replace method imposed on a whole content faster or is replace on each token in a for loop (which is executed regardless of the replacing) faster?

i.e. Replace once and perform other operations or Replace step by step for each token and then perform the required operation.

Thanks in Advance

5
  • 3
    take a look at this - cqse.eu/en/blog/string-replace-performance Commented Nov 19, 2014 at 12:39
  • 1
    The answer to every single "which is faster, this code or this code" ever is "profile them and see". Commented Nov 19, 2014 at 12:40
  • For your first example I would prefer data=data.replaceAll("[\\.,\\(\\)?]"); to replace all in one step. This would be considerably faster. Commented Nov 19, 2014 at 12:47
  • Possible duplicate of stackoverflow.com/questions/17531362/… Commented Nov 19, 2014 at 13:01
  • Another dup: stackoverflow.com/questions/5373431/… Commented Nov 20, 2014 at 17:55

2 Answers 2

4

Just replace would be enough without any loops.

replaceAll uses regexp engine under the hood that has much more performance overhead.

There seems to be a common misunderstanding of this "All" suffix.

See Difference between String replace() and replaceAll().

Update

Found very similar question to this one:

Removing certain characters from a string

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for reply and also links which helped me understand replace function better, I have been working on replace and replaceAllfrom past few months and today I came to know that both replace and replaceAll actually replace all characters, was misunderstood with All suffix in replace function and I thought replaceAll would internally use for loop for replacing all occurrences of data which is again wrong from my side.. Thank yo u for sharing info. One small doubt, Is there any scenario where I can use replaceAll instead of replace?
@Abhishek replace is a special case equivalent to quoting the input before calling replaceAll. What you need is replaceAll("[?.,()]", "").
What I actually meant was does replace function imposed on whole content is faster or replace on each token in for loop(For loop is anyhow used for other operations) is faster?
@Abhishek, single replacement would be faster, of course, as involves less copy operations and memory allocations.
1

I am not aware of statistics for this kind of problem, but first of all, if you are concerned about performance, I would substitute the various replaceAll() calls with a single one, like this:

data=data.replaceAll("\\?|\.|\\)|\\(|,", "");

It might go faster.

2 Comments

In Java, you must double the backslashes, otherwise it won't compile, as e.g. \? is an illegal escape sequence. See my other comment for a simpler regex.
Agreed. I tested the expression in Regexplanet and forgot to double the backslashes. Thank you for the correction.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.