0

I'd like to use something like StringBuilder to hold a string, and then perform a large number of regex replaceAll operations on it, in an efficient way. I'd like to leverage StringBuilder's variable sized array and prevent temporary string allocations. That is, I'd like the regex replaceAll operation to mutate the array held by StringBuilder as needed, without allocating temporary strings. How can I do this?

Unfortunately StringBuilder does not have a built-in method to do this. It only has a replace() method without regex, and I can't see a way to do this without effectively replacing the entire StringBuilder buffer with a newly allocated String using Matcher, which I'd like to avoid.

7
  • You could apply the regex either before building the string or after building the string. As a follow-up: why do you want to do this? Is this a bid to avoid temporary objects or something? Commented Dec 7, 2018 at 21:39
  • Can you put the stuff you want to replace in one regex? Commented Dec 7, 2018 at 21:39
  • If you do end up going the route where you use replaceAll on a String instead of StringBuilder one way to be more efficient is to compile the Pattern beforehand and use that to replace things with since replaceAll will call Pattern.compile everytime. Commented Dec 7, 2018 at 21:40
  • If you want to replace your matches you can use Matcher#appendReplacement(StringBuilder, String) Commented Dec 7, 2018 at 21:54
  • 1
    Then just allocate a char array and do it like you would in C. How complex are these regular expressions? If they are actual regular expressions as in just concatenation, alteration and Kleene star then rolling your own class to match on a char array isn't difficult, Assuming plain ASCII chars If you start throwing in all the perl functionality and UTF encoding it gets more complicated. Commented Dec 7, 2018 at 23:33

1 Answer 1

1

Regex doesn't create extra Strings. It verifies that strings match (or don't match) a pattern.

Capture groups return back Strings, but Strings in Java are not mutable, so you can't have them be represented by a mutable storage area, or even part of a mutable storage area.

Also, a Regex operation is not a single step (even if it appears to be in the code), but a run of a state machine with the string as input. Java is multi-threaded, and the state machine would not work correctly if the data is being modified as the machine runs over it. To fix this would require locking the buffer, which would incur additional overheads.

Between the overhead of a lock and the overhead of having a different String object, the overhead of a lock would make the savings in maintaining two immutable objects negative. In short, you'd expend far more cpu cycles obtaining the lock than you'd save in not having a dozen (or likely even a hundred) additional strings.

Finally, the entire JVM contains string specific optimizations. If you wanted a mutable string, those optimizations wouldn't work, and would cause bizarre behavior in one of the more commonly used data types within the JVM.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.