3

How do I do on-the-fly search & replace in a Java Stream (input or output)?

I don't want to load the stream into memory or to a file.

I just see the bytes passing by and I need to do some replacements. The sequences being replaced are short (up to 20 bytes).

5
  • 1
    It depends on what kind of "stream" it is. Is it text? Is it some sort of format with known field widths? You'll have to be much more specific in your question. Commented Oct 26, 2009 at 13:42
  • it's binary random junk. Commented Oct 26, 2009 at 13:47
  • Are the replacements just as long as the original "strings" (byte sequences)? Commented Oct 26, 2009 at 13:48
  • @Lucero: no. Both the string and the replacements are much much shorter than the stream. But replacements can be long, equal length, or shorter than the original string. Commented Oct 26, 2009 at 13:51
  • 1
    This is not really an answer to your question. You should though, look into the java.nio package. Take a look at the following examples: NIO Examples The first example shows how to do a simple "grep" on a file. Using the NIO you will not have to worry about a buffer size, just let the regular expression library methods do the heavy lifting. Commented Oct 26, 2009 at 20:15

3 Answers 3

4

You can use the class provided here if static replacement rules are enough for you.

Sign up to request clarification or add additional context in comments.

4 Comments

Just a small "academic" note: I looked over the source, and as far as I can tell the runtime the runtime/CPU usage - especially in the worst case - seems to be pretty bad. Assuming for instance 10 patterns of 101 characters, for each byte read there could be up to 1000 processing steps (comparare operations) performed. The DFA solution would require only one operation (table lookup). With increasing pattern size and number and input stream length this could be a problem.
(However, the source is well done in terms of structure and documentation and there is good test coverage, so please take my comment as suggestion for a better algorithm, I don't mean to criticize the answer.)
Thanks, one of the reasons to reference the class here is ability to get a feedback :) Your comment is right, will revise the algorithm.
This should probably point directly to issues.apache.org/jira/browse/IO-218 where it is proposed (more stable link source given it depends on apache commons-io anyway).
1

You could implement a deterministic finite automaton which looks at each byte once only (e.g. no lookbehind is required), so that you would basically stream the input through a buffer holding max as many characters as the length of your pattern, outputting the pattern on a match or overflowing (non-matched) characters when advancing in the pattern. Runtime is linear after preparation of the pattern.

Wikipedia has some information on pattern matching and how that works in theory.

1 Comment

Thank you, @Lucero. I was looking for a library solution.
0

I got some good ideas from the link provided and ended up writing a small class to handle replacement of $VAR$ variables in a stream. For posterity:

public class ReplacingOutputStream extends OutputStream {
    private static final int DOLLAR_SIGN = "$".codePointAt(0);
    private static final int BACKSLASH = "\\".codePointAt(0);
    private final OutputStream delegate;
    private final Map<String, Object> replacementValues;

    private int previous = Integer.MIN_VALUE;
    private boolean replacing = false;
    private ArrayList<Integer> replacement = new ArrayList<Integer>();


    public ReplacingOutputStream(OutputStream delegate, Map<String, Object> replacementValues) {
        this.delegate = delegate;
        this.replacementValues = replacementValues;
    }

    public @Override void write(int b) throws IOException {
        if (b == DOLLAR_SIGN && previous != BACKSLASH) {
            if (replacing) {
                doReplacement();
                replacing = false;
            } else {
                replacing = true;
            }
        } else {
            if (replacing) {
                replacement.add(b);
            } else {
                delegate.write(b);
            }
        }

        previous = b;
    }

    private void doReplacement() throws IOException {
        StringBuilder sb = new StringBuilder();
        for (Integer intval : replacement) {
            sb.append(Character.toChars(intval));
        }
        replacement.clear();

        String oldValue = sb.toString();
        Object _newValue = replacementValues.get(oldValue);
        if (_newValue == null) {
            throw new RuntimeException("Could not find replacement variable for value '"+oldValue+"'.");
        }

        String newValue = _newValue.toString();
        for (int i=0; i < newValue.length(); ++i) {
            int value = newValue.codePointAt(i);
            delegate.write(value);
        }
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.