1

I know that Java's ByteBuffer.clear() is not really to clean all data in ByteBuffer, so when I StringBuilder.append() string every time, the final result has always appended all remaining chars in ByteBuffer, which are the old data from last write, so how to fix this issues?

int byteRead = -1;
int readCount = 0;
int BUFFER_SIZE = 256;
StringBuilder sb = new StringBuilder();
ByteBuffer buffer = ByteBuffer.allocate(BUFFER_SIZE);
ReadableByteChannel readableByteChannel = Channels.newChannel(is);
while ((byteRead = readableByteChannel.read(buffer)) > 0 && readCount < 68) {
    sb.append(new String(buffer.array(), "UTF-8"));
    buffer.clear();
    readCount++;
}
1
  • Eh? Clearing all the data is exactly what it's for. Your problem is that you're ignoring the read length by just building the string directly out of the entire array. Commented Jun 14, 2016 at 0:07

3 Answers 3

6

As already pointed out by other answers, you have to consider the position of the buffer, which gets updated by the read method. So the correct code looks like:

while ((byteRead = readableByteChannel.read(buffer)) > 0 && readCount < 68) {
    sb.append(new String(buffer.array(),
        buffer.arrayOffset(), buffer.arrayOffset()+buffer.position(), "UTF-8"));
    buffer.clear();
    readCount++;
}

Note that in your special case, arrayOffset() will always be zero, but you better write the code in a way, that it doesn’t break when you change something at the buffer allocation code.

But this code is broken. When you read a multiple-byte UTF-8 sequence, it may happen, that the first bytes of that sequence are read in one operation and the remaining bytes are read in the next one. Your attempts to create String instances from these incomplete sequences will produce invalid characters. Besides that, you are creating these String instances, just to copy their contents to a StringBuilder, which is quite inefficient.

So, to do it correctly, you should do something like:

int readCount = 0;
int BUFFER_SIZE = 256;
StringBuilder sb = new StringBuilder();
CharsetDecoder dec=StandardCharsets.UTF_8.newDecoder();
ByteBuffer buffer = ByteBuffer.allocate(BUFFER_SIZE);
CharBuffer cBuffer= CharBuffer.allocate(BUFFER_SIZE);
ReadableByteChannel readableByteChannel = Channels.newChannel(is);
while(readableByteChannel.read(buffer) > 0 && readCount < 68) {
    buffer.flip();
    while(dec.decode(buffer, cBuffer, false).isOverflow()) {
        cBuffer.flip();
        sb.append(cBuffer);
        cBuffer.clear();
    }
    buffer.compact();
    readCount++;
}
buffer.flip();
for(boolean more=true; more; ) {
    more=dec.decode(buffer, cBuffer, true).isOverflow();
    cBuffer.flip();
    sb.append(cBuffer);
    cBuffer.clear();
}

Note, how both, ReadableByteChannel and CharsetDecoder process the buffers using their positions and limits. All you have to do, is to use flip and compact correctly as shown in the documentation of compact.

The only exception is the appending to the Stringbuilder, as that’s not an NIO function. There, we have to use clear(), as we know that the Stringbuilder.append operation does consume all characters from the buffer.

Note that this code still does not deal with certain (unavoidable) error conditions, since you stop after an arbitrary number of reads, it’s always possible that you cut in the middle of a multi-byte UTF-8 sequence.


But this quite complicated logic has been implemented by the JRE already and if you give up the idea of cutting after a certain number of bytes, you can utilize that:

int readCount = 0;
int BUFFER_SIZE = 256;
StringBuilder sb = new StringBuilder();
CharBuffer cBuffer= CharBuffer.allocate(BUFFER_SIZE);
ReadableByteChannel readableByteChannel = Channels.newChannel(is);
Reader reader=Channels.newReader(readableByteChannel, "UTF-8");
while(reader.read(cBuffer) > 0 && readCount < 68) {
    cBuffer.flip();
    sb.append(cBuffer);
    cBuffer.clear();
    readCount++;
}

Now this code will limit the reading to 256 × 68 characters rather than bytes, but for UTF-8 encoded data, this makes a difference only when there are multi-byte sequences, about which you apparently didn’t care before.

Finally, since you apparently have an InputStream in the first place, you don’t need the ReadableByteChannel detour at all:

int readCount = 0;
int BUFFER_SIZE = 256;
StringBuilder sb = new StringBuilder();
CharBuffer cBuffer = CharBuffer.allocate(BUFFER_SIZE);
Reader reader = new InputStreamReader(is, StandardCharsets.UTF_8);
while(reader.read(cBuffer) > 0 && readCount < 68) {
    cBuffer.flip();
    sb.append(cBuffer);
    cBuffer.clear();
    readCount++;
}

This might look like “not being NIO code”, but Readers are still the canonical way of reading character data, even with NIO; there’s no replacement. The method Reader.read(CharBuffer) was missing in the first release of NIO, but handed in with Java 5.

Sign up to request clarification or add additional context in comments.

4 Comments

Why do not use BufferReader? Which performance better ReadableByteChannel vs Reader vs BufferReader?
Why use BufferedReader? A BufferedReader makes only sense, if you are repeatedly using its method for reading a single char or using a buffer smaller than the BufferedReader’s own buffer. In other words, it makes no sense. If your buffer is too small, just use a bigger buffer. And if you are concerned about performance, you don’t use the single char read method. The only remaining useful feature of BufferedReader is to allow processing of lines, which is irrelevant here.
It seems to be not-so widely known, but the Buffered… variants of InputStream and Reader will do nothing, if you pass in a sufficiently large buffer. If the buffer is only a bit smaller than their own buffer, performance may even get worse, due to unnecessary copying. Consequently, the Channel API does not provide buffered channel variants, as with channels, you always have to provide your own buffer.
Manually dealing with ReadableByteChannel here, only raises the chance of errors, while a plain Reader will do best, regardless of whether you initialize it via InputStream or ReadableByteChannel. You may consider using a CharBuffer larger than 256 here, just try with different sizes and measure. Or you use a CharBuffer large enough to hold the entire data instead and skip the copying to the StringBuilder if possible. CharBuffer implements the CharSequence interface so, depending on what you actually want to do, might be a drop-in replacement for the StringBuilder.
0

You can use the new String(byte[] bytes, int offset, int length, String charsetName()) constructor.

new String(buffer.array(), 0, byteRead, "UTF-8"); 

This will prevent the previous data from being used when the new String is created.

1 Comment

Above sample is the best? or your sample's performance is the best?
0

Use position() to get the current buffer position and get part of the array with Arrays.copyOf:

Arrays.copyOf(buffer.array(), 0, buffer.position());

Which will become in your case:

sb.append(new String(Arrays.copyOf(buffer.array(), 0, buffer.position()), "UTF-8"));

Or even shorter when using appropriate String constructor:

sb.append(new String(buffer.array(), 0, buffer.position(), "UTF-8"));

Or probably what you were looking for using slice(): sb.append(new String(buffer.slice().array(), "UTF-8"));

BTW. Instead of "UTF-8" it is better to use StandardCharsets.UTF_8.

1 Comment

@AndyChan The last one

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.