How to reading Java (Using Java lib in Scala) InputStream efficiently?

Question

My Scala server gets InputStream object from socket by socket.getInputStream (some bytes sent from my socket client, the size of bytes is printed below)

And following code tries to read it to Array

  var buffer: Array[Byte] = null
  def read(stream: InputStream, size: Int) = {
    val start = System.nanoTime()
    buffer = new Array[Byte](size)
    var value: Int = 0
    (0 until size).foreach(i => {
      value = stream.read()
      buffer(i) = value.toByte
    })
    val end = System.nanoTime()
    println(s"Getting buffer from InputStream, size: $size, cost: ${(end - start)/1e6} ms")
    buffer
  }

Part of output is

Getting buffer from InputStream, size: 4, cost: 174.923596 ms
Getting buffer from InputStream, size: 2408728, cost: 919.207885 ms

However, for the same data size, some existed server could be much faster, e.g. Redis could send the bytes in ~10ms, so

Is it possible to improve the performance in this Program?

Why not use a high-level library like Fs2 or AkkaStreams? Or even, if you want to read everything you can just use Source from the stdlib. — Luis Miguel Mejía Suárez
– Luis Miguel Mejía Suárez, Commented Jan 24, 2022 at 13:34

rzwitserloot · Accepted Answer · 2022-01-24 12:38:05Z

3

stream.read() is the slowest take on the concept.

Instead you want the read(byte[]) variant, or the read(byte[], int offset, int length) variant (one is just a very simple, and performance-wise essentially free, wrapper around the 3-param method).

The 'overhead' of using read() ranges from 'slight' (in case buffers are involved) to 'a factor 1000x' in case there aren't. If it's the second, you can get back to the 'slight' overhead by wrapping your inputstream in a BufferedInputStream and read from that.

But no matter what happens, this:

int toRead = 1000;
byte[] data = new byte[toRead];
int readSoFar = 0;
while (readSoFar < toRead) {
  int read = in.read(data, readSoFar, toRead - readSoFar);
  if (read == -1) throw new IOException("Expected more data");
  toRead += read;
}

is far faster than:

int toRead = 1000;
byte[] data = new byte[toRead];
while (toRead > 0) {
  data[toRead--] = in.read();
}

usage of scala makes no difference in performance for these examples.

answered Jan 24, 2022 at 12:38

rzwitserloot

107k6 gold badges74 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Litchy Over a year ago

In code I use reading single byte a time, but from source code (InputStream.java) I saw read a byte array is a for-loop to read single. Anyway I did some search and people says reading array is much faster. Did my understanding of source code wrong?

rzwitserloot Over a year ago

Yes, reading an entire byte array is much faster - that's what my (this) answer is also saying.

Litchy Over a year ago

Oh yes I found only the InputStream base class is use loop to read array. And subclass use memcpy in most cases

Collectives™ on Stack Overflow

How to reading Java (Using Java lib in Scala) InputStream efficiently?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related