0

My Scala server gets InputStream object from socket by socket.getInputStream (some bytes sent from my socket client, the size of bytes is printed below)

And following code tries to read it to Array

  var buffer: Array[Byte] = null
  def read(stream: InputStream, size: Int) = {
    val start = System.nanoTime()
    buffer = new Array[Byte](size)
    var value: Int = 0
    (0 until size).foreach(i => {
      value = stream.read()
      buffer(i) = value.toByte
    })
    val end = System.nanoTime()
    println(s"Getting buffer from InputStream, size: $size, cost: ${(end - start)/1e6} ms")
    buffer
  }

Part of output is

Getting buffer from InputStream, size: 4, cost: 174.923596 ms
Getting buffer from InputStream, size: 2408728, cost: 919.207885 ms

However, for the same data size, some existed server could be much faster, e.g. Redis could send the bytes in ~10ms, so

Is it possible to improve the performance in this Program?

1
  • Why not use a high-level library like Fs2 or AkkaStreams? Or even, if you want to read everything you can just use Source from the stdlib. Commented Jan 24, 2022 at 13:34

1 Answer 1

3

stream.read() is the slowest take on the concept.

Instead you want the read(byte[]) variant, or the read(byte[], int offset, int length) variant (one is just a very simple, and performance-wise essentially free, wrapper around the 3-param method).

The 'overhead' of using read() ranges from 'slight' (in case buffers are involved) to 'a factor 1000x' in case there aren't. If it's the second, you can get back to the 'slight' overhead by wrapping your inputstream in a BufferedInputStream and read from that.

But no matter what happens, this:

int toRead = 1000;
byte[] data = new byte[toRead];
int readSoFar = 0;
while (readSoFar < toRead) {
  int read = in.read(data, readSoFar, toRead - readSoFar);
  if (read == -1) throw new IOException("Expected more data");
  toRead += read;
}

is far faster than:

int toRead = 1000;
byte[] data = new byte[toRead];
while (toRead > 0) {
  data[toRead--] = in.read();
}

usage of scala makes no difference in performance for these examples.

Sign up to request clarification or add additional context in comments.

3 Comments

In code I use reading single byte a time, but from source code (InputStream.java) I saw read a byte array is a for-loop to read single. Anyway I did some search and people says reading array is much faster. Did my understanding of source code wrong?
Yes, reading an entire byte array is much faster - that's what my (this) answer is also saying.
Oh yes I found only the InputStream base class is use loop to read array. And subclass use memcpy in most cases

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.