10

Being bored earlier today I started thinking a bit about the relative performance of buffered and unbuffered byte streams in Java. As a simple test, I downloaded a reasonably large text file and wrote a short program to determine the effect that buffered streams has when copying the file. Four tests were performed:

  1. Copying the file using unbuffered input and output byte streams.
  2. Copying the file using a buffered input stream and an unbuffered output stream.
  3. Copying the file using an unbuffered input stream and a buffered output stream.
  4. Copying the file using buffered input and output streams.

Unsurprisingly, using buffered input and output streams is orders of magnitude faster than using unbuffered streams. However, the really interesting thing (to me at least) was the difference in speed between cases 2 and 3. Some sample results are as follows:

Unbuffered input, unbuffered output
Time: 36.602513585

Buffered input, unbuffered output
Time: 26.449306847

Unbuffered input, buffered output
Time: 6.673194184

Buffered input, buffered output
Time: 0.069888689

For those interested, the code is available here at Github. Can anyone shed any light on why the times for cases 2 and 3 are so asymmetric?

5
  • 5
    I would say unbuffered reading is faster than unbuffered writing.... Commented Sep 6, 2012 at 20:03
  • 2
    @beny23 I think he picked up on that, he was asking why. Commented Sep 6, 2012 at 20:09
  • 1
    While taking readings you should trigger JIT first by calling same functions some times . say in your case 10 times. then after each method you should call GC so it will not intervene in any particular result. Because in your current code if you change the order the result will be different Commented Sep 6, 2012 at 20:12
  • 2
    When you re-read the file in each subsequent test, you were almost certainly reading it from cache rather than the disk file, so your test isn't actually valid. You would have to find a way to de-prime the caches between each test run. Commented Sep 6, 2012 at 23:57
  • @EJP Thanks for that. I've tried it and you're right: there's a significant decrease in performance from ensuring that the cache is flushed before subsequent tests. The general pattern is still the same though. Commented Sep 7, 2012 at 4:15

3 Answers 3

10

When you read a file, the filesystem and devices below it do various levels of caching. They almost never read one byte at at time; they read a block. On a subsequent read of the next byte, the block will be in cache and so will be much faster.

It stands to reason then that if your buffer size is the same size as your block size, buffering the input stream doesn't actually gain you all that much (it saves a few system calls, but in terms of actual physical I/O it doesn't save you too much).

When you write a file, the filesystem can't cache for you because you haven't given it a backlog of things to write. It could potentially buffer the output for you, but it has to make an educated guess at how often to flush the buffer. By buffering the output yourself, you let the device do much more work at once because you manually build up that backlog.

Sign up to request clarification or add additional context in comments.

4 Comments

This jives with the fact that most HDDs read and write at the pretty much the same speed. So then it's an issue of how efficiently you are giving the drive things to write.
"When you write a file, the filesystem can't cache" not quite: Linux (and every other other "real" operating system that I've used, including recent versions of Windows) do maintain a cache of dirty pages to be written to disk (that's why sync exists).
@parsifal: Thanks, that's what I was trying to get at by saying that it can buffer the output. But you're right, it's more than a buffer because subsequent reads also read from that cache. I was more trying to say that "it can't anticipate what you're going to write next" the same way it can with a read.
Cheers Mark, that's a great explanation.
2

To your title question, it is more effective to buffer the output. The reason for this is the way Hard Disk Drives (HDDs) write data to their sectors. Especially considering fragmented disks. Reading is much faster because the disk already knows where the data is versus having to determine where it will fit. Using the buffer the disk will find larger contiguous blank space to save the data than in the unbuffered manner. Run another test for giggles. Create a new partition on your disk and run your tests reading and writing to the clean slate. To compare apples to apples, format the newly created partition between tests. Please post your numbers after this if you run the tests.

Comments

1

Generally writing is more tedious for the computer cause it cannot cache while reading can. Generally it is much like in real life - reading is faster and easier than writing!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.