7

I'm writing a program that reads a file (uses custom buffer, 8KB), then finds a keyword in that buffer. Since Java provides two type of streams: character & byte, I've implemented this using both byte[] and char[] for buffering.

I just wonder, which would be faster and better for performance, since a char is 2 byte and when using Reader to read up char[], the Reader will perform converting back from byte to char, which I think could make it slower than using only byte[].

3
  • Why don't you compare the implementation yourself? Commented Aug 15, 2011 at 4:02
  • Actually I did. The result seems that byte is a little faster but not significantly. I just want to ask for more opinions. Commented Aug 15, 2011 at 4:04
  • Actually I asked a similiar question several days ago. Look stackoverflow.com/questions/7047569/char-to-byte-optimize-java Commented Aug 15, 2011 at 4:52

3 Answers 3

6

Using a byte array will be faster:

  • You don't have the bytes to characters decoding step, which is at least a copy loop, and possibly more depending on the Charset used to do the decoding.

  • The byte array will take less space, and hence save CPU cycles in GC / initialization.

However:

  • Unless you are searching huge files, the difference is unlikely to be significant.

  • The byte array approach could FAIL if the input file is not encoded in an 8 bit character set. And even if it works (as it does for UTF-8 & UTF-16) there are potential issues with matching characters that span buffer boundaries.

(The reason that byte-wise treatment works for UTF-8 and UTF-16 is that the encoding makes it easy to distinguish between the first unit (byte or short) and subsequent units of an encoded character.)

Sign up to request clarification or add additional context in comments.

3 Comments

I've tested my byte implementation with Unicode file and it works. And the size of files ranging from 1MB-6MB (index file of dictionary).
@Stephen C Could you help to explain the code refactor to the velocity in my question. stackoverflow.com/questions/7047569/char-to-byte-optimize-java
Not without spending a lot of time doing research work that you could do yourself. Hint: use a profiler.
1

If it's a binary file you're reading use a byte array.

If it's a text file and you're going to be using the contents like strings later then you should use a char array.

1 Comment

It's a text file. However, I just find a keyword in a file, so implementing using byte[] works fine for me. I just wonder which could result in better than the other. In this situation, I consider a text file or a String is just a byte array and the problem becomes finding a smaller byte array in bigger array.
0

This stack overflow question file-streaming-in-java talks about streaming files efficiently in java.

I particularly like this reference article

On large files, you quickly have advantages of speed using only bytes, so if you can decode the pattern through bytes you could definitively gain a few precious cycles.

If your files are small, or you don't have so many, maybe it's not worth the trouble.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.