Java performance of byte[] vs. char[] for file stream

Question

I'm writing a program that reads a file (uses custom buffer, 8KB), then finds a keyword in that buffer. Since Java provides two type of streams: character & byte, I've implemented this using both byte[] and char[] for buffering.

I just wonder, which would be faster and better for performance, since a char is 2 byte and when using Reader to read up char[], the Reader will perform converting back from byte to char, which I think could make it slower than using only byte[].

Actually I did. The result seems that byte is a little faster but not significantly. I just want to ask for more opinions. — Genzer
– Genzer, Commented Aug 15, 2011 at 4:04
Actually I asked a similiar question several days ago. Look stackoverflow.com/questions/7047569/char-to-byte-optimize-java — Clark Bao
– Clark Bao, Commented Aug 15, 2011 at 4:52

Stephen C · Accepted Answer · 2011-08-15 04:49:40Z

6

Using a byte array will be faster:

You don't have the bytes to characters decoding step, which is at least a copy loop, and possibly more depending on the Charset used to do the decoding.
The byte array will take less space, and hence save CPU cycles in GC / initialization.

However:

Unless you are searching huge files, the difference is unlikely to be significant.
The byte array approach could FAIL if the input file is not encoded in an 8 bit character set. And even if it works (as it does for UTF-8 & UTF-16) there are potential issues with matching characters that span buffer boundaries.

(The reason that byte-wise treatment works for UTF-8 and UTF-16 is that the encoding makes it easy to distinguish between the first unit (byte or short) and subsequent units of an encoded character.)

edited Aug 15, 2011 at 4:49

answered Aug 15, 2011 at 4:11

Stephen C

723k95 gold badges849 silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Genzer Over a year ago

I've tested my byte implementation with Unicode file and it works. And the size of files ranging from 1MB-6MB (index file of dictionary).

Clark Bao Over a year ago

@Stephen C Could you help to explain the code refactor to the velocity in my question. stackoverflow.com/questions/7047569/char-to-byte-optimize-java

Stephen C Over a year ago

Not without spending a lot of time doing research work that you could do yourself. Hint: use a profiler.

Paul · Accepted Answer · 2011-08-15 04:01:57Z

1

If it's a binary file you're reading use a byte array.

If it's a text file and you're going to be using the contents like strings later then you should use a char array.

answered Aug 15, 2011 at 4:01

Paul

142k28 gold badges285 silver badges272 bronze badges

1 Comment

Genzer Over a year ago

It's a text file. However, I just find a keyword in a file, so implementing using byte[] works fine for me. I just wonder which could result in better than the other. In this situation, I consider a text file or a String is just a byte array and the problem becomes finding a smaller byte array in bigger array.

Community · Accepted Answer · 2017-05-23 11:51:29Z

0

This stack overflow question file-streaming-in-java talks about streaming files efficiently in java.

I particularly like this reference article

On large files, you quickly have advantages of speed using only bytes, so if you can decode the pattern through bytes you could definitively gain a few precious cycles.

If your files are small, or you don't have so many, maybe it's not worth the trouble.

edited May 23, 2017 at 11:51

CommunityBot

11 silver badge

answered Aug 15, 2011 at 4:27

Nicolas Modrzyk

14.2k2 gold badges39 silver badges40 bronze badges

Collectives™ on Stack Overflow

Java performance of byte[] vs. char[] for file stream

3 Answers 3

3 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related