21

I'm trying to read a binary file from a URLConnection. When I test it with a text file it seems to work fine but for binary files it doesn't. I'm using the following mime-type on the server when the file is send out:

application/octet-stream

But so far nothing seems to work. This is the code that I use to receive the file:

file = File.createTempFile( "tempfile", ".bin");
file.deleteOnExit();

URL url = new URL( "http://somedomain.com/image.gif" );

URLConnection connection = url.openConnection();

BufferedReader input = new BufferedReader( new InputStreamReader( connection.getInputStream() ) );

Writer writer = new OutputStreamWriter( new FileOutputStream( file ) );

int c;

while( ( c = input.read() ) != -1 ) {

   writer.write( (char)c );
}

writer.close();

input.close();

2 Answers 2

36

This is how I do it,

input = connection.getInputStream();
byte[] buffer = new byte[4096];
int n;

OutputStream output = new FileOutputStream( file );
while ((n = input.read(buffer)) != -1) 
{
    output.write(buffer, 0, n);
}
output.close();
Sign up to request clarification or add additional context in comments.

2 Comments

The n > 0 test is unnecessary. According to the javadocs, the only case where zero can be returned is when buffer.length is zero.
... and in any case a zero length write is harmless.
15

If you are trying to read a binary stream, you should NOT wrap the InputStream in a Reader of any kind. Read the data into a byte array buffer using the InputStream.read(byte[], int, int) method. Then write from the buffer to a FileOutputStream.

The way you are currently reading/writing the file will convert it into "characters" and back to bytes using your platform's default character encoding. This is liable to mangle binary data.

(There is a charset (LATIN-1) that provides a 1-to-1 lossless mapping between bytes and a subset of the char value-space. However this is a bad idea even when the mapping works. You will be translating / copying the binary data from byte[] to char[] and back again ... which achieves nothing in this context.)

5 Comments

Or you can try wrapping up your InputStream into BufferedInputStream.
@bhups - that is true, but it will only help if you are going to do lots of small reads. If you exclusively do large block reads, a BufferedInputStream will actually reduce throughput a bit.
This is correct; InputStreamReader will transform byte data to UTF-16 character data (in this case, using the default platform encoding, which is a bad idea even for text/plain). A Java char is not an octet as it is in some other languages.
@StephenC, regarding your last (+1 useful) comment - What buffer-size would still be considered as causing "lots of small reads" (by your definition)? In other words, how small "should" the byte[] read-buffer be, to justify usage of BufferedInputStream?
I can't give you an exact number. It depends on the relative costs of a syscall, the sizes of the buffer and the byte[], and so on. But my real point is to not assume that using a buffered stream always makes things faster.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.