1

How can I write/read a string from a binary file?

I've tried using writeUTF / readUTF (DataOutputStream/DataInputStream) but it was too much of a hassle.

Thanks.

4
  • If you are using Java 7 take a look at the new Files class. Commented Jul 20, 2012 at 18:02
  • Makes me envious, but Java 7 gives me incompatibilities with many older programs, I'd rather do it another way. Commented Jul 20, 2012 at 18:04
  • 1
    Show us what you've tried so far and where you are getting errors/problems. Commented Jul 20, 2012 at 18:05
  • java.io.UTFDataFormatException: malformed input around byte 17 when using readUTF Commented Jul 20, 2012 at 18:15

2 Answers 2

5

Forget about FileWriter, DataOutputStream for a moment.

  • For binary data one uses OutputStream and InputStream classes. They handle byte[].
  • For text data one uses Reader and Writer classes. They handle String which can store all kind of text, as it internally uses Unicode.

The crossover from text to binary data can be done by specifying the encoding, which defaults to the OS encoding.

  • new OutputStreamWriter(outputStream, encoding)
  • string.getBytes(encoding)

So if you want to avoid byte[] and use String you must abuse an encoding which covers all 256 byte values in any order. So no "UTF-8", but maybe "windows-1252" (also named "Cp1252").

But internally there is a conversion, and in very rare cases problems might happen. For instance é can in Unicode be one code, or two, e + combining diacritical mark right-accent '. There exists a conversion function (java.text.Normalizer) for that.

One case where this already led to problems is file names in different operating systems; MacOS has another Unicode normalisation than Windows, and hence in version control system need special attention.

So on principle it is better to use the more cumbersome byte arrays, or ByteArrayInputStream, or java.nio buffers. Mind also that String chars are 16 bit.

Sign up to request clarification or add additional context in comments.

Comments

2

If you want to write text you can use Writers and Readers.

You can use Data*Stream writeUTF/readUTF, but the strings have to be less than 64K characters long.


public static void main(String... args) throws IOException {
    // generate a million random words.
    List<String> words = new ArrayList<String>();
    for (int i = 0; i < 1000000; i++)
        words.add(Long.toHexString(System.nanoTime()));

    writeStrings("words", words);
    List<String> words2 = readWords("words");
    System.out.println("Words are the same is " + words.equals(words2));
}

public static List<String> readWords(String filename) throws IOException {
    DataInputStream dis = new DataInputStream(new BufferedInputStream(new FileInputStream(filename)));
    int count = dis.readInt();
    List<String> words = new ArrayList<String>(count);
    while (words.size() < count)
        words.add(dis.readUTF());
    return words;
}

public static void writeStrings(String filename, List<String> words) throws IOException {
    DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(filename)));
    dos.writeInt(words.size());
    for (String word : words)
        dos.writeUTF(word);
    dos.close();
}

prints

Words are the same is true

2 Comments

I am already using writeUTF/readUTF - it is too much of a hassle. Did I mention that I wanted to read/write from a binary file not plain text? Sorry... edited main post
I can't imagine anything simpler than using writeUTF/readUTF. Without seeing you code I can't imagine what is cause you hassle.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.