0

I'm trying to read a binary file in Java (android) that was created by a C# program however i have stumbled in to a problem. C# by default encode string in binary file by UTF-7, Java uses UTF-8. This of course mean that the string don't get loaded in properly.

I was wonder how to read the string as UTF-7 instead of UTF-8. I also noticed that i got a similar problem with floats. Does C# and Java handle them differently and if so how do i read it correctly in Java.

Edit: I'm using the BinaryWriter class in the C# program and the DataInputStream class in java.

1
  • 1
    Erm, what are you asking? Some code or something to share with us in order to make your question a little more concrete? Commented Sep 9, 2012 at 20:36

1 Answer 1

1

C# uses UTF-8 encoding unless otherwise specified.

EDIT The documentation here is incorrect.
Looking at the source, BinaryWriter writes the string length as a 7-bit encoded integer, using the following code:

    protected void Write7BitEncodedInt(int value) {
        // Write out an int 7 bits at a time.  The high bit of the byte, 
        // when on, tells reader to continue reading more bytes. 
        uint v = (uint) value;   // support negative numbers
        while (v >= 0x80) { 
            Write((byte) (v | 0x80));
            v >>= 7;
        }
        Write((byte)v); 
    }

You will need to port this code to Java in order to find out how many bytes to read.

Sign up to request clarification or add additional context in comments.

4 Comments

According to the documentation it's UTF-7 msdn.microsoft.com/en-us/library/yzxa6408.aspx
@Frozendragon: Wrong. It writes the lengthy as an integer encoded using UTF7, then writes the string using the writer's encoding. (UTF8 by default)
Does that not effect Javas ability to read it as a UTF-8 encoded string?
@SLaks see the final paragraph of the Description section of Wikipedia's article on UTF-7. "... however, ... instead of UTF-7, a little-endian variable-length quantity identical to LEB128 is used; and that in fact the count is a byte count and not a character count."

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.