3

I am following this specification of this file format: https://github.com/rouault/dump_gdbtable/wiki/FGDB-Spec

utf16: string in little-endian UTF-16 encoding

How do I read this? I tried BinaryReader.ReadString() however it returns something along the lines of:

"\0e\0y\0w\0o\0r\0d\0\0 \0\0\0\0\rP\0a\0r\0a\0m\0e\0t\0e\0r\0N\0a\0m\0e\0\0 \0\0\0\0\fC\0o\0n\0f\0i\0g\0S\0t\0r\0"

That definitely isn't right.


From the specification:

ubyte: number of UTF-16 characters (not bytes) of the name of the field
utf16: name of the field
ubyte: number of UTF-16 characters (not bytes) of the alias of the field. Might be 0
utf16: alias of the field (ommitted if previous field is 0)
ubyte: field type ( 0 = int16, 1 = int32, 2 = float32, 3 = float64, 4 = string, 5 = datetime, 6 = objectid, 7 = geometry, 8 = binary, 9=raster, 10/11 = UUID, 12 = XML )

Could I somehow use the number of UTF-16 characters to read the name of the field?

5
  • How do you construct the BinaryReader? Are you using an overload where you specify the encoding of the text? Commented Aug 1, 2014 at 14:20
  • Normally you specify encoding, but on this page there are no little endian utf-16, perhaps you have to make own encoding somehow (or one of them is what you need, not sure). Commented Aug 1, 2014 at 14:23
  • BinaryReader br = new BinaryReader(File.Open("C:\\florida.gdb\\a00000002.gdbtable", FileMode.Open, FileAccess.Read, FileShare.Read | FileShare.Delete)); Commented Aug 1, 2014 at 14:25
  • @Sinatr - there is such an encoding. It helps to know that in the Windows world, Unicode means UTF-16. Commented Aug 1, 2014 at 14:28
  • Do you have an example file somewhere? Commented Aug 1, 2014 at 15:00

2 Answers 2

3

BinaryReaders ReadString() method doesn't provide an overload where you can specify the string length (instead it assumes an encoded prefixed length, which doesn't match the format of the spec you linked).

Therefore, you cannot use ReadString() directly, but you can

  1. use ReadByte() to get the string (character) length,
  2. multiply it by 2,
  3. use ReadBytes(count),
  4. use Encoding.Unicode.GetString(bytes).
Sign up to request clarification or add additional context in comments.

3 Comments

Is multiplying by two necessary? When I do it, it returns something similar to the below answer, except more chinese/japanese characters after it: code sample bit = int count = (br.ReadByte() * 2) ; byte[] array = br.ReadBytes(count); field.nameOfField = Encoding.Unicode.GetString(array);
Spec says number of charachters, not bytes. Since Encoding.Unicode is 16 bits (2bytes per char) you want to multiply with 2. You might want to provide code in your question how you try to read the string.
aha! I think that's it! It returns "Keyword" which I believe is the name of the field.
1

It should be:

BinaryReader br = new BinaryReader(File.Open("C:\\florida.gdb\\a00000002.gdbtable",
                                   FileMode.Open,
                                   FileAccess.Read,
                                   FileShare.Read | FileShare.Delete),
                      Encoding.Unicode);

Where Encoding is System.Text.Encoding.


For various historical reasons, Microsoft/Windows refer to UTF-16 (and, specifically, the little-endian variant) as "Unicode" rather than UTF-16.

3 Comments

It returns "攀礀眀漀爀搀\0 \0ЀഀParameterNameЀ \0䌌漀渀昀椀最匀琀爀" when I switch it to your coding. Would I have to strip out the other characters? I'd do that, but I'm afraid of losing them when I go to save it again.
If you get that in return something is almost certainly wrong.
The Fileformat doesnt work like this! You have to read the bytes at the specific Offset and then interpret them as unicode.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.