2

I would like to know what is the best way to parse an untyped binary file. For example, a EBML file. (http://ebml.sourceforge.net/). EBML is basically a binary xml file. It can store basically anything, but its predominate use right now are MKV video files (matroska).

To read a EBML file at the byte level, reading the header making sure it is a EBML file and retrieving information on the file. MKV files can be huge, 1-30gb in size.

The binary file could be anything, jpeg, bmp, avi etc ... I just want to learn how to read them.

7
  • 1
    Very vague question. But my answer might be of assistance. Commented Nov 30, 2010 at 1:33
  • well, I wanted to set a foundation before I got more indepth with the EBML format. Sense EBML is different from most other file types sense its basically xml. I have looked at other components for reference like the Gif and Png support in Delphi. Commented Nov 30, 2010 at 2:24
  • the examples here show how to read a block, but not individual bytes... I am fairly new to do this though, but with the EBML format they use variable size integer and it might be well over my head at this time =) Commented Nov 30, 2010 at 2:37
  • To work with single bytes, @Logman, simply read blocks of size 1. Commented Nov 30, 2010 at 7:37
  • 1
    Reading 1 byte each time without using a buffered technique could be pretty slow. It is true that the OS will buffer something, but depending on the amount of data to read that could not be enough. Usually is better to load n bytes in memory, and then work from memory. Commented Nov 30, 2010 at 8:08

3 Answers 3

3

Basically, you do

const
  MAGIC_WORD = $535B;

type
  TMyFileTypeHeader = packed record
    MagicWord: word; // = MAGIC_WORD
    Size: cardinal;
    Version: cardinal;
    Width: cardinal;
    Height: cardinal;
    ColorDepth: cardinal;
    Title: array[0..31] of char;
  end;

procedure ReadFile(const FileName: string);
var
  f: file;
  amt: integer;
  FileHeader: TMyFileTypeHeader;
begin

  FileMode := fmOpenRead;
  AssignFile(f, FileName);

  try
    Reset(f, 1);

    BlockRead(f, FileHeader, sizeof(TMyFileTypeHeader), amt);

    if FileHeader.MagicWord <> MAGIC_WORD then
      raise Exception.Create(Format('File "%s" is not a valid XXX file.', [FileName]));

    // Read, parse, and do something

  finally
    CloseFile(f);
  end;     


end;

For instance, a bitmap file begins with a BITMAPFILEHEADER structure, followed (in version 3) by a BITMAPINFOHEADER. Followed by an optional array of palette items, followed by uncompressed RGB pixel data (in the simplest case, here in 24-bit format): BBGGRRBBGGRRBBGGRR...

Reading a JPG, on the other hand, is very complicated, because the JPG data is compressed in a way that requires a lot of advanced mathematics to even understand (I think -- I have actually never really dug into the JPG specs). At least, this is true for a lot of modern image file formats. BMP, on the other hand, is trivial -- the "worst" thing that can happen is that the image is RLE compressed.

The "details" of parsing a file depends entirely on the file format. The file format specification tells the developer how the data is stored in binary form (above, the two bitmap structures are part of the Windows bitmap specification). It is like a contract, signed (not literally) by all encoders/decoders of such files. In the case of EBML, the specification appears to be available here.

Sign up to request clarification or add additional context in comments.

2 Comments

BlockRead is the old TP way to read a file that way. IMHO it's an obsolete, deprecated technique. Use a stream, it is a more generic interface that can take advantage of different access methods (buffered stream, memory mapping, etc, as long as you have or have written a class that implements them) with a coherent interface.
@Idsandon: Well, but it works perfectly (and I have written a lot of encoders/decoders for a large variety of binary file types). Why abandon a working system?
3

Just use a TFileStream, like so ...

var MyFile: TStream;
begin
MyFile := TFileStream.Create( fmOpenRead, FileName);
try
  // Read stuff
  MyFile.ReadBuffer( MyVariable, SizeOf( MyVariable));
  // etc.
finally
  MyFile.Free
  end;

2 Comments

I would suggest to call Read() instead of ReadBuffer() to handle the number of bytes read directly, instead of having to handle an exception.
@ldsandon: But ReadBuffer has the advantage of raising an exception if not enough bytes could be read from the file, so you don't have to check it yourself. As you can see: One person's advantage can be the other's disadvantage.
0

You could memory map the file. Then you can access it as if you were accessing memory. See http://msdn.microsoft.com/en-us/library/aa366556(VS.85).aspx

1 Comment

I would need delphi code/component to help me understand this technique

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.