Best way to read/parse a untyped binary file in Delphi

Question

I would like to know what is the best way to parse an untyped binary file. For example, a EBML file. (http://ebml.sourceforge.net/). EBML is basically a binary xml file. It can store basically anything, but its predominate use right now are MKV video files (matroska).

To read a EBML file at the byte level, reading the header making sure it is a EBML file and retrieving information on the file. MKV files can be huge, 1-30gb in size.

The binary file could be anything, jpeg, bmp, avi etc ... I just want to learn how to read them.

well, I wanted to set a foundation before I got more indepth with the EBML format. Sense EBML is different from most other file types sense its basically xml. I have looked at other components for reference like the Gif and Png support in Delphi. — Logman
– Logman, Commented Nov 30, 2010 at 2:24
the examples here show how to read a block, but not individual bytes... I am fairly new to do this though, but with the EBML format they use variable size integer and it might be well over my head at this time =) — Logman
– Logman, Commented Nov 30, 2010 at 2:37
To work with single bytes, @Logman, simply read blocks of size 1. — Rob Kennedy
– Rob Kennedy, Commented Nov 30, 2010 at 7:37
Reading 1 byte each time without using a buffered technique could be pretty slow. It is true that the OS will buffer something, but depending on the amount of data to read that could not be enough. Usually is better to load n bytes in memory, and then work from memory. — user160694
– user160694, Commented Nov 30, 2010 at 8:08

Andreas Rejbrand · Accepted Answer · 2010-11-30 01:56:29Z

3

Basically, you do

const
  MAGIC_WORD = $535B;

type
  TMyFileTypeHeader = packed record
    MagicWord: word; // = MAGIC_WORD
    Size: cardinal;
    Version: cardinal;
    Width: cardinal;
    Height: cardinal;
    ColorDepth: cardinal;
    Title: array[0..31] of char;
  end;

procedure ReadFile(const FileName: string);
var
  f: file;
  amt: integer;
  FileHeader: TMyFileTypeHeader;
begin

  FileMode := fmOpenRead;
  AssignFile(f, FileName);

  try
    Reset(f, 1);

    BlockRead(f, FileHeader, sizeof(TMyFileTypeHeader), amt);

    if FileHeader.MagicWord <> MAGIC_WORD then
      raise Exception.Create(Format('File "%s" is not a valid XXX file.', [FileName]));

    // Read, parse, and do something

  finally
    CloseFile(f);
  end;     


end;

For instance, a bitmap file begins with a BITMAPFILEHEADER structure, followed (in version 3) by a BITMAPINFOHEADER. Followed by an optional array of palette items, followed by uncompressed RGB pixel data (in the simplest case, here in 24-bit format): BBGGRRBBGGRRBBGGRR...

Reading a JPG, on the other hand, is very complicated, because the JPG data is compressed in a way that requires a lot of advanced mathematics to even understand (I think -- I have actually never really dug into the JPG specs). At least, this is true for a lot of modern image file formats. BMP, on the other hand, is trivial -- the "worst" thing that can happen is that the image is RLE compressed.

The "details" of parsing a file depends entirely on the file format. The file format specification tells the developer how the data is stored in binary form (above, the two bitmap structures are part of the Windows bitmap specification). It is like a contract, signed (not literally) by all encoders/decoders of such files. In the case of EBML, the specification appears to be available here.

edited Nov 30, 2010 at 1:56

answered Nov 30, 2010 at 1:28

Andreas Rejbrand

110k8 gold badges298 silver badges404 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user160694 Over a year ago

BlockRead is the old TP way to read a file that way. IMHO it's an obsolete, deprecated technique. Use a stream, it is a more generic interface that can take advantage of different access methods (buffered stream, memory mapping, etc, as long as you have or have written a class that implements them) with a coherent interface.

Andreas Rejbrand Over a year ago

@Idsandon: Well, but it works perfectly (and I have written a lot of encoders/decoders for a large variety of binary file types). Why abandon a working system?

Sean B. Durkin · Accepted Answer · 2010-11-30 02:16:30Z

3

Just use a TFileStream, like so ...

var MyFile: TStream;
begin
MyFile := TFileStream.Create( fmOpenRead, FileName);
try
  // Read stuff
  MyFile.ReadBuffer( MyVariable, SizeOf( MyVariable));
  // etc.
finally
  MyFile.Free
  end;

answered Nov 30, 2010 at 2:16

Sean B. Durkin

12.8k2 gold badges39 silver badges73 bronze badges

2 Comments

user160694 Over a year ago

I would suggest to call Read() instead of ReadBuffer() to handle the number of bytes read directly, instead of having to handle an exception.

dummzeuch Over a year ago

@ldsandon: But ReadBuffer has the advantage of raising an exception if not enough bytes could be read from the file, so you don't have to check it yourself. As you can see: One person's advantage can be the other's disadvantage.

user160694 · Accepted Answer · 2010-11-30 08:15:38Z

0

You could memory map the file. Then you can access it as if you were accessing memory. See http://msdn.microsoft.com/en-us/library/aa366556(VS.85).aspx

answered Nov 30, 2010 at 8:15

user160694

1 Comment

Logman Over a year ago

I would need delphi code/component to help me understand this technique

Collectives™ on Stack Overflow

Best way to read/parse a untyped binary file in Delphi

3 Answers 3

2 Comments

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related