Best way read and compare byte data from binary files in C++?

Question

I can't wrap my head around how I can use char arrays (the first argument of std::ifstream.read() to compare different types of data).

For example, if I was trying to read the magic of a Windows PE file, I am doing this but I feel there are better ways around it since, to my knowledge, this requires I define every pre-assumed value in the file as a std::array:

std::array<char, 2> magic;
in.read(magic.data(), magic.size());
std::array<char, 2> shouldBe = { 0x4d, 0x5a }; // MZ for dos header

if(magic == shouldBe) {
    // magic correct
}

This gives me compiler warnings like invalid conversion from int to char. I also don't quite understand how I'd read in the magic for other files where the hex values don't at all correlate to ASCII characters. For example, every Java class file starts with 0xCAFEBABE is a magic yet when I read it in as 4 chars and then cast each part to an int, I get padding which I don't want on the left.

char* magic = new char[4];
in.read(magic, 4);
// how can I compare this array to 0xCAFEBABE?

Output when I loop through each part and then cast as int and use std::hex in the output stream:

ffffffca fffffffe ffffffba ffffffbe

What's the best way to parse lots of different types of values used in binary file formats like PE files and Java classes?

First, don't use char because it could be signed or other. Use uint8_t, which is an unsigned 8 bit quantity. You can cast it to the char type inside the read() parameter: in.read((char *) magic, 4); — Thomas Matthews
– Thomas Matthews, Commented Dec 15, 2015 at 19:09
@Barry I get a warning on this: std::array<char, 4> magic; inClass.read(magic.data(), 4); std::array<char, 4> classMagic = { 0xCA, 0xFE, 0xBA, 0xBE }; warning: narrowing conversion of '202' from 'int' to 'char' inside { } [-Wnarrowing] std::array<char, 4> classMagic = { 0xCA, 0xFE, 0xBA, 0xBE }; ^ — user3530525
– user3530525, Commented Dec 15, 2015 at 19:24
@user3530525: you don't get padding on the left. In this case, char seems to be signed, so each value larger than 0x7F (=127) overflows to a negative value. When casting a negative value to int, it should remain negative, so you read them as for example 0xffffffca which has the same value as char 0xca, -56. — stefaanv
– stefaanv, Commented Dec 15, 2015 at 19:33

Barry · Accepted Answer · 2015-12-15 19:30:04Z

3

The approach is perfectly fine. The only issue is this line:

std::array<char, 2> shouldBe = { 0x4d, 0x5a }; // MZ for dos header

Narrowing conversions are disallowed with list initialization, so you just have to do some explicit casting:

std::array<char, 2> shouldBe = { (char)0x4d, (char)0x5a };

answered Dec 15, 2015 at 19:30

Barry

312k32 gold badges732 silver badges1.1k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user3530525 Over a year ago

Thanks. Is there anyway to avoid having to set constant values for comparing? I kinda want to avoid lots of declarations for values that I need to compare. Also, could you have a look at my update?

Barry Over a year ago

@user3530525 Well, you want to compare against constant values. How else would you go about doing it? And no adding questions onto questions - especially totally unrelated ones.

user3530525 Over a year ago

Is there not away to sorta 'inline' declare std::arrays? so I can put it in an if statement evaluation? So, I basically declare it anonymously in that rather than having it somewhere else as a constant since I'll only ever compare the magic once.

Jerry Coffin Over a year ago

You don't have to do explicit casting. Better to use character constants: std::array<char, 2> shoudlBe { '\x4d', '\x5a'};

Jerry Coffin · Accepted Answer · 2015-12-15 20:23:04Z

You basically have two choices: you can either hard-code the values into the program, or you can store them externally. If you're storing them internally, it's probably easiest to start by structuring the data a bit:

struct magic { 
    std::string value;
    int result;
};

std::vector<magic> values { 
    { ".ELF", 1 },
    { "MZ", 2},
    { "\xca\xfe\xba\xbe", 3}, // 0xcafebabe
    { "etc", -1}};

Then you can (for example) step through values in a loop, compare values, when you get a match have a value to tell you (for example) how to process that kind of file.

If you store the values as strings as I've done here, it's probably easiest to do the comparisons as strings as well. One obvious way would be to read in a block (e.g., 2 kilobytes) from the beginning of the file, then create a string from the correct number of bytes from the file, then compare to the expected value.

Collectives™ on Stack Overflow

Best way read and compare byte data from binary files in C++?

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related