1

I am trying to read a binary file into memory, and then use it like so:

struct myStruct {
    std::string mystring; // is 40 bytes long
    uint myint1; // is 4 bytes long
};

typedef unsigned char byte;

byte *filedata = ReadFile(filename); // reads file into memory, closes the file
myStruct aStruct;
aStruct.mystring = filedata.????

I need a way of accessing the binary file with an offset, and getting a certain length at that offset. This is easy if I store the binary file data in a std::string, but i figured that using that to store binary data is not as good way of doing things. (filedata.substr(offset, len))

Reasonably extensive (IMO) searching hasn't turned anything relevant up, any ideas? I am willing to change storage type (e.g. to std::vector) if you think it is necessary.

10
  • If you have a byte* to the head of the data in memory, why don't you just walk down the length, copying the data as you go? As long as you increment your pointer and know how far to go, it is all good and easy. Commented Jan 29, 2013 at 20:10
  • but how can i get a specific length from that, and not just the byte at the current position of the pointer? Commented Jan 29, 2013 at 20:11
  • @LordAro it sounded like you knew the length of the string is 40 bytes long, followed by a 4 byte integer. Also beware of endianness doing it this way. Commented Jan 29, 2013 at 20:13
  • 1
    You need to deference the byte pointer and store in whatever data you're using. struct1.string = *(bytePtr + sizeof(char)*40); struct1.int1 = *(bytePtr + (sizeof(char)*40 + sizeof(int));. Again, beware of endianness, you're much better off serializing your data in. Commented Jan 29, 2013 at 20:25
  • 1
    Take a look at boost::serialize and also search the web for "c++ serialize". Commented Jan 29, 2013 at 20:29

2 Answers 2

3

If you're not going to use a serialization library, then I suggesting adding serialization support to each class:

struct My_Struct
{
    std::string my_string;
    unsigned int my_int;
    void Load_From_Buffer(unsigned char const *& p_buffer)
    {
        my_string = std::string(p_buffer);
        p_buffer += my_string.length() + 1; // +1 to account for the terminating nul character.
        my_int = *((unsigned int *) p_buffer);
        p_buffer += sizeof(my_int);
    }
};

unsigned char * const buffer = ReadFile(filename);
unsigned char * p_buffer = buffer;
My_Struct my_variable;
my_variable.Load_From_Buffer(p_buffer);

Some other useful interface methods:

unsigned int Size_On_Stream(void) const; // Returns the size the object would occupy in the stream.
void Store_To_Buffer(unsigned char *& p_buffer); // Stores object to buffer, increments pointer.

With templates you can extend the serialization functionality:

void Load_From_Buffer(std::string& s, unsigned char *& p_buffer)
{
    s = std::string((char *)p_buffer);
    p_buffer += s.length() + 1;
}

void template<classtype T> Load_From_Buffer(T& object, unsigned char *& p_buffer)
{
  object.Load_From_Buffer(p_buffer);
}

Edit 1: Reason not to write structure directly

In C and C++, the size of a structure may not be equal to the sum of the size of its members.
Compilers are allowed to insert padding, or unused space, between members so that the members are aligned on an address.

For example, a 32-bit processor likes to fetch things on 4 byte boundaries. Having one char in a structure followed by an int would make the int on relative address 1, which is not a multiple of 4. The compiler would pad the structure so that the int lines up on relative address 4.

Structures may contain pointers or items that contain pointers.
For example, the std::string type may have a size of 40, although the string may contain 3 characters or 300. It has a pointer to the actual data.

Endianess.
With multibyte integers some processors like the Most Significant Byte (MSB), a.k.a. Big Endian, first (the way humans read numbers) or the Least Significant Byte first, a.k.a. Little Endian. The Little Endian format takes less circuitry to read than the Big Endian.

Edit 2: Variant records

When outputting things like arrays and containers, you must decide whether you want to output the full container (include unused slots) or output only the items in the container. Outputting only the items in the container would use a variant record technique.

Two techniques for outputting variant records: quantity followed by items or items followed by a sentinel. The latter is how C-style strings are written, with the sentinel being a nul character.

The other technique is to output the quantity of items, followed by the items. So if I had 6 numbers, 0, 1, 2, 3, 4, 5, the output would be:
6 // The number of items
0
1
2
3
4
5

In the above Load_From_Buffer method, I would create a temporary to hold the quantity, write that out, then follow with each item from the container.

Sign up to request clarification or add additional context in comments.

7 Comments

this is looking good, 1 question: Why the need for buffer and p_buffer? (I'm not very good at C++ :L ) EDIT: Ignore this, it's the pointer to the array
If you pass buffer to the methods, the methods will increment it and you will lose the start of the original buffer. Always best to play with an additional pointer into a buffer.
@LordAro: Reminder: if you like the answer, click on the check mark.
Done, thanks :) Oh: "In C and C++, the size of a structure may not be equal to the sum of the size of its members." <-- Indeed, i have already come across this 'issue' :)
Out of interest (if you're still there), could this method also be applied to vectors? I'm having trouble finding the size of the array...
|
0

You could overload the std::ostream output operator and std::istream input operator for your structure, something like this:

struct Record {
    std::string name;
    int value;
};

std::istream& operator>>(std::istream& in, Record& record) {
    char name[40] = { 0 };
    int32_t value(0);
    in.read(name, 40);
    in.read(reinterpret_cast<char*>(&value), 4);
    record.name.assign(name, 40);
    record.value = value;
    return in;
}

std::ostream& operator<<(std::ostream& out, const Record& record) {
    std::string name(record.name);
    name.resize(40, '\0');
    out.write(name.c_str(), 40);
    out.write(reinterpret_cast<const char*>(&record.value), 4);
    return out;
}

int main(int argc, char **argv) {
    const char* filename("records");
    Record r[] = {{"zero", 0 }, {"one", 1 }, {"two", 2}};
    int n(sizeof(r)/sizeof(r[0]));

    std::ofstream out(filename, std::ios::binary);
    for (int i = 0; i < n; ++i) {
        out << r[i];
    }
    out.close();

    std::ifstream in(filename, std::ios::binary);
    std::vector<Record> rIn;
    Record record;
    while (in >> record) {
        rIn.push_back(record);
    }
    for (std::vector<Record>::iterator i = rIn.begin(); i != rIn.end(); ++i){
        std::cout << "name: " << i->name << ", value: " << i->value
                  << std::endl;
    }
    return 0;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.