TL;DR
What would be a good way, in C++ and using STL idioms, to iterate through a binary file to read, transform, and then again write out the data? The files can be pretty large (several hundred MB) so I don't want to load the entire file into memory at one time.
More context
I am trying to improve a utility which performs various operations on binary files. These files contain set of records consisting of a header and then the data. The utility provides options to dump the file to text, filter out certain records, extract certain records, append records etc. Unfortunately all of these functions have the code to read and write from the file copied and pasted into every function so the single source file contains a lot of redundant code and is starting to get out of hand.
I'm only just getting up to speed with using C++ and the STL but this is something that seems should be doable with some sort of template/iterator magic but I can't find a good example explaining this scenario. The other strategy I may pursue is to wrap the file access in a class which provides GetNextRecord and WriteNextRecord methods.
Below is a self-contained/(extremely) simplified version of what I'm working on. Is there a good way to write a function to read the data in the file created by WriteMyDataFile and create a new output file that removes all the records containing an 'i' character? I'm looking to abstract away the reading/writing of the file so that the function can mainly be about working with the data.
#include <fstream>
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
using namespace std;
const int c_version = 1;
struct RecordHeader
{
int length;
int version;
};
void WriteMyDataFile(char* recordFile, char* data)
{
ofstream output (recordFile, ios::out | ios::binary);
stringstream records(data);
while(records)
{
string r;
records >> r;
if(r.length() < 1)
{
continue;
}
RecordHeader header;
header.length = r.length();
header.version = c_version;
output.write((char*)&header, sizeof(header));
output.write(r.data(), header.length);
}
output.close();
}
vector<string> ReadDataFile(char* recordFile)
{
vector<string> records;
ifstream input (recordFile, ios::in | ios::binary);
while(!input.eof())
{
RecordHeader header;
input.read((char*)&header, sizeof(header));
if(!input.eof())
{
char* buffer = new char[header.length + 1];
input.read(buffer, header.length);
buffer[header.length] = '\0';
string s(buffer);
records.push_back(s);
delete[] buffer;
}
}
return records;
}
int main(int argc, char *argv[])
{
WriteMyDataFile(argv[1], argv[2]);
vector<string> records = ReadDataFile(argv[1]);
for(int i=0; i < records.size(); i++)
{
cout << records[i] << endl;
}
return 0;
}
To run this:
C:\>RecordUtility.exe test.bin "alpha bravo charlie delta"
Output:
alpha
bravo
charlie
delta
istreambuf_iterators andostreambuf_iterators?