std::ifstream buffer caching
- by ledokol
Hello everybody,
In my application I'm trying to merge sorted files (keeping them sorted of course), so I have to iterate through each element in both files to write the minimal to the third one. This works pretty much slow on big files, as far as I don't see any other choice (the iteration has to be done) I'm trying to optimize file loading. I can use some amount of RAM, which I can use for buffering. I mean instead of reading 4 bytes from both files every time I can read once something like 100Mb and work with that buffer after that, until there will be no element in buffer, then I'll refill the buffer again. But I guess ifstream is already doing that, will it give me more performance and is there any reason? If fstream does, maybe I can change size of that buffer?
added
My current code looks like that (pseudocode)
// this is done in loop
int i1 = input1.read_integer();
int i2 = input2.read_integer();
if (!input1.eof() && !input2.eof())
{
if (i1 < i2)
{
output.write(i1);
input2.seek_back(sizeof(int));
} else
input1.seek_back(sizeof(int));
output.write(i2);
}
} else {
if (input1.eof())
output.write(i2);
else if (input2.eof())
output.write(i1);
}
What I don't like here is
seek_back - I have to seek back to previous position as there is no way to peek 4 bytes
too much reading from file
if one of the streams is in EOF it still continues to check that stream instead of putting contents of another stream directly to output, but this is not a big issue, because chunk sizes are almost always equal.
Can you suggest improvement for that?
Thanks.