Reading text files line by line, with exact offset/position reporting
- by Benjamin Podszun
Hi.
My simple requirement: Reading a huge ( a million) line test file (For this example assume it's a CSV of some sorts) and keeping a reference to the beginning of that line for faster lookup in the future (read a line, starting at X).
I tried the naive and easy way first, using a StreamWriter and accessing the underlying BaseStream.Position. Unfortunately that doesn't work as I intended:
Given a file containing the following
Foo
Bar
Baz
Bla
Fasel
and this very simple code
using (var sr = new StreamReader(@"C:\Temp\LineTest.txt")) {
string line;
long pos = sr.BaseStream.Position;
while ((line = sr.ReadLine()) != null) {
Console.Write("{0:d3} ", pos);
Console.WriteLine(line);
pos = sr.BaseStream.Position;
}
}
the output is:
000 Foo
025 Bar
025 Baz
025 Bla
025 Fasel
I can imagine that the stream is trying to be helpful/efficient and probably reads in (big) chunks whenever new data is necessary. For me this is bad..
The question, finally: Any way to get the (byte, char) offset while reading a file line by line without using a basic Stream and messing with \r \n \r\n and string encoding etc. manually? Not a big deal, really, I just don't like to build things that might exist already..