High performance text file parsing in .net
- by diamandiev
Here is the situation:
I am making a small prog to parse server log files.
I tested it with a log file with several thousand requests (between 10000 - 20000 don't know exactly)
What i have to do is to load the log text files into memory so that i can query them.
This is taking the most resources.
The methods that take the most cpu time are those (worst culprits first):
string.split - splits the line values into a array of values
string.contains - checking if the user agent contains a specific agent string. (determine browser ID)
string.tolower - various purposes
streamreader.readline - to read the log file line by line.
string.startswith - determine if line is a column definition line or a line with values
there were some others that i was able to replace. For example the dictionary getter was
taking lots of resources too. Which i had not expected since its a dictionary and should have its keys indexed. I replaced it with a multidimensional array and saved some cpu time.
Now i am running on a fast dual core and the total time it takes to load the file i mentioned is about 1 sec.
Now this is really bad.
Imagine a site that has tens of thousands of visits a day. It's going to take minutes to load the log file.
So what are my alternatives? If any, cause i think this is just a .net limitation and i can't do much about it.