High performance text file parsing in .net

Posted by diamandiev on Stack Overflow See other posts from Stack Overflow or by diamandiev
Published on 2010-03-20T06:11:36Z Indexed on 2010/03/20 6:21 UTC
Read the original article Hit count: 364

Here is the situation:

I am making a small prog to parse server log files.

I tested it with a log file with several thousand requests (between 10000 - 20000 don't know exactly)

What i have to do is to load the log text files into memory so that i can query them.

This is taking the most resources.

The methods that take the most cpu time are those (worst culprits first):

string.split - splits the line values into a array of values

string.contains - checking if the user agent contains a specific agent string. (determine browser ID)

string.tolower - various purposes

streamreader.readline - to read the log file line by line.

string.startswith - determine if line is a column definition line or a line with values

there were some others that i was able to replace. For example the dictionary getter was taking lots of resources too. Which i had not expected since its a dictionary and should have its keys indexed. I replaced it with a multidimensional array and saved some cpu time.

Now i am running on a fast dual core and the total time it takes to load the file i mentioned is about 1 sec.

Now this is really bad.

Imagine a site that has tens of thousands of visits a day. It's going to take minutes to load the log file.

So what are my alternatives? If any, cause i think this is just a .net limitation and i can't do much about it.

© Stack Overflow or respective owner

Related posts about .NET

Related posts about Performance