High performance text file parsing in .net

Posted by diamandiev on Stack Overflow See other posts from Stack Overflow or by diamandiev
Published on 2010-03-20T06:11:36Z Indexed on 2010/03/20 6:21 UTC
Read the original article Hit count: 412

Filed under:

string-manipulation

Here is the situation:

I am making a small prog to parse server log files.

I tested it with a log file with several thousand requests (between 10000 - 20000 don't know exactly)

What i have to do is to load the log text files into memory so that i can query them.

This is taking the most resources.

The methods that take the most cpu time are those (worst culprits first):

string.split - splits the line values into a array of values

string.contains - checking if the user agent contains a specific agent string. (determine browser ID)

string.tolower - various purposes

streamreader.readline - to read the log file line by line.

string.startswith - determine if line is a column definition line or a line with values

there were some others that i was able to replace. For example the dictionary getter was taking lots of resources too. Which i had not expected since its a dictionary and should have its keys indexed. I replaced it with a multidimensional array and saved some cpu time.

Now i am running on a fast dual core and the total time it takes to load the file i mentioned is about 1 sec.

Now this is really bad.

Imagine a site that has tens of thousands of visits a day. It's going to take minutes to load the log file.

So what are my alternatives? If any, cause i think this is just a .net limitation and i can't do much about it.

Developer IT

High performance text file parsing in .net - Developer IT

High performance text file parsing in .net

.NET

Performance

optimization

string

string-manipulation

Related posts about .NET

Apt-Get Update: failure to fetch; can't connect to any sources

12.04: Apt-Get Update: failure to fetch; can't connect to any sources

What's New in ASP.NET 4

.NET Reflector 6, .NET Reflector Pro, TestDriven.NET, .NET 4.0 and Mono

Redmine on Apache2 with Passenger issue

Related posts about Performance

Improving VPN performance - stronger encryption = more performance?

Inaccurate performance counter timer values in Windows Performance Monitor

Excel-based Performance Reviews transformed into Web Application for Performance Management

How to save a perfmon Performance Counter as a textfile (Reliability and Performance Monitor Version

SQLAuthority News – A Successful Performance Tuning Seminar at Pune – Dec 4-5, 2010

Categories cloud