Is there a faster way to parse through a large file with regex quickly?

Posted by Ray Eatmon on Stack Overflow See other posts from Stack Overflow or by Ray Eatmon
Published on 2012-12-10T22:52:09Z Indexed on 2012/12/10 23:04 UTC
Read the original article Hit count: 269

Filed under:
|
|

Problem: Very very, large file I need to parse line by line to get 3 values from each line. Everything works but it takes a long time to parse through the whole file. Is it possible to do this within seconds? Typical time its taking is between 1 minute and 2 minutes.

Example file size is 148,208KB

I am using regex to parse through every line:

Here is my c# code:

 private static void ReadTheLines(int max, Responder rp, string inputFile)
    {

        List<int> rate = new List<int>();
        double counter = 1;

        try
        {
     using (var sr = new StreamReader(inputFile, Encoding.UTF8, true, 1024))
            {
                string line;

                Console.WriteLine("Reading....");
                while ((line = sr.ReadLine()) != null)
                {

                    if (counter <= max)
                    {
                        counter++;
                        rate = rp.GetRateLine(line);

                    }
                    else if(max == 0)
                    {

                        counter++;
                        rate = rp.GetRateLine(line);

                    }
                }


                rp.GetRate(rate);
                Console.ReadLine();

            }
       }
        catch (Exception e)
        {
            Console.WriteLine("The file could not be read:");
            Console.WriteLine(e.Message);
        }
    }

Here is my regex:

public List<int> GetRateLine(string justALine)
    {


        const string reg = @"^\d{1,}.+\[(.*)\s[\-]\d{1,}].+GET.*HTTP.*\d{3}[\s](\d{1,})[\s](\d{1,})$";
        Match match = Regex.Match(justALine, reg,
                                  RegexOptions.IgnoreCase);

        // Here we check the Match instance.
        if (match.Success)
        {
            // Finally, we get the Group value and display it.

            string theRate = match.Groups[3].Value;
            Ratestorage.Add(Convert.ToInt32(theRate));
        }
        else
        {
            Ratestorage.Add(0);
        }
        return Ratestorage;
    }

Here is an example line to parse, usually around 200,000 lines:

10.10.10.10 - - [27/Nov/2002:16:46:20 -0500] "GET /solr/ HTTP/1.1" 200 4926 789

© Stack Overflow or respective owner

Related posts about c#

Related posts about regex