Increasing Regex Efficiency

Posted by cam on Stack Overflow See other posts from Stack Overflow or by cam
Published on 2010-03-30T13:23:05Z Indexed on 2010/03/30 13:53 UTC
Read the original article Hit count: 340

Filed under:
|

I have about 100k Outlook mail items that have about 500-600 chars per Body. I have a list of 580 keywords that must search through each body, then append the words at the bottom.

I believe I've increased the efficiency of the majority of the function, but it still takes a lot of time. Even for 100 emails it takes about 4 seconds.

I run two functions for each keyword list (290 keywords each list).

       public List<string> Keyword_Search(HtmlNode nSearch)
    {
        var wordFound = new List<string>();
        foreach (string currWord in _keywordList)
        {
            bool isMatch = Regex.IsMatch(nSearch.InnerHtml, "\\b" + @currWord + "\\b",
                                                  RegexOptions.IgnoreCase);
            if (isMatch)
            {
                wordFound.Add(currWord);
            }
        }
        return wordFound;
    }

Is there anyway I can increase the efficiency of this function?

The other thing that might be slowing it down is that I use HTML Agility Pack to navigate through some nodes and pull out the body (nSearch.InnerHtml). The _keywordList is a List item, and not an array.

© Stack Overflow or respective owner

Related posts about c#

Related posts about regex