Increasing Regex Efficiency
- by cam
I have about 100k Outlook mail items that have about 500-600 chars per Body. I have a list of 580 keywords that must search through each body, then append the words at the bottom.
I believe I've increased the efficiency of the majority of the function, but it still takes a lot of time. Even for 100 emails it takes about 4 seconds.
I run two functions for each keyword list (290 keywords each list).
public List<string> Keyword_Search(HtmlNode nSearch)
{
var wordFound = new List<string>();
foreach (string currWord in _keywordList)
{
bool isMatch = Regex.IsMatch(nSearch.InnerHtml, "\\b" + @currWord + "\\b",
RegexOptions.IgnoreCase);
if (isMatch)
{
wordFound.Add(currWord);
}
}
return wordFound;
}
Is there anyway I can increase the efficiency of this function?
The other thing that might be slowing it down is that I use HTML Agility Pack to navigate through some nodes and pull out the body (nSearch.InnerHtml). The _keywordList is a List item, and not an array.