C#: Efficiently search a large string for occurences of other strings

Posted by Jon on Stack Overflow See other posts from Stack Overflow or by Jon
Published on 2009-06-23T08:19:08Z Indexed on 2010/04/13 15:03 UTC
Read the original article Hit count: 284

Filed under:
|

Hi,

I'm using C# to continuously search for multiple string "keywords" within large strings, which are >= 4kb. This code is constantly looping, and sleeps aren't cutting down CPU usage enough while maintaining a reasonable speed. The bog-down is the keyword matching method.

I've found a few possibilities, and all of them give similar efficiency.

1) http://tomasp.net/articles/ahocorasick.aspx -I do not have enough keywords for this to be the most efficient algorithm.

2) Regex. Using an instance level, compiled regex. -Provides more functionality than I require, and not quite enough efficiency.

3) String.IndexOf. -I would need to do a "smart" version of this for it provide enough efficiency. Looping through each keyword and calling IndexOf doesn't cut it.

Does anyone know of any algorithms or methods that I can use to attain my goal?

© Stack Overflow or respective owner

Related posts about c#

Related posts about strings