Fast string suffix checking in C# (.NET 4.0)?
- by ilitirit
What is the fastest method of checking string suffixes in C#?
I need to check each string in a large list (anywhere from 5000 to 100000 items) for a particular term. The term is guaranteed never to be embedded within the string. In other words, if the string contains the term, it will be at the end of the string. The string is also guaranteed to be longer than the suffix. Cultural information is not important.
These are how different methods performed against 100000 strings (half of them have the suffix):
1. Substring Comparison - 13.60ms
2. String.Contains - 22.33ms
3. CompareInfo.IsSuffix - 24.60ms
4. String.EndsWith - 29.08ms
5. String.LastIndexOf - 30.68ms
These are average times. [Edit] Forgot to mention that the strings also get put into separate lists, but this is not important. It does add to the running time though.
On my system substring comparison (extracting the end of the string using the String.Substring method and comparing it to the suffix term) is consistently the fastest when tested against 100000 strings. The problem with using substring comparison though is that Garbage Collection can slow it down considerably (more than the other methods) because String.Substring creates new strings. The effect is not as bad in .NET 4.0 as it was in 3.5 and below, but it is still noticeable. In my tests, String.Substring performed consistently slower on sets of 12000-13000 strings. This will obviously differ between systems and implementations.
[EDIT]
Benchmark code:
http://pastebin.com/smEtYNYN