C#: String Concatenation vs Format vs StringBuilder
- by James Michael Hare
I was looking through my groups’ C# coding standards the other day and there were a couple of legacy items in there that caught my eye. They had been passed down from committee to committee so many times that no one even thought to second guess and try them for a long time. It’s yet another example of how micro-optimizations can often get the best of us and cause us to write code that is not as maintainable as it could be for the sake of squeezing an extra ounce of performance out of our software.
So the two standards in question were these, in paraphrase:
Prefer StringBuilder or string.Format() to string concatenation.
Prefer string.Equals() with case-insensitive option to string.ToUpper().Equals().
Now some of you may already know what my results are going to show, as these items have been compared before on many blogs, but I think it’s always worth repeating and trying these yourself. So let’s dig in.
The first test was a pretty standard one. When concattenating strings, what is the best choice: StringBuilder, string concattenation, or string.Format()?
So before we being I read in a number of iterations from the console and a length of each string to generate. Then I generate that many random strings of the given length and an array to hold the results. Why am I so keen to keep the results? Because I want to be able to snapshot the memory and don’t want garbage collection to collect the strings, hence the array to keep hold of them. I also didn’t want the random strings to be part of the allocation, so I pre-allocate them and the array up front before the snapshot. So in the code snippets below:
num – Number of iterations.
strings – Array of randomly generated strings.
results – Array to hold the results of the concatenation tests.
timer – A System.Diagnostics.Stopwatch() instance to time code execution.
start – Beginning memory size.
stop – Ending memory size.
after – Memory size after final GC.
So first, let’s look at the concatenation loop:
1: // build num strings using concattenation.
2: for (int i = 0; i < num; i++)
3: {
4: results[i] = "This is test #" + i + " with a result of " + strings[i];
5: }
Pretty standard, right? Next for string.Format():
1: // build strings using string.Format()
2: for (int i = 0; i < num; i++)
3: {
4: results[i] = string.Format("This is test #{0} with a result of {1}", i, strings[i]);
5: }
Finally, StringBuilder:
1: // build strings using StringBuilder
2: for (int i = 0; i < num; i++)
3: {
4: var builder = new StringBuilder();
5: builder.Append("This is test #");
6: builder.Append(i);
7: builder.Append(" with a result of ");
8: builder.Append(strings[i]);
9: results[i] = builder.ToString();
10: }
So I take each of these loops, and time them by using a block like this:
1: // get the total amount of memory used, true tells it to run GC first.
2: start = System.GC.GetTotalMemory(true);
3:
4: // restart the timer
5: timer.Reset();
6: timer.Start();
7:
8: // *** code to time and measure goes here. ***
9:
10: // get the current amount of memory, stop the timer, then get memory after GC.
11: stop = System.GC.GetTotalMemory(false);
12: timer.Stop();
13: other = System.GC.GetTotalMemory(true);
So let’s look at what happens when I run each of these blocks through the timer and memory check at 500,000 iterations:
1: Operator + - Time: 547, Memory: 56104540/55595960 - 500000
2: string.Format() - Time: 749, Memory: 57295812/55595960 - 500000
3: StringBuilder - Time: 608, Memory: 55312888/55595960 – 500000
Egad! string.Format brings up the rear and + triumphs, well, at least in terms of speed. The concat burns more memory than StringBuilder but less than string.Format().
This shows two main things:
StringBuilder is not always the panacea many think it is.
The difference between any of the three is miniscule!
The second point is extremely important! You will often here people who will grasp at results and say, “look, operator + is 10% faster than StringBuilder so always use StringBuilder.” Statements like this are a disservice and often misleading. For example, if I had a good guess at what the size of the string would be, I could have preallocated my StringBuffer like so:
1: for (int i = 0; i < num; i++)
2: {
3: // pre-declare StringBuilder to have 100 char buffer.
4: var builder = new StringBuilder(100);
5: builder.Append("This is test #");
6: builder.Append(i);
7: builder.Append(" with a result of ");
8: builder.Append(strings[i]);
9: results[i] = builder.ToString();
10: }
Now let’s look at the times:
1: Operator + - Time: 551, Memory: 56104412/55595960 - 500000
2: string.Format() - Time: 753, Memory: 57296484/55595960 - 500000
3: StringBuilder - Time: 525, Memory: 59779156/55595960 - 500000
Whoa! All of the sudden StringBuilder is back on top again! But notice, it takes more memory now. This makes perfect sense if you examine the IL behind the scenes. Whenever you do a string concat (+) in your code, it examines the lengths of the arguments and creates a StringBuilder behind the scenes of the appropriate size for you.
But even IF we know the approximate size of our StringBuilder, look how much less readable it is! That’s why I feel you should always take into account both readability and performance. After all, consider all these timings are over 500,000 iterations. That’s at best 0.0004 ms difference per call which is neglidgable at best.
The key is to pick the best tool for the job. What do I mean? Consider these awesome words of wisdom:
Concatenate (+) is best at concatenating.
StringBuilder is best when you need to building.
Format is best at formatting.
Totally Earth-shattering, right! But if you consider it carefully, it actually has a lot of beauty in it’s simplicity. Remember, there is no magic bullet. If one of these always beat the others we’d only have one and not three choices.
The fact is, the concattenation operator (+) has been optimized for speed and looks the cleanest for joining together a known set of strings in the simplest manner possible.
StringBuilder, on the other hand, excels when you need to build a string of inderterminant length. Use it in those times when you are looping till you hit a stop condition and building a result and it won’t steer you wrong.
String.Format seems to be the looser from the stats, but consider which of these is more readable. Yes, ignore the fact that you could do this with ToString() on a DateTime.
1: // build a date via concatenation
2: var date1 = (month < 10 ? string.Empty : "0") + month + '/'
3: + (day < 10 ? string.Empty : "0") + '/' + year;
4:
5: // build a date via string builder
6: var builder = new StringBuilder(10);
7: if (month < 10) builder.Append('0');
8: builder.Append(month);
9: builder.Append('/');
10: if (day < 10) builder.Append('0');
11: builder.Append(day);
12: builder.Append('/');
13: builder.Append(year);
14: var date2 = builder.ToString();
15:
16: // build a date via string.Format
17: var date3 = string.Format("{0:00}/{1:00}/{2:0000}", month, day, year);
18:
So the strength in string.Format is that it makes constructing a formatted string easy to read. Yes, it’s slower, but look at how much more elegant it is to do zero-padding and anything else string.Format does.
So my lesson is, don’t look for the silver bullet! Choose the best tool. Micro-optimization almost always bites you in the end because you’re sacrificing readability for performance, which is almost exactly the wrong choice 90% of the time.
I love the rules of optimization. They’ve been stated before in many forms, but here’s how I always remember them:
For Beginners: Do not optimize.
For Experts: Do not optimize yet.
It’s so true. Most of the time on today’s modern hardware, a micro-second optimization at the sake of readability will net you nothing because it won’t be your bottleneck. Code for readability, choose the best tool for the job which will usually be the most readable and maintainable as well. Then, and only then, if you need that extra performance boost after profiling your code and exhausting all other options… then you can start to think about optimizing.