Do Not Optimize Without Measuring
- by Alois Kraus
Recently I had to do some performance work which included reading a lot of code. It is fascinating with what ideas people come up to solve a problem. Especially when there is no problem. When you look at other peoples code you will not be able to tell if it is well performing or not by reading it. You need to execute it with some sort of tracing or even better under a profiler. The first rule of the performance club is not to think and then to optimize but to measure, think and then optimize. The second rule is to do this do this in a loop to prevent slipping in bad things for too long into your code base. If you skip for some reason the measure step and optimize directly it is like changing the wave function in quantum mechanics. This has no observable effect in our world since it does represent only a probability distribution of all possible values. In quantum mechanics you need to let the wave function collapse to a single value. A collapsed wave function has therefore not many but one distinct value. This is what we physicists call a measurement. If you optimize your application without measuring it you are just changing the probability distribution of your potential performance values. Which performance your application actually has is still unknown. You only know that it will be within a specific range with a certain probability. As usual there are unlikely values within your distribution like a startup time of 20 minutes which should only happen once in 100 000 years. 100 000 years are a very short time when the first customer tries your heavily distributed networking application to run over a slow WIFI network… What is the point of this? Every programmer/architect has a mental performance model in his head. A model has always a set of explicit preconditions and a lot more implicit assumptions baked into it. When the model is good it will help you to think of good designs but it can also be the source of problems. In real world systems not all assumptions of your performance model (implicit or explicit) hold true any longer. The only way to connect your performance model and the real world is to measure it. In the WIFI example the model did assume a low latency high bandwidth LAN connection. If this assumption becomes wrong the system did have a drastic change in startup time. Lets look at a example. Lets assume we want to cache some expensive UI resource like fonts objects. For this undertaking we do create a Cache class with the UI themes we want to support. Since Fonts are expensive objects we do create it on demand the first time the theme is requested. A simple example of a Theme cache might look like this: using System;
using System.Collections.Generic;
using System.Drawing;
struct Theme
{
public Color Color;
public Font Font;
}
static class ThemeCache
{
static Dictionary<string, Theme> _Cache = new Dictionary<string, Theme>
{
{"Default", new Theme { Color = Color.AliceBlue }},
{"Theme12", new Theme { Color = Color.Aqua }},
};
public static Theme Get(string theme)
{
Theme cached = _Cache[theme];
if (cached.Font == null)
{
Console.WriteLine("Creating new font");
cached.Font = new Font("Arial", 8);
}
return cached;
}
}
class Program
{
static void Main(string[] args)
{
Theme item = ThemeCache.Get("Theme12");
item = ThemeCache.Get("Theme12");
}
}
This cache does create font objects only once since on first retrieve of the Theme object the font is added to the Theme object. When we let the application run it should print “Creating new font” only once. Right?
Wrong!
The vigilant readers have spotted the issue already. The creator of this cache class wanted to get maximum performance. So he decided that the Theme object should be a value type (struct) to not put too much pressure on the garbage collector.
The code
Theme cached = _Cache[theme];
if (cached.Font == null)
{
Console.WriteLine("Creating new font");
cached.Font = new Font("Arial", 8);
}
does work with a copy of the value stored in the dictionary. This means we do mutate a copy of the Theme object and return it to our caller. But the original Theme object in the dictionary will have always null for the Font field! The solution is to change the declaration of struct Theme to class Theme or to update the theme object in the dictionary. Our cache as it is currently is actually a non caching cache. The funny thing was that I found out with a profiler by looking at which objects where finalized. I found way too many font objects to be finalized. After a bit debugging I found the allocation source for Font objects was this cache. Since this cache was there for years it means that
the cache was never needed since I found no perf issue due to the creation of font objects.
the cache was never profiled if it did bring any performance gain.
to make the cache beneficial it needs to be accessed much more often.
That was the story of the non caching cache. Next time I will write something something about measuring.