Understanding VS2010 C# parallel profiling results

Posted by Haggai on Stack Overflow See other posts from Stack Overflow or by Haggai
Published on 2010-05-25T08:10:09Z Indexed on 2010/05/25 8:11 UTC
Read the original article Hit count: 347

Filed under:
|
|
|

I have a program with many independent computations so I decided to parallelize it.

I use Parallel.For/Each.

The results were okay for a dual-core machine - CPU utilization of about 80%-90% most of the time. However, with a dual Xeon machine (i.e. 8 cores) I get only about 30%-40% CPU utilization, although the program spends quite a lot of time (sometimes more than 10 seconds) on the parallel sections, and I see it employs about 20-30 more threads in those sections compared to serial sections. Each thread takes more than 1 second to complete, so I see no reason for them to work in parallel - unless there is a synchronization problem.

I used the built-in profiler of VS2010, and the results are strange. Even though I use locks only in one place, the profiler reports that about 85% of the program's time is spent on synchronization (also 5-7% sleep, 5-7% execution, under 1% IO).

The locked code is only a cache (a dictionary) get/add:

bool esn_found;
lock (lock_load_esn)
    esn_found = cache.TryGetValue(st, out esn);
if(!esn_found)
{
    esn = pData.esa_inv_idx.esa[term_idx];
    esn.populate(pData.esa_inv_idx.datafile);
    lock (lock_load_esn)
    {
        if (!cache.ContainsKey(st))
            cache.Add(st, esn);
    }
}

lock_load_esn is a static member of the class of type Object.
esn.populate reads from a file using a separate StreamReader for each thread.

However, when I press the Synchronization button to see what causes the most delay, I see that the profiler reports lines which are function entrance lines, and doesn't report the locked sections themselves.
It doesn't even report the function that contains the above code (reminder - the only lock in the program) as part of the blocking profile with noise level 2%. With noise level at 0% it reports all the functions of the program, which I don't understand why they count as blocking synchronizations.

So my question is - what is going on here?
How can it be that 85% of the time is spent on synchronization?
How do I find out what really is the problem with the parallel sections of my program?

Thanks.

© Stack Overflow or respective owner

Related posts about c#

Related posts about profiling