Why is thread local storage so slow?
Posted
by dsimcha
on Stack Overflow
See other posts from Stack Overflow
or by dsimcha
Published on 2009-02-03T05:28:37Z
Indexed on
2010/03/20
10:51 UTC
Read the original article
Hit count: 480
I'm working on a custom mark-release style memory allocator for the D programming language that works by allocating from thread-local regions. It seems that the thread local storage bottleneck is causing a huge (~50%) slowdown in allocating memory from these regions compared to an otherwise identical single threaded version of the code, even after designing my code to have only one TLS lookup per allocation/deallocation. This is based on allocating/freeing memory a large number of times in a loop, and I'm trying to figure out if it's an artifact of my benchmarking method. My understanding is that thread local storage should basically just involve accessing something through an extra layer of indirection, similar to accessing a variable via a pointer. Is this incorrect? How much overhead does thread-local storage typically have?
Note: Although I mention D, I'm also interested in general answers that aren't specific to D, since D's implementation of thread-local storage will likely improve if it is slower than the best implementations.
© Stack Overflow or respective owner