I still feel a bit unsafe about the topic and hope you folks can help me - 
For passing data (configuration or results) between a worker thread polling something and a controlling thread interested in the most recent data, I've ended up using more or less the following pattern repeatedly: 
Mutex m;
tData * stage;       // temporary, accessed concurrently
// send data, gives up ownership, receives old stage if any
tData * Send(tData * newData)
{
   ScopedLock lock(m);
   swap(newData, stage);              
   return newData;
}
// receiving thread fetches latest data here
tData * Fetch(tData * prev)
{
  ScopedLock lock(m);
  if (stage != 0)
  {
     // ... release prev
     prev = stage;
     stage = 0;
  }
  return prev;  // now current
}
Note: This is not supposed to be a full producer-consumer queue, only the msot recent data is relevant. Also, I've skimmed ressource management somewhat here.
When necessary I'm using two such stages: one to send config changes to the worker, and for sending back results. 
Now, my questions
assuming that ScopedLock implements a full memory barrier:
do stage and/or workerData need to be volatile?
is volatile necessary for tData members?
can I use smart pointers instead of the raw pointers - say boost::shared_ptr? 
Anything else that can go wrong?
I am basically trying to avoid "volatile infection" spreading into tData, and minimize lock contention (a lock free implementation seems possible, too). However, I'm not sure if this is the easiest solution. 
ScopedLock acts as a full memory barrier. Since all this is more or less platform dependent, let's say Visual C++  x86 or x64, though differences/notes for other platforms are welcome, too.
(a prelimenary "thanks but" for recommending libraries such as Intel TBB - I am trying to understand the platform issues here)