Why does one loop take longer to detect a shared memory update than another loop?

Posted by Joseph Garvin on Stack Overflow See other posts from Stack Overflow or by Joseph Garvin
Published on 2010-03-26T15:26:20Z Indexed on 2010/03/26 15:33 UTC
Read the original article Hit count: 512

Filed under:

I've written a 'server' program that writes to shared memory, and a client program that reads from the memory. The server has different 'channels' that it can be writing to, which are just different linked lists that it's appending items too. The client is interested in some of the linked lists, and wants to read every node that's added to those lists as it comes in, with the minimum latency possible.

I have 2 approaches for the client:

For each linked list, the client keeps a 'bookmark' pointer to keep its place within the linked list. It round robins the linked lists, iterating through all of them over and over (it loops forever), moving each bookmark one node forward each time if it can. Whether it can is determined by the value of a 'next' member of the node. If it's non-null, then jumping to the next node is safe (the server switches it from null to non-null atomically). This approach works OK, but if there are a lot of lists to iterate over, and only a few of them are receiving updates, the latency gets bad.
The server gives each list a unique ID. Each time the server appends an item to a list, it also appends the ID number of the list to a master 'update list'. The client only keeps one bookmark, a bookmark into the update list. It endlessly checks if the bookmark's next pointer is non-null ( while(node->next_ == NULL) {} ), if so moves ahead, reads the ID given, and then processes the new node on the linked list that has that ID. This, in theory, should handle large numbers of lists much better, because the client doesn't have to iterate over all of them each time.

When I benchmarked the latency of both approaches (using gettimeofday), to my surprise #2 was terrible. The first approach, for a small number of linked lists, would often be under 20us of latency. The second approach would have small spats of low latencies but often be between 4,000-7,000us!

Through inserting gettimeofday's here and there, I've determined that all of the added latency in approach #2 is spent in the loop repeatedly checking if the next pointer is non-null. This is puzzling to me; it's as if the change in one process is taking longer to 'publish' to the second process with the second approach. I assume there's some sort of cache interaction going on I don't understand. What's going on?

Developer IT

Why does one loop take longer to detect a shared memory update than another loop? - Developer IT

Why does one loop take longer to detect a shared memory update than another loop?

shared-memory

latency

multithreading

c++

spinlock

Related posts about shared-memory

Can't connect to SQL Server 2008 - looks like Shared Memory problem

Listing all shared memory segments used by a process on AIX5.3+

Instantiating class with custom allocator in shared memory

A quick design question about C++ container classes in shared memory

CUDA: When to use shared memory and when to rely on L1 caching?

Related posts about latency

guvcview recording video and audio out of synchronisation in Ubuntu 10.10

IRP_MJ_WRITE latency up to 15 seconds

SoundManager2 has irregular latency

Psychonauts crashes right after entering load save door

WPA2 authentication fails on Ubuntu 12.04 using Rosewill RNX-N1

Categories cloud