I've been thinking a lot lately about low-latency, high concurrency servers. Specifically, http servers. http servers (fast ones, anyway) can serve thousands of users simultaneously, with very little latency. So how do they do it?
As near as I can tell, they all use events. Cherokee and Lighttpd use libevent. Nginx uses it's own event library performing much the same function of libevent, that is, picking a platform optimal strategy for polling events (like kqueue on *bsd, epoll on linux, /dev/poll on Solaris, etc).
They all also seem to employ a strategy of multiprocess or multithread once the connection is made - using worker threads to handle the more cpu intensive tasks while another thread continues to listen and handle connections (via events). This is the extent of my understanding and ability to grok the thousand line sources of these applications. What I really want are finer details about how this all works.
In examples of using events I've seen (and written) the events are handling both input and output. To this end, do the workers employ some sort of input/output queue to the event handling thread? Or are these worker threads handling their own input and output? I imagine a fixed amount of worker threads are spawned, and connections are lined up and served on demand, but how does the event thread feed these connections to the workers? I've read about FIFO queues and circular buffers, but I've yet to see any implementations to work from. Are there any? Do any use compare-and-swap instructions to avoid locking or is locking less detrimental to event polling than I think? Or have I misread the design entirely?
Ultimately, I'd like to take enough away to improve some of my own event-driven network services. Bonus points to anyone providing solid implementation details (especially for stuff like low-latency queues) in C, as that's the language my network services are written in.