Put on your reading glasses - this will be a long-ish one.
First, what I'm doing. I'm building a web-app interface for some particularly slow tcp devices. Opening a socket to them takes 200ms and an fwrite/fread cycle takes another 300ms. To reduce the need for both of these actions on each request, I'm opening a persistent tcp socket which reduces the response time by the aforementioned 200ms. I was hoping PHP-FPM would share the persistent connections between requests from different clients (and indeed it does!), but there are some issues which I havent been able to resolve after 2 days of interneting, reading logs and modifying settings. I have somewhat narrowed it down though.
Setup:
Ubuntu 13.04 x64 Server (fully updated) on Linode
PHP 5.5.0-6~raring+1 (fpm-fcgi)
nginx/1.5.2
Relevent config:
nginx
worker_processes 4;
php-fpm/pool.d
pm = dynamic
pm.max_children = 2
pm.start_servers = 2
pm.min_spare_servers = 2
Let's go from coarse to fine detail of what happens. After a fresh start I have 4x nginx processes and 2x php5-fpm processes waiting to handle requests. Then I send requests every couple seconds to the script. The first take a while to open the socket connection and returns with the data in about 500ms, the second returns data in 300ms (yay it's re-using the socket), the third also succeeds in about 300ms, the fourth request = 502 Bad Gateway, same with the 5th. Sixth request once again returns data, except now it took 500ms again. The process repeats for several cycles after which every 4 requests result in 2x 502 Bad Gateways and 2x 500ms Data responses.
If I double all the fpm pool values and have 4x php-fpm processes running, the cycles settles in with 4x successful 500ms responses followed by 4x Bad Gateway errors. If I don't use persistent sockets, this issue goes away but then every request is 500ms. What I suspect is happening is the persistent socket keeps each php-fpm process from idling and ties it up, so the next one gets chosen until none are left and as they error out, maybe they are restarted and become available on the next round-robin loop ut the socket dies with the process. I haven't yet checked the 'slowlog', but the nginx error log shows lots of this:
*188 recv() failed (104: Connection reset by peer) while reading response header from upstream, client:...
All the suggestions on the internet regarding fixing nginx/php-fpm/502 bad gateway relate to high load or fcgi_pass misconfiguration. This is not the case here. Increasing buffers/sizes, changing timeouts, switching from unix socket to tcp socket for fcgi_pass, upping connection limits on the system....none of this stuff applies here.
I've had some other success with setting pm = ondemand rather than dynamic, but as soon as the initial fpm-process gets killed off after idling, the persistent socket is gone for all subsequent php-fpm spawns. For the php script, I'm using stream_socket_client() with a STREAM_CLIENT_PERSISTENT flag. A while/stream_select() loop to detect socket data and fread($sock, 4096) to grab the data. I don't call fclose() obviously.
If anyone has some additional questions or advice on how to get a persistent socket without tying up the php-fpm processes beyond the request completion, or maybe some other things to try, I'd appreciate it.
some useful links:
Nginx + php-fpm - recv() error
Nginx + php-fpm "504 Gateway Time-out" error with almost zero load (on a test-server)
Nginx + PHP-FPM "error 104 Connection reset by peer" causes occasional duplicate posts
http://www.linuxquestions.org/questions/programming-9/php-pfsockopen-552084/
http://stackoverflow.com/questions/14268018/concurrent-use-of-a-persistent-php-socket
http://devzone.zend.com/303/extension-writing-part-i-introduction-to-php-and-zend/#Heading3
http://stackoverflow.com/questions/242316/how-to-keep-a-php-stream-socket-alive
http://php.net/manual/en/install.fpm.configuration.php
https://www.google.com/search?q=recv%28%29+failed+%28104:+Connection+reset+by+peer%29+while+reading+response+header+from+upstream+%22502%22&ei=mC1XUrm7F4WQyAHbv4H4AQ&start=10&sa=N&biw=1920&bih=953&dpr=1