Monitoring slow nginx/unicorn requests

Posted by injekt on Server Fault See other posts from Server Fault or by injekt
Published on 2011-04-24T20:00:41Z Indexed on 2012/05/31 4:42 UTC
Read the original article Hit count: 632

Filed under:
|
|
|
|

I'm currently using Nginx to proxy requests to a Unicorn server running a Sinatra application. The application only has a couple of routes defined, those of which make fairly simple (non costly) queries to a PostgreSQL database, and finally return data in JSON format, these services are being monitored by God.

I'm currently experiencing extremely slow response times from this application server. I have another two Unicorn servers being proxied via Nginx, and these are responding perfectly fine, so I think I can rule out any wrong doing from Nginx.

Here is my God configuration:

# God configuration

APP_ROOT = File.expand_path '../', File.dirname(__FILE__)

God.watch do |w|
  w.name = "app_name"
  w.interval = 30.seconds # default

  w.start = "cd #{APP_ROOT} && unicorn -c #{APP_ROOT}/config/unicorn.rb -D"

  # -QUIT = graceful shutdown, waits for workers to finish their current request before finishing
  w.stop = "kill -QUIT `cat #{APP_ROOT}/tmp/unicorn.pid`"

  w.restart = "kill -USR2 `cat #{APP_ROOT}/tmp/unicorn.pid`"

  w.start_grace = 10.seconds
  w.restart_grace = 10.seconds
  w.pid_file = "#{APP_ROOT}/tmp/unicorn.pid"

  # User under which to run the process
  w.uid = 'web'
  w.gid = 'web'

  # Cleanup the pid file (this is needed for processes running as a daemon)
  w.behavior(:clean_pid_file)

  # Conditions under which to start the process
  w.start_if do |start|
    start.condition(:process_running) do |c|
      c.interval = 5.seconds
      c.running = false
    end
  end

  # Conditions under which to restart the process
  w.restart_if do |restart|
    restart.condition(:memory_usage) do |c|
      c.above = 150.megabytes
      c.times = [3, 5] # 3 out of 5 intervals
    end

    restart.condition(:cpu_usage) do |c|
      c.above = 50.percent
      c.times = 5
    end
  end

  w.lifecycle do |on|
    on.condition(:flapping) do |c|
      c.to_state = [:start, :restart]
      c.times = 5
      c.within = 5.minute
      c.transition = :unmonitored
      c.retry_in = 10.minutes
      c.retry_times = 5
      c.retry_within = 2.hours
    end
  end
end

Here is my Unicorn configuration:

# Unicorn configuration file

APP_ROOT = File.expand_path '../', File.dirname(__FILE__)

worker_processes 8

preload_app true

pid "#{APP_ROOT}/tmp/unicorn.pid"

listen 8001

stderr_path "#{APP_ROOT}/log/unicorn.stderr.log"
stdout_path "#{APP_ROOT}/log/unicorn.stdout.log"

before_fork do |server, worker|
  old_pid = "#{APP_ROOT}/tmp/unicorn.pid.oldbin"

  if File.exists?(old_pid) && server.pid != old_pid
    begin
      Process.kill("QUIT", File.read(old_pid).to_i)
    rescue Errno::ENOENT, Errno::ESRCH
      # someone else did our job for us
    end
  end
end

I have checked God status logs but it appears CPU and Memory Usage are never out of bounds. I also have something to kill high memory workers, which can be found on the GitHub blog page here.

When running a tail -f on the Unicorn logs I see some requests, but they're far and few between, when I was at around 60-100 a second before this trouble seemed to have arrived. This log also shows workers being reaped and started as expected.

So my question is, how would I go about debugging this? What are the next steps I should be taking? I'm extremely baffled that the server will sometimes respond quickly, but at others time it's very slow, for long periods of time (which may or may not be peak traffic times).

Any advice is much appreciated.

© Server Fault or respective owner

Related posts about nginx

Related posts about postgresql