Varnish 3.0.2 to Apache2 sometimes return error 503

Posted by Ronnie Jespersen on Server Fault See other posts from Server Fault or by Ronnie Jespersen
Published on 2012-06-06T13:04:27Z Indexed on 2012/06/17 21:18 UTC
Read the original article Hit count: 464

Filed under:
|
|
|

Hey guys I hope you can help me out here.

I have an Ngingx parsing http and https to a varnish cache(3.0.2). From the varnish it is sent to apache2. Now I have for some time been tracking some strange 503 errors. But I cant seem to find the silver bullet.

Currently I am logging the 503 errors through varnish this way:

sudo varnishlog -c -m TxStatus:503 >> /home/rj/varnishlog503.log

and then referring to the apache access log to see if any 503 requests have been handled.

Today I had a health check from the firewall that failed:

   20 SessionOpen  c 127.0.0.1 34319 :8081
   20 ReqStart     c 127.0.0.1 34319 607335635
   20 RxRequest    c HEAD
   20 RxURL        c /health-check
   20 RxProtocol   c HTTP/1.0
   20 RxHeader     c X-Real-IP: 192.168.3.254
   20 RxHeader     c Host: 192.168.3.189
   20 RxHeader     c X-Forwarded-For: 192.168.3.254
   20 RxHeader     c Connection: close
   20 RxHeader     c User-Agent: Astaro Service Monitor 0.9
   20 RxHeader     c Accept: */*
   20 VCL_call     c recv lookup
   20 VCL_call     c hash
   20 Hash         c /health-check
   20 VCL_return   c hash
   20 VCL_call     c miss fetch
   20 Backend      c 33 aurum aurum
   20 FetchError   c http first read error: -1 11 (No error recorded)
   20 VCL_call     c error deliver
   20 VCL_call     c deliver deliver
   20 TxProtocol   c HTTP/1.1
   20 TxStatus     c 503
   20 TxResponse   c Service Unavailable
   20 TxHeader     c Server: Varnish
   20 TxHeader     c Content-Type: text/html; charset=utf-8
   20 TxHeader     c Retry-After: 5
   20 TxHeader     c Content-Length: 879
   20 TxHeader     c Accept-Ranges: bytes
   20 TxHeader     c Date: Wed, 06 Jun 2012 12:35:12 GMT
   20 TxHeader     c X-Varnish: 607335635
   20 TxHeader     c Age: 60
   20 TxHeader     c Via: 1.1 varnish
   20 TxHeader     c Connection: close
   20 Length       c 879
   20 ReqEnd       c 607335635 1338986052.649786949 1338986112.648169994 0.000160217 59.997980356 0.000402689

Now the backend server (apache) does not have any 503 error in the access log at this point. So I am confused. Is this varnish throwing a 503 because it thinks apache is to slow? There is a lot traffic coming through at this point so I know the server is up and running.

I do have other 503 error codes with posts and gets so there is really no pattern. It seems to be at random times and random requests. Even in the morning when the server dosen't seem to be doing anything.

I do see another pattern in the log:

4 VCL_call     c recv pass
4 VCL_call     c hash
4 Hash         c /?id=412
4 VCL_return   c hash
4 VCL_call     c pass pass
4 FetchError   c no backend connection
4 VCL_call     c error deliver
4 VCL_call     c deliver deliver

Here fetcherror says "no backend connection". A summery of the FetchErrors in todays log:

   16 FetchError   c http first read error: -1 11 (No error recorded)
    5 FetchError   c http first read error: -1 11 (No error recorded)
    4 FetchError   c http first read error: -1 11 (No error recorded)
   19 FetchError   c http first read error: -1 11 (No error recorded)
    5 FetchError   c http first read error: -1 11 (No error recorded)
   23 FetchError   c http first read error: -1 11 (No error recorded)
   24 FetchError   c http first read error: -1 11 (No error recorded)
   16 FetchError   c http first read error: -1 11 (No error recorded)
    6 FetchError   c http first read error: -1 11 (No error recorded)
    4 FetchError   c http first read error: -1 11 (No error recorded)
    5 FetchError   c http first read error: -1 11 (No error recorded)
    4 FetchError   c http first read error: -1 11 (No error recorded)
    4 FetchError   c http first read error: -1 11 (No error recorded)
   22 FetchError   c http first read error: -1 11 (No error recorded)
    6 FetchError   c http first read error: -1 11 (No error recorded)
   21 FetchError   c http first read error: -1 11 (No error recorded)
   26 FetchError   c no backend connection
    4 FetchError   c no backend connection
   20 FetchError   c http first read error: -1 11 (No error recorded)
   39 FetchError   c http first read error: -1 11 (No error recorded)

I haven't changed the default timeout values for varnish. This is my configuration for one of the backend servers.

backend xenon {
        .host = "192.168.3.187";
        .port = "80";
        .probe = {
                .url = "/health-check/";
                .interval = 3s;
                .window = 5;
                .threshold = 2;
        }
}

I'm running prefork module on apache2 with this configuration

<IfModule mpm_prefork_module>
    StartServers          1
    MinSpareServers       2
    MaxSpareServers       5
    MaxClients           200
    MaxRequestsPerChild  75
</IfModule>

and only PHP files is sent to the server. Every other static file is handled by Nginx.

Any ideas?

------- EDIT --------------

Some more debuging information

I have run a varnishadm debug.health

Backend radon is Healthy
Current states  good:  5 threshold:  2 window:  5
Average responsetime of good probes: 0.002560
Oldest                                                    Newest
================================================================
4444444444444444444444444444444444444444444444444444444444444444 Good IPv4
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Good Xmit
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR Good Recv
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH Happy
Backend xenon is Healthy
Current states  good:  5 threshold:  2 window:  5
Average responsetime of good probes: 0.002760
Oldest                                                    Newest
================================================================
4444444444444444444444444444444444444444444444444444444444444444 Good IPv4
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Good Xmit
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR Good Recv
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH Happy
Backend iridium is Healthy
Current states  good:  5 threshold:  2 window:  5
Average responsetime of good probes: 0.000849
Oldest                                                    Newest
================================================================
4444444444444444444444444444444444444444444444444444444444444444 Good IPv4
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Good Xmit
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR Good Recv
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH Happy
Backend aurum is Healthy
Current states  good:  5 threshold:  2 window:  5
Average responsetime of good probes: 0.002100
Oldest                                                    Newest
================================================================
4444444444444444444444444444444444444444444444444444444444444444 Good IPv4
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Good Xmit
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR Good Recv
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH Happy

And I have been monitoring varnishstat from the two load balancers

 3224774         3.99         2.61 backend_conn - Backend conn. success
      27         0.00         0.00 backend_unhealthy - Backend conn. not attempted
      63         0.00         0.00 backend_fail - Backend conn. failures
  358798         0.00         0.29 backend_reuse - Backend conn. reuses
   21035         0.00         0.02 backend_toolate - Backend conn. was closed
  379834         0.00         0.31 backend_recycle - Backend conn. recycles
      26         0.00         0.00 backend_retry - Backend conn. retry

3217751         5.99         2.61 backend_conn - Backend conn. success
      32         0.00         0.00 backend_fail - Backend conn. failures
  364185         0.00         0.30 backend_reuse - Backend conn. reuses
   27077         0.00         0.02 backend_toolate - Backend conn. was closed
  391263         0.00         0.32 backend_recycle - Backend conn. recycles
      36         0.00         0.00 backend_retry - Backend conn. retry

Notice that none of them have reported backend_fail.

/Ronnie

© Server Fault or respective owner

Related posts about apache2

Related posts about nginx