Is there a bug with Apache 2.2 and content filters (and maybe mod_proxy)?
- by asciiphil
I'm running Apache 2.2.15-29 on RHEL 6 (actually Scientific Linux 6.4) and I'm trying to set up a reverse proxy with content rewriting so all of the links on the proxied web pages are rewritten to reference the proxy host. I'm running into a problem with some of the content rewriting and I'd like to know if this is a bug or if I'm doing something wrong (and how to do it right, if applicable).
I'm proxying a subdirectory on an internal host (internal.example.com/foo) onto the root of an external host (external.example.com). I need to rewrite HTML, CSS, and Javascript content to fix all of the URLs. I'm also hosting some content locally on the external host, which I don't think is a problem but I'm mentioning here for completeness.
My httpd.conf looks roughly like this:
<VirtualHost *:80>
ServerName external.example.com
ServerAlias example.com
# Serve all local content directly, reverse-proxy all unknown URIs.
RewriteEngine On
RewriteRule ^(/(index.html?)?)?$ http://internal.example.com/foo/ [P]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} -d
RewriteRule ^.*$ - [L]
RewriteRule ^/~ - [L]
RewriteRule ^(.*)$ http://internal.example.com$1 [P]
# Standard header rewriting.
ProxyPassReverse / http://internal.example.com/foo/
ProxyPassReverseCookieDomain internal.example.com external.example.com
ProxyPassReverseCookiePath /foo/ /
# Strip any Accept-Encoding: headers from the client so we can process the pages
# as plain text.
RequestHeader unset Accept-Encoding
# Use mod_proxy_html to fix URLs in text/html content.
ProxyHTMLEnable On
ProxyHTMLURLMap http://internal.example.com/foo/ /
ProxyHTMLURLMap http://internal.example.com/foo /
ProxyHTMLURLMap /foo/ /
## Use mod_substitute to fix URLs in CSS and Javascript
#<Location />
# AddOutputFilterByType SUBSTITUTE text/css
# AddOutputFilterByType SUBSTITUTE text/javascript
# Substitute "s|http://internal.example.com/foo/|/|nq"
#</Location>
# Use mod_ext_filter to fix URLs in CSS and Javascript
ExtFilterDefine fixurlcss mode=output intype=text/css cmd="/bin/sed -rf /etc/httpd/fixurls"
ExtFilterDefine fixurljs mode=output intype=text/javascript cmd="/bin/sed -rf /etc/httpd/fixurls"
<Location />
SetOutputFilter fixurlcss;fixurljs
</Location>
</VirtualHost>
The text/html rewriting works just fine. When I use either mod_substitute or mod_ext_filter, the external server sends the pages as Transfer-Encoding: chunked, sends all of the data, and then closes the connection without sending the final, zero-length chunk. Some HTTP clients are unhappy with this. (Chrome won't process any content sent in this way, for example, so the pages don't get CSS applied to them.)
Here's a sample wget session:
$ wget -O /dev/null -S http://external.example.com/include/jquery.js
--2013-11-01 11:36:36-- http://external.example.com/include/jquery.js
Resolving external.example.com (external.example.com)... 192.168.0.1
Connecting to external.example.com (external.example.com)|192.168.0.1|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Date: Fri, 01 Nov 2013 15:36:36 GMT
Server: Apache
Last-Modified: Tue, 29 Oct 2013 13:09:10 GMT
ETag: "1d60026-187b8-4e9e0ec273e35"
Accept-Ranges: bytes
Vary: Accept-Encoding
X-UA-Compatible: IE=edge,chrome=1
Content-Type: text/javascript;charset=utf-8
Connection: close
Transfer-Encoding: chunked
Length: unspecified [text/javascript]
Saving to: `/dev/null'
[ <=> ] 100,280 --.-K/s in 0.005s
2013-11-01 11:36:37 (19.8 MB/s) - Read error at byte 100280 (Success).Retrying.
--2013-11-01 11:36:38-- (try: 2) http://external.example.com/include/jquery.js
Connecting to external.example.com (external.example.com)|192.168.0.1|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 416 Requested Range Not Satisfiable
Date: Fri, 01 Nov 2013 15:36:38 GMT
Server: Apache
Vary: Accept-Encoding
Content-Type: text/html;charset=utf-8
Content-Length: 260
Connection: close
The file is already fully retrieved; nothing to do.
Am I doing something wrong? Am I hitting some sort of Apache bug? What do I need to do to get it working? (Note that I'd prefer solutions that work within RHEL-6-packaged RPMs and upgrading to Apache 2.4 would be a last resort, as we have a lot of infrastructure built around 2.2 on this system at the moment.)