Why can't we reach some (but not all) external web service via VPN connection?
- by Paul Haldane
At work (UK university) we use a set of Windows servers running WS2008R2 and RRAS which offer VPN service to students in our accommodation. We do this to associate the network connections with individuals. Before they've connected to the VPN all they can talk to is the stuff thats needed to setup the VPN and a local web site with documentation on how to connect. Medium term we'll probably replace this but it's what we're using at the moment.
VPN on the 2008 servers allocates client a private (10.x) address. Access to external sites is through NAT on the campus routers (same as any other directly connected client on a private address). Non-VPN connections aren't seeing this problem.
Older servers run WS 2003 and ISA2004. That setup works but has become unreliable under load. Big difference there was that we were allocating non-RFC1918 addresses to the clients (so no NAT required).
Behaviour we're seeing is that once connected to the VPN, clients can reach local web sites (that is sites on the campus network) but only some external sites. It seems (but this may be chance) that the sites we can reach are Google ones (including YouTube). We certainly have trouble reaching Microsoft's Office 365 service (which is a pain because that's where mail for most of our students is).
One odd bit of behaviour is that clients can fetch (using wget on a Windows 7 client) http://www.oracle.com/ (which gets a 301 redirect) but hangs when asked to fetch http://www.oracle.com/index.html (which is what the first URL redirects to). Access works reliably if we configure clients to use our local web proxies (Squid).
My gut tells me that this is likely to be something in the chain dropping replies either based on HTTP inspection or the IP address in the reply. However I'm puzzled about why we're seeing this with the VPN clients.
Plan for tomorrow (when I'm back in the office) is to setup a web server on external connection so that we can monitor behaviour at both ends of the conversation (hoping that the problem manifests itself with our test server).
Any suggestions for things we should be looking at?