Hiding a Website from Search Engine Bots and Viewers by Disabling Default VirtualHost
- by Basel Shishani
When staging a website on a remote VPS, we would like it to be accessible to team members only, and we would also like to keep the search engine bots off until the site is finalized.
Access control by host whether in Iptables or Apache is not desirable, as accessing hosts can vary.
After some reading in Apache config and other SF postings, I settled on the following design that relies on restricting access to only through specific domain names:
Default virtual host would be disabled in Apache config as follows - relying on Apache behavior to use first virtual host for site default:
<VirtualHost *:80>
# Anything matching this should be silently ignored.
</VirtualHost>
<VirtualHost *:80>
ServerName secretsiteone.com
DocumentRoot /var/www/secretsiteone.com
</VirtualHost>
<VirtualHost *:80>
ServerName secretsitetwo.com
...
</VirtualHost>
Then each team member can add the domain names in their local /etc/hosts:
xx.xx.xx.xx secrethostone.com
My question is: is the above technique good enough to achieve the above said goals esp restricting SE bots, or is it possible that bots would work around that.
Note: I understand that mod_rewrite rules con be used to achieve a similar effect as discussed here:
How to disable default VirtualHost in apache2?, so the same question would apply to that technique too.
Also please note: the content is not highly secretive - the idea is not to devise something that is hack proof, so we are not concerned about traffic interception or the like. The idea is to keep competitors and casual surfers from viewing the content before it's released, and to prevent SE bots from indexing it.