Open-sourcing a web site with active users?
- by Lars Yencken
I currently run several research-related web-sites with active users, and these sites use some personally identifying information about these users (their email address, IP address, and query history). Ideally I'd release the code to these sites as open source, so that other people could easily run similar sites, and more importantly scrutinise and replicate my work, but I haven't been comfortable doing so, since I'm unsure of the security implications. For example, I wouldn't want my users' details to be accessed or distributed by a third party who found some flaw in my site, something which might be easy to do with full source access.
I've tried going half-way by refactoring the (Django) site into more independent modules, and releasing those, but this is very time consuming, and in practice I've never gotten around to releasing enough that a third party can replicate the site(s) easily. I also feel that maybe I'm kidding myself, and that this process is really no different to releasing the full source.
What would you recommend in cases like this? Would you open-source the site and take the risk? As an alternative, would you advertise the source as "available upon request" to other researchers, so that you at least know who has the code? Or would you just apologise to them and keep it closed in order to protect users?