I got an e-mail from someone that has an interesting situation. He has 15,000 customers, and he asks if he should have a database for their data per customer. Without a LOT more data it’s impossible to say, of course, but there are some general concepts to keep in mind. Whenever you’re segmenting data, it’s all about boundary choices. You have not only boundaries around how big the data will get, but things like how many objects (tables, stored procedures and so on) that will be involved, if there are any cross-sections of data (do they share location or product information) and – very important – what are the security requirements? From the answer to these types of questions, you now have the choice of making multiple tables in a single database, or using multiple databases. A database carries some overhead – it needs a certain amount of memory for locking and so on. But it has a very clean boundary – everything from objects to security can be kept apart. Having multiple users in the same database is possible as well, using things like a Schema. But keeping 15,000 schemas can be challenging as well. My recommendation in complex situations like this is similar to a post on decisions that I did earlier – I lay out the choices on a spreadsheet in rows, and then my requirements at the top in the columns. I give each choice a number based on how well it meets each requirement. At the end, the highest number wins. And many times it’s a mix – perhaps this person could segment customers into larger regions or districts or products, in a database. Within that database might be multiple schemas for the customers. Of course, he needs to query across all customers, that becomes another requirement.
Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!