Commercial web application--scalable database design
- by Rob Campbell
I'm designing a set of web apps to track scientific laboratory data. Each laboratory has several members, each of whom will access both their own data and that of their laboratory as a whole. Many typical queries will thus be expected to return records of multiple members (e.g. my mouse, joe's mouse and sally's mouse).
I think I have the database fairly well normalized. I'm now wondering how to ensure that users can efficiently access both their own data and their lab's data set when it is mixed among (hopefully) a whole ton of records from other labs.
What I've come up with so far is that most tables will end with two fields: user_id and labgroup_id. The WHERE clause of any SELECT statement will include the appropriate reference to one of the id fields ("...WHERE 'labroup_id=n..." or "...WHERE user_id=n...").
My questions are:
Is this an approach that will scale to 10^6 or more records?
If so, what's the best way to use these fields in a query so that it most efficiently searches the relevant subset of the database? e.g. Should the first step in querying be to create a temporary table containing just the labgroup's data? Or will indexing using some combination of the id, user_id, and labroup_id fields be sufficient at that scale?
I thank any responders very much in advance.