Database - Designing an "Events" Table
- by Alix Axel
After reading the tips from this great Nettuts+ article I've come up with a table schema that would separate highly volatile data from other tables subjected to heavy reads and at the same time lower the number of tables needed in the whole database schema, however I'm not sure if this is a good idea since it doesn't follow the rules of normalization and I would like to hear your advice, here is the general idea:
I've four types of users modeled in a Class Table Inheritance structure, in the main "user" table I store data common to all the users (id, username, password, several flags, ...) along with some TIMESTAMP fields (date_created, date_updated, date_activated, date_lastLogin, ...).
To quote the tip #16 from the Nettuts+ article mentioned above:
Example 2: You have a “last_login”
field in your table. It updates every
time a user logs in to the website.
But every update on a table causes the
query cache for that table to be
flushed. You can put that field into
another table to keep updates to your
users table to a minimum.
Now it gets even trickier, I need to keep track of some user statistics like
how many unique times a user profile was seen
how many unique times a ad from a specific type of user was clicked
how many unique times a post from a specific type of user was seen
and so on...
In my fully normalized database this adds up to about 8 to 10 additional tables, it's not a lot but I would like to keep things simple if I could, so I've come up with the following "events" table:
|------|----------------|----------------|--------------|-----------|
| ID | TABLE | EVENT | DATE | IP |
|------|----------------|----------------|--------------|-----------|
| 1 | user | login | 201004190030 | 127.0.0.1 |
|------|----------------|----------------|--------------|-----------|
| 1 | user | login | 201004190230 | 127.0.0.1 |
|------|----------------|----------------|--------------|-----------|
| 2 | user | created | 201004190031 | 127.0.0.2 |
|------|----------------|----------------|--------------|-----------|
| 2 | user | activated | 201004190234 | 127.0.0.2 |
|------|----------------|----------------|--------------|-----------|
| 2 | user | approved | 201004190930 | 217.0.0.1 |
|------|----------------|----------------|--------------|-----------|
| 2 | user | login | 201004191200 | 127.0.0.2 |
|------|----------------|----------------|--------------|-----------|
| 15 | user_ads | created | 201004191230 | 127.0.0.1 |
|------|----------------|----------------|--------------|-----------|
| 15 | user_ads | impressed | 201004191231 | 127.0.0.2 |
|------|----------------|----------------|--------------|-----------|
| 15 | user_ads | clicked | 201004191231 | 127.0.0.2 |
|------|----------------|----------------|--------------|-----------|
| 15 | user_ads | clicked | 201004191231 | 127.0.0.2 |
|------|----------------|----------------|--------------|-----------|
| 15 | user_ads | clicked | 201004191231 | 127.0.0.2 |
|------|----------------|----------------|--------------|-----------|
| 15 | user_ads | clicked | 201004191231 | 127.0.0.2 |
|------|----------------|----------------|--------------|-----------|
| 15 | user_ads | clicked | 201004191231 | 127.0.0.2 |
|------|----------------|----------------|--------------|-----------|
| 2 | user | blocked | 201004200319 | 217.0.0.1 |
|------|----------------|----------------|--------------|-----------|
| 2 | user | deleted | 201004200320 | 217.0.0.1 |
|------|----------------|----------------|--------------|-----------|
Basically the ID refers to the primary key (id) field in the TABLE table, I believe the rest should be pretty straightforward. One thing that I've come to like in this design is that I can keep track of all the user logins instead of just the last one, and thus generate some interesting metrics with that data.
Due to the nature of the events table I also thought of making some optimizations, such as:
#9: Since there is only a finite number of tables and a finite (and predetermined) number of events, the TABLE and EVENTS columns could be setup as ENUMs instead of VARCHARs to save some space.
#14: Store IPs as UNSIGNED INT with INET_ATON() instead of VARCHARs.
Store DATEs as TIMESTAMPs instead of DATETIMEs.
Use the ARCHIVE (or the CSV?) engine instead of InnoDB / MyISAM.
Overall, each event would only consume 14 bytes which is okay for my traffic I guess.
Pros:
Ability to store more detailed data (such as logins).
No need to design (and code for) almost a dozen additional tables (dates and statistics).
Reduces a few columns per table and keeps volatile data separated.
Cons:
Non-relational (still not as bad as EAV):
SELECT * FROM events WHERE id = 2 AND table = 'user' ORDER BY date DESC();
6 bytes overhead per event (ID, TABLE and EVENT).
I'm more inclined to go with this approach since the pros seem to far outweigh the cons, but I'm still a little bit reluctant.. Am I missing something? What are your thoughts on this?
Thanks!