Database model for keeping track of likes/shares/comments on blog posts over time
- by gage
My goal is to keep track of the popular posts on different blog sites based on social network activity at any given time. The goal is not to simply get the most popular now, but instead find posts that are popular compared to other posts on the same blog. For example, I follow a tech blog, a sports blog, and a gossip blog. The tech blog gets waaay more readership than the other two blogs, so in raw numbers every post on the tech blog will always out number views on the other two. So lets say the average tech blog post gets 500 facebook likes and the other two get an average of 50 likes per post. Then when there is a sports blog post that has 200 fb likes and a gossip blog post with 300 while the tech blog posts today have 500 likes I want to highlight the sports and gossip blog posts (more likes than average vs tech blog with more # of likes but just average for the blog)
The approach I am thinking of taking is to make an entry in a database for each blog post. Every x minutes (say every 15 minutes) I will check how many likes/shares/comments an entry has received on all the social networks (facebook, twitter, google+, linkeIn). So over time there will be a history of likes for each blog post, i.e
post 1234
after 15 min: 10 fb likes, 4 tweets, 6 g+
after 30 min: 15 fb likes, 15 tweets, 10 g+
...
...
after 48 hours: 200 fb likes, 25 tweets, 15 g+
By keeping a history like this for each blog post I can know the average number of likes/shares/tweets at any give time interval. So for example the average number of fb likes for all blog posts 48hrs after posting is 50, and a particular post has 200 I can mark that as a popular post and feature/highlight it. A consideration in the design is to be able to easily query the values (likes/shares) for a specific time-frame, i.e. fb likes after 30min or tweets after 24 hrs in-order to compute averages with which to compare against (or should averages be stored in it's own table?)
If this approach is flawed or could use improvement please let me know, but it is not my main question. My main question is what should a database scheme for storing this info look like?
Assuming that the above approach is taken I am trying to figure out what a database schema for storing the likes over time would look like. I am brand new to databases, in doing some basic reading I see that it is advisable to make a 3NF database. I have come up with the following possible schema.
Schema 1
DB Popular Posts
Table: Post
post_id ( primary key(pk) )
url
title
Table: Social Activity
activity_id (pk)
url (fk)
type (i.e. facebook,twitter,g+)
value
timestamp
This was my initial instinct (base on my very limited db knowledge). As far as I under stand this schema would be 3NF? I searched for designs of similar database model, and found this question on stackoverflow, http://stackoverflow.com/questions/11216080/data-structure-for-storing-height-and-weight-etc-over-time-for-multiple-users . The scenario in that question is similar (recording weight/height of users overtime). Taking the accepted answer for that question and applying it to my model results in something like:
Schema 2 (same as above, but break down the social activity into 2 tables)
DB Popular Posts
Table: Post
post_id (pk)
url
title
Table: Social Measurement
measurement_id (pk)
post_id (fk)
timestamp
Table: Social stat
stat_id (pk)
measurement_id (fk)
type (i.e. facebook,twitter,g+)
value
The advantage I see in schema 2 is that I will likely want to access all the values for a given time, i.e. when making a measurement at 30min after a post is published I will simultaneous check number of fb likes, fb shares, fb comments, tweets, g+, linkedIn. So with this schema it may be easier get get all stats for a measurement_id corresponding to a certain time, i.e. all social stats for post 1234 at time x.
Another thought I had is since it doesn't make sense to compare number of fb likes with number of tweets or g+ shares, maybe it makes sense to separate each social measurement into it's own table?
Schema 3
DB Popular Posts
Table: Post
post_id (pk)
url
title
Table: fb_likes
fb_like_id (pk)
post_id (fk)
timestamp
value
Table: fb_shares
fb_shares_id (pk)
post_id (fk)
timestamp
value
Table: tweets
tweets__id (pk)
post_id (fk)
timestamp
value
Table: google_plus
google_plus_id (pk)
post_id (fk)
timestamp
value
As you can see I am generally lost/unsure of what approach to take.
I'm sure this typical type of database problem (storing measurements overtime, i.e temperature statistic) that must have a common solution. Is there a design pattern/model for this, does it have a name? I tried searching for "database periodic data collection" or "database measurements over time" but didn't find anything specific.
What would be an appropriate model to solve the needs of this problem?