Keeping video viewing statistics breakdown by video time in a database
- by Septagram
I need to keep a number of statistics about the videos being watched, and one of them is what parts of the video are being watched most. The design I came up with is to split the video into 256 intervals and keep the floating-point number of views for each of them. I receive the data as a number of intervals the user watched continuously. The problem is how to store them. There are two solutions I see.
Row per every video segment
Let's have a database table like this:
CREATE TABLE `video_heatmap` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`video_id` int(11) NOT NULL,
`position` tinyint(3) unsigned NOT NULL,
`views` float NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `idx_lookup` (`video_id`,`position`)
) ENGINE=MyISAM
Then, whenever we have to process a number of views, make sure there are the respective database rows and add appropriate values to the views column. I found out it's a lot faster if the existence of rows is taken care of first (SELECT COUNT(*) of rows for a given video and INSERT IGNORE if they are lacking), and then a number of update queries is used like this:
UPDATE video_heatmap
SET views = views + ?
WHERE video_id = ? AND position >= ? AND position < ?
This seems, however, a little bloated. The other solution I came up with is
Row per video, update in transactions
A table will look (sort of) like this:
CREATE TABLE video (
id INT NOT NULL AUTO_INCREMENT,
heatmap BINARY (4 * 256) NOT NULL,
...
) ENGINE=InnoDB
Then, upon every time a view needs to be stored, it will be done in a transaction with consistent snapshot, in a sequence like this:
If the video doesn't exist in the database, it is created.
A row is retrieved, heatmap, an array of floats stored in the binary form, is converted into a form more friendly for processing (in PHP).
Values in the array are increased appropriately and the array is converted back.
Row is changed via UPDATE query.
So far the advantages can be summed up like this:
First approach
Stores data as floats, not as some magical binary array.
Doesn't require transaction support, so doesn't require InnoDB, and we're using MyISAM for everything at the moment, so there won't be any need to mix storage engines. (only applies in my specific situation)
Doesn't require a transaction WITH CONSISTENT SNAPSHOT. I don't know what are the performance penalties of those.
I already implemented it and it works. (only applies in my specific situation)
Second approach
Is using a lot less storage space (the first approach is storing video ID 256 times and stores position for every segment of the video, not to mention primary key).
Should scale better, because of InnoDB's per-row locking as opposed to MyISAM's table locking.
Might generally work faster because there are a lot less requests being made.
Easier to implement in code (although the other one is already implemented).
So, what should I do? If it wasn't for the rest of our system using MyISAM consistently, I'd go with the second approach, but currently I'm leaning to the first one. But maybe there are some reasons to favour one approach or another?