sql: Group by x,y,z; return grouped by x,y with lowest f(z)
Posted
by Sai Emrys
on Stack Overflow
See other posts from Stack Overflow
or by Sai Emrys
Published on 2010-03-18T12:07:36Z
Indexed on
2010/03/18
12:11 UTC
Read the original article
Hit count: 501
This is for http://cssfingerprint.com
I collect timing stats about how fast the different methods I use perform on different browsers, etc., so that I can optimize the scraping speed. Separately, I have a report about what each method returns for a handful of URLs with known-correct values, so that I can tell which methods are bogus on which browsers. (Each is different, alas.)
The related tables look like this:
CREATE TABLE `browser_tests` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`bogus` tinyint(1) DEFAULT NULL,
`result` tinyint(1) DEFAULT NULL,
`method` varchar(255) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`os` varchar(255) DEFAULT NULL,
`browser` varchar(255) DEFAULT NULL,
`version` varchar(255) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`user_agent` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=33784 DEFAULT CHARSET=latin1
CREATE TABLE `method_timings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`method` varchar(255) DEFAULT NULL,
`batch_size` int(11) DEFAULT NULL,
`timing` int(11) DEFAULT NULL,
`os` varchar(255) DEFAULT NULL,
`browser` varchar(255) DEFAULT NULL,
`version` varchar(255) DEFAULT NULL,
`user_agent` varchar(255) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=28849 DEFAULT CHARSET=latin1
(user_agent is broken down pre-insert into browser, version, and os from a small list of recognized values using regex; I keep the original user-agent string just in case.)
I have a query like this that tells me the average timing for every non-bogus browser / version / method tuple:
select c, avg(bogus) as bog, timing, method, browser, version
from browser_tests as b inner join (
select count(*) as c, round(avg(timing)) as timing, method,
browser, version from method_timings
group by browser, version, method
having c > 10 order by browser, version, timing
) as t using (browser, version, method)
group by browser, version, method
having bog < 1
order by browser, version, timing;
Which returns something like:
c bog tim method browser version
88 0.8333 184 reuse_insert Chrome 4.0.249.89
18 0.0000 238 mass_insert_width Chrome 4.0.249.89
70 0.0400 246 mass_insert Chrome 4.0.249.89
70 0.0400 327 mass_noinsert Chrome 4.0.249.89
88 0.0556 367 reuse_reinsert Chrome 4.0.249.89
88 0.0556 383 jquery Chrome 4.0.249.89
88 0.0556 863 full_reinsert Chrome 4.0.249.89
187 0.0000 105 jquery Chrome 5.0.307.11
187 0.8806 109 reuse_insert Chrome 5.0.307.11
123 0.0000 110 mass_insert_width Chrome 5.0.307.11
176 0.0000 231 mass_noinsert Chrome 5.0.307.11
176 0.0000 237 mass_insert Chrome 5.0.307.11
187 0.0000 314 reuse_reinsert Chrome 5.0.307.11
187 0.0000 372 full_reinsert Chrome 5.0.307.11
12 0.7500 82 reuse_insert Chrome 5.0.335.0
12 0.2500 102 jquery Chrome 5.0.335.0
[...]
I want to modify this query to return only the browser/version/method with the lowest timing - i.e. something like:
88 0.8333 184 reuse_insert Chrome 4.0.249.89
187 0.0000 105 jquery Chrome 5.0.307.11
12 0.7500 82 reuse_insert Chrome 5.0.335.0
[...]
How can I do this, while still returning the method that goes with that lowest timing?
I could filter it app-side, but I'd rather do this in mysql since it'd work better with my caching.
© Stack Overflow or respective owner