Basic site analytics doesn't tally with Google data
- by Jenkz
After being stumped by an earlier quesiton: SO google-analytics-domain-data-without-filtering
I've been experimenting with a very basic analytics system of my own.
MySQL table:
hit_id, subsite_id, timestamp, ip, url
The subsite_id let's me drill down to a folder (as explained in the previous question).
I can now get the following metrics:
Page Views - Grouped by subsite_id and date
Unique Page Views - Grouped by subsite_id, date, url, IP (not nesecarily how Google does it!)
The usual "most visited page", "likely time to visit" etc etc.
I've now compared my data to that in Google Analytics and found that Google has lower values each metric. Ie, my own setup is counting more hits than Google.
So I've started discounting IP's from various web crawlers, Google, Yahoo & Dotbot so far.
Short Questions:
Is it worth me collating a list of
all major crawlers to discount, is
any list likely to change regularly?
Are there any other obvious filters
that Google will be applying to GA
data?
What other data would you
collect that might be of use further
down the line?
What variables does
Google use to work out entrance
search keywords to a site?
The data is only going to used internally for our own "subsite ranking system", but I would like to show my users some basic data (page views, most popular pages etc) for their reference.