Storing millions of URLs in a database for fast pattern matching

Posted by Paras Chopra on Stack Overflow See other posts from Stack Overflow or by Paras Chopra
Published on 2010-06-05T18:23:17Z Indexed on 2010/06/06 2:32 UTC
Read the original article Hit count: 238

Filed under:
|
|
|

I am developing a web analytics kind of system which needs to log referring URL, landing page URL and search keywords for every visitor on the website. What I want to do with this collected data is to allow end-user to query the data such as "Show me all visitors who came from Bing.com searching for phrase that contains 'red shoes'" or "Show me all visitors who landed on URL that contained 'campaign=twitter_ad'", etc.

Because this system will be used on many big websites, the amount of data that needs to log will grow really, really fast. So, my question: a) what would be the best strategy for logging so that scaling the system doesn't become a pain; b) how to use that architecture for rapid querying of arbitrary requests? Is there a special method of storing URLs so that querying them gets faster?

In addition to MySQL database that I use, I am exploring (and open to) other alternatives better suited for this task.

© Stack Overflow or respective owner

Related posts about mysql

Related posts about database