mysql index optimization for a table with multiple indexes that index some of the same columns

Posted by Sean on Stack Overflow See other posts from Stack Overflow or by Sean
Published on 2010-04-06T23:04:32Z Indexed on 2010/04/07 0:53 UTC
Read the original article Hit count: 505

Filed under:

indexing

I have a table that stores some basic data about visitor sessions on third party web sites. This is its structure:

id, site_id, unixtime, unixtime_last, ip_address, uid

There are four indexes: id, site_id/unixtime, site_id/ip_address, and site_id/uid

There are many different types of ways that we query this table, and all of them are specific to the site_id. The index with unixtime is used to display the list of visitors for a given date or time range. The other two are used to find all visits from an IP address or a "uid" (a unique cookie value created for each visitor), as well as determining if this is a new visitor or a returning visitor.

Obviously storing site_id inside 3 indexes is inefficient for both write speed and storage, but I see no way around it, since I need to be able to quickly query this data for a given specific site_id.

Any ideas on making this more efficient?

I don't really understand B-trees besides some very basic stuff, but it's more efficient to have the left-most column of an index be the one with the least variance - correct? Because I considered having the site_id being the second column of the index for both ip_address and uid but I think that would make the index less efficient since the IP and UID are going to vary more than the site ID will, because we only have about 8000 unique sites per database server, but millions of unique visitors across all ~8000 sites on a daily basis.

I've also considered removing site_id from the IP and UID indexes completely, since the chances of the same visitor going to multiple sites that share the same database server are quite small, but in cases where this does happen, I fear it could be quite slow to determine if this is a new visitor to this site_id or not. The query would be something like:

select id from sessions where uid = 'value' and site_id = 123 limit 1

... so if this visitor had visited this site before, it would only need to find one row with this site_id before it stopped. This wouldn't be super fast necessarily, but acceptably fast. But say we have a site that gets 500,000 visitors a day, and a particular visitor loves this site and goes there 10 times a day. Now they happen to hit another site on the same database server for the first time. The above query could take quite a long time to search through all of the potentially thousands of rows for this UID, scattered all over the disk, since it wouldn't be finding one for this site ID.

Any insight on making this as efficient as possible would be appreciated :)

Update - this is a MyISAM table with MySQL 5.0. My concerns are both with performance as well as storage space. This table is both read and write heavy. If I had to choose between performance and storage, my biggest concern is performance - but both are important.

We use memcached heavily in all areas of our service, but that's not an excuse to not care about the database design. I want the database to be as efficient as possible.

Developer IT

mysql index optimization for a table with multiple indexes that index some of the same columns - Developer IT

mysql index optimization for a table with multiple indexes that index some of the same columns

mysql

index

indexing

Related posts about mysql

How to remove MySQL completely with config and library files on ubuntu 12.04 gnome 3.0

mysql: Cannot load from mysql.proc. The table is probably corrupted

Why is there a /etc/init.d/mysql file on this Slackware machine? How could it have gotten there?

mysql: Bind on unix socket: Permission denied

MySQL – Learning MySQL Online in 6 Hours – MySQL Fundamentals in 320 Minutes

Related posts about index

NTFS Corruption: Files created in Linux corrupted when Windows Boots

SQL SERVER – Force Index Scan on Table – Use No Index to Retrieve the Data – Query Hint

Covering Index versus Clustered Index (Database Index)

SQL SERVER Force Index Scan on Table Use No Index to Retrieve the Data Query Hint

Custom SNMP Cacti Data Source fails to update

Categories cloud