Mysql InnoDB performance optimization and indexing

Posted by Davide C on Stack Overflow See other posts from Stack Overflow or by Davide C
Published on 2010-06-09T20:48:38Z Indexed on 2010/06/09 20:52 UTC
Read the original article Hit count: 318

Hello everybody,

I have 2 databases and I need to link information between two big tables (more than 3M entries each, continuously growing). The 1st database has a table 'pages' that stores various information about web pages, and includes the URL of each one. The column 'URL' is a varchar(512) and has no index.

The 2nd database has a table 'urlHops' defined as:

CREATE TABLE urlHops ( dest varchar(512) NOT NULL, src varchar(512) DEFAULT NULL, timestamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, KEY dest_key (dest), KEY src_key (src) ) ENGINE=InnoDB DEFAULT CHARSET=latin1

Now, I need basically to issue (efficiently) queries like this: select p.id,p.URL from db1.pages p, db2.urlHops u where u.src=p.URL and u.dest=?

At first, I thought to add an index on pages(URL). But it's a very long column, and I already issue a lot of INSERTs and UPDATEs on the same table (way more than the number of SELECTs I would do using this index).

Other possible solutions I thought are: -adding a column to pages, storing the md5 hash of the URL and indexing it; this way I could do queries using the md5 of the URL, with the advantage of an index on a smaller column. -adding another table that contains only page id and page URL, indexing both columns. But this is maybe a waste of space, having only the advantage of not slowing down the inserts and updates I execute on 'pages'.

I don't want to slow down the inserts and updates, but at the same time I would be able to do the queries on the URL efficiently. Any advice? My primary concern is performance; if needed, wasting some disk space is not a problem.

Thank you, regards

Davide

© Stack Overflow or respective owner

Related posts about mysql

Related posts about Performance