Algorithm for finding similar users through a join table

Posted by Gdeglin on Stack Overflow See other posts from Stack Overflow or by Gdeglin
Published on 2010-05-20T22:50:36Z Indexed on 2010/05/20 23:20 UTC
Read the original article Hit count: 302

Filed under:

I have an application where users can select a variety of interests from around 300 possible interests. Each selected interest is stored in a join table containing the columns user_id and interest_id.

Typical users select around 50 interests out of the 300.

I would like to build a system where users can find the top 20 users that have the most interests in common with them.

Right now I am able to accomplish this using the following query:

SELECT i2.user_id, count(i2.interest_id) AS count 
  FROM interests_users as i1, interests_users as i2
    WHERE i1.interest_id = i2.interest_id AND i1.user_id = 35
  GROUP BY i2.user_id
  ORDER BY count DESC LIMIT 20;

However, this query takes approximately 500 milliseconds to execute with 10,000 users and 500,000 rows in the join table. All indexes and database configuration settings have been tuned to the best of my ability.

I have also tried avoiding the use of joins altogether using the following query:

select user_id,count(interest_id) count
  from interests_users
    where interest_id in (13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,508)
  group by user_id 
  order by count desc 
  limit 20;

But this one is even slower (~800 milliseconds).

How could I best lower the time that I can gather this kind of data to below 100 milliseconds?

I have considered putting this data into a graph database like Neo4j, but I am not sure if that is the easiest solution or if it would even be faster than what I am currently doing.

Developer IT

Algorithm for finding similar users through a join table - Developer IT

Algorithm for finding similar users through a join table

sql

algorithm

Related posts about sql

SQL SERVER – Concat Strings in SQL Server using T-SQL – SQL in Sixty Seconds #035 – Video

SQL SERVER – Concat Function in SQL Server – SQL Concatenation

Error with SQL Server Setup 2012 on Windows 2012

How can I detect which version of SQL (eg SQL 2008 or SQL Azure)

Nested SQL Select statement fails on SQL Server 2000, ok on SQL Server 2005

Related posts about algorithm

Jpeg Algorithm vs BMP Algorithm?

word disambiguation algorithm (Lesk algorithm)

Search algorithm (with a sort algorithm already implemented)

Is there any algorithm for finding LINES by PIXEL COLORS on picture?

collsion issues with quadtree [on hold]

Categories cloud