Search Results

Search found 1 results on 1 pages for 'hiffy'.

Page 1/1 | 1 

  • Speeding up inner-joins and subqueries while restricting row size and table membership

    - by hiffy
    I'm developing an rss feed reader that uses a bayesian filter to filter out boring blog posts. The Stream table is meant to act as a FIFO buffer from which the webapp will consume 'entries'. I use it to store the temporary relationship between entries, users and bayesian filter classifications. After a user marks an entry as read, it will be added to the metadata table (so that a user isn't presented with material they have already read), and deleted from the stream table. Every three minutes, a background process will repopulate the Stream table with new entries (i.e. whenever the daemon adds new entries after the checks the rss feeds for updates). Problem: The query I came up with is hella slow. More importantly, the Stream table only needs to hold one hundred unread entries at a time; it'll reduce duplication, make processing faster and give me some flexibility with how I display the entries. The query (takes about 9 seconds on 3600 items with no indexes): insert into stream(entry_id, user_id) select entries.id, subscriptions_users.user_id from entries inner join subscriptions_users on subscriptions_users.subscription_id = entries.subscription_id where subscriptions_users.user_id = 1 and entries.id not in (select entry_id from metadata where metadata.user_id = 1) and entries.id not in (select entry_id from stream where user_id = 1); The query explained: insert into stream all of the entries from a user's subscription list (subscriptions_users) that the user has not read (i.e. do not exist in metadata) and which do not already exist in the stream. Attempted solution: adding limit 100 to the end speeds up the query considerably, but upon repeated executions will keep on adding a different set of 100 entries that do not already exist in the table (with each successful query taking longer and longer). This is close but not quite what I wanted to do. Does anyone have any advice (nosql?) or know a more efficient way of composing the query?

    Read the article

1