Delivering activity feed items in a moderately scalable way
- by sotangochips
The application I'm working on has an activity feed where each user can see their friends' activity (much like Facebook). I'm looking for a moderately scalable way to show a given users' activity stream on the fly. I say 'moderately' because I'm looking to do this with just a database (Postgresql) and maybe memcached. For instance, I want this solution to scale to 200k users each with 100 friends.
Currently, there is a master activity table that stores the rendered html for the given activity (Jim added a friend, George installed an application, etc.). This master activity table keeps the source user, the html, and a timestamp.
Then, there's a separate ('join') table that simply keeps a pointer to the person who should see this activity in their friend feed, and a pointer to the object in the main activity table.
So, if I have 100 friends, and I do 3 activities, then the join table will then grow to 300 items.
Clearly this table will grow very quickly. It has the nice property, though, that fetching activity to show to a user takes a single (relatively) inexpensive query.
The other option is to just keep the main activity table and query it by saying something like:
select * from activity where source_user in (1, 2, 44, 2423, ... my friend list)
This has the disadvantage that you're querying for users who may never be active, and as your friend list grows, this query can get slower and slower.
I see the pros and the cons of both sides, but I'm wondering if some SO folks might help me weigh the options and suggest one way or they other. I'm also open to other solutions, though I'd like to keep it simple and not install something like CouchDB, etc.
Many thanks!