Archiving SQLHelp tweets
- by jamiet
#SQLHelp is a Twitter hashtag that can be used by any Twitter user to get help from the SQL Server community. I think its fair to say that in its first year of being it has proved to be a very useful resource however Kendra Little (@kendra_little) made a very salient point yesterday when she tweeted: Is there a way to search the archives of #sqlhelp Trying to remember answer to a question I know I saw a couple months ago http://twitter.com/#!/Kendra_Little/status/15538234184441856 This highlights an inherent problem with Twitter’s search capability – it simply does not reach far enough back in time. I have made steps to remedy that situation by putting into place two initiatives to archive Tweets that contain the #sqlhelp hashtag. The Archivist http://archivist.visitmix.com/ is a free service that, quite simply, archives a history of tweets that contain a given search term by periodically polling Twitter’s search service with that search term and subsequently displaying a dashboard providing an aggregate view of those tweets for things like tweet volume over time, top users and top words (Archivist FAQ). I have set up an archive on The Archivist for “sqlhelp” which you can view at http://archivist.visitmix.com/jamiet/7. Here is a screenshot of the SQLHelp dashboard 36 minutes after I set it up: There is lots of good information in there, including the fact that Jonathan Kehayias (@SQLSarg) is the most active SQLHelp tweeter (I suspect as an answerer rather than a questioner ) and that SSIS has proven to be a rather (ahem) popular subject!! Datasift The Archivist has its uses though for our purposes it has a couple of downsides. For starters you cannot search through an archive (which is what Kendra was after) and nor can you export the contents of the archive for offline analysis. For those functions we need something a bit more heavyweight and for that I present to you Datasift. Datasift is a tool (currently an alpha release) that allows you to search for tweets and provide them through an object called a Datasift stream. That sounds very similar to normal Twitter search though it has one distinct advantage that other Twitter search tools do not – Datasift has access to Twitter’s Streaming API (aka the Twitter Firehose). In addition it has access to a lot of other rather nice features: It provides the Datasift API that allows you to consume the output of a Datasift stream in your tool of choice (bring on my favourite ultimate mashup tool J ) It has a query language (called Filtered Stream Definition Language – FSDL for short) A Datasift stream can consume (and filter) other Datasift streams Datasift can (and does) consume services other than Twitter If I refer to Datasift as “ETL for tweets” then you may get some sort of idea what it is all about. Just as I did with The Archivist I have set up a publicly available Datasift stream for “sqlhelp” at http://datasift.net/stream/1581/sqlhelp. Here is the FSDL query that provides the data: twitter.text contains "sqlhelp" Pretty simple eh? At the current time it provides little more than a rudimentary dashboard but as Datasift is currently an alpha release I think this may be worth keeping an eye on. The real value though is the ability to consume the output of a stream via Datasift’s RESTful API, observe: http://api.datasift.net/stream.xml?stream_identifier=c7015255f07e982afdeebdf1ae6e3c0d&username=jamiet&api_key=XXXXXXX (Note that an api_key is required during the alpha period so, given that I’m not supplying my api_key, this URI will not work for you) Just to prove that a Datasift stream can indeed consume data from another stream I have set up a second stream that further filters the first one for tweets containing “SSIS”. That one is at http://datasift.net/stream/1586/ssis-sqlhelp and here is the FSDL query: rule "414c9845685ff8d2548999cf3162e897" and (interaction.content contains "ssis") When Datasift moves beyond alpha I’ll re-assess how useful this is going to be and post a follow-up blog. @Jamiet