Looking for efficient scaling patterns for Silverlight application with distributed text-file data s
- by Edward Tanguay
I'm designing a Silverlight software solution for students and teachers to record flashcards, e.g. words and phrases that students find while reading and errors that teachers notice while teaching.
Requirements are:
each person publishes his own flashcards in a file on a web server, e.g. http://:www.mywebserver.com/flashcards.txt
other people subscribe to that person's flashcards by using a Silverlight flashcard reader that I have developed and entering the URLs of flashcard files they want to subscribe to, URLs and imported flashcards being saved in IsolatedStorage
the flashcards.txt file has the following simple format: title, then blocks of question/answers:
Jim Smith's flashcards from English class 53-222, winter semester 2009
==fla
Das kann nicht sein.
That can't be.
==fla
Es sei denn, er kommt nicht.
Unless he doesn't come.
The user then makes public the URL to his flashcard file and other readers begin reading in his flashcards.
In order to lower the bar for non-technical users to contribute, it will even be possible for them to save this text in a Google Document, which they publish and distribute the URL. The flashcard readers will then recognize it is a google document and perform the necessary screen scraping to get at the raw text.
I have two technical questions about this approach:
What is a best way to plan now for scalability issues: e.g. if your reader is subscribed to 10 flashcard files that are each 200K, it will have to download 2MB of text just to find out if any new flashcards are available. Or can I somehow accurately and consistently get at the last update date/time of text files on servers and published google docs?
Each reader will have the ability to allow the person to test himself on imported flashcards and add meta information to them, e.g. categorize them, edit them, etc. This information will be stored in IsolatedStorage along with the important flashcards themselves. What is a good pattern to allow these readers to share and synchronize this meta data, e.g. so when you are looking at a flashcard you can see that 5 other people have made corrections to it. The best solution I can think of now is that the Silverlight readers will have to republish their data to a central database, but then there is the problem of uniquely identifying each flashcard, the best approach seems to be URL + position-in-file, or even better URL + original text of both question and answer fields, but both of these have their obvious drawbacks.
The main requirement is that the bar for participation is kept as low as possible, i.e. type text in a google document, publish it, distribute the URL, and you're publishing within the flashcard community. So I want to come up with the most efficient technical solutions in order to compensate for the lack of database, lack of unique ids, etc.
For those who have designed or developed similar non-traditional, distributed database projects like this, what advice, experience or best-practice tips you can share on the above two points?