I'm building an interactive language learning application to be used in a classroom environment. The idea is that a teacher should be able to talk to the students (=audio stream to all students), let students talk to each other (= audio P2P) in groups of two or more, let students watch a video coming from a the DVD player or coming from a media server. It should be possible to save the audio/video streams. The teacher should also be able to monitor, take-over or block the desktop of the students. The platform is Windows and it's a desktop application, no web application. The audio delay should be as minimal as poosible. Optionally a student sitting at home should be supported, but it's not a high priority.
I am now finished with the classroom control part of the application (login, monitor, block, ...) and want to start the audio and video part. I've been evaluating several options like DirectX, GStreamer and SIP but now I have to make a decision.
DirectX seems an obvious choice for the Windows platform, but it only lets me capture and playback audio and video. The encoding/decoding/network part I should do myself.
GStreamer contains all kinds of options to capture/encode/stream/save audio and video streams. I've experimented a bit with it (ossbuild) and it does seem to involve a lot of trial and error to make something work:
- microphone capture (via directsoundsrc) produces cracking noises on some computers
- rtpL16 payloader didn't work well
- streaming raw audio over the network only working at a sampling rate of 8000, no higher
- there are a lot of errors when receiving mpeg4 video (bad I-frame), on some computers worse than others
It is my impression that gstreamer is primary targetted at linux platforms. Development and support for the Windows platform seems to be a little behind. Nevertheless it's a powerful framework that could save me months and years of work.
SIP seems to be able to do everything I want, but it is targeted towards telephony and IM. I don't know how flexible SIP is. It seems to me that the SIP layer would just be overhead as I already have a central (teacher) application that can control and setup all the streams. The interesting parts of frameworks like opalvoip and freeswitch are the actual audio/video capture, the encoding and transmission. Does anyone know how these interesting parts relate a framework like gstreamer? Are they easy to integrate into a custom application? Are they flexible enough?
Does anyone have experience with all or one of these technologies? Maybe there are even other options I can look at?
Many thanks for your advice