Hopping/Tumbling Windows Could Introduce Latency.
This is a pre-article to one I am going to be writing on adjusting an event’s time and duration to satisfy business process requirements but it is one that I think is really useful when understanding the way that Hopping/Tumbling windows work within StreamInsight. A Tumbling window is just a special shortcut version of a Hopping window where the width of the window is equal to the size of the hop Here is the simplest and often used definition for a Hopping Window. You can find them all here public static CepWindowStream<CepWindow<TPayload>> HoppingWindow<TPayload>( this CepStream<TPayload> source, TimeSpan windowSize, TimeSpan hopSize, WindowInputPolicy inputPolicy, HoppingWindowOutputPolicy outputPolicy ) And here is the definition for a Tumbling Window public static CepWindowStream<CepWindow<TPayload>> TumblingWindow<TPayload>( this CepStream<TPayload> source, TimeSpan windowSize, WindowInputPolicy inputPolicy, HoppingWindowOutputPolicy outputPolicy ) These methods allow you to group events into windows of a temporal size. It is a really useful and simple feature in StreamInsight. One of the downsides though is that the windows cannot be flushed until an event in a following window occurs. This means that you will potentially never see some events or see them with a delay. Let me explain. Remember that a stream is a potentially unbounded sequence of events. Events in StreamInsight are given a StartTime. It is this StartTime that is used to calculate into which temporal window an event falls. It is best practice to assign a timestamp from the source system and not one from the system clock on the processing server. StreamInsight cannot know when a window is over. It cannot tell whether you have received all events in the window or whether some events have been delayed which means that StreamInsight cannot flush the stream for you. Imagine you have events with the following Timestamps 12:10:10 PM 12:10:20 PM 12:10:35 PM 12:10:45 PM 11:59:59 PM And imagine that you have defined a 1 minute Tumbling Window over this stream using the following syntax var HoppingStream = from shift in inputStream.TumblingWindow(TimeSpan.FromMinutes(1),HoppingWindowOutputPolicy.ClipToWindowEnd)
select new WindowCountPayload { CountInWindow = (Int32)shift.Count() };
The events between 12:10:10 PM and 12:10:45 PM will not be seen until the event at 11:59:59 PM arrives. This could be a real problem if you need to react to windows promptly
This can always be worked around by using a different design pattern but a lot of the examples I see assume there is a constant, very frequent stream of events resulting in windows always being flushed.
Further examples of using windowing in StreamInsight can be found here