Whilst getting some courseware ready I was playing around writing some code and I decided to very simply show when a window starts and ends based on you asking for a TumblingWindow of n time units in StreamInsight. I thought this was going to be a two second thing but what I found was something I haven’t yet found documented anywhere until now. All this code is written in C# and will slot straight into my favourite quick-win dev tool LinqPad Let’s first create a sample dataset var EnumerableCollection = new []
{
new {id = 1, StartTime = DateTime.Parse("2010-10-01 12:00:00 PM").ToLocalTime()},
new {id = 2, StartTime = DateTime.Parse("2010-10-01 12:20:00 PM").ToLocalTime()},
new {id = 3, StartTime = DateTime.Parse("2010-10-01 12:30:00 PM").ToLocalTime()},
new {id = 4, StartTime = DateTime.Parse("2010-10-01 12:40:00 PM").ToLocalTime()},
new {id = 5, StartTime = DateTime.Parse("2010-10-01 12:50:00 PM").ToLocalTime()},
new {id = 6, StartTime = DateTime.Parse("2010-10-01 01:00:00 PM").ToLocalTime()},
new {id = 7, StartTime = DateTime.Parse("2010-10-01 01:10:00 PM").ToLocalTime()},
new {id = 8, StartTime = DateTime.Parse("2010-10-01 02:00:00 PM").ToLocalTime()},
new {id = 9, StartTime = DateTime.Parse("2010-10-01 03:20:00 PM").ToLocalTime()},
new {id = 10, StartTime = DateTime.Parse("2010-10-01 03:30:00 PM").ToLocalTime()},
new {id = 11, StartTime = DateTime.Parse("2010-10-01 04:40:00 PM").ToLocalTime()},
new {id = 12, StartTime = DateTime.Parse("2010-10-01 04:50:00 PM").ToLocalTime()},
new {id = 13, StartTime = DateTime.Parse("2010-10-01 05:00:00 PM").ToLocalTime()},
new {id = 14, StartTime = DateTime.Parse("2010-10-01 05:10:00 PM").ToLocalTime()}
};
Now let’s create a stream of point events
var inputStream = EnumerableCollection
.ToPointStream(Application,evt=> PointEvent
.CreateInsert(evt.StartTime,evt),AdvanceTimeSettings.StrictlyIncreasingStartTime);
Now we can create our windows over the stream. The first window we will create is a one hour tumbling window. We’'ll count the events in the window but what we do here is not the point, the point is our window edges.
var windowedStream = from win in inputStream.TumblingWindow(TimeSpan.FromHours(1),HoppingWindowOutputPolicy.ClipToWindowEnd)
select new {CountOfEntries = win.Count()};
Now we can have a look at what we get. I am only going to show the first non Cti event as that is enough to demonstrate what is going on
windowedStream.ToIntervalEnumerable().First(e=> e.EventKind == EventKind.Insert).Dump("First Row from Windowed Stream");
The results are below
EventKind
Insert
StartTime
01/10/2010 12:00
EndTime
01/10/2010 13:00
{ CountOfEntries = 5 }
Payload
CountOfEntries
5
Now this makes sense and is quite often the width of window specified in examples. So what happens if I change the windowing code now to
var windowedStream = from win in inputStream.TumblingWindow(TimeSpan.FromHours(5),HoppingWindowOutputPolicy.ClipToWindowEnd)
select new {CountOfEntries = win.Count()};
Now where does your window start? What about
var windowedStream = from win in inputStream.TumblingWindow(TimeSpan.FromMinutes(13),HoppingWindowOutputPolicy.ClipToWindowEnd)
select new {CountOfEntries = win.Count()};
Well for the first example your window will start at 01/10/2010 10:00:00 , and for the second example it will start at 01/10/2010 11:55:00
Surprised?
Here is the reason why and thanks to the StreamInsight team for listening.
Windows start at TimeSpan.MinValue. Windows are then created from that point onwards of the size you specified in your code. If a window contains no events they are not produced by the engine to the output. This is why window start times can be before the first event is created.