logically - Page 13 - Developer IT

Changes to the LINQ-to-StreamInsight Dialect

- by Roman Schindlauer

In previous versions of StreamInsight (1.0 through 2.0), CepStream<> represents temporal streams of many varieties: Streams with ‘open’ inputs (e.g., those defined and composed over CepStream<T>.Create(string streamName) Streams with ‘partially bound’ inputs (e.g., those defined and composed over CepStream<T>.Create(Type adapterFactory, …)) Streams with fully bound inputs (e.g., those defined and composed over To*Stream – sequences or DQC) The stream may be embedded (where Server.Create is used) The stream may be remote (where Server.Connect is used) When adding support for new programming primitives in StreamInsight 2.1, we faced a choice: Add a fourth variety (use CepStream<> to represent streams that are bound the new programming model constructs), or introduce a separate type that represents temporal streams in the new user model. We opted for the latter. Introducing a new type has the effect of reducing the number of (confusing) runtime failures due to inappropriate uses of CepStream<> instances in the incorrect context. The new types are: IStreamable<>, which logically represents a temporal stream. IQStreamable<> : IStreamable<>, which represents a queryable temporal stream. Its relationship to IStreamable<> is analogous to the relationship of IQueryable<> to IEnumerable<>. The developer can compose temporal queries over remote stream sources using this type. The syntax of temporal queries composed over IQStreamable<> is mostly consistent with the syntax of our existing CepStream<>-based LINQ provider. However, we have taken the opportunity to refine certain aspects of the language surface. Differences are outlined below. Because 2.1 introduces new types to represent temporal queries, the changes outlined in this post do no impact existing StreamInsight applications using the existing types! SelectMany StreamInsight does not support the SelectMany operator in its usual form (which is analogous to SQL’s “CROSS APPLY” operator): static IEnumerable<R> SelectMany<T, R>(this IEnumerable<T> source, Func<T, IEnumerable<R>> collectionSelector) It instead uses SelectMany as a convenient syntactic representation of an inner join. The parameter to the selector function is thus unavailable. Because the parameter isn’t supported, its type in StreamInsight 1.0 – 2.0 wasn’t carefully scrutinized. Unfortunately, the type chosen for the parameter is nonsensical to LINQ programmers: static CepStream<R> SelectMany<T, R>(this CepStream<T> source, Expression<Func<CepStream<T>, CepStream<R>>> streamSelector) Using Unit as the type for the parameter accurately reflects the StreamInsight’s capabilities: static IQStreamable<R> SelectMany<T, R>(this IQStreamable<T> source, Expression<Func<Unit, IQStreamable<R>>> streamSelector) For queries that succeed – that is, queries that do not reference the stream selector parameter – there is no difference between the code written for the two overloads: from x in xs from y in ys select f(x, y) Top-K The Take operator used in StreamInsight causes confusion for LINQ programmers because it is applied to the (unbounded) stream rather than the (bounded) window, suggesting that the query as a whole will return k rows: (from win in xs.SnapshotWindow() from x in win orderby x.A select x.B).Take(k) The use of SelectMany is also unfortunate in this context because it implies the availability of the window parameter within the remainder of the comprehension. The following compiles but fails at runtime: (from win in xs.SnapshotWindow() from x in win orderby x.A select win).Take(k) The Take operator in 2.1 is applied to the window rather than the stream: Before After (from win in xs.SnapshotWindow() from x in win orderby x.A select x.B).Take(k) from win in xs.SnapshotWindow() from b in (from x in win orderby x.A select x.B).Take(k) select b Multicast We are introducing an explicit multicast operator in order to preserve expression identity, which is important given the semantics about moving code to and from StreamInsight. This also better matches existing LINQ dialects, such as Reactive. This pattern enables expressing multicasting in two ways: Implicit Explicit var ys = from x in xs where x.A > 1 select x; var zs = from y1 in ys from y2 in ys.ShiftEventTime(_ => TimeSpan.FromSeconds(1)) select y1 + y2; var ys = from x in xs where x.A > 1 select x; var zs = ys.Multicast(ys1 => from y1 in ys1 from y2 in ys1.ShiftEventTime(_ => TimeSpan.FromSeconds(1)) select y1 + y2; Notice the product translates an expression using implicit multicast into an expression using the explicit multicast operator. The user does not see this translation. Default window policies Only default window policies are supported in the new surface. Other policies can be simulated by using AlterEventLifetime. Before After xs.SnapshotWindow( WindowInputPolicy.ClipToWindow, SnapshotWindowInputPolicy.Clip) xs.SnapshotWindow() xs.TumblingWindow( TimeSpan.FromSeconds(1), HoppingWindowOutputPolicy.PointAlignToWindowEnd) xs.TumblingWindow( TimeSpan.FromSeconds(1)) xs.TumblingWindow( TimeSpan.FromSeconds(1), HoppingWindowOutputPolicy.ClipToWindowEnd) Not supported … LeftAntiJoin Representation of LASJ as a correlated sub-query in the LINQ surface is problematic as the StreamInsight engine does not support correlated sub-queries (see discussion of SelectMany). The current syntax requires the introduction of an otherwise unsupported ‘IsEmpty()’ operator. As a result, the pattern is not discoverable and implies capabilities not present in the server. The direct representation of LASJ is used instead: Before After from x in xs where (from y in ys where x.A > y.B select y).IsEmpty() select x xs.LeftAntiJoin(ys, (x, y) => x.A > y.B) from x in xs where (from y in ys where x.A == y.B select y).IsEmpty() select x xs.LeftAntiJoin(ys, x => x.A, y => y.B) ApplyWithUnion The ApplyWithUnion methods have been deprecated since their signatures are redundant given the standard SelectMany overloads: Before After xs.GroupBy(x => x.A).ApplyWithUnion(gs => from win in gs.SnapshotWindow() select win.Count()) xs.GroupBy(x => x.A).SelectMany( gs => from win in gs.SnapshotWindow() select win.Count()) xs.GroupBy(x => x.A).ApplyWithUnion(gs => from win in gs.SnapshotWindow() select win.Count(), r => new { r.Key, Count = r.Payload }) from x in xs group x by x.A into gs from win in gs.SnapshotWindow() select new { gs.Key, Count = win.Count() } Alternate UDO syntax The representation of UDOs in the StreamInsight LINQ dialect confuses cardinalities. Based on the semantics of user-defined operators in StreamInsight, one would expect to construct queries in the following form: from win in xs.SnapshotWindow() from y in MyUdo(win) select y Instead, the UDO proxy method is referenced within a projection, and the (many) results returned by the user code are automatically flattened into a stream: from win in xs.SnapshotWindow() select MyUdo(win) The “many-or-one” confusion is exemplified by the following example that compiles but fails at runtime: from win in xs.SnapshotWindow() select MyUdo(win) + win.Count() The above query must fail because the UDO is in fact returning many values per window while the count aggregate is returning one. Original syntax New alternate syntax from win in xs.SnapshotWindow() select win.UdoProxy(1) from win in xs.SnapshotWindow() from y in win.UserDefinedOperator(() => new Udo(1)) select y -or- from win in xs.SnapshotWindow() from y in win.UdoMacro(1) select y Notice that this formulation also sidesteps the dynamic type pitfalls of the existing “proxy method” approach to UDOs, in which the type of the UDO implementation (TInput, TOuput) and the type of its constructor arguments (TConfig) need to align in a precise and non-obvious way with the argument and return types for the corresponding proxy method. UDSO syntax UDSO currently leverages the DataContractSerializer to clone initial state for logical instances of the user operator. Initial state will instead be described by an expression in the new LINQ surface. Before After xs.Scan(new Udso()) xs.Scan(() => new Udso()) Name changes ShiftEventTime => AlterEventStartTime: The alter event lifetime overload taking a new start time value has been renamed. CountByStartTimeWindow => CountWindow

Read the article

Working with Timelines with LINQ to Twitter

- by Joe Mayo

When first working with the Twitter API, I thought that using SinceID would be an effective way to page through timelines. In practice it doesn’t work well for various reasons. To explain why, Twitter published an excellent document that is a must-read for anyone working with timelines: Twitter Documentation: Working with Timelines This post shows how to implement the recommended strategies in that document by using LINQ to Twitter. You should read the document in it’s entirety before moving on because my explanation will start at the bottom and work back up to the top in relation to the Twitter document. What follows is an explanation of SinceID, MaxID, and how they come together to help you efficiently work with Twitter timelines. The Role of SinceID Specifying SinceID says to Twitter, “Don’t return tweets earlier than this”. What you want to do is store this value after every timeline query set so that it can be reused on the next set of queries. The next section will explain what I mean by query set, but a quick explanation is that it’s a loop that gets all new tweets. The SinceID is a backstop to avoid retrieving tweets that you already have. Here’s some initialization code that includes a variable named sinceID that will be used to populate the SinceID property in subsequent queries: // last tweet processed on previous query set ulong sinceID = 210024053698867204; ulong maxID; const int Count = 10; var statusList = new List<status>(); Here, I’ve hard-coded the sinceID variable, but this is where you would initialize sinceID from whatever storage you choose (i.e. a database). The first time you ever run this code, you won’t have a value from a previous query set. Initially setting it to 0 might sound like a good idea, but what if you’re querying a timeline with lots of tweets? Because of the number of tweets and rate limits, your query set might take a very long time to run. A caveat might be that Twitter won’t return an entire timeline back to Tweet #0, but rather only go back a certain period of time, the limits of which are documented for individual Twitter timeline API resources. So, to initialize SinceID at too low of a number can result in a lot of initial tweets, yet there is a limit to how far you can go back. What you’re trying to accomplish in your application should guide you in how to initially set SinceID. I have more to say about SinceID later in this post. The other variables initialized above include the declaration for MaxID, Count, and statusList. The statusList variable is a holder for all the timeline tweets collected during this query set. You can set Count to any value you want as the largest number of tweets to retrieve, as defined by individual Twitter timeline API resources. To effectively page results, you’ll use the maxID variable to set the MaxID property in queries, which I’ll discuss next. Initializing MaxID On your first query of a query set, MaxID will be whatever the most recent tweet is that you get back. Further, you don’t know what MaxID is until after the initial query. The technique used in this post is to do an initial query and then use the results to figure out what the next MaxID will be. Here’s the code for the initial query: var userStatusResponse = (from tweet in twitterCtx.Status where tweet.Type == StatusType.User && tweet.ScreenName == "JoeMayo" && tweet.SinceID == sinceID && tweet.Count == Count select tweet) .ToList(); statusList.AddRange(userStatusResponse); // first tweet processed on current query maxID = userStatusResponse.Min( status => ulong.Parse(status.StatusID)) - 1; The query above sets both SinceID and Count properties. As explained earlier, Count is the largest number of tweets to return, but the number can be less. A couple reasons why the number of tweets that are returned could be less than Count include the fact that the user, specified by ScreenName, might not have tweeted Count times yet or might not have tweeted at least Count times within the maximum number of tweets that can be returned by the Twitter timeline API resource. Another reason could be because there aren’t Count tweets between now and the tweet ID specified by sinceID. Setting SinceID constrains the results to only those tweets that occurred after the specified Tweet ID, assigned via the sinceID variable in the query above. The statusList is an accumulator of all tweets receive during this query set. To simplify the code, I left out some logic to check whether there were no tweets returned. If the query above doesn’t return any tweets, you’ll receive an exception when trying to perform operations on an empty list. Yeah, I cheated again. Besides querying initial tweets, what’s important about this code is the final line that sets maxID. It retrieves the lowest numbered status ID in the results. Since the lowest numbered status ID is for a tweet we already have, the code decrements the result by one to keep from asking for that tweet again. Remember, SinceID is not inclusive, but MaxID is. The maxID variable is now set to the highest possible tweet ID that can be returned in the next query. The next section explains how to use MaxID to help get the remaining tweets in the query set. Retrieving Remaining Tweets Earlier in this post, I defined a term that I called a query set. Essentially, this is a group of requests to Twitter that you perform to get all new tweets. A single query might not be enough to get all new tweets, so you’ll have to start at the top of the list that Twitter returns and keep making requests until you have all new tweets. The previous section showed the first query of the query set. The code below is a loop that completes the query set: do { // now add sinceID and maxID userStatusResponse = (from tweet in twitterCtx.Status where tweet.Type == StatusType.User && tweet.ScreenName == "JoeMayo" && tweet.Count == Count && tweet.SinceID == sinceID && tweet.MaxID == maxID select tweet) .ToList(); if (userStatusResponse.Count > 0) { // first tweet processed on current query maxID = userStatusResponse.Min( status => ulong.Parse(status.StatusID)) - 1; statusList.AddRange(userStatusResponse); } } while (userStatusResponse.Count != 0 && statusList.Count < 30); Here we have another query, but this time it includes the MaxID property. The SinceID property prevents reading tweets that we’ve already read and Count specifies the largest number of tweets to return. Earlier, I mentioned how it was important to check how many tweets were returned because failing to do so will result in an exception when subsequent code runs on an empty list. The code above protects against this problem by only working with the results if Twitter actually returns tweets. Reasons why there wouldn’t be results include: if the first query got all the new tweets there wouldn’t be more to get and there might not have been any new tweets between the SinceID and MaxID settings of the most recent query. The code for loading the returned tweets into statusList and getting the maxID are the same as previously explained. The important point here is that MaxID is being reset, not SinceID. As explained in the Twitter documentation, paging occurs from the newest tweets to oldest, so setting MaxID lets us move from the most recent tweets down to the oldest as specified by SinceID. The two loop conditions cause the loop to continue as long as tweets are being read or a max number of tweets have been read. Logically, you want to stop reading when you’ve read all the tweets and that’s indicated by the fact that the most recent query did not return results. I put the check to stop after 30 tweets are reached to keep the demo from running too long – in the console the response scrolls past available buffer and I wanted you to be able to see the complete output. Yet, there’s another point to be made about constraining the number of items you return at one time. The Twitter API has rate limits and making too many queries per minute will result in an error from twitter that LINQ to Twitter raises as an exception. To use the API properly, you’ll have to ensure you don’t exceed this threshold. Looking at the statusList.Count as done above is rather primitive, but you can implement your own logic to properly manage your rate limit. Yeah, I cheated again. Summary Now you know how to use LINQ to Twitter to work with Twitter timelines. After reading this post, you have a better idea of the role of SinceID - the oldest tweet already received. You also know that MaxID is the largest tweet ID to retrieve in a query. Together, these settings allow you to page through results via one or more queries. You also understand what factors affect the number of tweets returned and considerations for potential error handling logic. The full example of the code for this post is included in the downloadable source code for LINQ to Twitter. @JoeMayo

Read the article

Changes to the LINQ-to-StreamInsight Dialect

- by Roman Schindlauer

In previous versions of StreamInsight (1.0 through 2.0), CepStream<> represents temporal streams of many varieties: Streams with ‘open’ inputs (e.g., those defined and composed over CepStream<T>.Create(string streamName) Streams with ‘partially bound’ inputs (e.g., those defined and composed over CepStream<T>.Create(Type adapterFactory, …)) Streams with fully bound inputs (e.g., those defined and composed over To*Stream – sequences or DQC) The stream may be embedded (where Server.Create is used) The stream may be remote (where Server.Connect is used) When adding support for new programming primitives in StreamInsight 2.1, we faced a choice: Add a fourth variety (use CepStream<> to represent streams that are bound the new programming model constructs), or introduce a separate type that represents temporal streams in the new user model. We opted for the latter. Introducing a new type has the effect of reducing the number of (confusing) runtime failures due to inappropriate uses of CepStream<> instances in the incorrect context. The new types are: IStreamable<>, which logically represents a temporal stream. IQStreamable<> : IStreamable<>, which represents a queryable temporal stream. Its relationship to IStreamable<> is analogous to the relationship of IQueryable<> to IEnumerable<>. The developer can compose temporal queries over remote stream sources using this type. The syntax of temporal queries composed over IQStreamable<> is mostly consistent with the syntax of our existing CepStream<>-based LINQ provider. However, we have taken the opportunity to refine certain aspects of the language surface. Differences are outlined below. Because 2.1 introduces new types to represent temporal queries, the changes outlined in this post do no impact existing StreamInsight applications using the existing types! SelectMany StreamInsight does not support the SelectMany operator in its usual form (which is analogous to SQL’s “CROSS APPLY” operator): static IEnumerable<R> SelectMany<T, R>(this IEnumerable<T> source, Func<T, IEnumerable<R>> collectionSelector) It instead uses SelectMany as a convenient syntactic representation of an inner join. The parameter to the selector function is thus unavailable. Because the parameter isn’t supported, its type in StreamInsight 1.0 – 2.0 wasn’t carefully scrutinized. Unfortunately, the type chosen for the parameter is nonsensical to LINQ programmers: static CepStream<R> SelectMany<T, R>(this CepStream<T> source, Expression<Func<CepStream<T>, CepStream<R>>> streamSelector) Using Unit as the type for the parameter accurately reflects the StreamInsight’s capabilities: static IQStreamable<R> SelectMany<T, R>(this IQStreamable<T> source, Expression<Func<Unit, IQStreamable<R>>> streamSelector) For queries that succeed – that is, queries that do not reference the stream selector parameter – there is no difference between the code written for the two overloads: from x in xs from y in ys select f(x, y) Top-K The Take operator used in StreamInsight causes confusion for LINQ programmers because it is applied to the (unbounded) stream rather than the (bounded) window, suggesting that the query as a whole will return k rows: (from win in xs.SnapshotWindow() from x in win orderby x.A select x.B).Take(k) The use of SelectMany is also unfortunate in this context because it implies the availability of the window parameter within the remainder of the comprehension. The following compiles but fails at runtime: (from win in xs.SnapshotWindow() from x in win orderby x.A select win).Take(k) The Take operator in 2.1 is applied to the window rather than the stream: Before After (from win in xs.SnapshotWindow() from x in win orderby x.A select x.B).Take(k) from win in xs.SnapshotWindow() from b in (from x in win orderby x.A select x.B).Take(k) select b Multicast We are introducing an explicit multicast operator in order to preserve expression identity, which is important given the semantics about moving code to and from StreamInsight. This also better matches existing LINQ dialects, such as Reactive. This pattern enables expressing multicasting in two ways: Implicit Explicit var ys = from x in xs where x.A > 1 select x; var zs = from y1 in ys from y2 in ys.ShiftEventTime(_ => TimeSpan.FromSeconds(1)) select y1 + y2; var ys = from x in xs where x.A > 1 select x; var zs = ys.Multicast(ys1 => from y1 in ys1 from y2 in ys1.ShiftEventTime(_ => TimeSpan.FromSeconds(1)) select y1 + y2; Notice the product translates an expression using implicit multicast into an expression using the explicit multicast operator. The user does not see this translation. Default window policies Only default window policies are supported in the new surface. Other policies can be simulated by using AlterEventLifetime. Before After xs.SnapshotWindow( WindowInputPolicy.ClipToWindow, SnapshotWindowInputPolicy.Clip) xs.SnapshotWindow() xs.TumblingWindow( TimeSpan.FromSeconds(1), HoppingWindowOutputPolicy.PointAlignToWindowEnd) xs.TumblingWindow( TimeSpan.FromSeconds(1)) xs.TumblingWindow( TimeSpan.FromSeconds(1), HoppingWindowOutputPolicy.ClipToWindowEnd) Not supported … LeftAntiJoin Representation of LASJ as a correlated sub-query in the LINQ surface is problematic as the StreamInsight engine does not support correlated sub-queries (see discussion of SelectMany). The current syntax requires the introduction of an otherwise unsupported ‘IsEmpty()’ operator. As a result, the pattern is not discoverable and implies capabilities not present in the server. The direct representation of LASJ is used instead: Before After from x in xs where (from y in ys where x.A > y.B select y).IsEmpty() select x xs.LeftAntiJoin(ys, (x, y) => x.A > y.B) from x in xs where (from y in ys where x.A == y.B select y).IsEmpty() select x xs.LeftAntiJoin(ys, x => x.A, y => y.B) ApplyWithUnion The ApplyWithUnion methods have been deprecated since their signatures are redundant given the standard SelectMany overloads: Before After xs.GroupBy(x => x.A).ApplyWithUnion(gs => from win in gs.SnapshotWindow() select win.Count()) xs.GroupBy(x => x.A).SelectMany( gs => from win in gs.SnapshotWindow() select win.Count()) xs.GroupBy(x => x.A).ApplyWithUnion(gs => from win in gs.SnapshotWindow() select win.Count(), r => new { r.Key, Count = r.Payload }) from x in xs group x by x.A into gs from win in gs.SnapshotWindow() select new { gs.Key, Count = win.Count() } Alternate UDO syntax The representation of UDOs in the StreamInsight LINQ dialect confuses cardinalities. Based on the semantics of user-defined operators in StreamInsight, one would expect to construct queries in the following form: from win in xs.SnapshotWindow() from y in MyUdo(win) select y Instead, the UDO proxy method is referenced within a projection, and the (many) results returned by the user code are automatically flattened into a stream: from win in xs.SnapshotWindow() select MyUdo(win) The “many-or-one” confusion is exemplified by the following example that compiles but fails at runtime: from win in xs.SnapshotWindow() select MyUdo(win) + win.Count() The above query must fail because the UDO is in fact returning many values per window while the count aggregate is returning one. Original syntax New alternate syntax from win in xs.SnapshotWindow() select win.UdoProxy(1) from win in xs.SnapshotWindow() from y in win.UserDefinedOperator(() => new Udo(1)) select y -or- from win in xs.SnapshotWindow() from y in win.UdoMacro(1) select y Notice that this formulation also sidesteps the dynamic type pitfalls of the existing “proxy method” approach to UDOs, in which the type of the UDO implementation (TInput, TOuput) and the type of its constructor arguments (TConfig) need to align in a precise and non-obvious way with the argument and return types for the corresponding proxy method. UDSO syntax UDSO currently leverages the DataContractSerializer to clone initial state for logical instances of the user operator. Initial state will instead be described by an expression in the new LINQ surface. Before After xs.Scan(new Udso()) xs.Scan(() => new Udso()) Name changes ShiftEventTime => AlterEventStartTime: The alter event lifetime overload taking a new start time value has been renamed. CountByStartTimeWindow => CountWindow

Read the article

Taming Hopping Windows

- by Roman Schindlauer

At first glance, hopping windows seem fairly innocuous and obvious. They organize events into windows with a simple periodic definition: the windows have some duration d (e.g. a window covers 5 second time intervals), an interval or period p (e.g. a new window starts every 2 seconds) and an alignment a (e.g. one of those windows starts at 12:00 PM on March 15, 2012 UTC). var wins = xs .HoppingWindow(TimeSpan.FromSeconds(5), TimeSpan.FromSeconds(2), new DateTime(2012, 3, 15, 12, 0, 0, DateTimeKind.Utc)); Logically, there is a window with start time a + np and end time a + np + d for every integer n. That’s a lot of windows. So why doesn’t the following query (always) blow up? var query = wins.Select(win => win.Count()); A few users have asked why StreamInsight doesn’t produce output for empty windows. Primarily it’s because there is an infinite number of empty windows! (Actually, StreamInsight uses DateTimeOffset.MaxValue to approximate “the end of time” and DateTimeOffset.MinValue to approximate “the beginning of time”, so the number of windows is lower in practice.) That was the good news. Now the bad news. Events also have duration. Consider the following simple input: var xs = this.Application .DefineEnumerable(() => new[] { EdgeEvent.CreateStart(DateTimeOffset.UtcNow, 0) }) .ToStreamable(AdvanceTimeSettings.IncreasingStartTime); Because the event has no explicit end edge, it lasts until the end of time. So there are lots of non-empty windows if we apply a hopping window to that single event! For this reason, we need to be careful with hopping window queries in StreamInsight. Or we can switch to a custom implementation of hopping windows that doesn’t suffer from this shortcoming. The alternate window implementation produces output only when the input changes. We start by breaking up the timeline into non-overlapping intervals assigned to each window. In figure 1, six hopping windows (“Windows”) are assigned to six intervals (“Assignments”) in the timeline. Next we take input events (“Events”) and alter their lifetimes (“Altered Events”) so that they cover the intervals of the windows they intersect. In figure 1, you can see that the first event e1 intersects windows w1 and w2 so it is adjusted to cover assignments a1 and a2. Finally, we can use snapshot windows (“Snapshots”) to produce output for the hopping windows. Notice however that instead of having six windows generating output, we have only four. The first and second snapshots correspond to the first and second hopping windows. The remaining snapshots however cover two hopping windows each! While in this example we saved only two events, the savings can be more significant when the ratio of event duration to window duration is higher. Figure 1: Timeline The implementation of this strategy is straightforward. We need to set the start times of events to the start time of the interval assigned to the earliest window including the start time. Similarly, we need to modify the end times of events to the end time of the interval assigned to the latest window including the end time. The following snap-to-boundary function that rounds a timestamp value t down to the nearest value t' <= t such that t' is a + np for some integer n will be useful. For convenience, we will represent both DateTime and TimeSpan values using long ticks: static long SnapToBoundary(long t, long a, long p) { return t - ((t - a) % p) - (t > a ? 0L : p); } How do we find the earliest window including the start time for an event? It’s the window following the last window that does not include the start time assuming that there are no gaps in the windows (i.e. duration < interval), and limitation of this solution. To find the end time of that antecedent window, we need to know the alignment of window ends: long e = a + (d % p); Using the window end alignment, we are finally ready to describe the start time selector: static long AdjustStartTime(long t, long e, long p) { return SnapToBoundary(t, e, p) + p; } To find the latest window including the end time for an event, we look for the last window start time (non-inclusive): public static long AdjustEndTime(long t, long a, long d, long p) { return SnapToBoundary(t - 1, a, p) + p + d; } Bringing it together, we can define the translation from events to ‘altered events’ as in Figure 1: public static IQStreamable<T> SnapToWindowIntervals<T>(IQStreamable<T> source, TimeSpan duration, TimeSpan interval, DateTime alignment) { if (source == null) throw new ArgumentNullException("source"); // reason about DateTime and TimeSpan in ticks long d = Math.Min(DateTime.MaxValue.Ticks, duration.Ticks); long p = Math.Min(DateTime.MaxValue.Ticks, Math.Abs(interval.Ticks)); // set alignment to earliest possible window var a = alignment.ToUniversalTime().Ticks % p; // verify constraints of this solution if (d <= 0L) { throw new ArgumentOutOfRangeException("duration"); } if (p == 0L || p > d) { throw new ArgumentOutOfRangeException("interval"); } // find the alignment of window ends long e = a + (d % p); return source.AlterEventLifetime( evt => ToDateTime(AdjustStartTime(evt.StartTime.ToUniversalTime().Ticks, e, p)), evt => ToDateTime(AdjustEndTime(evt.EndTime.ToUniversalTime().Ticks, a, d, p)) - ToDateTime(AdjustStartTime(evt.StartTime.ToUniversalTime().Ticks, e, p))); } public static DateTime ToDateTime(long ticks) { // just snap to min or max value rather than under/overflowing return ticks < DateTime.MinValue.Ticks ? new DateTime(DateTime.MinValue.Ticks, DateTimeKind.Utc) : ticks > DateTime.MaxValue.Ticks ? new DateTime(DateTime.MaxValue.Ticks, DateTimeKind.Utc) : new DateTime(ticks, DateTimeKind.Utc); } Finally, we can describe our custom hopping window operator: public static IQWindowedStreamable<T> HoppingWindow2<T>( IQStreamable<T> source, TimeSpan duration, TimeSpan interval, DateTime alignment) { if (source == null) { throw new ArgumentNullException("source"); } return SnapToWindowIntervals(source, duration, interval, alignment).SnapshotWindow(); } By switching from HoppingWindow to HoppingWindow2 in the following example, the query returns quickly rather than gobbling resources and ultimately failing! public void Main() { var start = new DateTimeOffset(new DateTime(2012, 6, 28), TimeSpan.Zero); var duration = TimeSpan.FromSeconds(5); var interval = TimeSpan.FromSeconds(2); var alignment = new DateTime(2012, 3, 15, 12, 0, 0, DateTimeKind.Utc); var events = this.Application.DefineEnumerable(() => new[] { EdgeEvent.CreateStart(start.AddSeconds(0), "e0"), EdgeEvent.CreateStart(start.AddSeconds(1), "e1"), EdgeEvent.CreateEnd(start.AddSeconds(1), start.AddSeconds(2), "e1"), EdgeEvent.CreateStart(start.AddSeconds(3), "e2"), EdgeEvent.CreateStart(start.AddSeconds(9), "e3"), EdgeEvent.CreateEnd(start.AddSeconds(3), start.AddSeconds(10), "e2"), EdgeEvent.CreateEnd(start.AddSeconds(9), start.AddSeconds(10), "e3"), }).ToStreamable(AdvanceTimeSettings.IncreasingStartTime); var adjustedEvents = SnapToWindowIntervals(events, duration, interval, alignment); var query = from win in HoppingWindow2(events, duration, interval, alignment) select win.Count(); DisplayResults(adjustedEvents, "Adjusted Events"); DisplayResults(query, "Query"); } As you can see, instead of producing a massive number of windows for the open start edge e0, a single window is emitted from 12:00:15 AM until the end of time: Adjusted Events StartTime EndTime Payload 6/28/2012 12:00:01 AM 12/31/9999 11:59:59 PM e0 6/28/2012 12:00:03 AM 6/28/2012 12:00:07 AM e1 6/28/2012 12:00:05 AM 6/28/2012 12:00:15 AM e2 6/28/2012 12:00:11 AM 6/28/2012 12:00:15 AM e3 Query StartTime EndTime Payload 6/28/2012 12:00:01 AM 6/28/2012 12:00:03 AM 1 6/28/2012 12:00:03 AM 6/28/2012 12:00:05 AM 2 6/28/2012 12:00:05 AM 6/28/2012 12:00:07 AM 3 6/28/2012 12:00:07 AM 6/28/2012 12:00:11 AM 2 6/28/2012 12:00:11 AM 6/28/2012 12:00:15 AM 3 6/28/2012 12:00:15 AM 12/31/9999 11:59:59 PM 1 Regards, The StreamInsight Team

Read the article

Guide to reduce TFS database growth using the Test Attachment Cleaner

- by terje

Recently there has been several reports on TFS databases growing too fast and growing too big. Notable this has been observed when one has started to use more features of the Testing system. Also, the TFS 2010 handles test results differently from TFS 2008, and this leads to more data stored in the TFS databases. As a consequence of this there has been released some tools to remove unneeded data in the database, and also some fixes to correct for bugs which has been found and corrected during this process. Further some preventive practices and maintenance rules should be adopted. A lot of people have blogged about this, among these are: Anu’s very important blog post here describes both the problem and solutions to handle it. She describes both the Test Attachment Cleaner tool, and also some QFE/CU releases to fix some underlying bugs which prevented the tool from being fully effective. Brian Harry’s blog post here describes the problem too This forum thread describes the problem with some solution hints. Ravi Shanker’s blog post here describes best practices on solving this (TBP) Grant Holidays blogpost here describes strategies to use the Test Attachment Cleaner both to detect space problems and how to rectify them. The problem can be divided into the following areas: Publishing of test results from builds Publishing of manual test results and their attachments in particular Publishing of deployment binaries for use during a test run Bugs in SQL server preventing total cleanup of data (All the published data above is published into the TFS database as attachments.) The test results will include all data being collected during the run. Some of this data can grow rather large, like IntelliTrace logs and video recordings. Also the pushing of binaries which happen for automated test runs, including tests run during a build using code coverage which will include all the files in the deployment folder, contributes a lot to the size of the attached data. In order to handle this systematically, I have set up a 3-stage process: Find out if you have a database space issue Set up your TFS server to minimize potential database issues If you have the “problem”, clean up the database and otherwise keep it clean Analyze the data Are your database( s) growing ? Are unused test results growing out of proportion ? To find out about this you need to query your TFS database for some of the information, and use the Test Attachment Cleaner (TAC) to obtain some more detailed information. If you don’t have too many databases you can use the SQL Server reports from within the Management Studio to analyze the database and table sizes. Or, you can use a set of queries . I find queries often faster to use because I can tweak them the way I want them. But be aware that these queries are non-documented and non-supported and may change when the product team wants to change them. If you have multiple Project Collections, find out which might have problems: (Disclaimer: The queries below work on TFS 2010. They will not work on Dev-11, since the table structure have been changed. I will try to update them for Dev-11 when it is released.) Open a SQL Management Studio session onto the SQL Server where you have your TFS Databases. Use the query below to find the Project Collection databases and their sizes, in descending size order. use master select DB_NAME(database_id) AS DBName, (size/128) SizeInMB FROM sys.master_files where type=0 and substring(db_name(database_id),1,4)='Tfs_' and DB_NAME(database_id)<>'Tfs_Configuration' order by size desc Doing this on one of our SQL servers gives the following results: It is pretty easy to see on which collection to start the work Find out which tables are possibly too large Keep a special watch out for the Tfs_Attachment table. Use the script at the bottom of Grant’s blog to find the table sizes in descending size order. In our case we got this result: From Grant’s blog we learnt that the tbl_Content is in the Version Control category, so the major only big issue we have here is the tbl_AttachmentContent. Find out which team projects have possibly too large attachments In order to use the TAC to find and eventually delete attachment data we need to find out which team projects have these attachments. The team project is a required parameter to the TAC. Use the following query to find this, replace the collection database name with whatever applies in your case: use Tfs_DefaultCollection select p.projectname, sum(a.compressedlength)/1024/1024 as sizeInMB from dbo.tbl_Attachment as a inner join tbl_testrun as tr on a.testrunid=tr.testrunid inner join tbl_project as p on p.projectid=tr.projectid group by p.projectname order by sum(a.compressedlength) desc In our case we got this result (had to remove some names), out of more than 100 team projects accumulated over quite some years: As can be seen here it is pretty obvious the “Byggtjeneste – Projects” are the main team project to take care of, with the ones on lines 2-4 as the next ones. Check which attachment types takes up the most space It can be nice to know which attachment types takes up the space, so run the following query: use Tfs_DefaultCollection select a.attachmenttype, sum(a.compressedlength)/1024/1024 as sizeInMB from dbo.tbl_Attachment as a inner join tbl_testrun as tr on a.testrunid=tr.testrunid inner join tbl_project as p on p.projectid=tr.projectid group by a.attachmenttype order by sum(a.compressedlength) desc We then got this result: From this it is pretty obvious that the problem here is the binary files, as also mentioned in Anu’s blog. Check which file types, by their extension, takes up the most space Run the following query use Tfs_DefaultCollection select SUBSTRING(filename,len(filename)-CHARINDEX('.',REVERSE(filename))+2,999)as Extension, sum(compressedlength)/1024 as SizeInKB from tbl_Attachment group by SUBSTRING(filename,len(filename)-CHARINDEX('.',REVERSE(filename))+2,999) order by sum(compressedlength) desc This gives a result like this: Now you should have collected enough information to tell you what to do – if you got to do something, and some of the information you need in order to set up your TAC settings file, both for a cleanup and for scheduled maintenance later. Get your TFS server and environment properly set up Even if you have got the problem or if have yet not got the problem, you should ensure the TFS server is set up so that the risk of getting into this problem is minimized. To ensure this you should install the following set of updates and components. The assumption is that your TFS Server is at SP1 level. Install the QFE for KB2608743 – which also contains detailed instructions on its use, download from here. The QFE changes the default settings to not upload deployed binaries, which are used in automated test runs. Binaries will still be uploaded if: Code coverage is enabled in the test settings. You change the UploadDeploymentItem to true in the testsettings file. Be aware that this might be reset back to false by another user which haven't installed this QFE. The hotfix should be installed to The build servers (the build agents) The machine hosting the Test Controller Local development computers (Visual Studio) Local test computers (MTM) It is not required to install it to the TFS Server, test agents or the build controller – it has no effect on these programs. If you use the SQL Server 2008 R2 you should also install the CU 10 (or later). This CU fixes a potential problem of hanging “ghost” files. This seems to happen only in certain trigger situations, but to ensure it doesn’t bite you, it is better to make sure this CU is installed. There is no such CU for SQL Server 2008 pre-R2 Work around: If you suspect hanging ghost files, they can be – with some mental effort, deduced from the ghost counters using the following SQL query: use master SELECT DB_NAME(database_id) as 'database',OBJECT_NAME(object_id) as 'objectname', index_type_desc,ghost_record_count,version_ghost_record_count,record_count,avg_record_size_in_bytes FROM sys.dm_db_index_physical_stats (DB_ID(N'<DatabaseName>'), OBJECT_ID(N'<TableName>'), NULL, NULL , 'DETAILED') The problem is a stalled ghost cleanup process. Restarting the SQL server after having stopped all components that depends on it, like the TFS Server and SPS services – that is all applications that connect to the SQL server. Then restart the SQL server, and finally start up all dependent processes again. (I would guess a complete server reboot would do the trick too.) After this the ghost cleanup process will run properly again. The fix will come in the next CU cycle for SQL Server R2 SP1. The R2 pre-SP1 and R2 SP1 have separate maintenance cycles, and are maintained individually. Each have its own set of CU’s. When it comes I will add the link here to that CU. The "hanging ghost file” issue came up after one have run the TAC, and deleted enourmes amount of data. The SQL Server can get into this hanging state (without the QFE) in certain cases due to this. And of course, install and set up the Test Attachment Cleaner command line power tool. This should be done following some guidelines from Ravi Shanker: “When you run TAC, ensure that you are deleting small chunks of data at regular intervals (say run TAC every night at 3AM to delete data that is between age 730 to 731 days) – this will ensure that small amounts of data are being deleted and SQL ghosted record cleanup can catch up with the number of deletes performed. “ This rule minimizes the risk of the ghosted hang problem to occur, and further makes it easier for the SQL server ghosting process to work smoothly. “Run DBCC SHRINKDB post the ghosted records are cleaned up to physically reclaim the space on the file system” This is the last step in a 3 step process of removing SQL server data. First they are logically deleted. Then they are cleaned out by the ghosting process, and finally removed using the shrinkdb command. Cleaning out the attachments The TAC is run from the command line using a set of parameters and controlled by a settingsfile. The parameters point out a server uri including the team project collection and also point at a specific team project. So in order to run this for multiple team projects regularly one has to set up a script to run the TAC multiple times, once for each team project. When you install the TAC there is a very useful readme file in the same directory. When the deployment binaries are published to the TFS server, ALL items are published up from the deployment folder. That often means much more files than you would assume are necessary. This is a brute force technique. It works, but you need to take care when cleaning up. Grant has shown how their settings file looks in his blog post, removing all attachments older than 180 days , as long as there are no active workitems connected to them. This setting can be useful to clean out all items, both in a clean-up once operation, and in a general There are two scenarios we need to consider: Cleaning up an existing overgrown database Maintaining a server to avoid an overgrown database using scheduled TAC 1. Cleaning up a database which has grown too big due to these attachments. This job is a “Once” job. We do this once and then move on to make sure it won’t happen again, by taking the actions in 2) below. In this scenario you should only consider the large files. Your goal should be to simply reduce the size, and don’t bother about the smaller stuff. That can be left a scheduled TAC cleanup ( 2 below). Here you can use a very general settings file, and just remove the large attachments, or you can choose to remove any old items. Grant’s settings file is an example of the last one. A settings file to remove only large attachments could look like this:  <DeletionCriteria> <TestRun /> <Attachment> <SizeInMB GreaterThan="10" /> </Attachment> </DeletionCriteria> Or like this: If you want only to remove dll’s and pdb’s about that size, add an Extensions-section. Without that section, all extensions will be deleted.  <DeletionCriteria> <TestRun /> <Attachment> <SizeInMB GreaterThan="10" /> <Extensions> <Include value="dll" /> <Include value="pdb" /> </Extensions> </Attachment> </DeletionCriteria> Before you start up your scheduled maintenance, you should clear out all older items. 2. Scheduled maintenance using the TAC If you run a schedule every night, and remove old items, and also remove them in small batches. It is important to run this often, like every night, in order to keep the number of deleted items low. That way the SQL ghost process works better. One approach could be to delete all items older than some number of days, let’s say 180 days. This could be combined with restricting it to keep attachments with active or resolved bugs. Doing this every night ensures that only small amounts of data is deleted.  <DeletionCriteria> <TestRun> <AgeInDays OlderThan="180" /> </TestRun> <Attachment /> <LinkedBugs> <Exclude state="Active" /> <Exclude state="Resolved"/> </LinkedBugs> </DeletionCriteria> In my experience there are projects which are left with active or resolved workitems, akthough no further work is done. It can be wise to have a cleanup process with no restrictions on linked bugs at all. Note that you then have to remove the whole LinkedBugs section. A approach which could work better here is to do a two step approach, use the schedule above to with no LinkedBugs as a sweeper cleaning task taking away all data older than you could care about. Then have another scheduled TAC task to take out more specifically attachments that you are not likely to use. This task could be much more specific, and based on your analysis clean out what you know is troublesome data.  <DeletionCriteria> <TestRun > <AgeInDays OlderThan="30" /> </TestRun> <Attachment> <SizeInMB GreaterThan="10" /> <Extensions> <Include value="iTrace"/> <Include value="dll"/> <Include value="pdb"/> <Include value="wmv"/> </Extensions> </Attachment> <LinkedBugs> <Exclude state="Active" /> <Exclude state="Resolved" /> </LinkedBugs> </DeletionCriteria> The readme document for the TAC says that it recognizes “internal” extensions, but it does recognize any extension. To run the tool do the following command: tcmpt attachmentcleanup /collection:your_tfs_collection_url /teamproject:your_team_project /settingsfile:path_to_settingsfile /outputfile:%temp%/teamproject.tcmpt.log /mode:delete Shrinking the database You could run a shrink database command after the TAC has run in cases where there are a lot of data being deleted. In this case you SHOULD do it, to free up all that space. But, after the shrink operation you should do a rebuild indexes, since the shrink operation will leave the database in a very fragmented state, which will reduce performance. Note that you need to rebuild indexes, reorganizing is not enough. For smaller amounts of data you should NOT shrink the database, since the data will be reused by the SQL server when it need to add more records. In fact, it is regarded as a bad practice to shrink the database regularly. So on a daily maintenance schedule you should NOT shrink the database. To shrink the database you do a DBCC SHRINKDATABASE command, and then follow up with a DBCC INDEXDEFRAG afterwards. I find the easiest way to do this is to create a SQL Maintenance plan including the Shrink Database Task and the Rebuild Index Task and just execute it when you need to do this.

Read the article

Simple MSBuild Configuration: Updating Assemblies With A Version Number

- by srkirkland

When distributing a library you often run up against versioning problems, once facet of which is simply determining which version of that library your client is running. Of course, each project in your solution has an AssemblyInfo.cs file which provides, among other things, the ability to set the Assembly name and version number. Unfortunately, setting the assembly version here would require not only changing the version manually for each build (depending on your schedule), but keeping it in sync across all projects. There are many ways to solve this versioning problem, and in this blog post I’m going to try to explain what I think is the easiest and most flexible solution. I will walk you through using MSBuild to create a simple build script, and I’ll even show how to (optionally) integrate with a Team City build server. All of the code from this post can be found at https://github.com/srkirkland/BuildVersion. Create CommonAssemblyInfo.cs The first step is to create a common location for the repeated assembly info that is spread across all of your projects. Create a new solution-level file (I usually create a Build/ folder in the solution root, but anywhere reachable by all your projects will do) called CommonAssemblyInfo.cs. In here you can put any information common to all your assemblies, including the version number. An example CommonAssemblyInfo.cs is as follows: using System.Reflection; using System.Resources; using System.Runtime.InteropServices; [assembly: AssemblyCompany("University of California, Davis")] [assembly: AssemblyProduct("BuildVersionTest")] [assembly: AssemblyCopyright("Scott Kirkland & UC Regents")] [assembly: AssemblyConfiguration("")] [assembly: AssemblyTrademark("")] [assembly: ComVisible(false)] [assembly: AssemblyVersion("1.2.3.4")] //Will be replaced [assembly: NeutralResourcesLanguage("en-US")] .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Cleanup AssemblyInfo.cs & Link CommonAssemblyInfo.cs For each of your projects, you’ll want to clean up your assembly info to contain only information that is unique to that assembly – everything else will go in the CommonAssemblyInfo.cs file. For most of my projects, that just means setting the AssemblyTitle, though you may feel AssemblyDescription is warranted. An example AssemblyInfo.cs file is as follows: using System.Reflection; [assembly: AssemblyTitle("BuildVersionTest")] .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Next, you need to “link” the CommonAssemblyinfo.cs file into your projects right beside your newly lean AssemblyInfo.cs file. To do this, right click on your project and choose Add | Existing Item from the context menu. Navigate to your CommonAssemblyinfo.cs file but instead of clicking Add, click the little down-arrow next to add and choose “Add as Link.” You should see a little link graphic similar to this: We’ve actually reduced complexity a lot already, because if you build all of your assemblies will have the same common info, including the product name and our static (fake) assembly version. Let’s take this one step further and introduce a build script. Create an MSBuild file What we want from the build script (for now) is basically just to have the common assembly version number changed via a parameter (eventually to be passed in by the build server) and then for the project to build. Also we’d like to have a flexibility to define what build configuration to use (debug, release, etc). In order to find/replace the version number, we are going to use a Regular Expression to find and replace the text within your CommonAssemblyInfo.cs file. There are many other ways to do this using community build task add-ins, but since we want to keep it simple let’s just define the Regular Expression task manually in a new file, Build.tasks (this example taken from the NuGet build.tasks file). <?xml version="1.0" encoding="utf-8"?> <Project ToolsVersion="4.0" DefaultTargets="Go" xmlns="http://schemas.microsoft.com/developer/msbuild/2003"> <UsingTask TaskName="RegexTransform" TaskFactory="CodeTaskFactory" AssemblyFile="$(MSBuildToolsPath)\Microsoft.Build.Tasks.v4.0.dll"> <ParameterGroup> <Items ParameterType="Microsoft.Build.Framework.ITaskItem[]" /> </ParameterGroup> <Task> <Using Namespace="System.IO" /> <Using Namespace="System.Text.RegularExpressions" /> <Using Namespace="Microsoft.Build.Framework" /> <Code Type="Fragment" Language="cs"> <![CDATA[ foreach(ITaskItem item in Items) { string fileName = item.GetMetadata("FullPath"); string find = item.GetMetadata("Find"); string replaceWith = item.GetMetadata("ReplaceWith"); if(!File.Exists(fileName)) { Log.LogError(null, null, null, null, 0, 0, 0, 0, String.Format("Could not find version file: {0}", fileName), new object[0]); } string content = File.ReadAllText(fileName); File.WriteAllText( fileName, Regex.Replace( content, find, replaceWith ) ); } ]]> </Code> </Task> </UsingTask> </Project> .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } If you glance at the code, you’ll see it’s really just going a Regex.Replace() on a given file, which is exactly what we need. Now we are ready to write our build file, called (by convention) Build.proj. <?xml version="1.0" encoding="utf-8"?> <Project ToolsVersion="4.0" DefaultTargets="Go" xmlns="http://schemas.microsoft.com/developer/msbuild/2003"> <Import Project="$(MSBuildProjectDirectory)\Build.tasks" /> <PropertyGroup> <Configuration Condition="'$(Configuration)' == ''">Debug</Configuration> <SolutionRoot>$(MSBuildProjectDirectory)</SolutionRoot> </PropertyGroup> <ItemGroup> <RegexTransform Include="$(SolutionRoot)\CommonAssemblyInfo.cs"> <Find>(?<major>\d+)\.(?<minor>\d+)\.\d+\.(?<revision>\d+)</Find> <ReplaceWith>$(BUILD_NUMBER)</ReplaceWith> </RegexTransform> </ItemGroup> <Target Name="Go" DependsOnTargets="UpdateAssemblyVersion; Build"> </Target> <Target Name="UpdateAssemblyVersion" Condition="'$(BUILD_NUMBER)' != ''"> <RegexTransform Items="@(RegexTransform)" /> </Target> <Target Name="Build"> <MSBuild Projects="$(SolutionRoot)\BuildVersionTest.sln" Targets="Build" /> </Target> </Project> Reviewing this MSBuild file, we see that by default the “Go” target will be called, which in turn depends on “UpdateAssemblyVersion” and then “Build.” We go ahead and import the Bulid.tasks file and then setup some handy properties for setting the build configuration and solution root (in this case, my build files are in the solution root, but we might want to create a Build/ directory later). The rest of the file flows logically, we setup the RegexTransform to match version numbers such as <major>.<minor>.1.<revision> (1.2.3.4 in our example) and replace it with a $(BUILD_NUMBER) parameter which will be supplied externally. The first target, “UpdateAssemblyVersion” just runs the RegexTransform, and the second target, “Build” just runs the default MSBuild on our solution. Testing the MSBuild file locally Now we have a build file which can replace assembly version numbers and build, so let’s setup a quick batch file to be able to build locally. To do this you simply create a file called Build.cmd and have it call MSBuild on your Build.proj file. I’ve added a bit more flexibility so you can specify build configuration and version number, which makes your Build.cmd look as follows: set config=%1 if "%config%" == "" ( set config=debug ) set version=%2 if "%version%" == "" ( set version=2.3.4.5 ) %WINDIR%\Microsoft.NET\Framework\v4.0.30319\msbuild Build.proj /p:Configuration="%config%" /p:build_number="%version%" .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Now if you click on the Build.cmd file, you will get a default debug build using the version 2.3.4.5. Let’s run it in a command window with the parameters set for a release build version 2.0.1.453. Excellent! We can now run one simple command and govern the build configuration and version number of our entire solution. Each DLL produced will have the same version number, making determining which version of a library you are running very simple and accurate. Configure the build server (TeamCity) Of course you are not really going to want to run a build command manually every time, and typing in incrementing version numbers will also not be ideal. A good solution is to have a computer (or set of computers) act as a build server and build your code for you, providing you a consistent environment, excellent reporting, and much more. One of the most popular Build Servers is JetBrains’ TeamCity, and this last section will show you the few configuration parameters to use when setting up a build using your MSBuild file created earlier. If you are using a different build server, the same principals should apply. First, when setting up the project you want to specify the “Build Number Format,” often given in the form <major>.<minor>.<revision>.<build>. In this case you will set major/minor manually, and optionally revision (or you can use your VCS revision number with %build.vcs.number%), and then build using the {0} wildcard. Thus your build number format might look like this: 2.0.1.{0}. During each build, this value will be created and passed into the $BUILD_NUMBER variable of our Build.proj file, which then uses it to decorate your assemblies with the proper version. After setting up the build number, you must choose MSBuild as the Build Runner, then provide a path to your build file (Build.proj). After specifying your MSBuild Version (equivalent to your .NET Framework Version), you have the option to specify targets (the default being “Go”) and additional MSBuild parameters. The one parameter that is often useful is manually setting the configuration property (/p:Configuration="Release") if you want something other than the default (which is Debug in our example). Your resulting configuration will look something like this: [Under General Settings] [Build Runner Settings] Now every time your build is run, a newly incremented build version number will be generated and passed to MSBuild, which will then version your assemblies and build your solution. A Quick Review Our goal was to version our output assemblies in an automated way, and we accomplished it by performing a few quick steps: Move the common assembly information, including version, into a linked CommonAssemblyInfo.cs file Create a simple MSBuild script to replace the common assembly version number and build your solution Direct your build server to use the created MSBuild script That’s really all there is to it. You can find all of the code from this post at https://github.com/srkirkland/BuildVersion. Enjoy!

Read the article

C#/.NET Fundamentals: Choosing the Right Collection Class

- by James Michael Hare

The .NET Base Class Library (BCL) has a wide array of collection classes at your disposal which make it easy to manage collections of objects. While it's great to have so many classes available, it can be daunting to choose the right collection to use for any given situation. As hard as it may be, choosing the right collection can be absolutely key to the performance and maintainability of your application! This post will look at breaking down any confusion between each collection and the situations in which they excel. We will be spending most of our time looking at the System.Collections.Generic namespace, which is the recommended set of collections. The Generic Collections: System.Collections.Generic namespace The generic collections were introduced in .NET 2.0 in the System.Collections.Generic namespace. This is the main body of collections you should tend to focus on first, as they will tend to suit 99% of your needs right up front. It is important to note that the generic collections are unsynchronized. This decision was made for performance reasons because depending on how you are using the collections its completely possible that synchronization may not be required or may be needed on a higher level than simple method-level synchronization. Furthermore, concurrent read access (all writes done at beginning and never again) is always safe, but for concurrent mixed access you should either synchronize the collection or use one of the concurrent collections. So let's look at each of the collections in turn and its various pros and cons, at the end we'll summarize with a table to help make it easier to compare and contrast the different collections. The Associative Collection Classes Associative collections store a value in the collection by providing a key that is used to add/remove/lookup the item. Hence, the container associates the value with the key. These collections are most useful when you need to lookup/manipulate a collection using a key value. For example, if you wanted to look up an order in a collection of orders by an order id, you might have an associative collection where they key is the order id and the value is the order. The Dictionary<TKey,TVale> is probably the most used associative container class. The Dictionary<TKey,TValue> is the fastest class for associative lookups/inserts/deletes because it uses a hash table under the covers. Because the keys are hashed, the key type should correctly implement GetHashCode() and Equals() appropriately or you should provide an external IEqualityComparer to the dictionary on construction. The insert/delete/lookup time of items in the dictionary is amortized constant time - O(1) - which means no matter how big the dictionary gets, the time it takes to find something remains relatively constant. This is highly desirable for high-speed lookups. The only downside is that the dictionary, by nature of using a hash table, is unordered, so you cannot easily traverse the items in a Dictionary in order. The SortedDictionary<TKey,TValue> is similar to the Dictionary<TKey,TValue> in usage but very different in implementation. The SortedDictionary<TKey,TValye> uses a binary tree under the covers to maintain the items in order by the key. As a consequence of sorting, the type used for the key must correctly implement IComparable<TKey> so that the keys can be correctly sorted. The sorted dictionary trades a little bit of lookup time for the ability to maintain the items in order, thus insert/delete/lookup times in a sorted dictionary are logarithmic - O(log n). Generally speaking, with logarithmic time, you can double the size of the collection and it only has to perform one extra comparison to find the item. Use the SortedDictionary<TKey,TValue> when you want fast lookups but also want to be able to maintain the collection in order by the key. The SortedList<TKey,TValue> is the other ordered associative container class in the generic containers. Once again SortedList<TKey,TValue>, like SortedDictionary<TKey,TValue>, uses a key to sort key-value pairs. Unlike SortedDictionary, however, items in a SortedList are stored as an ordered array of items. This means that insertions and deletions are linear - O(n) - because deleting or adding an item may involve shifting all items up or down in the list. Lookup time, however is O(log n) because the SortedList can use a binary search to find any item in the list by its key. So why would you ever want to do this? Well, the answer is that if you are going to load the SortedList up-front, the insertions will be slower, but because array indexing is faster than following object links, lookups are marginally faster than a SortedDictionary. Once again I'd use this in situations where you want fast lookups and want to maintain the collection in order by the key, and where insertions and deletions are rare. The Non-Associative Containers The other container classes are non-associative. They don't use keys to manipulate the collection but rely on the object itself being stored or some other means (such as index) to manipulate the collection. The List<T> is a basic contiguous storage container. Some people may call this a vector or dynamic array. Essentially it is an array of items that grow once its current capacity is exceeded. Because the items are stored contiguously as an array, you can access items in the List<T> by index very quickly. However inserting and removing in the beginning or middle of the List<T> are very costly because you must shift all the items up or down as you delete or insert respectively. However, adding and removing at the end of a List<T> is an amortized constant operation - O(1). Typically List<T> is the standard go-to collection when you don't have any other constraints, and typically we favor a List<T> even over arrays unless we are sure the size will remain absolutely fixed. The LinkedList<T> is a basic implementation of a doubly-linked list. This means that you can add or remove items in the middle of a linked list very quickly (because there's no items to move up or down in contiguous memory), but you also lose the ability to index items by position quickly. Most of the time we tend to favor List<T> over LinkedList<T> unless you are doing a lot of adding and removing from the collection, in which case a LinkedList<T> may make more sense. The HashSet<T> is an unordered collection of unique items. This means that the collection cannot have duplicates and no order is maintained. Logically, this is very similar to having a Dictionary<TKey,TValue> where the TKey and TValue both refer to the same object. This collection is very useful for maintaining a collection of items you wish to check membership against. For example, if you receive an order for a given vendor code, you may want to check to make sure the vendor code belongs to the set of vendor codes you handle. In these cases a HashSet<T> is useful for super-quick lookups where order is not important. Once again, like in Dictionary, the type T should have a valid implementation of GetHashCode() and Equals(), or you should provide an appropriate IEqualityComparer<T> to the HashSet<T> on construction. The SortedSet<T> is to HashSet<T> what the SortedDictionary<TKey,TValue> is to Dictionary<TKey,TValue>. That is, the SortedSet<T> is a binary tree where the key and value are the same object. This once again means that adding/removing/lookups are logarithmic - O(log n) - but you gain the ability to iterate over the items in order. For this collection to be effective, type T must implement IComparable<T> or you need to supply an external IComparer<T>. Finally, the Stack<T> and Queue<T> are two very specific collections that allow you to handle a sequential collection of objects in very specific ways. The Stack<T> is a last-in-first-out (LIFO) container where items are added and removed from the top of the stack. Typically this is useful in situations where you want to stack actions and then be able to undo those actions in reverse order as needed. The Queue<T> on the other hand is a first-in-first-out container which adds items at the end of the queue and removes items from the front. This is useful for situations where you need to process items in the order in which they came, such as a print spooler or waiting lines. So that's the basic collections. Let's summarize what we've learned in a quick reference table. Collection Ordered? Contiguous Storage? Direct Access? Lookup Efficiency Manipulate Efficiency Notes Dictionary No Yes Via Key Key: O(1) O(1) Best for high performance lookups. SortedDictionary Yes No Via Key Key: O(log n) O(log n) Compromise of Dictionary speed and ordering, uses binary search tree. SortedList Yes Yes Via Key Key: O(log n) O(n) Very similar to SortedDictionary, except tree is implemented in an array, so has faster lookup on preloaded data, but slower loads. List No Yes Via Index Index: O(1) Value: O(n) O(n) Best for smaller lists where direct access required and no ordering. LinkedList No No No Value: O(n) O(1) Best for lists where inserting/deleting in middle is common and no direct access required. HashSet No Yes Via Key Key: O(1) O(1) Unique unordered collection, like a Dictionary except key and value are same object. SortedSet Yes No Via Key Key: O(log n) O(log n) Unique ordered collection, like SortedDictionary except key and value are same object. Stack No Yes Only Top Top: O(1) O(1)* Essentially same as List<T> except only process as LIFO Queue No Yes Only Front Front: O(1) O(1) Essentially same as List<T> except only process as FIFO The Original Collections: System.Collections namespace The original collection classes are largely considered deprecated by developers and by Microsoft itself. In fact they indicate that for the most part you should always favor the generic or concurrent collections, and only use the original collections when you are dealing with legacy .NET code. Because these collections are out of vogue, let's just briefly mention the original collection and their generic equivalents: ArrayList A dynamic, contiguous collection of objects. Favor the generic collection List<T> instead. Hashtable Associative, unordered collection of key-value pairs of objects. Favor the generic collection Dictionary<TKey,TValue> instead. Queue First-in-first-out (FIFO) collection of objects. Favor the generic collection Queue<T> instead. SortedList Associative, ordered collection of key-value pairs of objects. Favor the generic collection SortedList<T> instead. Stack Last-in-first-out (LIFO) collection of objects. Favor the generic collection Stack<T> instead. In general, the older collections are non-type-safe and in some cases less performant than their generic counterparts. Once again, the only reason you should fall back on these older collections is for backward compatibility with legacy code and libraries only. The Concurrent Collections: System.Collections.Concurrent namespace The concurrent collections are new as of .NET 4.0 and are included in the System.Collections.Concurrent namespace. These collections are optimized for use in situations where multi-threaded read and write access of a collection is desired. The concurrent queue, stack, and dictionary work much as you'd expect. The bag and blocking collection are more unique. Below is the summary of each with a link to a blog post I did on each of them. ConcurrentQueue Thread-safe version of a queue (FIFO). For more information see: C#/.NET Little Wonders: The ConcurrentStack and ConcurrentQueue ConcurrentStack Thread-safe version of a stack (LIFO). For more information see: C#/.NET Little Wonders: The ConcurrentStack and ConcurrentQueue ConcurrentBag Thread-safe unordered collection of objects. Optimized for situations where a thread may be bother reader and writer. For more information see: C#/.NET Little Wonders: The ConcurrentBag and BlockingCollection ConcurrentDictionary Thread-safe version of a dictionary. Optimized for multiple readers (allows multiple readers under same lock). For more information see C#/.NET Little Wonders: The ConcurrentDictionary BlockingCollection Wrapper collection that implement producers & consumers paradigm. Readers can block until items are available to read. Writers can block until space is available to write (if bounded). For more information see C#/.NET Little Wonders: The ConcurrentBag and BlockingCollection Summary The .NET BCL has lots of collections built in to help you store and manipulate collections of data. Understanding how these collections work and knowing in which situations each container is best is one of the key skills necessary to build more performant code. Choosing the wrong collection for the job can make your code much slower or even harder to maintain if you choose one that doesn’t perform as well or otherwise doesn’t exactly fit the situation. Remember to avoid the original collections and stick with the generic collections. If you need concurrent access, you can use the generic collections if the data is read-only, or consider the concurrent collections for mixed-access if you are running on .NET 4.0 or higher. Tweet Technorati Tags: C#,.NET,Collecitons,Generic,Concurrent,Dictionary,List,Stack,Queue,SortedList,SortedDictionary,HashSet,SortedSet

Read the article

Partitioned Repository for WebCenter Content using Oracle Database 11g

- by Adao Junior

One of the biggest challenges for content management solutions is related to the storage management due the high volumes of the unstoppable growing of information. Even if you have storage appliances and a lot of terabytes, thinks like backup, compression, deduplication, storage relocation, encryption, availability could be a nightmare. One standard option that you have with the Oracle WebCenter Content is to store data to the database. And the Oracle Database allows you leverage features like compression, deduplication, encryption and seamless backup. But with a huge volume, the challenge is passed to the DBA to keep the WebCenter Content Database up and running. One solution is the use of DB partitions for your content repository, but what are the implications of this? Can I fit this with my business requirements? Well, yes. It’s up to you how you will manage that, you just need a good plan. During you “storage brainstorm plan” take in your mind what you need, such as storage petabytes of documents? You need everything on-line? There’s a way to logically separate the “good content” from the “legacy content”? The first thing that comes to my mind is to use the creation date of the document, but you need to remember that this document could receive a lot of revisions and maybe you can consider the revision creation date. Your plan can have also complex rules like per Document Type or per a custom metadata like department or an hybrid per date, per DocType and an specific virtual folder. Extrapolation the use, you can have your repository distributed in different servers, different disks, different disk types (Such as ssds, sas, sata, tape,…), separated accordingly your business requirements, separating the “hot” content from the legacy and easily matching your compliance requirements. If you think to use by revision, the simple way is to consider the dId, that is the sequential unique id for every content created using the WebCenter Content or the dLastModified that is the date field of the FileStorage table that contains the date of inclusion of the content to the DB Table using SecureFiles. Using the scenario of partitioned repository using an hierarchical separation by date, we will transform the FileStorage table in an partitioned table using “Partition by Range” of the dLastModified column (You can use the dId or a join with other tables for other metadata such as dDocType, Security, etc…). The test scenario bellow covers: Previous existent data on the JDBC Storage to be migrated to the new partitioned JDBC Storage Partition by Date Automatically generation of new partitions based on a pre-defined interval (Available only with Oracle Database 11g+) Deduplication and Compression for legacy data Oracle WebCenter Content 11g PS5 (Could present some customizations that do not affect the test scenario) For the test case you need some data stored using JDBC Storage to be the “legacy” data. If you do not have done before, just create an Storage rule pointed to the JDBC Storage: Enable the metadata StorageRule in the UI and upload some documents using this rule. For this test case you can run using the schema owner or an dba user. We will use the schema owner TESTS_OCS. I can’t forgot to tell that this is just a test and you should do a proper backup of your environment. When you use the schema owner, you need some privileges, using the dba user grant the privileges needed: REM Grant privileges required for online redefinition. GRANT EXECUTE ON DBMS_REDEFINITION TO TESTS_OCS; GRANT ALTER ANY TABLE TO TESTS_OCS; GRANT DROP ANY TABLE TO TESTS_OCS; GRANT LOCK ANY TABLE TO TESTS_OCS; GRANT CREATE ANY TABLE TO TESTS_OCS; GRANT SELECT ANY TABLE TO TESTS_OCS; REM Privileges required to perform cloning of dependent objects. GRANT CREATE ANY TRIGGER TO TESTS_OCS; GRANT CREATE ANY INDEX TO TESTS_OCS; In our test scenario we will separate the content as Legacy, Day1, Day2, Day3 and Future. This last one will partitioned automatically using 3 tablespaces in a round robin mode. In a real scenario the partition rule could be per month, per year or any rule that you choose. Table spaces for the test scenario: CREATE TABLESPACE TESTS_OCS_PART_LEGACY DATAFILE 'tests_ocs_part_legacy.dat' SIZE 500K AUTOEXTEND ON NEXT 500K MAXSIZE UNLIMITED; CREATE TABLESPACE TESTS_OCS_PART_DAY1 DATAFILE 'tests_ocs_part_day1.dat' SIZE 500K AUTOEXTEND ON NEXT 500K MAXSIZE UNLIMITED; CREATE TABLESPACE TESTS_OCS_PART_DAY2 DATAFILE 'tests_ocs_part_day2.dat' SIZE 500K AUTOEXTEND ON NEXT 500K MAXSIZE UNLIMITED; CREATE TABLESPACE TESTS_OCS_PART_DAY3 DATAFILE 'tests_ocs_part_day3.dat' SIZE 500K AUTOEXTEND ON NEXT 500K MAXSIZE UNLIMITED; CREATE TABLESPACE TESTS_OCS_PART_ROUND_ROBIN_A 'tests_ocs_part_round_robin_a.dat' DATAFILE SIZE 500K AUTOEXTEND ON NEXT 500K MAXSIZE UNLIMITED; CREATE TABLESPACE TESTS_OCS_PART_ROUND_ROBIN_B 'tests_ocs_part_round_robin_b.dat' DATAFILE SIZE 500K AUTOEXTEND ON NEXT 500K MAXSIZE UNLIMITED; CREATE TABLESPACE TESTS_OCS_PART_ROUND_ROBIN_C 'tests_ocs_part_round_robin_c.dat' DATAFILE SIZE 500K AUTOEXTEND ON NEXT 500K MAXSIZE UNLIMITED; Before start, gather optimizer statistics on the actual FileStorage table: EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'FileStorage', cascade => TRUE); Now check if is possible execute the redefinition process: EXEC DBMS_REDEFINITION.CAN_REDEF_TABLE('TESTS_OCS', 'FileStorage',DBMS_REDEFINITION.CONS_USE_PK); If no errors messages, you are good to go. Create a Partitioned Interim FileStorage table. You need to create a new table with the partition information to act as an interim table: CREATE TABLE FILESTORAGE_Part ( DID NUMBER(*,0) NOT NULL ENABLE, DRENDITIONID VARCHAR2(30 CHAR) NOT NULL ENABLE, DLASTMODIFIED TIMESTAMP (6), DFILESIZE NUMBER(*,0), DISDELETED VARCHAR2(1 CHAR), BFILEDATA BLOB ) LOB (BFILEDATA) STORE AS SECUREFILE ( ENABLE STORAGE IN ROW NOCACHE LOGGING KEEP_DUPLICATES NOCOMPRESS ) PARTITION BY RANGE (DLASTMODIFIED) INTERVAL (NUMTODSINTERVAL(1,'DAY')) STORE IN (TESTS_OCS_PART_ROUND_ROBIN_A, TESTS_OCS_PART_ROUND_ROBIN_B, TESTS_OCS_PART_ROUND_ROBIN_C) ( PARTITION FILESTORAGE_PART_LEGACY VALUES LESS THAN (TO_DATE('05-APR-2012 12.00.00 AM', 'DD-MON-YYYY HH.MI.SS AM')) TABLESPACE TESTS_OCS_PART_LEGACY LOB (BFILEDATA) STORE AS SECUREFILE ( TABLESPACE TESTS_OCS_PART_LEGACY RETENTION NONE DEDUPLICATE COMPRESS HIGH ), PARTITION FILESTORAGE_PART_DAY1 VALUES LESS THAN (TO_DATE('06-APR-2012 07.25.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) TABLESPACE TESTS_OCS_PART_DAY1 LOB (BFILEDATA) STORE AS SECUREFILE ( TABLESPACE TESTS_OCS_PART_DAY1 RETENTION AUTO KEEP_DUPLICATES COMPRESS ), PARTITION FILESTORAGE_PART_DAY2 VALUES LESS THAN (TO_DATE('06-APR-2012 07.55.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) TABLESPACE TESTS_OCS_PART_DAY2 LOB (BFILEDATA) STORE AS SECUREFILE ( TABLESPACE TESTS_OCS_PART_DAY2 RETENTION AUTO KEEP_DUPLICATES NOCOMPRESS ), PARTITION FILESTORAGE_PART_DAY3 VALUES LESS THAN (TO_DATE('06-APR-2012 07.58.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) TABLESPACE TESTS_OCS_PART_DAY3 LOB (BFILEDATA) STORE AS SECUREFILE ( TABLESPACE TESTS_OCS_PART_DAY3 RETENTION AUTO KEEP_DUPLICATES NOCOMPRESS ) ); After the creation you should see your partitions defined. Note that only the fixed range partitions have been created, none of the interval partition have been created. Start the redefinition process: BEGIN DBMS_REDEFINITION.START_REDEF_TABLE( uname => 'TESTS_OCS' ,orig_table => 'FileStorage' ,int_table => 'FileStorage_PART' ,col_mapping => NULL ,options_flag => DBMS_REDEFINITION.CONS_USE_PK ); END; This operation can take some time to complete, depending how many contents that you have and on the size of the table. Using the DBA user you can check the progress with this command: SELECT * FROM v$sesstat WHERE sid = 1; Copy dependent objects: DECLARE redefinition_errors PLS_INTEGER := 0; BEGIN DBMS_REDEFINITION.COPY_TABLE_DEPENDENTS( uname => 'TESTS_OCS' ,orig_table => 'FileStorage' ,int_table => 'FileStorage_PART' ,copy_indexes => DBMS_REDEFINITION.CONS_ORIG_PARAMS ,copy_triggers => TRUE ,copy_constraints => TRUE ,copy_privileges => TRUE ,ignore_errors => TRUE ,num_errors => redefinition_errors ,copy_statistics => FALSE ,copy_mvlog => FALSE ); IF (redefinition_errors > 0) THEN DBMS_OUTPUT.PUT_LINE('>>> FileStorage to FileStorage_PART temp copy Errors: ' || TO_CHAR(redefinition_errors)); END IF; END; With the DBA user, verify that there's no errors: SELECT object_name, base_table_name, ddl_txt FROM DBA_REDEFINITION_ERRORS; *Note that will show 2 lines related to the constrains, this is expected. Synchronize the interim table FileStorage_PART: BEGIN DBMS_REDEFINITION.SYNC_INTERIM_TABLE( uname => 'TESTS_OCS', orig_table => 'FileStorage', int_table => 'FileStorage_PART'); END; Gather statistics on the new table: EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'FileStorage_PART', cascade => TRUE); Complete the redefinition: BEGIN DBMS_REDEFINITION.FINISH_REDEF_TABLE( uname => 'TESTS_OCS', orig_table => 'FileStorage', int_table => 'FileStorage_PART'); END; During the execution the FileStorage table is locked in exclusive mode until finish the operation. After the last command the FileStorage table is partitioned. If you have contents out of the range partition, you should see the new partitions created automatically, not generating an error if you “forgot” to create all the future ranges. You will see something like: You now can drop the FileStorage_PART table: border-bottom-width: 1px; border-bottom-style: solid; text-align: left; border-left-color: silver; border-left-width: 1px; border-left-style: solid; padding-bottom: 4px; line-height: 12pt; background-color: #f4f4f4; margin-top: 20px; margin-right: 0px; margin-bottom: 10px; margin-left: 0px; padding-left: 4px; width: 97.5%; padding-right: 4px; font-family: 'Courier New', Courier, monospace; direction: ltr; max-height: 200px; font-size: 8pt; overflow-x: auto; overflow-y: auto; border-top-color: silver; border-top-width: 1px; border-top-style: solid; cursor: text; border-right-color: silver; border-right-width: 1px; border-right-style: solid; padding-top: 4px; " id="codeSnippetWrapper"> DROP TABLE FileStorage_PART PURGE; To check the FileStorage table is valid and is partitioned, use the command: SELECT num_rows,partitioned FROM user_tables WHERE table_name = 'FILESTORAGE'; You can list the contents of the FileStorage table in a specific partition, per example: SELECT * FROM FileStorage PARTITION (FILESTORAGE_PART_LEGACY) Some useful commands that you can use to check the partitions, note that you need to run using a DBA user: SELECT * FROM DBA_TAB_PARTITIONS WHERE table_name = 'FILESTORAGE'; SELECT * FROM DBA_TABLESPACES WHERE tablespace_name like 'TESTS_OCS%'; After the redefinition process complete you have a new FileStorage table storing all content that has the Storage rule pointed to the JDBC Storage and partitioned using the rule set during the creation of the temporary interim FileStorage_PART table. At this point you can test the WebCenter Content downloading the documents (Original and Renditions). Note that the content could be already in the cache area, take a look in the weblayout directory to see if a file with the same id is there, then click on the web rendition of your test file and see if have created the file and you can open, this means that is all working. The redefinition process can be repeated many times, this allow you test what the better layout, over and over again. Now some interesting maintenance actions related to the partitions: Make an tablespace read only. No issues viewing, the WebCenter Content do not alter the revisions When try to delete an content that is part of an read only tablespace, an error will occurs and the document will not be deleted The only way to prevent errors today is creating an custom component that checks the partitions and if you have an document in an “Read Only” repository, execute the deletion process of the metadata and mark the document to be deleted on the next db maintenance, like a new redefinition. Take an tablespace off-line for archiving purposes or any other reason. When you try open an document that is included in this tablespace will receive an error that was unable to retrieve the content, but the others online tablespaces are not affected. Same behavior when deleting documents. Again, an custom component is the solution. If you have an document “out of range”, the component can show an message that the repository for that document is offline. This can be extended to a option to the user to request to put online again. Moving some legacy content to an offline repository (table) using the Exchange option to move the content from one partition to a empty nonpartitioned table like FileStorage_LEGACY. Note that this option will remove the registers from the FileStorage and will not be able to open the stored content. You always need to keep in mind the indexes and constrains. An redefinition separating the original content (vault) from the renditions and separate by date ate the same time. This could be an option for DAM environments that want to have an special place for the renditions and put the original files in a storage with less performance. The process will be the same, you just need to change the script of the interim table to use composite partitioning. Will be something like: CREATE TABLE FILESTORAGE_RenditionPart ( DID NUMBER(*,0) NOT NULL ENABLE, DRENDITIONID VARCHAR2(30 CHAR) NOT NULL ENABLE, DLASTMODIFIED TIMESTAMP (6), DFILESIZE NUMBER(*,0), DISDELETED VARCHAR2(1 CHAR), BFILEDATA BLOB ) LOB (BFILEDATA) STORE AS SECUREFILE ( ENABLE STORAGE IN ROW NOCACHE LOGGING KEEP_DUPLICATES NOCOMPRESS ) PARTITION BY LIST (DRENDITIONID) SUBPARTITION BY RANGE (DLASTMODIFIED) ( PARTITION Vault VALUES ('primaryFile') ( SUBPARTITION FILESTORAGE_VAULT_LEGACY VALUES LESS THAN (TO_DATE('05-APR-2012 12.00.00 AM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE , SUBPARTITION FILESTORAGE_VAULT_DAY1 VALUES LESS THAN (TO_DATE('06-APR-2012 07.25.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE , SUBPARTITION FILESTORAGE_VAULT_DAY2 VALUES LESS THAN (TO_DATE('06-APR-2012 07.55.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE , SUBPARTITION FILESTORAGE_VAULT_DAY3 VALUES LESS THAN (TO_DATE('06-APR-2012 07.58.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE , SUBPARTITION FILESTORAGE_VAULT_FUTURE VALUES LESS THAN (MAXVALUE) ) ,PARTITION WebLayout VALUES ('webViewableFile') ( SUBPARTITION FILESTORAGE_WEBLAYOUT_LEGACY VALUES LESS THAN (TO_DATE('05-APR-2012 12.00.00 AM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE , SUBPARTITION FILESTORAGE_WEBLAYOUT_DAY1 VALUES LESS THAN (TO_DATE('06-APR-2012 07.25.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE , SUBPARTITION FILESTORAGE_WEBLAYOUT_DAY2 VALUES LESS THAN (TO_DATE('06-APR-2012 07.55.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE , SUBPARTITION FILESTORAGE_WEBLAYOUT_DAY3 VALUES LESS THAN (TO_DATE('06-APR-2012 07.58.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE , SUBPARTITION FILESTORAGE_WEBLAYOUT_FUTURE VALUES LESS THAN (MAXVALUE) ) ,PARTITION Special VALUES ('Special') ( SUBPARTITION FILESTORAGE_SPECIAL_LEGACY VALUES LESS THAN (TO_DATE('05-APR-2012 12.00.00 AM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE , SUBPARTITION FILESTORAGE_SPECIAL_DAY1 VALUES LESS THAN (TO_DATE('06-APR-2012 07.25.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE , SUBPARTITION FILESTORAGE_SPECIAL_DAY2 VALUES LESS THAN (TO_DATE('06-APR-2012 07.55.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE , SUBPARTITION FILESTORAGE_SPECIAL_DAY3 VALUES LESS THAN (TO_DATE('06-APR-2012 07.58.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE , SUBPARTITION FILESTORAGE_SPECIAL_FUTURE VALUES LESS THAN (MAXVALUE) ) )ENABLE ROW MOVEMENT; The next post related to partitioned repository will come with an sample component to handle the possible exceptions when you need to take off line an tablespace/partition or move to another place. Also, we can include some integration to the Retention Management and Records Management. Another subject related to partitioning is the ability to create an FileStore Provider pointed to a different database, raising the level of the distributed storage vs. performance. Let us know if this is important to you or you have an use case not listed, leave a comment. Cross-posted on the blog.ContentrA.com

Read the article

How to maintain encapsulation with composition in C++?

- by iFreilicht

I am designing a class Master that is composed from multiple other classes, A, Base, C and D. These four classes have absolutely no use outside of Master and are meant to split up its functionality into manageable and logically divided packages. They also provide extensible functionality as in the case of Base, which can be inherited from by clients. But, how do I maintain encapsulation of Master with this design? So far, I've got two approaches, which are both far from perfect: 1. Replicate all accessors: Just write accessor-methods for all accessor-methods of all classes that Master is composed of. This leads to perfect encapsulation, because no implementation detail of Master is visible, but is extremely tedious and makes the class definition monstrous, which is exactly what the composition should prevent. Also, adding functionality to one of the composees (is that even a word?) would require to re-write all those methods in Master. An additional problem is that inheritors of Base could only alter, but not add functionality. 2. Use non-assignable, non-copyable member-accessors: Having a class accessor<T> that can not be copied, moved or assigned to, but overrides the operator-> to access an underlying shared_ptr, so that calls like Master->A()->niceFunction(); are made possible. My problem with this is that it kind of breaks encapsulation as I would now be unable to change my implementation of Master to use a different class for the functionality of niceFunction(). Still, it is the closest I've gotten without using the ugly first approach. It also fixes the inheritance issue quite nicely. A small side question would be if such a class already existed in std or boost. EDIT: Wall of code I will now post the code of the header files of the classes discussed. It may be a bit hard to understand, but I'll give my best in explaining all of it. 1. GameTree.h The foundation of it all. This basically is a doubly-linked tree, holding GameObject-instances, which we'll later get to. It also has it's own custom iterator GTIterator, but I left that out for brevity. WResult is an enum with the values SUCCESS and FAILED, but it's not really important. class GameTree { public: //Static methods for the root. Only one root is allowed to exist at a time! static void ConstructRoot(seed_type seed, unsigned int depth); inline static bool rootExists(){ return static_cast<bool>(rootObject_); } inline static weak_ptr<GameTree> root(){ return rootObject_; } //delta is in ms, this is used for velocity, collision and such void tick(unsigned int delta); //Interaction with the tree inline weak_ptr<GameTree> parent() const { return parent_; } inline unsigned int numChildren() const{ return static_cast<unsigned int>(children_.size()); } weak_ptr<GameTree> getChild(unsigned int index) const; template<typename GOType> weak_ptr<GameTree> addChild(seed_type seed, unsigned int depth = 9001){ GOType object{ new GOType(seed) }; return addChildObject(unique_ptr<GameTree>(new GameTree(std::move(object), depth))); } WResult moveTo(weak_ptr<GameTree> newParent); WResult erase(); //Iterators for for( : ) loop GTIterator& begin(){ return *(beginIter_ = std::move(make_unique<GTIterator>(children_.begin()))); } GTIterator& end(){ return *(endIter_ = std::move(make_unique<GTIterator>(children_.end()))); } //unloading should be used when objects are far away WResult unloadChildren(unsigned int newDepth = 0); WResult loadChildren(unsigned int newDepth = 1); inline const RenderObject& renderObject() const{ return gameObject_->renderObject(); } //Getter for the underlying GameObject (I have not tested the template version) weak_ptr<GameObject> gameObject(){ return gameObject_; } template<typename GOType> weak_ptr<GOType> gameObject(){ return dynamic_cast<weak_ptr<GOType>>(gameObject_); } weak_ptr<PhysicsObject> physicsObject() { return gameObject_->physicsObject(); } private: GameTree(const GameTree&); //copying is only allowed internally GameTree(shared_ptr<GameObject> object, unsigned int depth = 9001); //pointer to root static shared_ptr<GameTree> rootObject_; //internal management of a child weak_ptr<GameTree> addChildObject(shared_ptr<GameTree>); WResult removeChild(unsigned int index); //private members shared_ptr<GameObject> gameObject_; shared_ptr<GTIterator> beginIter_; shared_ptr<GTIterator> endIter_; //tree stuff vector<shared_ptr<GameTree>> children_; weak_ptr<GameTree> parent_; unsigned int selfIndex_; //used for deletion, this isn't necessary void initChildren(unsigned int depth); //constructs children }; 2. GameObject.h This is a bit hard to grasp, but GameObject basically works like this: When constructing a GameObject, you construct its basic attributes and a CResult-instance, which contains a vector<unique_ptr<Construction>>. The Construction-struct contains all information that is needed to construct a GameObject, which is a seed and a function-object that is applied at construction by a factory. This enables dynamic loading and unloading of GameObjects as done by GameTree. It also means that you have to define that factory if you inherit GameObject. This inheritance is also the reason why GameTree has a template-function gameObject<GOType>. GameObject can contain a RenderObject and a PhysicsObject, which we'll later get to. Anyway, here's the code. class GameObject; typedef unsigned long seed_type; //this declaration magic means that all GameObjectFactorys inherit from GameObjectFactory<GameObject> template<typename GOType> struct GameObjectFactory; template<> struct GameObjectFactory<GameObject>{ virtual unique_ptr<GameObject> construct(seed_type seed) const = 0; }; template<typename GOType> struct GameObjectFactory : GameObjectFactory<GameObject>{ GameObjectFactory() : GameObjectFactory<GameObject>(){} unique_ptr<GameObject> construct(seed_type seed) const{ return unique_ptr<GOType>(new GOType(seed)); } }; //same as with the factories. this is important for storing them in vectors template<typename GOType> struct Construction; template<> struct Construction<GameObject>{ virtual unique_ptr<GameObject> construct() const = 0; }; template<typename GOType> struct Construction : Construction<GameObject>{ Construction(seed_type seed, function<void(GOType*)> func = [](GOType* null){}) : Construction<GameObject>(), seed_(seed), func_(func) {} unique_ptr<GameObject> construct() const{ unique_ptr<GameObject> gameObject{ GOType::factory.construct(seed_) }; func_(dynamic_cast<GOType*>(gameObject.get())); return std::move(gameObject); } seed_type seed_; function<void(GOType*)> func_; }; typedef struct CResult { CResult() : constructions{} {} CResult(CResult && o) : constructions(std::move(o.constructions)) {} CResult& operator= (CResult& other){ if (this != &other){ for (unique_ptr<Construction<GameObject>>& child : other.constructions){ constructions.push_back(std::move(child)); } } return *this; } template<typename GOType> void push_back(seed_type seed, function<void(GOType*)> func = [](GOType* null){}){ constructions.push_back(make_unique<Construction<GOType>>(seed, func)); } vector<unique_ptr<Construction<GameObject>>> constructions; } CResult; //finally, the GameObject class GameObject { public: GameObject(seed_type seed); GameObject(const GameObject&); virtual void tick(unsigned int delta); inline Matrix4f trafoMatrix(){ return physicsObject_->transformationMatrix(); } //getter inline seed_type seed() const{ return seed_; } inline CResult& properties(){ return properties_; } inline const RenderObject& renderObject() const{ return *renderObject_; } inline weak_ptr<PhysicsObject> physicsObject() { return physicsObject_; } protected: virtual CResult construct_(seed_type seed) = 0; CResult properties_; shared_ptr<RenderObject> renderObject_; shared_ptr<PhysicsObject> physicsObject_; seed_type seed_; }; 3. PhysicsObject That's a bit easier. It is responsible for position, velocity and acceleration. It will also handle collisions in the future. It contains three Transformation objects, two of which are optional. I'm not going to include the accessors on the PhysicsObject class because I tried my first approach on it and it's just pure madness (way over 30 functions). Also missing: the named constructors that construct PhysicsObjects with different behaviour. class Transformation{ Vector3f translation_; Vector3f rotation_; Vector3f scaling_; public: Transformation() : translation_{ 0, 0, 0 }, rotation_{ 0, 0, 0 }, scaling_{ 1, 1, 1 } {}; Transformation(Vector3f translation, Vector3f rotation, Vector3f scaling); inline Vector3f translation(){ return translation_; } inline void translation(float x, float y, float z){ translation(Vector3f(x, y, z)); } inline void translation(Vector3f newTranslation){ translation_ = newTranslation; } inline void translate(float x, float y, float z){ translate(Vector3f(x, y, z)); } inline void translate(Vector3f summand){ translation_ += summand; } inline Vector3f rotation(){ return rotation_; } inline void rotation(float pitch, float yaw, float roll){ rotation(Vector3f(pitch, yaw, roll)); } inline void rotation(Vector3f newRotation){ rotation_ = newRotation; } inline void rotate(float pitch, float yaw, float roll){ rotate(Vector3f(pitch, yaw, roll)); } inline void rotate(Vector3f summand){ rotation_ += summand; } inline Vector3f scaling(){ return scaling_; } inline void scaling(float x, float y, float z){ scaling(Vector3f(x, y, z)); } inline void scaling(Vector3f newScaling){ scaling_ = newScaling; } inline void scale(float x, float y, float z){ scale(Vector3f(x, y, z)); } void scale(Vector3f factor){ scaling_(0) *= factor(0); scaling_(1) *= factor(1); scaling_(2) *= factor(2); } Matrix4f matrix(){ return WMatrix::Translation(translation_) * WMatrix::Rotation(rotation_) * WMatrix::Scale(scaling_); } }; class PhysicsObject; typedef void tickFunction(PhysicsObject& self, unsigned int delta); class PhysicsObject{ PhysicsObject(const Transformation& trafo) : transformation_(trafo), transformationVelocity_(nullptr), transformationAcceleration_(nullptr), tick_(nullptr) {} PhysicsObject(PhysicsObject&& other) : transformation_(other.transformation_), transformationVelocity_(std::move(other.transformationVelocity_)), transformationAcceleration_(std::move(other.transformationAcceleration_)), tick_(other.tick_) {} Transformation transformation_; unique_ptr<Transformation> transformationVelocity_; unique_ptr<Transformation> transformationAcceleration_; tickFunction* tick_; public: void tick(unsigned int delta){ tick_ ? tick_(*this, delta) : 0; } inline Matrix4f transformationMatrix(){ return transformation_.matrix(); } } 4. RenderObject RenderObject is a base class for different types of things that could be rendered, i.e. Meshes, Light Sources or Sprites. DISCLAIMER: I did not write this code, I'm working on this project with someone else. class RenderObject { public: RenderObject(float renderDistance); virtual ~RenderObject(); float renderDistance() const { return renderDistance_; } void setRenderDistance(float rD) { renderDistance_ = rD; } protected: float renderDistance_; }; struct NullRenderObject : public RenderObject{ NullRenderObject() : RenderObject(0.f){}; }; class Light : public RenderObject{ public: Light() : RenderObject(30.f){}; }; class Mesh : public RenderObject{ public: Mesh(unsigned int seed) : RenderObject(20.f) { meshID_ = 0; textureID_ = 0; if (seed == 1) meshID_ = Model::getMeshID("EM-208_heavy"); else meshID_ = Model::getMeshID("cube"); }; unsigned int getMeshID() const { return meshID_; } unsigned int getTextureID() const { return textureID_; } private: unsigned int meshID_; unsigned int textureID_; }; I guess this shows my issue quite nicely: You see a few accessors in GameObject which return weak_ptrs to access members of members, but that is not really what I want. Also please keep in mind that this is NOT, by any means, finished or production code! It is merely a prototype and there may be inconsistencies, unnecessary public parts of classes and such.

Read the article

How do I manage library symbols with linked classes in Flash CS4 to compile/debug in Flash Builder 4

- by wpjmurray

I'm building a video player using Flash CS4 (hereby referred to as "Flash") to create the graphic symbols and compiling and debugging with Flash Builder 4 ("FB4"). Here are the steps I take in my current workflow: --Create the graphic symbols in Flash. I've created a few different symbols for the player, but I'll focus on just the play/pause button ("ppbutton") here. --In the Library panel, I go to the ppbutton symbol's Linkage properties and link to a class named assets.PlayPauseButtonAsset that extends MovieClip. I do not actually have an assets package nor do I have a class file for PlayPauseButtonAsset as Flash will create them for me when I publish. --In Flash's Publish settings, I set the project to export a SWC that will be used in FB4, called VideoPlayerAssets.swc. --After the SWC is created, I create my FB4 project called "VideoPlayer" and add the SWC to my path. FB4 creates the class VideoPlayer in the default package automatically. --In VideoPlayer.as, I import assets.*, which imports all of the symbol classes I created in Flash and are available via VideoPlayerAssets.swc. I can now instantiate the ppbutton and add to the stage, like this: var ppbutton:PlayPauseButtonAsset = new PlayPauseButtonAsset(); addChild(ppbutton); At this point ppbutton doesn't have any functionality because I didn't create any code for it. So I create a new class called video.controls.PlayPauseButtonLogic which extends assets.PlayPauseButtonAsset. I add some logic, and now I can use that new class to put a working ppbutton on the stage: var ppbutton:PlayPauseButtonLogic = new PlayPauseButtonLogic(); addChild(ppbutton); This works fine, but you may be asking why I didn't just link the ppbutton symbol in Flash to the video.controls.PlayPauseButtonLogic class in the first place. The reason is that I have a designer creating the UI in Flash and I don't want to have to re-publish the SWC from Flash every time I make a change in the logic. Basically, I want my designer to be able to make a symbol in Flash, link that symbol to a logically named class in Linkage properties, and export the SWC. I do not want to have to touch that .fla file again unless the designer makes changes to the symbols or layout. I'm using a versioning system for the project as well and it's cleaner to make sure only the designer is touching the .fla file. So, finally, here's the issue I'm running into: --As the design gets more complex, the designer is nesting symbols to position the video controls on the control bar. He creates a controlbar symbol and links it to assets.ControlBarAsset. The controlbar symbol contains the ppbutton symbol. --The designer publishes the SWC and ControlBarAsset is now available in FB4. I create new class called video.controls.ControlBarLogic that extends assets.ControlBarAsset so I can add some logic to the controlbar, and I add the controlbar to the stage: var controlbar:ControlBarLogic = new ControlBarLogic(); addChild(controlbar); --This works, but the ppbutton doesn't do anything. That's because ppbutton, while inside controlbar, is still only linked to PlayPauseButtonAsset, which doesn't have any logic. I'm no longer instantiating a ppbutton object because it's part of controlbar. That's where I'm stuck today. I can't seem to simply re-cast controlbar's ppbutton as PlayPauseButtonLogic as I get a Type error. And I don't want to have to make a class that has to instantiate each of the video player controls, the place them at their x and y values on the stage according to how the designer placed them, as that would require me to open the .fla and check the various properties of a symbol, then add those values to the code. If the designer made a change, I'd have to go into the code each time just to update those properties each time. Not good. How do I re-cast nested symbols to use the logic classes that I create that extend the asset classes? Remember, the solution is not to link Flash symbols to actual classes so I don't have to keep recompiling the SWC, unless there's a way to do that without having to re-compile the SWC. I want the designer to do his thing, publish the SWC, and be done with it. Then I can take his SWC, apply my logic to his assets, and be able to debug and compile the final SWF.

Read the article

Handling file upload in a non-blocking manner

- by Kaliyug Antagonist

The background thread is here Just to make objective clear - the user will upload a large file and must be redirected immediately to another page for proceeding different operations. But the file being large, will take time to be read from the controller's InputStream. So I unwillingly decided to fork a new Thread to handle this I/O. The code is as follows : The controller servlet /** * @see HttpServlet#doPost(HttpServletRequest request, HttpServletResponse * response) */ protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { // TODO Auto-generated method stub System.out.println("In Controller.doPost(...)"); TempModel tempModel = new TempModel(); tempModel.uploadSegYFile(request, response); System.out.println("Forwarding to Accepted.jsp"); /*try { Thread.sleep(1000 * 60); } catch (InterruptedException e) { // TODO Auto-generated catch block e.printStackTrace(); }*/ request.getRequestDispatcher("/jsp/Accepted.jsp").forward(request, response); } The model class package com.model; import java.io.IOException; import java.util.concurrent.ExecutionException; import java.util.concurrent.Future; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; import com.utils.ProcessUtils; public class TempModel { public void uploadSegYFile(HttpServletRequest request, HttpServletResponse response) { // TODO Auto-generated method stub System.out.println("In TempModel.uploadSegYFile(...)"); /* * Trigger the upload/processing code in a thread, return immediately * and notify when the thread completes */ try { FileUploaderRunnable fileUploadRunnable = new FileUploaderRunnable( request.getInputStream()); /* * Future<FileUploaderRunnable> future = ProcessUtils.submitTask( * fileUploadRunnable, fileUploadRunnable); * * FileUploaderRunnable processed = future.get(); * * System.out.println("Is file uploaded : " + * processed.isFileUploaded()); */ Thread uploadThread = new Thread(fileUploadRunnable); uploadThread.start(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } /* * catch (InterruptedException e) { // TODO Auto-generated catch block * e.printStackTrace(); } catch (ExecutionException e) { // TODO * Auto-generated catch block e.printStackTrace(); } */ System.out.println("Returning from TempModel.uploadSegYFile(...)"); } } The Runnable package com.model; import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.nio.ByteBuffer; import java.nio.channels.Channels; import java.nio.channels.ReadableByteChannel; public class FileUploaderRunnable implements Runnable { private boolean isFileUploaded = false; private InputStream inputStream = null; public FileUploaderRunnable(InputStream inputStream) { // TODO Auto-generated constructor stub this.inputStream = inputStream; } public void run() { // TODO Auto-generated method stub /* Read from InputStream. If success, set isFileUploaded = true */ System.out.println("Starting upload in a thread"); File outputFile = new File("D:/06c01_output.seg");/* * This will be changed * later */ FileOutputStream fos; ReadableByteChannel readable = Channels.newChannel(inputStream); ByteBuffer buffer = ByteBuffer.allocate(1000000); try { fos = new FileOutputStream(outputFile); while (readable.read(buffer) != -1) { fos.write(buffer.array()); buffer.clear(); } fos.flush(); fos.close(); readable.close(); } catch (FileNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } System.out.println("File upload thread completed"); } public boolean isFileUploaded() { return isFileUploaded; } } My queries/doubts : Spawning threads manually from the Servlet makes sense to me logically but scares me coding wise - the container isn't aware of these threads after all(I think so!) The current code is giving an Exception which is quite obvious - the stream is inaccessible as the doPost(...) method returns before the run() method completes : In Controller.doPost(...) In TempModel.uploadSegYFile(...) Returning from TempModel.uploadSegYFile(...) Forwarding to Accepted.jsp Starting upload in a thread Exception in thread "Thread-4" java.lang.NullPointerException at org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:512) at org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:497) at org.apache.coyote.http11.InternalInputBuffer$InputStreamInputBuffer.doRead(InternalInputBuffer.java:559) at org.apache.coyote.http11.AbstractInputBuffer.doRead(AbstractInputBuffer.java:324) at org.apache.coyote.Request.doRead(Request.java:422) at org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:287) at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:407) at org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:310) at org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:202) at java.nio.channels.Channels$ReadableByteChannelImpl.read(Unknown Source) at com.model.FileUploaderRunnable.run(FileUploaderRunnable.java:39) at java.lang.Thread.run(Unknown Source) Keeping in mind the point 1., does the use of Executor framework help me in anyway ? package com.utils; import java.util.concurrent.Future; import java.util.concurrent.ScheduledThreadPoolExecutor; public final class ProcessUtils { /* Ensure that no more than 2 uploads,processing req. are allowed */ private static final ScheduledThreadPoolExecutor threadPoolExec = new ScheduledThreadPoolExecutor( 2); public static <T> Future<T> submitTask(Runnable task, T result) { return threadPoolExec.submit(task, result); } } So how should I ensure that the user doesn't block and the stream remains accessible so that the (uploaded)file can be read from it?

Read the article

How to model a relationship that NHibernate (or Hibernate) doesn’t easily support

- by MylesRip

I have a situation in which the ideal relationship, I believe, would involve Value Object Inheritance. This is unfortunately not supported in NHibernate so any solution I come up with will be less than perfect. Let’s say that: “Item” entities have a “Location” that can be in one of multiple different formats. These formats are completely different with no overlapping fields. We will deal with each Location in the format that is provided in the data with no attempt to convert from one format to another. Each Item has exactly one Location. “SpecialItem” is a subtype of Item, however, that is unique in that it has exactly two Locations. “Group” entities aggregate Items. “LocationGroup” is as subtype of Group. LocationGroup also has a single Location that can be in any of the formats as described above. Although I’m interested in Items by Group, I’m also interested in being able to find all items with the same Location, regardless of which group they are in. I apologize for the number of stipulations listed above, but I’m afraid that simplifying it any further wouldn’t really reflect the difficulties of the situation. Here is how the above could be diagrammed: Mapping Dilemma Diagram: (http://www.freeimagehosting.net/uploads/592ad48b1a.jpg) (I tried placing the diagram inline, but Stack Overflow won't allow that until I have accumulated more points. I understand the reasoning behind it, but it is a bit inconvenient for now.) Hmmm... Apparently I can't have multiple links either. :-( Analyzing the above, I make the following observations: I treat Locations polymorphically, referring to the supertype rather than the subtype. Logically, Locations should be “Value Objects” rather than entities since it is meaningless to differentiate between two Location objects that have all the same values. Thus equality between Locations should be based on field comparisons, not identifiers. Also, value objects should be immutable and shared references should not be allowed. Using NHibernate (or Hibernate) one would typically map value objects using the “component” keyword which would cause the fields of the class to be mapped directly into the database table that represents the containing class. Put another way, there would not be a separate “Locations” table in the database (and Locations would therefore have no identifiers). NHibernate (or Hibernate) do not currently support inheritance for value objects. My choices as I see them are: Ignore the fact that Locations should be value objects and map them as entities. This would take care of the inheritance mapping issues since NHibernate supports entity inheritance. The downside is that I then have to deal with aliasing issues. (Meaning that if multiple objects share a reference to the same Location, then changing values for one object’s Location would cause the location to change for other objects that share the reference the same Location record.) I want to avoid this if possible. Another downside is that entities are typically compared by their IDs. This would mean that two Location objects would be considered not equal even if the values of all their fields are the same. This would be invalid and unacceptable from the business perspective. Flatten Locations into a single class so that there are no longer inheritance relationships for Locations. This would allow Locations to be treated as value objects which could easily be handled by using “component” mapping in NHibernate. The downside in this case would be that the domain model becomes weaker, more fragile and less maintainable. Do some “creative” mapping in the hbm files in order to force Location fields to be mapped into the containing entities’ tables without using the “component” keyword. This approach is described by Colin Jack here. My situation is more complicated than the one he describes due to the fact that SpecialItem has a second Location and the fact that a different entity, LocatedGroup, also has Locations. I could probably get it to work, but the mappings would be non-intuitive and therefore hard to understand and maintain by other developers in the future. Also, I suspect that these tricky mappings would likely not be possible using Fluent NHibernate so I would use the advantages of using that tool, at least in that situation. Surely others out there have run into similar situations. I’m hoping someone who has “been there, done that” can share some wisdom. :-) So here’s the question… Which approach should be preferred in this situation? Why?

Read the article

XNA Xbox 360 Content Manager Thread freezing Draw Thread

- by Alikar

I currently have a game that takes in large images, easily bigger than 1MB, to serve as backgrounds. I know exactly when this transition is supposed to take place, so I made a loader class to handle loading these large images in the background, but when I load the images it still freezes the main thread where the drawing takes place. Since this code runs on the 360 I move the thread to the 4th hardware thread, but that doesn't seem to help. Below is the class I am using. Any thoughts as to why my new content manager which should be in its own thread is interrupting the draw in my main thread would be appreciated. namespace FileSystem { /// <summary> /// This is used to reference how many objects reference this texture. /// Everytime someone references a texture we increase the iNumberOfReferences. /// When a class calls remove on a specific texture we check to see if anything /// else is referencing the class, if it is we don't remove it. If there isn't /// anything referencing the texture its safe to dispose of. /// </summary> class TextureContainer { public uint uiNumberOfReferences = 0; public Texture2D texture; } /// <summary> /// This class loads all the files from the Content. /// </summary> static class FileManager { static Microsoft.Xna.Framework.Content.ContentManager Content; static EventWaitHandle wh = new AutoResetEvent(false); static Dictionary<string, TextureContainer> Texture2DResourceDictionary; static List<Texture2D> TexturesToDispose; static List<String> TexturesToLoad; static int iProcessor = 4; private static object threadMutex = new object(); private static object Texture2DMutex = new object(); private static object loadingMutex = new object(); private static bool bLoadingTextures = false; /// <summary> /// Returns if we are loading textures or not. /// </summary> public static bool LoadingTexture { get { lock (loadingMutex) { return bLoadingTextures; } } } /// <summary> /// Since this is an static class. This is the constructor for the file loadeder. This is the version /// for the Xbox 360. /// </summary> /// <param name="_Content"></param> public static void Initalize(IServiceProvider serviceProvider, string rootDirectory, int _iProcessor ) { Content = new Microsoft.Xna.Framework.Content.ContentManager(serviceProvider, rootDirectory); Texture2DResourceDictionary = new Dictionary<string, TextureContainer>(); TexturesToDispose = new List<Texture2D>(); iProcessor = _iProcessor; CreateThread(); } /// <summary> /// Since this is an static class. This is the constructor for the file loadeder. /// </summary> /// <param name="_Content"></param> public static void Initalize(IServiceProvider serviceProvider, string rootDirectory) { Content = new Microsoft.Xna.Framework.Content.ContentManager(serviceProvider, rootDirectory); Texture2DResourceDictionary = new Dictionary<string, TextureContainer>(); TexturesToDispose = new List<Texture2D>(); CreateThread(); } /// <summary> /// Creates the thread incase we wanted to set up some parameters /// Outside of the constructor. /// </summary> static public void CreateThread() { Thread t = new Thread(new ThreadStart(StartThread)); t.Start(); } // This is the function that we thread. static public void StartThread() { //BBSThreadClass BBSTC = (BBSThreadClass)_oData; FileManager.Execute(); } /// <summary> /// This thread shouldn't be called by the outside world. /// It allows the File Manager to loop. /// </summary> static private void Execute() { // Make sure our thread is on the correct processor on the XBox 360. #if WINDOWS #else Thread.CurrentThread.SetProcessorAffinity(new int[] { iProcessor }); Thread.CurrentThread.IsBackground = true; #endif // This loop will load textures into ram for us away from the main thread. while (true) { wh.WaitOne(); // Locking down our data while we process it. lock (threadMutex) { lock (loadingMutex) { bLoadingTextures = true; } bool bContainsKey = false; for (int con = 0; con < TexturesToLoad.Count; con++) { // If we have already loaded the texture into memory reference // the one in the dictionary. lock (Texture2DMutex) { bContainsKey = Texture2DResourceDictionary.ContainsKey(TexturesToLoad[con]); } if (bContainsKey) { // Do nothing } // Otherwise load it into the dictionary and then reference the // copy in the dictionary else { TextureContainer TC = new TextureContainer(); TC.uiNumberOfReferences = 1; // We start out with 1 referece. // Loading the texture into memory. try { TC.texture = Content.Load<Texture2D>(TexturesToLoad[con]); // This is passed into the dictionary, thus there is only one copy of // the texture in memory. // There is an issue with Sprite Batch and disposing textures. // This will have to wait until its figured out. lock (Texture2DMutex) { bContainsKey = Texture2DResourceDictionary.ContainsKey(TexturesToLoad[con]); Texture2DResourceDictionary.Add(TexturesToLoad[con], TC); } // We don't have the find the reference to the container since we // already have it. } // Occasionally our texture will already by loaded by another thread while // this thread is operating. This mainly happens on the first level. catch (Exception e) { // If this happens we don't worry about it since this thread only loads // texture data and if its already there we don't need to load it. } } Thread.Sleep(100); } } lock (loadingMutex) { bLoadingTextures = false; } } } static public void LoadTextureList(List<string> _textureList) { // Ensuring that we can't creating threading problems. lock (threadMutex) { TexturesToLoad = _textureList; } wh.Set(); } /// <summary> /// This loads a 2D texture which represents a 2D grid of Texels. /// </summary> /// <param name="_textureName">The name of the picture you wish to load.</param> /// <returns>Holds the image data.</returns> public static Texture2D LoadTexture2D( string _textureName ) { TextureContainer temp; lock (Texture2DMutex) { bool bContainsKey = false; // If we have already loaded the texture into memory reference // the one in the dictionary. lock (Texture2DMutex) { bContainsKey = Texture2DResourceDictionary.ContainsKey(_textureName); if (bContainsKey) { temp = Texture2DResourceDictionary[_textureName]; temp.uiNumberOfReferences++; // Incrementing the number of references } // Otherwise load it into the dictionary and then reference the // copy in the dictionary else { TextureContainer TC = new TextureContainer(); TC.uiNumberOfReferences = 1; // We start out with 1 referece. // Loading the texture into memory. try { TC.texture = Content.Load<Texture2D>(_textureName); // This is passed into the dictionary, thus there is only one copy of // the texture in memory. } // Occasionally our texture will already by loaded by another thread while // this thread is operating. This mainly happens on the first level. catch(Exception e) { temp = Texture2DResourceDictionary[_textureName]; temp.uiNumberOfReferences++; // Incrementing the number of references } // There is an issue with Sprite Batch and disposing textures. // This will have to wait until its figured out. Texture2DResourceDictionary.Add(_textureName, TC); // We don't have the find the reference to the container since we // already have it. temp = TC; } } } // Return a reference to the texture return temp.texture; } /// <summary> /// Go through our dictionary and remove any references to the /// texture passed in. /// </summary> /// <param name="texture">Texture to remove from texture dictionary.</param> public static void RemoveTexture2D(Texture2D texture) { foreach (KeyValuePair<string, TextureContainer> pair in Texture2DResourceDictionary) { // Do our references match? if (pair.Value.texture == texture) { // Only one object or less holds a reference to the // texture. Logically it should be safe to remove. if (pair.Value.uiNumberOfReferences <= 1) { // Grabing referenc to texture TexturesToDispose.Add(pair.Value.texture); // We are about to release the memory of the texture, // thus we make sure no one else can call this member // in the dictionary. Texture2DResourceDictionary.Remove(pair.Key); // Once we have removed the texture we don't want to create an exception. // So we will stop looking in the list since it has changed. break; } // More than one Object has a reference to this texture. // So we will not be removing it from memory and instead // simply marking down the number of references by 1. else { pair.Value.uiNumberOfReferences--; } } } } /*public static void DisposeTextures() { int Count = TexturesToDispose.Count; // If there are any textures to dispose of. if (Count > 0) { for (int con = 0; con < TexturesToDispose.Count; con++) { // =!THIS REMOVES THE TEXTURE FROM MEMORY!= // This is not like a normal dispose. This will actually // remove the object from memory. Texture2D is inherited // from GraphicsResource which removes it self from // memory on dispose. Very nice for game efficency, // but "dangerous" in managed land. Texture2D Temp = TexturesToDispose[con]; Temp.Dispose(); } // Remove textures we've already disposed of. TexturesToDispose.Clear(); } }*/ /// <summary> /// This loads a 2D texture which represnets a font. /// </summary> /// <param name="_textureName">The name of the font you wish to load.</param> /// <returns>Holds the font data.</returns> public static SpriteFont LoadFont( string _fontName ) { SpriteFont temp = Content.Load<SpriteFont>( _fontName ); return temp; } /// <summary> /// This loads an XML document. /// </summary> /// <param name="_textureName">The name of the XML document you wish to load.</param> /// <returns>Holds the XML data.</returns> public static XmlDocument LoadXML( string _fileName ) { XmlDocument temp = Content.Load<XmlDocument>( _fileName ); return temp; } /// <summary> /// This loads a sound file. /// </summary> /// <param name="_fileName"></param> /// <returns></returns> public static SoundEffect LoadSound( string _fileName ) { SoundEffect temp = Content.Load<SoundEffect>(_fileName); return temp; } } }

Search Results

Search found 313 results on 13 pages for 'logically'.

Page 13/13 | < Previous Page | 9 10 11 12 13

- by Roman Schindlauer

- by Joe Mayo

- by Roman Schindlauer

- by Roman Schindlauer

- by terje

- by srkirkland

- by James Michael Hare

- by Adao Junior

- by iFreilicht

- by wpjmurray

- by Kaliyug Antagonist

- by MylesRip

- by Alikar

< Previous Page | 9 10 11 12 13