Search Results

Search found 22653 results on 907 pages for 'case insensitive'.

Page 339/907 | < Previous Page | 335 336 337 338 339 340 341 342 343 344 345 346  | Next Page >

  • Replication Services as ETL extraction tool

    - by jorg
    In my last blog post I explained the principles of Replication Services and the possibilities it offers in a BI environment. One of the possibilities I described was the use of snapshot replication as an ETL extraction tool: “Snapshot Replication can also be useful in BI environments, if you don’t need a near real-time copy of the database, you can choose to use this form of replication. Next to an alternative for Transactional Replication it can be used to stage data so it can be transformed and moved into the data warehousing environment afterwards. In many solutions I have seen developers create multiple SSIS packages that simply copies data from one or more source systems to a staging database that figures as source for the ETL process. The creation of these packages takes a lot of (boring) time, while Replication Services can do the same in minutes. It is possible to filter out columns and/or records and it can even apply schema changes automatically so I think it offers enough features here. I don’t know how the performance will be and if it really works as good for this purpose as I expect, but I want to try this out soon!” Well I have tried it out and I must say it worked well. I was able to let replication services do work in a fraction of the time it would cost me to do the same in SSIS. What I did was the following: Configure snapshot replication for some Adventure Works tables, this was quite simple and straightforward. Create an SSIS package that executes the snapshot replication on demand and waits for its completion. This is something that you can’t do with out of the box functionality. While configuring the snapshot replication two SQL Agent Jobs are created, one for the creation of the snapshot and one for the distribution of the snapshot. Unfortunately these jobs are  asynchronous which means that if you execute them they immediately report back if the job started successfully or not, they do not wait for completion and report its result afterwards. So I had to create an SSIS package that executes the jobs and waits for their completion before the rest of the ETL process continues. Fortunately I was able to create the SSIS package with the desired functionality. I have made a step-by-step guide that will help you configure the snapshot replication and I have uploaded the SSIS package you need to execute it. Configure snapshot replication   The first step is to create a publication on the database you want to replicate. Connect to SQL Server Management Studio and right-click Replication, choose for New.. Publication…   The New Publication Wizard appears, click Next Choose your “source” database and click Next Choose Snapshot publication and click Next   You can now select tables and other objects that you want to publish Expand Tables and select the tables that are needed in your ETL process In the next screen you can add filters on the selected tables which can be very useful. Think about selecting only the last x days of data for example. Its possible to filter out rows and/or columns. In this example I did not apply any filters. Schedule the Snapshot Agent to run at a desired time, by doing this a SQL Agent Job is created which we need to execute from a SSIS package later on. Next you need to set the Security Settings for the Snapshot Agent. Click on the Security Settings button.   In this example I ran the Agent under the SQL Server Agent service account. This is not recommended as a security best practice. Fortunately there is an excellent article on TechNet which tells you exactly how to set up the security for replication services. Read it here and make sure you follow the guidelines!   On the next screen choose to create the publication at the end of the wizard Give the publication a name (SnapshotTest) and complete the wizard   The publication is created and the articles (tables in this case) are added Now the publication is created successfully its time to create a new subscription for this publication.   Expand the Replication folder in SSMS and right click Local Subscriptions, choose New Subscriptions   The New Subscription Wizard appears   Select the publisher on which you just created your publication and select the database and publication (SnapshotTest)   You can now choose where the Distribution Agent should run. If it runs at the distributor (push subscriptions) it causes extra processing overhead. If you use a separate server for your ETL process and databases choose to run each agent at its subscriber (pull subscriptions) to reduce the processing overhead at the distributor. Of course we need a database for the subscription and fortunately the Wizard can create it for you. Choose for New database   Give the database the desired name, set the desired options and click OK You can now add multiple SQL Server Subscribers which is not necessary in this case but can be very useful.   You now need to set the security settings for the Distribution Agent. Click on the …. button Again, in this example I ran the Agent under the SQL Server Agent service account. Read the security best practices here   Click Next   Make sure you create a synchronization job schedule again. This job is also necessary in the SSIS package later on. Initialize the subscription at first synchronization Select the first box to create the subscription when finishing this wizard Complete the wizard by clicking Finish The subscription will be created In SSMS you see a new database is created, the subscriber. There are no tables or other objects in the database available yet because the replication jobs did not ran yet. Now expand the SQL Server Agent, go to Jobs and search for the job that creates the snapshot:   Rename this job to “CreateSnapshot” Now search for the job that distributes the snapshot:   Rename this job to “DistributeSnapshot” Create an SSIS package that executes the snapshot replication We now need an SSIS package that will take care of the execution of both jobs. The CreateSnapshot job needs to execute and finish before the DistributeSnapshot job runs. After the DistributeSnapshot job has started the package needs to wait until its finished before the package execution finishes. The Execute SQL Server Agent Job Task is designed to execute SQL Agent Jobs from SSIS. Unfortunately this SSIS task only executes the job and reports back if the job started succesfully or not, it does not report if the job actually completed with success or failure. This is because these jobs are asynchronous. The SSIS package I’ve created does the following: It runs the CreateSnapshot job It checks every 5 seconds if the job is completed with a for loop When the CreateSnapshot job is completed it starts the DistributeSnapshot job And again it waits until the snapshot is delivered before the package will finish successfully Quite simple and the package is ready to use as standalone extract mechanism. After executing the package the replicated tables are added to the subscriber database and are filled with data:   Download the SSIS package here (SSIS 2008) Conclusion In this example I only replicated 5 tables, I could create a SSIS package that does the same in approximately the same amount of time. But if I replicated all the 70+ AdventureWorks tables I would save a lot of time and boring work! With replication services you also benefit from the feature that schema changes are applied automatically which means your entire extract phase wont break. Because a snapshot is created using the bcp utility (bulk copy) it’s also quite fast, so the performance will be quite good. Disadvantages of using snapshot replication as extraction tool is the limitation on source systems. You can only choose SQL Server or Oracle databases to act as a publisher. So if you plan to build an extract phase for your ETL process that will invoke a lot of tables think about replication services, it would save you a lot of time and thanks to the Extract SSIS package I’ve created you can perfectly fit it in your usual SSIS ETL process.

    Read the article

  • Using SQL Execution Plans to discover the Swedish alphabet

    - by Rob Farley
    SQL Server is quite remarkable in a bunch of ways. In this post, I’m using the way that the Query Optimizer handles LIKE to keep it SARGable, the Execution Plans that result, Collations, and PowerShell to come up with the Swedish alphabet. SARGability is the ability to seek for items in an index according to a particular set of criteria. If you don’t have SARGability in play, you need to scan the whole index (or table if you don’t have an index). For example, I can find myself in the phonebook easily, because it’s sorted by LastName and I can find Farley in there by moving to the Fs, and so on. I can’t find everyone in my suburb easily, because the phonebook isn’t sorted that way. I can’t even find people who have six letters in their last name, because also the book is sorted by LastName, it’s not sorted by LEN(LastName). This is all stuff I’ve looked at before, including in the talk I gave at SQLBits in October 2010. If I try to find everyone who’s names start with F, I can do that using a query a bit like: SELECT LastName FROM dbo.PhoneBook WHERE LEFT(LastName,1) = 'F'; Unfortunately, the Query Optimizer doesn’t realise that all the entries that satisfy LEFT(LastName,1) = 'F' will be together, and it has to scan the whole table to find them. But if I write: SELECT LastName FROM dbo.PhoneBook WHERE LastName LIKE 'F%'; then SQL is smart enough to understand this, and performs an Index Seek instead. To see why, I look further into the plan, in particular, the properties of the Index Seek operator. The ToolTip shows me what I’m after: You’ll see that it does a Seek to find any entries that are at least F, but not yet G. There’s an extra Predicate in there (a Residual Predicate if you like), which checks that each LastName is really LIKE F% – I suppose it doesn’t consider that the Seek Predicate is quite enough – but most of the benefit is seen by its working out the Seek Predicate, filtering to just the “at least F but not yet G” section of the data. This got me curious though, particularly about where the G comes from, and whether I could leverage it to create the Swedish alphabet. I know that in the Swedish language, there are three extra letters that appear at the end of the alphabet. One of them is ä that appears in the word Västerås. It turns out that Västerås is quite hard to find in an index when you’re looking it up in a Swedish map. I talked about this briefly in my five-minute talk on Collation from SQLPASS (the one which was slightly less than serious). So by looking at the plan, I can work out what the next letter is in the alphabet of the collation used by the column. In other words, if my alphabet were Swedish, I’d be able to tell what the next letter after F is – just in case it’s not G. It turns out it is… Yes, the Swedish letter after F is G. But I worked this out by using a copy of my PhoneBook table that used the Finnish_Swedish_CI_AI collation. I couldn’t find how the Query Optimizer calculates the G, and my friend Paul White (@SQL_Kiwi) tells me that it’s frustratingly internal to the QO. He’s particularly smart, even if he is from New Zealand. To investigate further, I decided to do some PowerShell, leveraging the Get-SqlPlan function that I blogged about recently (make sure you also have the SqlServerCmdletSnapin100 snap-in added). I started by indicating that I was going to use Finnish_Swedish_CI_AI as my collation of choice, and that I’d start whichever letter cam straight after the number 9. I figure that this is a cheat’s way of guessing the first letter of the alphabet (but it doesn’t actually work in Unicode – luckily I’m using varchar not nvarchar. Actually, there are a few aspects of this code that only work using ASCII, so apologies if you were wanting to apply it to Greek, Japanese, etc). I also initialised my $alphabet variable. $collation = 'Finnish_Swedish_CI_AI'; $firstletter = '9'; $alphabet = ''; Now I created the table for my test. A single field would do, and putting a Clustered Index on it would suffice for the Seeks. Invoke-Sqlcmd -server . -data tempdb -query "create table dbo.collation_test (col varchar(10) collate $collation primary key);" Now I get into the looping. $c = $firstletter; $stillgoing = $true; while ($stillgoing) { I construct the query I want, seeking for entries which start with whatever $c has reached, and get the plan for it: $query = "select col from dbo.collation_test where col like '$($c)%';"; [xml] $pl = get-sqlplan $query "." "tempdb"; At this point, my $pl variable is a scary piece of XML, representing the execution plan. A bit of hunting through it showed me that the EndRange element contained what I was after, and that if it contained NULL, then I was done. $stillgoing = ($pl.ShowPlanXML.BatchSequence.Batch.Statements.StmtSimple.QueryPlan.RelOp.IndexScan.SeekPredicates.SeekPredicateNew.SeekKeys.EndRange -ne $null); Now I could grab the value out of it (which came with apostrophes that needed stripping), and append that to my $alphabet variable.   if ($stillgoing)   {  $c=$pl.ShowPlanXML.BatchSequence.Batch.Statements.StmtSimple.QueryPlan.RelOp.IndexScan.SeekPredicates.SeekPredicateNew.SeekKeys.EndRange.RangeExpressions.ScalarOperator.ScalarString.Replace("'","");     $alphabet += $c;   } Finally, finishing the loop, dropping the table, and showing my alphabet! } Invoke-Sqlcmd -server . -data tempdb -query "drop table dbo.collation_test;"; $alphabet; When I run all this, I see that the Swedish alphabet is ABCDEFGHIJKLMNOPQRSTUVXYZÅÄÖ, which matches what I see at Wikipedia. Interesting to see that the letters on the end are still there, even with Case Insensitivity. Turns out they’re not just “letters with accents”, they’re letters in their own right. I’m sure you gave up reading long ago, and really aren’t that fazed about the idea of doing this using PowerShell. I chose PowerShell because I’d already come up with an easy way of grabbing the estimated plan for a query, and PowerShell does allow for easy navigation of XML. I find the most interesting aspect of this as the fact that the Query Optimizer uses the next letter of the alphabet to maintain the SARGability of LIKE. I’m hoping they do something similar for a whole bunch of operations. Oh, and the fact that you know how to find stuff in the IKEA catalogue. Footnote: If you are interested in whether this works in other languages, you might want to consider the following screenshot, which shows that in principle, it should work with Japanese. It might be a bit harder to run this in PowerShell though, as I’m not sure how it translates. In Hiragana, the Japanese alphabet starts ?, ?, ?, ?, ?, ...

    Read the article

  • Using jQuery Live instead of jQuery Hover function

    - by hajan
    Let’s say we have a case where we need to create mouseover / mouseout functionality for a list which will be dynamically filled with data on client-side. We can use jQuery hover function, which handles the mouseover and mouseout events with two functions. See the following example: <!DOCTYPE html> <html lang="en"> <head id="Head1" runat="server">     <title>jQuery Mouseover / Mouseout Demo</title>     <script type="text/javascript" src="http://ajax.aspnetcdn.com/ajax/jquery/jquery-1.4.4.js"></script>     <style type="text/css">         .hover { color:Red; cursor:pointer;}     </style>     <script type="text/javascript">         $(function () {             $("li").hover(               function () {                   $(this).addClass("hover");               },               function () {                   $(this).removeClass("hover");               });         });     </script> </head> <body>     <form id="form2" runat="server">     <ul>         <li>Data 1</li>         <li>Data 2</li>         <li>Data 3</li>         <li>Data 4</li>         <li>Data 5</li>         <li>Data 6</li>     </ul>     </form> </body> </html> Now, if you have situation where you want to add new data dynamically... Lets say you have a button to add new item in the list. Add the following code right bellow the </ul> tag <input type="text" id="txtItem" /> <input type="button" id="addNewItem" value="Add New Item" /> And add the following button click functionality: //button add new item functionality $("#addNewItem").click(function (event) {     event.preventDefault();     $("<li>" + $("#txtItem").val() + "</li>").appendTo("ul"); }); The mouse over effect won't work for the newly added items. Therefore, we need to use live or delegate function. These both do the same job. The main difference is that for some cases delegate is considered a bit faster, and can be used in chaining. In our case, we can use both. I will use live function. $("li").live("mouseover mouseout",   function (event) {       if (event.type == "mouseover") $(this).addClass("hover");       else $(this).removeClass("hover");   }); The complete code is: <!DOCTYPE html> <html lang="en"> <head id="Head1" runat="server">     <title>jQuery Mouseover / Mouseout Demo</title>     <script type="text/javascript" src="http://ajax.aspnetcdn.com/ajax/jquery/jquery-1.4.4.js"></script>     <style type="text/css">         .hover { color:Red; cursor:pointer;}     </style>     <script type="text/javascript">         $(function () {             $("li").live("mouseover mouseout",               function (event) {                   if (event.type == "mouseover") $(this).addClass("hover");                   else $(this).removeClass("hover");               });             //button add new item functionality             $("#addNewItem").click(function (event) {                 event.preventDefault();                 $("<li>" + $("#txtItem").val() + "</li>").appendTo("ul");             });         });     </script> </head> <body>     <form id="form2" runat="server">     <ul>         <li>Data 1</li>         <li>Data 2</li>         <li>Data 3</li>         <li>Data 4</li>         <li>Data 5</li>         <li>Data 6</li>     </ul>          <input type="text" id="txtItem" />     <input type="button" id="addNewItem" value="Add New Item" />     </form> </body> </html> So, basically when replacing hover with live, you see we use the mouseover and mouseout names for both events. Check the working demo which is available HERE. Hope this was useful blog for you. Hope it’s helpful. HajanReference blog: http://codeasp.net/blogs/hajan/microsoft-net/1260/using-jquery-live-instead-of-jquery-hover-function

    Read the article

  • Using the BAM Interceptor with Continuation

    - by Charles Young
    Originally posted on: http://geekswithblogs.net/cyoung/archive/2014/06/02/using-the-bam-interceptor-with-continuation.aspxI’ve recently been resurrecting some code written several years ago that makes extensive use of the BAM Interceptor provided as part of BizTalk Server’s BAM event observation library.  In doing this, I noticed an issue with continuations.  Essentially, whenever I tried to configure one or more continuations for an activity, the BAM Interceptor failed to complete the activity correctly.   Careful inspection of my code confirmed that I was initializing and invoking the BAM interceptor correctly, so I was mystified.  However, I eventually found the problem.  It is a logical error in the BAM Interceptor code itself. The BAM Interceptor provides a useful mechanism for implementing dynamic tracking.  It supports configurable ‘track points’.  These are grouped into named ‘locations’.  BAM uses the term ‘step’ as a synonym for ‘location’.   Each track point defines a BAM action such as starting an activity, extracting a data item, enabling a continuation, etc.  Each step defines a collection of track points. Understanding Steps The BAM Interceptor provides an abstract model for handling configuration of steps.  It doesn’t, however, define any specific configuration mechanism (e.g., config files, SSO, etc.)  It is up to the developer to decide how to store, manage and retrieve configuration data.  At run time, this configuration is used to register track points which then drive the BAM Interceptor. The full semantics of a step are not immediately clear from Microsoft’s documentation.  They represent a point in a business activity where BAM tracking occurs.  They are named locations in the code.  What is less obvious is that they always represent either the full tracking work for a given activity or a discrete fragment of that work which commences with the start of a new activity or the continuation of an existing activity.  The BAM Interceptor enforces this by throwing an error if no ‘start new’ or ‘continue’ track point is registered for a named location. This constraint implies that each step must marked with an ‘end activity’ track point.  One of the peculiarities of BAM semantics is that when an activity is continued under a correlated ID, you must first mark the current activity as ‘ended’ in order to ensure the right housekeeping is done in the database.  If you re-start an ended activity under the same ID, you will leave the BAM import tables in an inconsistent state.  A step, therefore, always represents an entire unit of work for a given activity or continuation ID.  For activities with continuation, each unit of work is termed a ‘fragment’. Instance and Fragment State Internally, the BAM Interceptor maintains state data at two levels.  First, it represents the overall state of the activity using a ‘trace instance’ token.  This token contains the name and ID of the activity together with a couple of state flags.  The second level of state represents a ‘trace fragment’.   As we have seen, a fragment of an activity corresponds directly to the notion of a ‘step’.  It is the unit of work done at a named location, and it must be bounded by start and end, or continue and end, actions. When handling continuations, the BAM Interceptor differentiates between ‘root’ fragments and other fragments.  Very simply, a root fragment represents the start of an activity.  Other fragments represent continuations.  This is where the logic breaks down.  The BAM Interceptor loses state integrity for root fragments when continuations are defined. Initialization Microsoft’s BAM Interceptor code supports the initialization of BAM Interceptors from track point configuration data.  The process starts by populating an Activity Interceptor Configuration object with an array of track points.  These can belong to different steps (aka ‘locations’) and can be registered in any order.  Once it is populated with track points, the Activity Interceptor Configuration is used to initialise the BAM Interceptor.  The BAM Interceptor sets up a hash table of array lists.  Each step is represented by an array list, and each array list contains an ordered set of track points.  The BAM Interceptor represents track points as ‘executable’ components.  When the OnStep method of the BAM Interceptor is called for a given step, the corresponding list of track points is retrieved and each track point is executed in turn.  Each track point retrieves any required data using a call back mechanism and then serializes a BAM trace fragment object representing a specific action (e.g., start, update, enable continuation, stop, etc.).  The serialised trace fragment is then handed off to a BAM event stream (buffered or direct) which takes the appropriate action. The Root of the Problem The logic breaks down in the Activity Interceptor Configuration.  Each Activity Interceptor Configuration is initialised with an instance of a ‘trace instance’ token.  This provides the basic metadata for the activity as a whole.  It contains the activity name and ID together with state flags indicating if the activity ID is a root (i.e., not a continuation fragment) and if it is completed.  This single token is then shared by all trace actions for all steps registered with the Activity Interceptor Configuration. Each trace instance token is automatically initialised to represent a root fragment.  However, if you subsequently register a ‘continuation’ step with the Activity Interceptor Configuration, the ‘root’ flag is set to false at the point the ‘continue’ track point is registered for that step.   If you use a ‘reflector’ tool to inspect the code for the ActivityInterceptorConfiguration class, you can see the flag being set in one of the overloads of the RegisterContinue method.    This makes no sense.  The trace instance token is shared across all the track points registered with the Activity Interceptor Configuration.  The Activity Interceptor Configuration is designed to hold track points for multiple steps.  The ‘root’ flag is clearly meant to be initialised to ‘true’ for the preliminary root fragment and then subsequently set to false at the point that a continuation step is processed.  Instead, if the Activity Interceptor Configuration contains a continuation step, it is changed to ‘false’ before the root fragment is processed.  This is clearly an error in logic. The problem causes havoc when the BAM Interceptor is used with continuation.  Effectively the root step is no longer processed correctly, and the ultimate effect is that the continued activity never completes!   This has nothing to do with the root and the continuation being in the same process.  It is due to a fundamental mistake of setting the ‘root’ flag to false for a continuation before the root fragment is processed. The Workaround Fortunately, it is easy to work around the bug.  The trick is to ensure that you create a new Activity Interceptor Configuration object for each individual step.  This may mean filtering your configuration data to extract the track points for a single step or grouping the configured track points into individual steps and the creating a separate Activity Interceptor Configuration for each group.  In my case, the first approach was required.  Here is what the amended code looks like: // Because of a logic error in Microsoft's code, a separate ActivityInterceptorConfiguration must be used // for each location. The following code extracts only those track points for a given step name (location). var trackPointGroup = from ResolutionService.TrackPoint tp in bamActivity.TrackPoints                       where (string)tp.Location == bamStepName                       select tp; var bamActivityInterceptorConfig =     new Microsoft.BizTalk.Bam.EventObservation.ActivityInterceptorConfiguration(activityName); foreach (var trackPoint in trackPointGroup) {     switch (trackPoint.Type)     {         case TrackPointType.Start:             bamActivityInterceptorConfig.RegisterStartNew(trackPoint.Location, trackPoint.ExtractionInfo);             break; etc… I’m using LINQ to filter a list of track points for those entries that correspond to a given step and then registering only those track points on a new instance of the ActivityInterceptorConfiguration class.   As soon as I re-wrote the code to do this, activities with continuations started to complete correctly.

    Read the article

  • Fun with Aggregates

    - by Paul White
    There are interesting things to be learned from even the simplest queries.  For example, imagine you are given the task of writing a query to list AdventureWorks product names where the product has at least one entry in the transaction history table, but fewer than ten. One possible query to meet that specification is: SELECT p.Name FROM Production.Product AS p JOIN Production.TransactionHistory AS th ON p.ProductID = th.ProductID GROUP BY p.ProductID, p.Name HAVING COUNT_BIG(*) < 10; That query correctly returns 23 rows (execution plan and data sample shown below): The execution plan looks a bit different from the written form of the query: the base tables are accessed in reverse order, and the aggregation is performed before the join.  The general idea is to read all rows from the history table, compute the count of rows grouped by ProductID, merge join the results to the Product table on ProductID, and finally filter to only return rows where the count is less than ten. This ‘fully-optimized’ plan has an estimated cost of around 0.33 units.  The reason for the quote marks there is that this plan is not quite as optimal as it could be – surely it would make sense to push the Filter down past the join too?  To answer that, let’s look at some other ways to formulate this query.  This being SQL, there are any number of ways to write logically-equivalent query specifications, so we’ll just look at a couple of interesting ones.  The first query is an attempt to reverse-engineer T-SQL from the optimized query plan shown above.  It joins the result of pre-aggregating the history table to the Product table before filtering: SELECT p.Name FROM ( SELECT th.ProductID, cnt = COUNT_BIG(*) FROM Production.TransactionHistory AS th GROUP BY th.ProductID ) AS q1 JOIN Production.Product AS p ON p.ProductID = q1.ProductID WHERE q1.cnt < 10; Perhaps a little surprisingly, we get a slightly different execution plan: The results are the same (23 rows) but this time the Filter is pushed below the join!  The optimizer chooses nested loops for the join, because the cardinality estimate for rows passing the Filter is a bit low (estimate 1 versus 23 actual), though you can force a merge join with a hint and the Filter still appears below the join.  In yet another variation, the < 10 predicate can be ‘manually pushed’ by specifying it in a HAVING clause in the “q1” sub-query instead of in the WHERE clause as written above. The reason this predicate can be pushed past the join in this query form, but not in the original formulation is simply an optimizer limitation – it does make efforts (primarily during the simplification phase) to encourage logically-equivalent query specifications to produce the same execution plan, but the implementation is not completely comprehensive. Moving on to a second example, the following query specification results from phrasing the requirement as “list the products where there exists fewer than ten correlated rows in the history table”: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) < 10 ); Unfortunately, this query produces an incorrect result (86 rows): The problem is that it lists products with no history rows, though the reasons are interesting.  The COUNT_BIG(*) in the EXISTS clause is a scalar aggregate (meaning there is no GROUP BY clause) and scalar aggregates always produce a value, even when the input is an empty set.  In the case of the COUNT aggregate, the result of aggregating the empty set is zero (the other standard aggregates produce a NULL).  To make the point really clear, let’s look at product 709, which happens to be one for which no history rows exist: -- Scalar aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709;   -- Vector aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709 GROUP BY th.ProductID; The estimated execution plans for these two statements are almost identical: You might expect the Stream Aggregate to have a Group By for the second statement, but this is not the case.  The query includes an equality comparison to a constant value (709), so all qualified rows are guaranteed to have the same value for ProductID and the Group By is optimized away. In fact there are some minor differences between the two plans (the first is auto-parameterized and qualifies for trivial plan, whereas the second is not auto-parameterized and requires cost-based optimization), but there is nothing to indicate that one is a scalar aggregate and the other is a vector aggregate.  This is something I would like to see exposed in show plan so I suggested it on Connect.  Anyway, the results of running the two queries show the difference at runtime: The scalar aggregate (no GROUP BY) returns a result of zero, whereas the vector aggregate (with a GROUP BY clause) returns nothing at all.  Returning to our EXISTS query, we could ‘fix’ it by changing the HAVING clause to reject rows where the scalar aggregate returns zero: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) BETWEEN 1 AND 9 ); The query now returns the correct 23 rows: Unfortunately, the execution plan is less efficient now – it has an estimated cost of 0.78 compared to 0.33 for the earlier plans.  Let’s try adding a redundant GROUP BY instead of changing the HAVING clause: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY th.ProductID HAVING COUNT_BIG(*) < 10 ); Not only do we now get correct results (23 rows), this is the execution plan: I like to compare that plan to quantum physics: if you don’t find it shocking, you haven’t understood it properly :)  The simple addition of a redundant GROUP BY has resulted in the EXISTS form of the query being transformed into exactly the same optimal plan we found earlier.  What’s more, in SQL Server 2008 and later, we can replace the odd-looking GROUP BY with an explicit GROUP BY on the empty set: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ); I offer that as an alternative because some people find it more intuitive (and it perhaps has more geek value too).  Whichever way you prefer, it’s rather satisfying to note that the result of the sub-query does not exist for a particular correlated value where a vector aggregate is used (the scalar COUNT aggregate always returns a value, even if zero, so it always ‘EXISTS’ regardless which ProductID is logically being evaluated). The following query forms also produce the optimal plan and correct results, so long as a vector aggregate is used (you can probably find more equivalent query forms): WHERE Clause SELECT p.Name FROM Production.Product AS p WHERE ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) < 10; APPLY SELECT p.Name FROM Production.Product AS p CROSS APPLY ( SELECT NULL FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ) AS ca (dummy); FROM Clause SELECT q1.Name FROM ( SELECT p.Name, cnt = ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) FROM Production.Product AS p ) AS q1 WHERE q1.cnt < 10; This last example uses SUM(1) instead of COUNT and does not require a vector aggregate…you should be able to work out why :) SELECT q.Name FROM ( SELECT p.Name, cnt = ( SELECT SUM(1) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID ) FROM Production.Product AS p ) AS q WHERE q.cnt < 10; The semantics of SQL aggregates are rather odd in places.  It definitely pays to get to know the rules, and to be careful to check whether your queries are using scalar or vector aggregates.  As we have seen, query plans do not show in which ‘mode’ an aggregate is running and getting it wrong can cause poor performance, wrong results, or both. © 2012 Paul White Twitter: @SQL_Kiwi email: [email protected]

    Read the article

  • Windows Azure Myths

    - by BuckWoody
    Windows Azure is part of the Microsoft "stack" - the suite of software and services we offer. Because we have so many products in almost every part of technology, it's hard to know everything about all parts of what we do - even for those of us who work here. So it's no surprise that some folks are not as familiar with Windows and SQL Azure as they are, say Windows Server or XBox. As I chat with folks about a solution for a business or organization need, I put Windows Azure into the mix. I always start off with "What do you already know about Windows Azure?" so that I don't bore folks with information they already have. I some cases they've checked out the product ahead of time and have specific questions, in others they aren't as familiar, and in still others there is a fair amount of mis-information. Sometimes that's because of a marketing failure, sometimes it's hearsay, and somtetimes it's active misinformation. I thought I might lay out a few of these misconceptions. As always - do your fact-checking! Never take anyone's word alone (including mine) as gospel. Make sure you educate yourself on your options. Your company or your clients depend on you to have the right information on IT, so make sure you live up to that. Myth 1: Nobody uses Windows Azure It's true that we don't give out numbers on the amount of clients on Windows and SQL Azure. But lots of folks are here - companies you may have heard of like Boeing, NASA, Fujitsu, The City of London, Nuedesic, and many others. I deal with firms small and large that use Windows Azure for mission-critical applications, sometimes totally on Windows and/or SQL Azure, sometimes in conjunction with an on-premises system, sometimes for only a specific component in Windows Azure like storage. The interesting thing is that many sites you visit have a Windows Azure component, or are running on Windows Azure. They just don't announce it. Just like the other cloud providers, the companies have asked to be completely branded themselves - they don't want you to be aware or care that they are on Windows Azure. Sometimes that's for security, other times it's for different reasons. It's just like the web sites you visit. For the most part, they don't advertise which OS or Web Server they use. It really just shouldn't matter. The point is that they just use what works to solve a given problem. Check out a few public case studies here: https://www.windowsazure.com/en-us/home/case-studies/ Myth 2: It's only for Microsoft stuff - can't use Open Source This is the one I face the most, and am the most dismayed by. We work just fine with many open source products, including Java, NodeJS, PHP, Ruby, Python, Hadoop, and many other languages and applications. You can quickly deploy a Wordpress, Umbraco and other "kits". We have software development kits (SDK's) for iPhones, iPads, Android, Windows phones and more. We have an SDK to work with FaceBook and other social networks. In short, we play well with others. More on the languages and runtimes we support here: https://www.windowsazure.com/en-us/develop/overview/ More on the SDK's here: http://www.wadewegner.com/2011/05/windows-azure-toolkit-for-ios/, http://www.wadewegner.com/2011/08/windows-azure-toolkits-for-devices-now-with-android/, http://azuretoolkit.codeplex.com/ Myth 3: Microsoft expects me to switch everything to "the cloud" No, we don't. That would be disasterous, unless the only things you run in your company uses works perfectly in Azure. Use Windows Azure  - or any cloud for that matter - where it works. Whenever I talk to companies, I focus on two things: Something that is broken and needs to be re-architected Something you want to do that is new If something is broken, and you need new tools to scale, extend, add capacity dynamically and so on, then you can consider using Windows or SQL Azure. It can help solve problems that you have, or it may include a component you don't want to write or architect yourself. Sometimes you want to do something new, like extend your company's offerings to mobile phones, to the web, or to a social network. More info on where it works here: http://blogs.msdn.com/b/buckwoody/archive/2011/01/18/windows-azure-and-sql-azure-use-cases.aspx Myth 4: I have to write code to use Windows and SQL Azure If Windows Azure is a PaaS - a Platform as a Service - then don't you have to write code to use it? Nope. Windows and SQL Azure are made up of various components. Some of those components allow you to write and deploy code (like Compute) and others don't. We have lots of customers using Windows Azure storage as a backup, to securely share files instead of using DropBox, to distribute videos or code or firmware, and more. Others use our High Performance Computing (HPC) offering to rent a supercomputer when they need one. You can even throw workloads at that using Excel! In addition there are lots of other components in Windows Azure you can use, from the Windows Azure Media Services to others. More here: https://www.windowsazure.com/en-us/home/scenarios/saas/ Myth 5: Windows Azure is just another form of "vendor lock-in" Windows Azure uses .NET, OSS languages and standard interfaces for the code. Sure, you're not going to take the code line-for-line and run it on a mainframe, but it's standard code that you write, and can port to something else. And the data is yours - you can bring it back whever you want. It's either in text or binary form, that you have complete control over. There are no licenses - you can "pay as you go", and when you're done, you can leave the service and take all your code, data and IP with you.   So go out there, read up, try it. Use it where it works. And don't believe everything you hear - sometimes the Internet doesn't get it all correct. :)

    Read the article

  • Of C# Iterators and Performance

    - by James Michael Hare
    Some of you reading this will be wondering, "what is an iterator" and think I'm locked in the world of C++.  Nope, I'm talking C# iterators.  No, not enumerators, iterators.   So, for those of you who do not know what iterators are in C#, I will explain it in summary, and for those of you who know what iterators are but are curious of the performance impacts, I will explore that as well.   Iterators have been around for a bit now, and there are still a bunch of people who don't know what they are or what they do.  I don't know how many times at work I've had a code review on my code and have someone ask me, "what's that yield word do?"   Basically, this post came to me as I was writing some extension methods to extend IEnumerable<T> -- I'll post some of the fun ones in a later post.  Since I was filtering the resulting list down, I was using the standard C# iterator concept; but that got me wondering: what are the performance implications of using an iterator versus returning a new enumeration?   So, to begin, let's look at a couple of methods.  This is a new (albeit contrived) method called Every(...).  The goal of this method is to access and enumeration and return every nth item in the enumeration (including the first).  So Every(2) would return items 0, 2, 4, 6, etc.   Now, if you wanted to write this in the traditional way, you may come up with something like this:       public static IEnumerable<T> Every<T>(this IEnumerable<T> list, int interval)     {         List<T> newList = new List<T>();         int count = 0;           foreach (var i in list)         {             if ((count++ % interval) == 0)             {                 newList.Add(i);             }         }           return newList;     }     So basically this method takes any IEnumerable<T> and returns a new IEnumerable<T> that contains every nth item.  Pretty straight forward.   The problem?  Well, Every<T>(...) will construct a list containing every nth item whether or not you care.  What happens if you were searching this result for a certain item and find that item after five tries?  You would have generated the rest of the list for nothing.   Enter iterators.  This C# construct uses the yield keyword to effectively defer evaluation of the next item until it is asked for.  This can be very handy if the evaluation itself is expensive or if there's a fair chance you'll never want to fully evaluate a list.   We see this all the time in Linq, where many expressions are chained together to do complex processing on a list.  This would be very expensive if each of these expressions evaluated their entire possible result set on call.    Let's look at the same example function, this time using an iterator:       public static IEnumerable<T> Every<T>(this IEnumerable<T> list, int interval)     {         int count = 0;         foreach (var i in list)         {             if ((count++ % interval) == 0)             {                 yield return i;             }         }     }   Notice it does not create a new return value explicitly, the only evidence of a return is the "yield return" statement.  What this means is that when an item is requested from the enumeration, it will enter this method and evaluate until it either hits a yield return (in which case that item is returned) or until it exits the method or hits a yield break (in which case the iteration ends.   Behind the scenes, this is all done with a class that the CLR creates behind the scenes that keeps track of the state of the iteration, so that every time the next item is asked for, it finds that item and then updates the current position so it knows where to start at next time.   It doesn't seem like a big deal, does it?  But keep in mind the key point here: it only returns items as they are requested. Thus if there's a good chance you will only process a portion of the return list and/or if the evaluation of each item is expensive, an iterator may be of benefit.   This is especially true if you intend your methods to be chainable similar to the way Linq methods can be chained.    For example, perhaps you have a List<int> and you want to take every tenth one until you find one greater than 10.  We could write that as:       List<int> someList = new List<int>();         // fill list here         someList.Every(10).TakeWhile(i => i <= 10);     Now is the difference more apparent?  If we use the first form of Every that makes a copy of the list.  It's going to copy the entire list whether we will need those items or not, that can be costly!    With the iterator version, however, it will only take items from the list until it finds one that is > 10, at which point no further items in the list are evaluated.   So, sounds neat eh?  But what's the cost is what you're probably wondering.  So I ran some tests using the two forms of Every above on lists varying from 5 to 500,000 integers and tried various things.    Now, iteration isn't free.  If you are more likely than not to iterate the entire collection every time, iterator has some very slight overhead:   Copy vs Iterator on 100% of Collection (10,000 iterations) Collection Size Num Iterated Type Total ms 5 5 Copy 5 5 5 Iterator 5 50 50 Copy 28 50 50 Iterator 27 500 500 Copy 227 500 500 Iterator 247 5000 5000 Copy 2266 5000 5000 Iterator 2444 50,000 50,000 Copy 24,443 50,000 50,000 Iterator 24,719 500,000 500,000 Copy 250,024 500,000 500,000 Iterator 251,521   Notice that when iterating over the entire produced list, the times for the iterator are a little better for smaller lists, then getting just a slight bit worse for larger lists.  In reality, given the number of items and iterations, the result is near negligible, but just to show that iterators come at a price.  However, it should also be noted that the form of Every that returns a copy will have a left-over collection to garbage collect.   However, if we only partially evaluate less and less through the list, the savings start to show and make it well worth the overhead.  Let's look at what happens if you stop looking after 80% of the list:   Copy vs Iterator on 80% of Collection (10,000 iterations) Collection Size Num Iterated Type Total ms 5 4 Copy 5 5 4 Iterator 5 50 40 Copy 27 50 40 Iterator 23 500 400 Copy 215 500 400 Iterator 200 5000 4000 Copy 2099 5000 4000 Iterator 1962 50,000 40,000 Copy 22,385 50,000 40,000 Iterator 19,599 500,000 400,000 Copy 236,427 500,000 400,000 Iterator 196,010       Notice that the iterator form is now operating quite a bit faster.  But the savings really add up if you stop on average at 50% (which most searches would typically do):     Copy vs Iterator on 50% of Collection (10,000 iterations) Collection Size Num Iterated Type Total ms 5 2 Copy 5 5 2 Iterator 4 50 25 Copy 25 50 25 Iterator 16 500 250 Copy 188 500 250 Iterator 126 5000 2500 Copy 1854 5000 2500 Iterator 1226 50,000 25,000 Copy 19,839 50,000 25,000 Iterator 12,233 500,000 250,000 Copy 208,667 500,000 250,000 Iterator 122,336   Now we see that if we only expect to go on average 50% into the results, we tend to shave off around 40% of the time.  And this is only for one level deep.  If we are using this in a chain of query expressions it only adds to the savings.   So my recommendation?  If you have a resonable expectation that someone may only want to partially consume your enumerable result, I would always tend to favor an iterator.  The cost if they iterate the whole thing does not add much at all -- and if they consume only partially, you reap some really good performance gains.   Next time I'll discuss some of my favorite extensions I've created to make development life a little easier and maintainability a little better.

    Read the article

  • Oracle Expands Sun Blade Portfolio for Cloud and Highly Virtualized Environments

    - by Ferhat Hatay
    Oracle announced the expansion of Sun Blade Portfolio for cloud and highly virtualized environments that deliver powerful performance and simplified management as tightly integrated systems.  Along with the SPARC T3-1B blade server, Oracle VM blade cluster reference configuration and Oracle's optimized solution for Oracle WebLogic Suite, Oracle introduced the dual-node Sun Blade X6275 M2 server module with some impressive benchmark results.   Benchmarks on the Sun Blade X6275 M2 server module demonstrate the outstanding performance characteristics critical for running varied commercial applications used in cloud and highly virtualized environments.  These include best-in-class SPEC CPU2006 results with the Intel Xeon processor 5600 series, six Fluent world records and 1.8 times the price-performance of the IBM Power 755 running NAMD, a prominent bio-informatics workload.   Benchmarks for Sun Blade X6275 M2 server module  SPEC CPU2006  The Sun Blade X6275 M2 server module demonstrated best in class SPECint_rate2006 results for all published results using the Intel Xeon processor 5600 series, with a result of 679.  This result is 97% better than the HP BL460c G7 blade, 80% better than the IBM HS22V blade, and 79% better than the Dell M710 blade.  This result demonstrates the density advantage of the new Oracle's server module for space-constrained data centers.     Sun Blade X6275M2 (2 Nodes, Intel Xeon X5670 2.93GHz) - 679 SPECint_rate2006; HP ProLiant BL460c G7 (2.93 GHz, Intel Xeon X5670) - 347 SPECint_rate2006; IBM BladeCenter HS22V (Intel Xeon X5680)  - 377 SPECint_rate2006; Dell PowerEdge M710 (Intel Xeon X5680, 3.33 GHz) - 380 SPECint_rate2006.  SPEC, SPECint, SPECfp reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org as of 11/24/2010 and this report.    For more specifics about these results, please go to see http://blogs.sun.com/BestPerf   Fluent The Sun Fire X6275 M2 server module produced world-record results on each of the six standard cases in the current "FLUENT 12" benchmark test suite at 8-, 12-, 24-, 32-, 64- and 96-core configurations. These results beat the most recent QLogic score with IBM DX 360 M series platforms and QLogic "Truescale" interconnects.  Results on sedan_4m test case on the Sun Blade X6275 M2 server module are 23% better than the HP C7000 system, and 20% better than the IBM DX 360 M2; Dell has not posted a result for this test case.  Results can be found at the FLUENT website.   ANSYS's FLUENT software solves fluid flow problems, and is based on a numerical technique called computational fluid dynamics (CFD), which is used in the automotive, aerospace, and consumer products industries. The FLUENT 12 benchmark test suite consists of seven models that are well suited for multi-node clustered environments and representative of modern engineering CFD clusters. Vendors benchmark their systems with the principal objective of providing comparative performance information for FLUENT software that, among other things, depends on compilers, optimization, interconnect, and the performance characteristics of the hardware.   FLUENT application performance is representative of other commercial applications that require memory and CPU resources to be available in a scalable cluster-ready format.  FLUENT benchmark has six conventional test cases (eddy_417k, turbo_500k, aircraft_2m, sedan_4m, truck_14m, truck_poly_14m) at various core counts.   All information on the FLUENT website (http://www.fluent.com) is Copyrighted1995-2010 by ANSYS Inc. Results as of November 24, 2010. For more specifics about these results, please go to see http://blogs.sun.com/BestPerf   NAMD Results on the Sun Blade X6275 M2 server module running NAMD (a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems) show up to a 1.8X better price/performance than IBM's Power 7-based system.  For space-constrained environments, the ultra-dense Sun Blade X6275 M2 server module provides a 1.7X better price/performance per rack unit than IBM's system.     IBM Power 755 4-way Cluster (16U). Total price for cluster: $324,212. See IBM United States Hardware Announcement 110-008, dated February 9, 2010, pp. 4, 21 and 39-46.  Sun Blade X6275 M2 8-Blade Cluster (10U). Total price for cluster:  $193,939. Price/performance and performance/RU comparisons based on f1ATPase molecule test results. Sun Blade X6275 M2 cluster: $3,568/step/sec, 5.435 step/sec/RU. IBM Power 755 cluster: $6,355/step/sec, 3.189 step/sec/U. See http://www-03.ibm.com/systems/power/hardware/reports/system_perf.html. See http://www.ks.uiuc.edu/Research/namd/performance.html for more information, results as of 11/24/10.   For more specifics about these results, please go to see http://blogs.sun.com/BestPerf   Reverse Time Migration The Reverse Time Migration is heavily used in geophysical imaging and modeling for Oil & Gas Exploration.  The Sun Blade X6275 M2 server module showed up to a 40% performance improvement over the previous generation server module with super-linear scalability to 16 nodes for the 9-Point Stencil used in this Reverse Time Migration computational kernel.  The balanced combination of Oracle's Sun Storage 7410 system with the Sun Blade X6275 M2 server module cluster showed linear scalability for the total application throughput, including the I/O and MPI communication, to produce a final 3-D seismic depth imaged cube for interpretation. The final image write time from the Sun Blade X6275 M2 server module nodes to Oracle's Sun Storage 7410 system achieved 10GbE line speed of 1.25 GBytes/second or better performance. Between subsequent runs, the effects of I/O buffer caching on the Sun Blade X6275 M2 server module nodes and write optimized caching on the Sun Storage 7410 system gave up to 1.8 GBytes/second effective write performance. The performance results and characterization of this Reverse Time Migration benchmark could serve as a useful measure for many other I/O intensive commercial applications. 3D VTI Reverse Time Migration Seismic Depth Imaging, see http://blogs.sun.com/BestPerf/entry/3d_vti_reverse_time_migration for more information, results as of 11/14/2010.                            

    Read the article

  • Parent Objects

    - by Ali Bahrami
    Support for Parent Objects was added in Solaris 11 Update 1. The following material is adapted from the PSARC arc case, and the Solaris Linker and Libraries Manual. A "plugin" is a shared object, usually loaded via dlopen(), that is used by a program in order to allow the end user to add functionality to the program. Examples of plugins include those used by web browsers (flash, acrobat, etc), as well as mdb and elfedit modules. The object that loads the plugin at runtime is called the "parent object". Unlike most object dependencies, the parent is not identified by name, but by its status as the object doing the load. Historically, building a good plugin is has been more complicated than it should be: A parent and its plugin usually share a 2-way dependency: The plugin provides one or more routines for the parent to call, and the parent supplies support routines for use by the plugin for things like memory allocation and error reporting. It is a best practice to build all objects, including plugins, with the -z defs option, in order to ensure that the object specifies all of its dependencies, and is self contained. However: The parent is usually an executable, which cannot be linked to via the usual library mechanisms provided by the link editor. Even if the parent is a shared object, which could be a normal library dependency to the plugin, it may be desirable to build plugins that can be used by more than one parent, in which case embedding a dependency NEEDED entry for one of the parents is undesirable. The usual way to build a high quality plugin with -z defs uses a special mapfile provided by the parent. This mapfile defines the parent routines, specifying the PARENT attribute (see example below). This works, but is inconvenient, and error prone. The symbol table in the parent already describes what it makes available to plugins — ideally the plugin would obtain that information directly rather than from a separate mapfile. The new -z parent option to ld allows a plugin to link to the parent and access the parent symbol table. This differs from a typical dependency: No NEEDED record is created. The relationship is recorded as a logical connection to the parent, rather than as an explicit object name However, it operates in the same manner as any other dependency in terms of making symbols available to the plugin. When the -z parent option is used, the link-editor records the basename of the parent object in the dynamic section, using the new tag DT_SUNW_PARENT. This is an informational tag, which is not used by the runtime linker to locate the parent, but which is available for diagnostic purposes. The ld(1) manpage documentation for the -z parent option is: -z parent=object Specifies a "parent object", which can be an executable or shared object, against which to link the output object. This option is typically used when creating "plugin" shared objects intended to be loaded by an executable at runtime via the dlopen() function. The symbol table from the parent object is used to satisfy references from the plugin object. The use of the -z parent option makes symbols from the object calling dlopen() available to the plugin. Example For this example, we use a main program, and a plugin. The parent provides a function named parent_callback() for the plugin to call. The plugin provides a function named plugin_func() to the parent: % cat main.c #include <stdio.h> #include <dlfcn.h> #include <link.h> void parent_callback(void) { printf("plugin_func() has called parent_callback()\n"); } int main(int argc, char **argv) { typedef void plugin_func_t(void); void *hdl; plugin_func_t *plugin_func; if (argc != 2) { fprintf(stderr, "usage: main plugin\n"); return (1); } if ((hdl = dlopen(argv[1], RTLD_LAZY)) == NULL) { fprintf(stderr, "unable to load plugin: %s\n", dlerror()); return (1); } plugin_func = (plugin_func_t *) dlsym(hdl, "plugin_func"); if (plugin_func == NULL) { fprintf(stderr, "unable to find plugin_func: %s\n", dlerror()); return (1); } (*plugin_func)(); return (0); } % cat plugin.c #include <stdio.h> extern void parent_callback(void); void plugin_func(void) { printf("parent has called plugin_func() from plugin.so\n"); parent_callback(); } Building this in the traditional manner, without -zdefs: % cc -o main main.c % cc -G -o plugin.so plugin.c % ./main ./plugin.so parent has called plugin_func() from plugin.so plugin_func() has called parent_callback() As noted above, when building any shared object, the -z defs option is recommended, in order to ensure that the object is self contained and specifies all of its dependencies. However, the use of -z defs prevents the plugin object from linking due to the unsatisfied symbol from the parent object: % cc -zdefs -G -o plugin.so plugin.c Undefined first referenced symbol in file parent_callback plugin.o ld: fatal: symbol referencing errors. No output written to plugin.so A mapfile can be used to specify to ld that the parent_callback symbol is supplied by the parent object. % cat plugin.mapfile $mapfile_version 2 SYMBOL_SCOPE { global: parent_callback { FLAGS = PARENT }; }; % cc -zdefs -Mplugin.mapfile -G -o plugin.so plugin.c However, the -z parent option to ld is the most direct solution to this problem, allowing the plugin to actually link against the parent object, and obtain the available symbols from it. An added benefit of using -z parent instead of a mapfile, is that the name of the parent object is recorded in the dynamic section of the plugin, and can be displayed by the file utility: % cc -zdefs -zparent=main -G -o plugin.so plugin.c % elfdump -d plugin.so | grep PARENT [0] SUNW_PARENT 0xcc main % file plugin.so plugin.so: ELF 32-bit LSB dynamic lib 80386 Version 1, parent main, dynamically linked, not stripped % ./main ./plugin.so parent has called plugin_func() from plugin.so plugin_func() has called parent_callback() We can also observe this in elfedit plugins on Solaris systems running Solaris 11 Update 1 or newer: % file /usr/lib/elfedit/dyn.so /usr/lib/elfedit/dyn.so: ELF 32-bit LSB dynamic lib 80386 Version 1, parent elfedit, dynamically linked, not stripped, no debugging information available Related Other Work The GNU ld has an option named --just-symbols that can be used in a similar manner: --just-symbols=filename Read symbol names and their addresses from filename, but do not relocate it or include it in the output. This allows your output file to refer symbolically to absolute locations of memory defined in other programs. You may use this option more than once. -z parent is a higher level operation aimed specifically at simplifying the construction of high quality plugins. Although it employs the same operation, it differs from --just symbols in 2 significant ways: There can only be one parent. The parent is recorded in the created object, and can be displayed by 'file', or other similar tools.

    Read the article

  • Try a sample: Using the counter predicate for event sampling

    - by extended_events
    Extended Events offers a rich filtering mechanism, called predicates, that allows you to reduce the number of events you collect by specifying criteria that will be applied during event collection. (You can find more information about predicates in Using SQL Server 2008 Extended Events (by Jonathan Kehayias)) By evaluating predicates early in the event firing sequence we can reduce the performance impact of collecting events by stopping event collection when the criteria are not met. You can specify predicates on both event fields and on a special object called a predicate source. Predicate sources are similar to action in that they typically are related to some type of global information available from the server. You will find that many of the actions available in Extended Events have equivalent predicate sources, but actions and predicates sources are not the same thing. Applying predicates, whether on a field or predicate source, is very similar to what you are used to in T-SQL in terms of how they work; you pick some field/source and compare it to a value, for example, session_id = 52. There is one predicate source that merits special attention though, not just for its special use, but for how the order of predicate evaluation impacts the behavior you see. I’m referring to the counter predicate source. The counter predicate source gives you a way to sample a subset of events that otherwise meet the criteria of the predicate; for example you could collect every other event, or only every tenth event. Simple CountingThe counter predicate source works by creating an in memory counter that increments every time the predicate statement is evaluated. Here is a simple example with my favorite event, sql_statement_completed, that only collects the second statement that is run. (OK, that’s not much of a sample, but this is for demonstration purposes. Here is the session definition: CREATE EVENT SESSION counter_test ON SERVERADD EVENT sqlserver.sql_statement_completed    (ACTION (sqlserver.sql_text)    WHERE package0.counter = 2)ADD TARGET package0.ring_bufferWITH (MAX_DISPATCH_LATENCY = 1 SECONDS) You can find general information about the session DDL syntax in BOL and from Pedro’s post Introduction to Extended Events. The important part here is the WHERE statement that defines that I only what the event where package0.count = 2; in other words, only the second instance of the event. Notice that I need to provide the package name along with the predicate source. You don’t need to provide the package name if you’re using event fields, only for predicate sources. Let’s say I run the following test queries: -- Run three statements to test the sessionSELECT 'This is the first statement'GOSELECT 'This is the second statement'GOSELECT 'This is the third statement';GO Once you return the event data from the ring buffer and parse the XML (see my earlier post on reading event data) you should see something like this: event_name sql_text sql_statement_completed SELECT ‘This is the second statement’ You can see that only the second statement from the test was actually collected. (Feel free to try this yourself. Check out what happens if you remove the WHERE statement from your session. Go ahead, I’ll wait.) Percentage Sampling OK, so that wasn’t particularly interesting, but you can probably see that this could be interesting, for example, lets say I need a 25% sample of the statements executed on my server for some type of QA analysis, that might be more interesting than just the second statement. All comparisons of predicates are handled using an object called a predicate comparator; the simple comparisons such as equals, greater than, etc. are mapped to the common mathematical symbols you know and love (eg. = and >), but to do the less common comparisons you will need to use the predicate comparators directly. You would probably look to the MOD operation to do this type sampling; we would too, but we don’t call it MOD, we call it divides_by_uint64. This comparator evaluates whether one number is divisible by another with no remainder. The general syntax for using a predicate comparator is pred_comp(field, value), field is always first and value is always second. So lets take a look at how the session changes to answer our new question of 25% sampling: CREATE EVENT SESSION counter_test_25 ON SERVERADD EVENT sqlserver.sql_statement_completed    (ACTION (sqlserver.sql_text)    WHERE package0.divides_by_uint64(package0.counter,4))ADD TARGET package0.ring_bufferWITH (MAX_DISPATCH_LATENCY = 1 SECONDS)GO Here I’ve replaced the simple equivalency check with the divides_by_uint64 comparator to check if the counter is evenly divisible by 4, which gives us back every fourth record. I’ll leave it as an exercise for the reader to test this session. Why order matters I indicated at the start of this post that order matters when it comes to the counter predicate – it does. Like most other predicate systems, Extended Events evaluates the predicate statement from left to right; as soon as the predicate statement is proven false we abandon evaluation of the remainder of the statement. The counter predicate source is only incremented when it is evaluated so whether or not the counter is incremented will depend on where it is in the predicate statement and whether a previous criteria made the predicate false or not. Here is a generic example: Pred1: (WHERE statement_1 AND package0.counter = 2)Pred2: (WHERE package0.counter = 2 AND statement_1) Let’s say I cause a number of events as follows and examine what happens to the counter predicate source. Iteration Statement Pred1 Counter Pred2 Counter A Not statement_1 0 1 B statement_1 1 2 C Not statement_1 1 3 D statement_1 2 4 As you can see, in the case of Pred1, statement_1 is evaluated first, when it fails (A & C) predicate evaluation is stopped and the counter is not incremented. With Pred2 the counter is evaluated first, so it is incremented on every iteration of the event and the remaining parts of the predicate are then evaluated. In this example, Pred1 would return an event for D while Pred2 would return an event for B. But wait, there is an interesting side-effect here; consider Pred2 if I had run my statements in the following order: Not statement_1 Not statement_1 statement_1 statement_1 In this case I would never get an event back from the system because the point at which counter=2, the rest of the predicate evaluates as false so the event is not returned. If you’re using the counter target for sampling and you’re not getting the expected events, or any events, check the order of the predicate criteria. As a general rule I’d suggest that the counter criteria should be the last element of your predicate statement since that will assure that your sampling rate will apply to the set of event records defined by the rest of your predicate. Aside: I’m interested in hearing about uses for putting the counter predicate criteria earlier in the predicate statement. If you have one, post it in a comment to share with the class. - Mike Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Writing the tests for FluentPath

    - by Bertrand Le Roy
    Writing the tests for FluentPath is a challenge. The library is a wrapper around a legacy API (System.IO) that wasn’t designed to be easily testable. If it were more testable, the sensible testing methodology would be to tell System.IO to act against a mock file system, which would enable me to verify that my code is doing the expected file system operations without having to manipulate the actual, physical file system: what we are testing here is FluentPath, not System.IO. Unfortunately, that is not an option as nothing in System.IO enables us to plug a mock file system in. As a consequence, we are left with few options. A few people have suggested me to abstract my calls to System.IO away so that I could tell FluentPath – not System.IO – to use a mock instead of the real thing. That in turn is getting a little silly: FluentPath already is a thin abstraction around System.IO, so layering another abstraction between them would double the test surface while bringing little or no value. I would have to test that new abstraction layer, and that would bring us back to square one. Unless I’m missing something, the only option I have here is to bite the bullet and test against the real file system. Of course, the tests that do that can hardly be called unit tests. They are more integration tests as they don’t only test bits of my code. They really test the successful integration of my code with the underlying System.IO. In order to write such tests, the techniques of BDD work particularly well as they enable you to express scenarios in natural language, from which test code is generated. Integration tests are being better expressed as scenarios orchestrating a few basic behaviors, so this is a nice fit. The Orchard team has been successfully using SpecFlow for integration tests for a while and I thought it was pretty cool so that’s what I decided to use. Consider for example the following scenario: Scenario: Change extension Given a clean test directory When I change the extension of bar\notes.txt to foo Then bar\notes.txt should not exist And bar\notes.foo should exist This is human readable and tells you everything you need to know about what you’re testing, but it is also executable code. What happens when SpecFlow compiles this scenario is that it executes a bunch of regular expressions that identify the known Given (set-up phases), When (actions) and Then (result assertions) to identify the code to run, which is then translated into calls into the appropriate methods. Nothing magical. Here is the code generated by SpecFlow: [NUnit.Framework.TestAttribute()] [NUnit.Framework.DescriptionAttribute("Change extension")] public virtual void ChangeExtension() { TechTalk.SpecFlow.ScenarioInfo scenarioInfo = new TechTalk.SpecFlow.ScenarioInfo("Change extension", ((string[])(null))); #line 6 this.ScenarioSetup(scenarioInfo); #line 7 testRunner.Given("a clean test directory"); #line 8 testRunner.When("I change the extension of " + "bar\\notes.txt to foo"); #line 9 testRunner.Then("bar\\notes.txt should not exist"); #line 10 testRunner.And("bar\\notes.foo should exist"); #line hidden testRunner.CollectScenarioErrors();} The #line directives are there to give clues to the debugger, because yes, you can put breakpoints into a scenario: The way you usually write tests with SpecFlow is that you write the scenario first, let it fail, then write the translation of your Given, When and Then into code if they don’t already exist, which results in running but failing tests, and then you write the code to make your tests pass (you implement the scenario). In the case of FluentPath, I built a simple Given method that builds a simple file hierarchy in a temporary directory that all scenarios are going to work with: [Given("a clean test directory")] public void GivenACleanDirectory() { _path = new Path(SystemIO.Path.GetTempPath()) .CreateSubDirectory("FluentPathSpecs") .MakeCurrent(); _path.GetFileSystemEntries() .Delete(true); _path.CreateFile("foo.txt", "This is a text file named foo."); var bar = _path.CreateSubDirectory("bar"); bar.CreateFile("baz.txt", "bar baz") .SetLastWriteTime(DateTime.Now.AddSeconds(-2)); bar.CreateFile("notes.txt", "This is a text file containing notes."); var barbar = bar.CreateSubDirectory("bar"); barbar.CreateFile("deep.txt", "Deep thoughts"); var sub = _path.CreateSubDirectory("sub"); sub.CreateSubDirectory("subsub"); sub.CreateFile("baz.txt", "sub baz") .SetLastWriteTime(DateTime.Now); sub.CreateFile("binary.bin", new byte[] {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0xFF}); } Then, to implement the scenario that you can read above, I had to write the following When: [When("I change the extension of (.*) to (.*)")] public void WhenIChangeTheExtension( string path, string newExtension) { var oldPath = Path.Current.Combine(path.Split('\\')); oldPath.Move(p => p.ChangeExtension(newExtension)); } As you can see, the When attribute is specifying the regular expression that will enable the SpecFlow engine to recognize what When method to call and also how to map its parameters. For our scenario, “bar\notes.txt” will get mapped to the path parameter, and “foo” to the newExtension parameter. And of course, the code that verifies the assumptions of the scenario: [Then("(.*) should exist")] public void ThenEntryShouldExist(string path) { Assert.IsTrue(_path.Combine(path.Split('\\')).Exists); } [Then("(.*) should not exist")] public void ThenEntryShouldNotExist(string path) { Assert.IsFalse(_path.Combine(path.Split('\\')).Exists); } These steps should be written with reusability in mind. They are building blocks for your scenarios, not implementation of a specific scenario. Think small and fine-grained. In the case of the above steps, I could reuse each of those steps in other scenarios. Those tests are easy to write and easier to read, which means that they also constitute a form of documentation. Oh, and SpecFlow is just one way to do this. Rob wrote a long time ago about this sort of thing (but using a different framework) and I highly recommend this post if I somehow managed to pique your interest: http://blog.wekeroad.com/blog/make-bdd-your-bff-2/ And this screencast (Rob always makes excellent screencasts): http://blog.wekeroad.com/mvc-storefront/kona-3/ (click the “Download it here” link)

    Read the article

  • Windows Azure Myths

    - by BuckWoody
    Windows Azure is part of the Microsoft "stack" - the suite of software and services we offer. Because we have so many products in almost every part of technology, it's hard to know everything about all parts of what we do - even for those of us who work here. So it's no surprise that some folks are not as familiar with Windows and SQL Azure as they are, say Windows Server or XBox. As I chat with folks about a solution for a business or organization need, I put Windows Azure into the mix. I always start off with "What do you already know about Windows Azure?" so that I don't bore folks with information they already have. I some cases they've checked out the product ahead of time and have specific questions, in others they aren't as familiar, and in still others there is a fair amount of mis-information. Sometimes that's because of a marketing failure, sometimes it's hearsay, and somtetimes it's active misinformation. I thought I might lay out a few of these misconceptions. As always - do your fact-checking! Never take anyone's word alone (including mine) as gospel. Make sure you educate yourself on your options. Your company or your clients depend on you to have the right information on IT, so make sure you live up to that. Myth 1: Nobody uses Windows Azure It's true that we don't give out numbers on the amount of clients on Windows and SQL Azure. But lots of folks are here - companies you may have heard of like Boeing, NASA, Fujitsu, The City of London, Nuedesic, and many others. I deal with firms small and large that use Windows Azure for mission-critical applications, sometimes totally on Windows and/or SQL Azure, sometimes in conjunction with an on-premises system, sometimes for only a specific component in Windows Azure like storage. The interesting thing is that many sites you visit have a Windows Azure component, or are running on Windows Azure. They just don't announce it. Just like the other cloud providers, the companies have asked to be completely branded themselves - they don't want you to be aware or care that they are on Windows Azure. Sometimes that's for security, other times it's for different reasons. It's just like the web sites you visit. For the most part, they don't advertise which OS or Web Server they use. It really just shouldn't matter. The point is that they just use what works to solve a given problem. Check out a few public case studies here: https://www.windowsazure.com/en-us/home/case-studies/ Myth 2: It's only for Microsoft stuff - can't use Open Source This is the one I face the most, and am the most dismayed by. We work just fine with many open source products, including Java, NodeJS, PHP, Ruby, Python, Hadoop, and many other languages and applications. You can quickly deploy a Wordpress, Umbraco and other "kits". We have software development kits (SDK's) for iPhones, iPads, Android, Windows phones and more. We have an SDK to work with FaceBook and other social networks. In short, we play well with others. More on the languages and runtimes we support here: https://www.windowsazure.com/en-us/develop/overview/ More on the SDK's here: http://www.wadewegner.com/2011/05/windows-azure-toolkit-for-ios/, http://www.wadewegner.com/2011/08/windows-azure-toolkits-for-devices-now-with-android/, http://azuretoolkit.codeplex.com/ Myth 3: Microsoft expects me to switch everything to "the cloud" No, we don't. That would be disasterous, unless the only things you run in your company uses works perfectly in Azure. Use Windows Azure  - or any cloud for that matter - where it works. Whenever I talk to companies, I focus on two things: Something that is broken and needs to be re-architected Something you want to do that is new If something is broken, and you need new tools to scale, extend, add capacity dynamically and so on, then you can consider using Windows or SQL Azure. It can help solve problems that you have, or it may include a component you don't want to write or architect yourself. Sometimes you want to do something new, like extend your company's offerings to mobile phones, to the web, or to a social network. More info on where it works here: http://blogs.msdn.com/b/buckwoody/archive/2011/01/18/windows-azure-and-sql-azure-use-cases.aspx Myth 4: I have to write code to use Windows and SQL Azure If Windows Azure is a PaaS - a Platform as a Service - then don't you have to write code to use it? Nope. Windows and SQL Azure are made up of various components. Some of those components allow you to write and deploy code (like Compute) and others don't. We have lots of customers using Windows Azure storage as a backup, to securely share files instead of using DropBox, to distribute videos or code or firmware, and more. Others use our High Performance Computing (HPC) offering to rent a supercomputer when they need one. You can even throw workloads at that using Excel! In addition there are lots of other components in Windows Azure you can use, from the Windows Azure Media Services to others. More here: https://www.windowsazure.com/en-us/home/scenarios/saas/ Myth 5: Windows Azure is just another form of "vendor lock-in" Windows Azure uses .NET, OSS languages and standard interfaces for the code. Sure, you're not going to take the code line-for-line and run it on a mainframe, but it's standard code that you write, and can port to something else. And the data is yours - you can bring it back whever you want. It's either in text or binary form, that you have complete control over. There are no licenses - you can "pay as you go", and when you're done, you can leave the service and take all your code, data and IP with you.   So go out there, read up, try it. Use it where it works. And don't believe everything you hear - sometimes the Internet doesn't get it all correct. :)

    Read the article

  • The blocking nature of aggregates

    - by Rob Farley
    I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here.  In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

    Read the article

  • The blocking nature of aggregates

    - by Rob Farley
    I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here.  In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

    Read the article

  • MySQL Cluster 7.3 Labs Release – Foreign Keys Are In!

    - by Mat Keep
    0 0 1 1097 6254 Homework 52 14 7337 14.0 Normal 0 false false false EN-US JA X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:Cambria; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin; mso-ansi-language:EN-US;} Summary (aka TL/DR): Support for Foreign Key constraints has been one of the most requested feature enhancements for MySQL Cluster. We are therefore extremely excited to announce that Foreign Keys are part of the first Labs Release of MySQL Cluster 7.3 – available for download, evaluation and feedback now! (Select the mysql-cluster-7.3-labs-June-2012 build) In this blog, I will attempt to discuss the design rationale, implementation, configuration and steps to get started in evaluating the first MySQL Cluster 7.3 Labs Release. Pace of Innovation It was only a couple of months ago that we announced the General Availability (GA) of MySQL Cluster 7.2, delivering 1 billion Queries per Minute, with 70x higher cross-shard JOIN performance, Memcached NoSQL key-value API and cross-data center replication.  This release has been a huge hit, with downloads and deployments quickly reaching record levels. The announcement of the first MySQL Cluster 7.3 Early Access lab release at today's MySQL Innovation Day event demonstrates the continued pace in Cluster development, and provides an opportunity for the community to evaluate and feedback on new features they want to see. What’s the Plan for MySQL Cluster 7.3? Well, Foreign Keys, as you may have gathered by now (!), and this is the focus of this first Labs Release. As with MySQL Cluster 7.2, we plan to publish a series of preview releases for 7.3 that will incrementally add new candidate features for a final GA release (subject to usual safe harbor statement below*), including: - New NoSQL APIs; - Features to automate the configuration and provisioning of multi-node clusters, on premise or in the cloud; - Performance and scalability enhancements; - Taking advantage of features in the latest MySQL 5.x Server GA. Design Rationale MySQL Cluster is designed as a “Not-Only-SQL” database. It combines attributes that enable users to blend the best of both relational and NoSQL technologies into solutions that deliver web scalability with 99.999% availability and real-time performance, including: Concurrent NoSQL and SQL access to the database; Auto-sharding with simple scale-out across commodity hardware; Multi-master replication with failover and recovery both within and across data centers; Shared-nothing architecture with no single point of failure; Online scaling and schema changes; ACID compliance and support for complex queries, across shards. Native support for Foreign Key constraints enables users to extend the benefits of MySQL Cluster into a broader range of use-cases, including: - Packaged applications in areas such as eCommerce and Web Content Management that prescribe databases with Foreign Key support. - In-house developments benefiting from Foreign Key constraints to simplify data models and eliminate the additional application logic needed to maintain data consistency and integrity between tables. Implementation The Foreign Key functionality is implemented directly within MySQL Cluster’s data nodes, allowing any client API accessing the cluster to benefit from them – whether using SQL or one of the NoSQL interfaces (Memcached, C++, Java, JPA or HTTP/REST.) The core referential actions defined in the SQL:2003 standard are implemented: CASCADE RESTRICT NO ACTION SET NULL In addition, the MySQL Cluster implementation supports the online adding and dropping of Foreign Keys, ensuring the Cluster continues to serve both read and write requests during the operation. An important difference to note with the Foreign Key implementation in InnoDB is that MySQL Cluster does not support the updating of Primary Keys from within the Data Nodes themselves - instead the UPDATE is emulated with a DELETE followed by an INSERT operation. Therefore an UPDATE operation will return an error if the parent reference is using a Primary Key, unless using CASCADE action, in which case the delete operation will result in the corresponding rows in the child table being deleted. The Engineering team plans to change this behavior in a subsequent preview release. Also note that when using InnoDB "NO ACTION" is identical to "RESTRICT". In the case of MySQL Cluster “NO ACTION” means “deferred check”, i.e. the constraint is checked before commit, allowing user-defined triggers to automatically make changes in order to satisfy the Foreign Key constraints. Configuration There is nothing special you have to do here – Foreign Key constraint checking is enabled by default. If you intend to migrate existing tables from another database or storage engine, for example from InnoDB, there are a couple of best practices to observe: 1. Analyze the structure of the Foreign Key graph and run the ALTER TABLE ENGINE=NDB in the correct sequence to ensure constraints are enforced 2. Alternatively drop the Foreign Key constraints prior to the import process and then recreate when complete. Getting Started Read this blog for a demonstration of using Foreign Keys with MySQL Cluster.  You can download MySQL Cluster 7.3 Labs Release with Foreign Keys today - (select the mysql-cluster-7.3-labs-June-2012 build) If you are new to MySQL Cluster, the Getting Started guide will walk you through installing an evaluation cluster on a singe host (these guides reflect MySQL Cluster 7.2, but apply equally well to 7.3) Post any questions to the MySQL Cluster forum where our Engineering team will attempt to assist you. Post any bugs you find to the MySQL bug tracking system (select MySQL Cluster from the Category drop-down menu) And if you have any feedback, please post them to the Comments section of this blog. Summary MySQL Cluster 7.2 is the GA, production-ready release of MySQL Cluster. This first Labs Release of MySQL Cluster 7.3 gives you the opportunity to preview and evaluate future developments in the MySQL Cluster database, and we are very excited to be able to share that with you. Let us know how you get along with MySQL Cluster 7.3, and other features that you want to see in future releases. * Safe Harbor Statement This information is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

    Read the article

  • CodePlex Daily Summary for Saturday, June 18, 2011

    CodePlex Daily Summary for Saturday, June 18, 2011Popular ReleasesEffectControls-Silverlight controls with animation effects: EffectControls: EffectControlsMedia Companion: MC 3.408b weekly: Some minor fixes..... Fixed messagebox coming up during batch scrape Added <originaltitle></originaltitle> to movie nfo's - when a new movie is scraped, the original title will be set as the returned title. The end user can of course change the title in MC, however the original title will not change. The original title can be seen by hovering over the movie title in the right pane of the main movie tab. To update all of your current nfo's to add the original title the same as your current ...NLog - Advanced .NET Logging: NLog 2.0 Release Candidate: Release notes for NLog 2.0 RC can be found at http://nlog-project.org/nlog-2-rc-release-notesPowerGUI Visual Studio Extension: PowerGUI VSX 1.3.5: Changes - VS SDK no longer required to be installed (a bug in v. 1.3.4).Gendering Add-In for Microsoft Office Word 2010: Gendering Add-In: This is the first stable Version of the Gendering Add-In. Unzip the package and start "setup.exe". The .reg file shows how to config an alternate path for suggestion table.TerrariViewer: TerrariViewer v3.1 [Terraria Inventory Editor]: This version adds tool tips. Almost every picture box you mouse over will tell you what item is in that box. I have also cleaned up the GUI a little more to make things easier on my end. There are various bug fixes including ones associated with opening different characters in the same instance of the program. As always, please bring any bugs you find to my attention.CommonLibrary.NET: CommonLibrary.NET - 0.9.7 Beta: A collection of very reusable code and components in C# 3.5 ranging from ActiveRecord, Csv, Command Line Parsing, Configuration, Holiday Calendars, Logging, Authentication, and much more. Samples in <root>\src\Lib\CommonLibrary.NET\Samples CommonLibrary.NET 0.9.7Documentation 6738 6503 New 6535 Enhancements 6583 6737DropBox Linker: DropBox Linker 1.2: Public sub-folders are now monitored for changes as well (thanks to mcm69) Automatic public sync folder detection (thanks to mcm69) Non-Latin and special characters encoded correctly in URLs Pop-ups are now slot-based (use first free slot and will never be overlapped — test it while previewing timeout) Public sync folder setting is hidden when auto-detected Timeout interval is displayed in popup previews A lot of major and minor code refactoring performed .NET Framework 4.0 Client...Terraria World Viewer: Version 1.3: Update June 15th Removed "Draw Markers" checkbox from main window because of redundancy/confusing. (Select all or no items from the Settings tab for the same effect.) Fixed Marker preferences not being saved. It is now possible to render more than one map without having to restart the application. World file will not be locked while the world is being rendered. Note: The World Viewer might render an inaccurate map or even crash if Terraria decides to modify the World file during the pro...MVC Controls Toolkit: Mvc Controls Toolkit 1.1.5 RC: Added Extended Dropdown allows a prompt item to be inserted as first element. RequiredAttribute, if present, trggers if no element is chosen Client side javascript function to set/get the values of DateTimeInput, TypedTextBox, TypedEditDisplay, and to bind/unbind a "change" handler The selected page in the pager is applied the attribute selected-page="selected" that can be used in the definition of CSS rules to style the selected page items controls now interpret a null value as an empr...Umbraco CMS: Umbraco CMS 5.0 CTP 1: Umbraco 5 Community Technology Preview Umbraco 5 will be the next version of everyone's favourite, friendly ASP.NET CMS that already powers over 100,000 websites worldwide. Try out our first CTP of version 5 today! If you're new to Umbraco and would like to get a quick low-down on our popular and easy-to-learn approach to content management, check out our intro video here. What's in the v5 CTP box? This is a preview version of version 5 and includes support for the following familiar Umbr...Coding4Fun Kinect Toolkit: Coding4Fun.Kinect Toolkit: Version 1.0Kinect Mouse Cursor: Kinect Mouse Cursor v1.0: The initial release of the Kinect Mouse Cursor project!patterns & practices: Project Silk: Project Silk Community Drop 11 - June 14, 2011: Changes from previous drop: Many code changes: please see the readme.mht for details. New "Client Data Management and Caching" chapter. Updated "Application Notifications" chapter. Updated "Architecture" chapter. Updated "jQuery UI Widget" chapter. Updated "Widget QuickStart" appendix and code. Guidance Chapters Ready for Review The Word documents for the chapters are included with the source code in addition to the CHM to help you provide feedback. The PDF is provided as a separat...Orchard Project: Orchard 1.2: Build: 1.2.41 Published: 6/14/2010 How to Install Orchard To install Orchard using Web PI, follow these instructions: http://www.orchardproject.net/docs/Installing-Orchard.ashx. Web PI will detect your hardware environment and install the application. Alternatively, to install the release manually, download the Orchard.Web.1.2.41.zip file. http://orchardproject.net/docs/Manually-installing-Orchard-zip-file.ashx The zip contents are pre-built and ready-to-run. Simply extract the contents o...Snippet Designer: Snippet Designer 1.4.0: Snippet Designer 1.4.0 for Visual Studio 2010 Change logSnippet Explorer ChangesReworked language filter UI to work better in the side bar. Added result count drop down which lets you choose how many results to see. Language filter and result count choices are persisted after Visual Studio is closed. Added file name to search criteria. Search is now case insensitive. Snippet Editor Changes Snippet Editor ChangesAdded menu option for the $end$ symbol which indicates where the c...Mobile Device Detection and Redirection: 1.0.4.1: Stable Release 51 Degrees.mobi Foundation is the best way to detect and redirect mobile devices and their capabilities on ASP.NET and is being used on thousands of websites worldwide. We’re highly confident in our software and we recommend all users update to this version. Changes to Version 1.0.4.1Changed the BlackberryHandler and BlackberryVersion6Handler to have equal CONFIDENCE values to ensure they both get a chance at detecting BlackBerry version 4&5 and version 6 devices. Prior to thi...Rawr: Rawr 4.1.06: This is the Downloadable WPF version of Rawr!For web-based version see http://elitistjerks.com/rawr.php You can find the version notes at: http://rawr.codeplex.com/wikipage?title=VersionNotes Rawr AddonWe now have a Rawr Official Addon for in-game exporting and importing of character data hosted on Curse. The Addon does not perform calculations like Rawr, it simply shows your exported Rawr data in wow tooltips and lets you export your character to Rawr (including bag and bank items) like Char...AcDown????? - Anime&Comic Downloader: AcDown????? v3.0 Beta6: ??AcDown?????????????,?????????????,????、????。?????Acfun????? ????32??64? Windows XP/Vista/7 ????????????? ??:????????Windows XP???,?????????.NET Framework 2.0???(x86)?.NET Framework 2.0???(x64),?????"?????????"??? ??v3.0 Beta6 ?????(imanhua.com)????? ???? ?? ??"????","?????","?????","????"?????? "????"?????"????????"?? ??????????? ?????????????? ?????????????/???? ?? ????Windows 7???????????? ????????? ?? ????????????? ???????/??????????? ???????????? ?? ?? ?????(imanh...Pulse: Pulse Beta 2: - Added new wallpapers provider http://wallbase.cc. Supports english search, multiple keywords* - Improved font rendering in Options window - Added "Set wallpaper as logon background" option* - Fixed crashes if there is no internet connection - Fixed: Rewalls downloads empty images sometimes - Added filters* Note 1: wallbase provider supports only english search. Rewalls provider supports only russian search but Pulse automatically translates your english keyword into russian using Google Tr...New Projects.NET Entities Framework Utils: Project for creating supporting code to work with .NET Entity Framework.A Simple Demo of Industrial Process Monitoring System based on .NET & Arduino: The demo show some key points of .NET 1 .NET WPF 2 WCF 3 Sinverlight 4 ADO.Net The demo is a good sample for learing .NET and developing process monitoring system. The demo get temperture from Arduinot It is also a good Arduin sample.AHtml Pad: AHtml Pad is a powerfull html and css editor. It's made for beginnners and experimented programmers. It's made in vb.net with visual studio 2010Allena la mente: Raccolta di minigames per migliorare Memoria, Riflessi, Logica e Matematica. Applicazione in silverlight per Windows Phone. Carousel TeamAnorexia World: Ich habe ein kleines Project geschrieben dass mit einen MDI Formular eine komplette Office Suite (noch in hartz4) in einen verheint!AutomaSolution - IT Automation made easy!: AutomaSolution is a console application written in C#. It takes a single .xml file as input, and processes each section to perform an automation task. AzureManagmentAPI.NET: .NET Wrapper for Microsoft Windows Azure Service Managment REST API. It's developed in C# and uses .NET 4.0 Framework. More Information about the Windows Azure Service Management REST API can be found here: http://msdn.microsoft.com/en-us/library/ee460799.aspxCirrus: Projet en C# qui sert au recueillement de données multi-sites.Cruise Control .NET TV: Cruise Control .NET TV puts your project's integration status on a TV and adds coverage graphs generated through NCover.DBML Updater: External tool for Visual Studio which automatically updating DBML files, according to configuration files (XML). You can also specify rules for editing and deleting elements in DBML file. And supports source code generation on the end.EPiServer CMS Page Type Extensions: EPiServer PageTypeExtensions provide additional features related to page types in EPiServer CMS 6. They allow the developer to set restrictions on the number of pages that can be created under a page and also provide page type image preview functionality.Excel add-in library: Create Excel xll add-ins.Gendering Add-In for Microsoft Office Word 2010: Word Add-In that assists user by giving hints to write gender-neutral documents. The current function is a post-processing function to verify a written text against the rules of gender-neutral definition in German Language. The definitions are implemented in form of words and phrases and their gender-neutral replacement as a suggestion. The documentation is written in German. Word Add-In, das eine Unterstützung bietet, einen bereits geschriebenen Text zu überprüfen, nach einem definierten ...Kinductor: Kinductor puts you on the podium and in control of a full symphonic orchestra using just your hands and Kinect.LPFM Last.fm Scrobbler: LPFM Last.fm Scrobbler is a simple .NET API library for scrobbling to the Last.fm web service. It is designed for desktop, web and mobile applications that target the .NET 4 Framework. The library supports the Scrobble and Now Playing functionality of the Last.fm API version 2.0MDB RIA Service Generator: Auto-generates RIA service from a given mdb to create a lightswitch extension.mediaplayer-isen: super projet qui envoie du fat !!! ^^Metodología General Ajustada - MGA: Herramienta tecnológica que apoya la Metodología para la formulación y evaluación de Proyectos de Inversión Pública mejorada en Colombia. Metodología General Ajustada - MGA. Desarrollado en Visual C# 2008 y Base de datos SQL Server 2008.M-i-c-r-o-S-o-f-t-W-M-S: M8i8c8r8o8S8o8f8t M8i8c8r8o8S8o8f8t M8i8c8r8o8S8o8f8tmim: TBAMinecraft data viewing tools: A little toolset for Minecraft server. Contains a basic NBT reader and ingame map viewer.PHPCSERP: PHPCS ERP ????????????????????,???????????PDF???。 ???????????,??,??。 PHPCS ERP ??????????????????。 PHPCS ERP ?????????????,??????????????????????IT?????????????。 ??????????????? PHPCS ERP。 ??????IT?????????,??????????。Rangers Build Customization Guide: Scenario based and hands-on guidance for the customization and deployment of TFS Builds activities such as versioning, code signing, branching. Rangers Lab Management Guide: Practical and scenario-based guidance, backed by custom VM Template automation for reference environments Snail-Blog: ??asp.net?????SSIS Extensions - SFTP Task, PGP Task, Zip Task: A set of custom tasks to extend SSIS. Includes a SFTP task, PGP encryption task and zip/unzip task.stage: asp.net opensource testprojectTimeBook: Project Description A simple asp.net mvc project to manage time. The main reason for the project is to learn asp.net mvc. The end product will have the following features. Multiple companies/individuals can sign up. Each company/individual can add/remove/update their clients. UMDH Tracer: Tool that generates & exploits UMDH Dump so that leaks detection is easier.

    Read the article

  • The Shift: how Orchard painlessly shifted to document storage, and how it’ll affect you

    - by Bertrand Le Roy
    We’ve known it all along. The storage for Orchard content items would be much more efficient using a document database than a relational one. Orchard content items are composed of parts that serialize naturally into infoset kinds of documents. Storing them as relational data like we’ve done so far was unnatural and requires the data for a single item to span multiple tables, related through 1-1 relationships. This means lots of joins in queries, and a great potential for Select N+1 problems. Document databases, unfortunately, are still a tough sell in many places that prefer the more familiar relational model. Being able to x-copy Orchard to hosters has also been a basic constraint in the design of Orchard. Combine those with the necessity at the time to run in medium trust, and with license compatibility issues, and you’ll find yourself with very few reasonable choices. So we went, a little reluctantly, for relational SQL stores, with the dream of one day transitioning to document storage. We have played for a while with the idea of building our own document storage on top of SQL databases, and Sébastien implemented something more than decent along those lines, but we had a better way all along that we didn’t notice until recently… In Orchard, there are fields, which are named properties that you can add dynamically to a content part. Because they are so dynamic, we have been storing them as XML into a column on the main content item table. This infoset storage and its associated API are fairly generic, but were only used for fields. The breakthrough was when Sébastien realized how this existing storage could give us the advantages of document storage with minimal changes, while continuing to use relational databases as the substrate. public bool CommercialPrices { get { return this.Retrieve(p => p.CommercialPrices); } set { this.Store(p => p.CommercialPrices, value); } } This code is very compact and efficient because the API can infer from the expression what the type and name of the property are. It is then able to do the proper conversions for you. For this code to work in a content part, there is no need for a record at all. This is particularly nice for site settings: one query on one table and you get everything you need. This shows how the existing infoset solves the data storage problem, but you still need to query. Well, for those properties that need to be filtered and sorted on, you can still use the current record-based relational system. This of course continues to work. We do however provide APIs that make it trivial to store into both record properties and the infoset storage in one operation: public double Price { get { return Retrieve(r => r.Price); } set { Store(r => r.Price, value); } } This code looks strikingly similar to the non-record case above. The difference is that it will manage both the infoset and the record-based storages. The call to the Store method will send the data in both places, keeping them in sync. The call to the Retrieve method does something even cooler: if the property you’re looking for exists in the infoset, it will return it, but if it doesn’t, it will automatically look into the record for it. And if that wasn’t cool enough, it will take that value from the record and store it into the infoset for the next time it’s required. This means that your data will start automagically migrating to infoset storage just by virtue of using the code above instead of the usual: public double Price { get { return Record.Price; } set { Record.Price = value; } } As your users browse the site, it will get faster and faster as Select N+1 issues will optimize themselves away. If you preferred, you could still have explicit migration code, but it really shouldn’t be necessary most of the time. If you do already have code using QueryHints to mitigate Select N+1 issues, you might want to reconsider those, as with the new system, you’ll want to avoid joins that you don’t need for filtering or sorting, further optimizing your queries. There are some rare cases where the storage of the property must be handled differently. Check out this string[] property on SearchSettingsPart for example: public string[] SearchedFields { get { return (Retrieve<string>("SearchedFields") ?? "") .Split(new[] {',', ' '}, StringSplitOptions.RemoveEmptyEntries); } set { Store("SearchedFields", String.Join(", ", value)); } } The array of strings is transformed by the property accessors into and from a comma-separated list stored in a string. The Retrieve and Store overloads used in this case are lower-level versions that explicitly specify the type and name of the attribute to retrieve or store. You may be wondering what this means for code or operations that look directly at the database tables instead of going through the new infoset APIs. Even if there is a record, the infoset version of the property will win if it exists, so it is necessary to keep the infoset up-to-date. It’s not very complicated, but definitely something to keep in mind. Here is what a product record looks like in Nwazet.Commerce for example: And here is the same data in the infoset: The infoset is stored in Orchard_Framework_ContentItemRecord or Orchard_Framework_ContentItemVersionRecord, depending on whether the content type is versionable or not. A good way to find what you’re looking for is to inspect the record table first, as it’s usually easier to read, and then get the item record of the same id. Here is the detailed XML document for this product: <Data> <ProductPart Inventory="40" Price="18" Sku="pi-camera-box" OutOfStockMessage="" AllowBackOrder="false" Weight="0.2" Size="" ShippingCost="null" IsDigital="false" /> <ProductAttributesPart Attributes="" /> <AutoroutePart DisplayAlias="camera-box" /> <TitlePart Title="Nwazet Pi Camera Box" /> <BodyPart Text="[...]" /> <CommonPart CreatedUtc="2013-09-10T00:39:00Z" PublishedUtc="2013-09-14T01:07:47Z" /> </Data> The data is neatly organized under each part. It is easy to see how that document is all you need to know about that content item, all in one table. If you want to modify that data directly in the database, you should be careful to do it in both the record table and the infoset in the content item record. In this configuration, the record is now nothing more than an index, and will only be used for sorting and filtering. Of course, it’s perfectly fine to mix record-backed properties and record-less properties on the same part. It really depends what you think must be sorted and filtered on. In turn, this potentially simplifies migrations considerably. So here it is, the great shift of Orchard to document storage, something that Orchard has been designed for all along, and that we were able to implement with a satisfying and surprising economy of resources. Expect this code to make its way into the 1.8 version of Orchard when that’s available.

    Read the article

  • The Data Scientist

    - by BuckWoody
    A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I think it means, and why I’m excited about it. In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets. The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly. I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas. Persistence The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all? This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization. Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to. Processing Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on. A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one. Presentation This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays. This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does. And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist. Why I’m excited I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail. Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data. So  watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way.

    Read the article

  • Packaging Swing apps with integrated JavaFX content

    - by igor
    JavaFX provides a lot of interesting capabilities for developing rich client applications in Java, but what if you are working on an existing Swing application and you want to take advantage of these new features?  Maybe you want to use one or two controls like the LineChart or a MediaView.  Maybe you want to embed a large Scene Graph as an initial step in porting your application to FX.  A hybrid Swing/FX application might just be the answer. Developing a hybrid Swing + JavaFX application is not terribly difficult, but until recently the deployment of hybrid applications has not simple as a "pure" JavaFX application.  The existing tools focused on packaging FX Applications, or Swing applications - they did not account for hybrid applications. But with JavaFX 2.2 the tools include support for this hybrid application use case.  Solution  In JavaFX 2.2 we extended the packaging ant tasks to greatly simplify deploying hybrid applications.  You now use the same deployment approach as you would for pure JavaFX applications.  Just bundle your main application jar with the fx:jar ant task and then generate html/jnlp files using fx:deploy.  The only difference is setting toolkit attribute for the fx:application tag as shown below: <fx:application id="swingFXApp" mainClass="${main.class}" toolkit="swing"/>  The value of ${main.class} in the example above is your application class which has a main method.  It does not need to extend JavaFX Application class. The resulting package provides support for the same set of execution modes as a package for a JavaFX application, although the packages which are created are not identical to the packages created for a pure FX application.  You will see two JNLP files generated in the case of a hybrid application - one for use from Swing applet and another for the webstart launch.  Note that these improvements do not alter the set of features available to Swing applications. The packaging tools just make it easier to use the advanced features of JavaFX in your Swing application. The same limits still apply, for example a Swing application can not use JavaFX Preloaders and code changes are necessary to support HTML splash screens. Why should I use the JavaFX ant tasks for packaging my Swing application?  While using FX packaging tool for a Swing application may seem like a mismatch at face value, there are some really good reasons to use this approach.  The primary justification for our packaging tools is to simplify the creation of your application artifacts, and to reduce manual errors.  Plus, no one should have to write JNLP by hand. Some specific benefits include: Your application jar will include a launcher program.  This improves your standalone launch by: checking for the JavaFX runtime guiding the user through any necessary installations setting the system proxy for Java The ant tasks will generate JNLP and HTML files for your swing app: avoids learning unnecessary details about JNLP, and eliminates the error-prone hand editing of JNLP files simplifies using advanced features like embedding JNLP and signing jars as BLOBs to improve launch performance.you can also embed the signing certificate details to improve the user's experience  allows the use of web page templates to inject the generated code directly into your actual web page instead of being forced to copy/paste the generated code snippets. What about native packing? Absolutely!  The very same ant task can generate a native bundle for a Swing application with JavaFX content.  Try running one of these sample native bundles for the "SwingInterop" FX example: exe and dmg.   I also used another feature on these examples: a click-through license agreement for .exe installers and OS X DMG drag installers. Small Caveat This packaging procedure is optimized around using the JavaFX packaging tools for your entire Swing application.  If you are trying to embed JavaFX content into existing project (with an existing build/packing process) then you may need to experiment in order to find the best way to integrate the JavaFX packaging steps into your existing build procedure. As long as you can use ant in your build process this should be a workable approach. It some cases solution could be less than ideal. For example, you need to use fx:jar to package your main jar file in order to produce a double-clickable jar or a native bundle.  The jar will be created from scratch, but you may already be creating the main jar file with a custom manifest.  This may lead to some redundant steps in your build process.  Hopefully the benefits will outweigh the problems. This is an area of ongoing development for the team, and we will continue to refine and improve both the tools and the process. Please share your experiences and suggestions with us.  You can comment here on the blog or file issues to JIRA. Sample code Here is the full ant code used to package SwingInterop.  You can grab latest JavaFX samples and try it yourself:  <target name="-post-jar"> <taskdef resource="com/sun/javafx/tools/ant/antlib.xml" uri="javafx:com.sun.javafx.tools.ant" classpath="${javafx.tools.ant.jar}"/> <!-- Mark application as Swing-based --> <fx:application id="swingFXApp" mainClass="${main.class}" toolkit="swing"/> <!-- Create doubleclickable jar file with embedded launcher --> <fx:jar destfile="${dist.jar}"> <fileset dir="${build.classes.dir}"/> <fx:application refid="swingFXApp" name="SwingInterop"/> <manifest> <attribute name="Implementation-Vendor" value="${application.vendor}"/> <attribute name="Implementation-Title" value="${application.title}"/> <attribute name="Implementation-Version" value="1.0"/> </manifest> </fx:jar> <!-- sign application jar. Use new self signed certificate --> <delete file="${build.dir}/test.keystore"/> <genkey alias="TestAlias" storepass="xyz123" keystore="${build.dir}/test.keystore" dname="CN=Samples, OU=JavaFX Dev, O=Oracle, C=US"/> <fx:signjar keystore="${build.dir}/test.keystore" alias="TestAlias" storepass="xyz123"> <fileset file="${dist.jar}"/> </fx:signjar> <!-- generate JNLPs, HTML and native bundles --> <fx:deploy width="960" height="720" includeDT="true" nativeBundles="all" outdir="${basedir}/${dist.dir}" embedJNLP="true" outfile="${application.title}"> <fx:application refId="swingFXApp"/> <fx:resources> <fx:fileset dir="${basedir}/${dist.dir}" includes="SwingInterop.jar"/> </fx:resources> <fx:permissions/> <info title="Sample app: ${application.title}" vendor="${application.vendor}"/> </fx:deploy> </target>

    Read the article

  • Extending Oracle CEP with Predictive Analytics

    - by vikram.shukla(at)oracle.com
    Introduction: OCEP is often used as a business rules engine to execute a set of business logic rules via CQL statements, and take decisions based on the outcome of those rules. There are times where configuring rules manually is sufficient because an application needs to deal with only a small and well-defined set of static rules. However, in many situations customers don't want to pre-define such rules for two reasons. First, they are dealing with events with lots of columns and manually crafting such rules for each column or a set of columns and combinations thereof is almost impossible. Second, they are content with probabilistic outcomes and do not care about 100% precision. The former is the case when a user is dealing with data with high dimensionality, the latter when an application can live with "false" positives as they can be discarded after further inspection, say by a Human Task component in a Business Process Management software. The primary goal of this blog post is to show how this can be achieved by combining OCEP with Oracle Data Mining® and leveraging the latter's rich set of algorithms and functionality to do predictive analytics in real time on streaming events. The secondary goal of this post is also to show how OCEP can be extended to invoke any arbitrary external computation in an RDBMS from within CEP. The extensible facility is known as the JDBC cartridge. The rest of the post describes the steps required to achieve this: We use the dataset available at http://blogs.oracle.com/datamining/2010/01/fraud_and_anomaly_detection_made_simple.html to showcase the capabilities. We use it to show how transaction anomalies or fraud can be detected. Building the model: Follow the self-explanatory steps described at the above URL to build the model.  It is very simple - it uses built-in Oracle Data Mining PL/SQL packages to cleanse, normalize and build the model out of the dataset.  You can also use graphical Oracle Data Miner®  to build the models. To summarize, it involves: Specifying which algorithms to use. In this case we use Support Vector Machines as we're trying to find anomalies in highly dimensional dataset.Build model on the data in the table for the algorithms specified. For this example, the table was populated in the scott/tiger schema with appropriate privileges. Configuring the Data Source: This is the first step in building CEP application using such an integration.  Our datasource looks as follows in the server config file.  It is advisable that you use the Visualizer to add it to the running server dynamically, rather than manually edit the file.    <data-source>         <name>DataMining</name>         <data-source-params>             <jndi-names>                 <element>DataMining</element>             </jndi-names>             <global-transactions-protocol>OnePhaseCommit</global-transactions-protocol>         </data-source-params>         <connection-pool-params>             <credential-mapping-enabled></credential-mapping-enabled>             <test-table-name>SQL SELECT 1 from DUAL</test-table-name>             <initial-capacity>1</initial-capacity>             <max-capacity>15</max-capacity>             <capacity-increment>1</capacity-increment>         </connection-pool-params>         <driver-params>             <use-xa-data-source-interface>true</use-xa-data-source-interface>             <driver-name>oracle.jdbc.OracleDriver</driver-name>             <url>jdbc:oracle:thin:@localhost:1522:orcl</url>             <properties>                 <element>                     <value>scott</value>                     <name>user</name>                 </element>                 <element>                     <value>{Salted-3DES}AzFE5dDbO2g=</value>                     <name>password</name>                 </element>                                 <element>                     <name>com.bea.core.datasource.serviceName</name>                     <value>oracle11.2g</value>                 </element>                 <element>                     <name>com.bea.core.datasource.serviceVersion</name>                     <value>11.2.0</value>                 </element>                 <element>                     <name>com.bea.core.datasource.serviceObjectClass</name>                     <value>java.sql.Driver</value>                 </element>             </properties>         </driver-params>     </data-source>   Designing the EPN: The EPN is very simple in this example. We briefly describe each of the components. The adapter ("DataMiningAdapter") reads data from a .csv file and sends it to the CQL processor downstream. The event payload here is same as that of the table in the database (refer to the attached project or do a "desc table-name" from a SQL*PLUS prompt). While this is for convenience in this example, it need not be the case. One can still omit fields in the streaming events, and need not match all columns in the table on which the model was built. Better yet, it does not even need to have the same name as columns in the table, as long as you alias them in the USING clause of the mining function. (Caveat: they still need to draw values from a similar universe or domain, otherwise it constitutes incorrect usage of the model). There are two things in the CQL processor ("DataMiningProc") that make scoring possible on streaming events. 1.      User defined cartridge function Please refer to the OCEP CQL reference manual to find more details about how to define such functions. We include the function below in its entirety for illustration. <?xml version="1.0" encoding="UTF-8"?> <jdbcctxconfig:config     xmlns:jdbcctxconfig="http://www.bea.com/ns/wlevs/config/application"     xmlns:jc="http://www.oracle.com/ns/ocep/config/jdbc">        <jc:jdbc-ctx>         <name>Oracle11gR2</name>         <data-source>DataMining</data-source>               <function name="prediction2">                                 <param name="CQLMONTH" type="char"/>                      <param name="WEEKOFMONTH" type="int"/>                      <param name="DAYOFWEEK" type="char" />                      <param name="MAKE" type="char" />                      <param name="ACCIDENTAREA"   type="char" />                      <param name="DAYOFWEEKCLAIMED"  type="char" />                      <param name="MONTHCLAIMED" type="char" />                      <param name="WEEKOFMONTHCLAIMED" type="int" />                      <param name="SEX" type="char" />                      <param name="MARITALSTATUS"   type="char" />                      <param name="AGE" type="int" />                      <param name="FAULT" type="char" />                      <param name="POLICYTYPE"   type="char" />                      <param name="VEHICLECATEGORY"  type="char" />                      <param name="VEHICLEPRICE" type="char" />                      <param name="FRAUDFOUND" type="int" />                      <param name="POLICYNUMBER" type="int" />                      <param name="REPNUMBER" type="int" />                      <param name="DEDUCTIBLE"   type="int" />                      <param name="DRIVERRATING"  type="int" />                      <param name="DAYSPOLICYACCIDENT"   type="char" />                      <param name="DAYSPOLICYCLAIM" type="char" />                      <param name="PASTNUMOFCLAIMS" type="char" />                      <param name="AGEOFVEHICLES" type="char" />                      <param name="AGEOFPOLICYHOLDER" type="char" />                      <param name="POLICEREPORTFILED" type="char" />                      <param name="WITNESSPRESNT" type="char" />                      <param name="AGENTTYPE" type="char" />                      <param name="NUMOFSUPP" type="char" />                      <param name="ADDRCHGCLAIM"   type="char" />                      <param name="NUMOFCARS" type="char" />                      <param name="CQLYEAR" type="int" />                      <param name="BASEPOLICY" type="char" />                                     <return-component-type>char</return-component-type>                                                      <sql><![CDATA[             SELECT to_char(PREDICTION_PROBABILITY(CLAIMSMODEL, '0' USING *))               AS probability             FROM (SELECT  :CQLMONTH AS MONTH,                                            :WEEKOFMONTH AS WEEKOFMONTH,                          :DAYOFWEEK AS DAYOFWEEK,                           :MAKE AS MAKE,                           :ACCIDENTAREA AS ACCIDENTAREA,                           :DAYOFWEEKCLAIMED AS DAYOFWEEKCLAIMED,                           :MONTHCLAIMED AS MONTHCLAIMED,                           :WEEKOFMONTHCLAIMED,                             :SEX AS SEX,                           :MARITALSTATUS AS MARITALSTATUS,                            :AGE AS AGE,                           :FAULT AS FAULT,                           :POLICYTYPE AS POLICYTYPE,                            :VEHICLECATEGORY AS VEHICLECATEGORY,                           :VEHICLEPRICE AS VEHICLEPRICE,                           :FRAUDFOUND AS FRAUDFOUND,                           :POLICYNUMBER AS POLICYNUMBER,                           :REPNUMBER AS REPNUMBER,                           :DEDUCTIBLE AS DEDUCTIBLE,                            :DRIVERRATING AS DRIVERRATING,                           :DAYSPOLICYACCIDENT AS DAYSPOLICYACCIDENT,                            :DAYSPOLICYCLAIM AS DAYSPOLICYCLAIM,                           :PASTNUMOFCLAIMS AS PASTNUMOFCLAIMS,                           :AGEOFVEHICLES AS AGEOFVEHICLES,                           :AGEOFPOLICYHOLDER AS AGEOFPOLICYHOLDER,                           :POLICEREPORTFILED AS POLICEREPORTFILED,                           :WITNESSPRESNT AS WITNESSPRESENT,                           :AGENTTYPE AS AGENTTYPE,                           :NUMOFSUPP AS NUMOFSUPP,                           :ADDRCHGCLAIM AS ADDRCHGCLAIM,                            :NUMOFCARS AS NUMOFCARS,                           :CQLYEAR AS YEAR,                           :BASEPOLICY AS BASEPOLICY                 FROM dual)                 ]]>         </sql>        </function>     </jc:jdbc-ctx> </jdbcctxconfig:config> 2.      Invoking the function for each event. Once this function is defined, you can invoke it from CQL as follows: <?xml version="1.0" encoding="UTF-8"?> <wlevs:config xmlns:wlevs="http://www.bea.com/ns/wlevs/config/application">   <processor>     <name>DataMiningProc</name>     <rules>        <query id="q1"><![CDATA[                     ISTREAM(SELECT S.CQLMONTH,                                   S.WEEKOFMONTH,                                   S.DAYOFWEEK, S.MAKE,                                   :                                         S.BASEPOLICY,                                    C.F AS probability                                                 FROM                                 StreamDataChannel [NOW] AS S,                                 TABLE(prediction2@Oracle11gR2(S.CQLMONTH,                                      S.WEEKOFMONTH,                                      S.DAYOFWEEK,                                       S.MAKE, ...,                                      S.BASEPOLICY) AS F of char) AS C)                       ]]></query>                 </rules>               </processor>           </wlevs:config>   Finally, the last stage in the EPN prints out the probability of the event being an anomaly. One can also define a threshold in CQL to filter out events that are normal, i.e., below a certain mark as defined by the analyst or designer. Sample Runs: Now let's see how this behaves when events are streamed through CEP. We use only two events for brevity, one normal and other one not. This is one of the "normal" looking events and the probability of it being anomalous is less than 60%. Event is: eventType=DataMiningOutEvent object=q1  time=2904821976256 S.CQLMONTH=Dec, S.WEEKOFMONTH=5, S.DAYOFWEEK=Wednesday, S.MAKE=Honda, S.ACCIDENTAREA=Urban, S.DAYOFWEEKCLAIMED=Tuesday, S.MONTHCLAIMED=Jan, S.WEEKOFMONTHCLAIMED=1, S.SEX=Female, S.MARITALSTATUS=Single, S.AGE=21, S.FAULT=Policy Holder, S.POLICYTYPE=Sport - Liability, S.VEHICLECATEGORY=Sport, S.VEHICLEPRICE=more than 69000, S.FRAUDFOUND=0, S.POLICYNUMBER=1, S.REPNUMBER=12, S.DEDUCTIBLE=300, S.DRIVERRATING=1, S.DAYSPOLICYACCIDENT=more than 30, S.DAYSPOLICYCLAIM=more than 30, S.PASTNUMOFCLAIMS=none, S.AGEOFVEHICLES=3 years, S.AGEOFPOLICYHOLDER=26 to 30, S.POLICEREPORTFILED=No, S.WITNESSPRESENT=No, S.AGENTTYPE=External, S.NUMOFSUPP=none, S.ADDRCHGCLAIM=1 year, S.NUMOFCARS=3 to 4, S.CQLYEAR=1994, S.BASEPOLICY=Liability, probability=.58931702982118561 isTotalOrderGuarantee=true\nAnamoly probability: .58931702982118561 However, the following event is scored as an anomaly with a very high probability of  89%. So there is likely to be something wrong with it. A close look reveals that the value of "deductible" field (10000) is not "normal". What exactly constitutes normal here?. If you run the query on the database to find ALL distinct values for the "deductible" field, it returns the following set: {300, 400, 500, 700} Event is: eventType=DataMiningOutEvent object=q1  time=2598483773496 S.CQLMONTH=Dec, S.WEEKOFMONTH=5, S.DAYOFWEEK=Wednesday, S.MAKE=Honda, S.ACCIDENTAREA=Urban, S.DAYOFWEEKCLAIMED=Tuesday, S.MONTHCLAIMED=Jan, S.WEEKOFMONTHCLAIMED=1, S.SEX=Female, S.MARITALSTATUS=Single, S.AGE=21, S.FAULT=Policy Holder, S.POLICYTYPE=Sport - Liability, S.VEHICLECATEGORY=Sport, S.VEHICLEPRICE=more than 69000, S.FRAUDFOUND=0, S.POLICYNUMBER=1, S.REPNUMBER=12, S.DEDUCTIBLE=10000, S.DRIVERRATING=1, S.DAYSPOLICYACCIDENT=more than 30, S.DAYSPOLICYCLAIM=more than 30, S.PASTNUMOFCLAIMS=none, S.AGEOFVEHICLES=3 years, S.AGEOFPOLICYHOLDER=26 to 30, S.POLICEREPORTFILED=No, S.WITNESSPRESENT=No, S.AGENTTYPE=External, S.NUMOFSUPP=none, S.ADDRCHGCLAIM=1 year, S.NUMOFCARS=3 to 4, S.CQLYEAR=1994, S.BASEPOLICY=Liability, probability=.89171554529576691 isTotalOrderGuarantee=true\nAnamoly probability: .89171554529576691 Conclusion: By way of this example, we show: real-time scoring of events as they flow through CEP leveraging Oracle Data Mining.how CEP applications can invoke complex arbitrary external computations (function shipping) in an RDBMS.

    Read the article

  • CodePlex Daily Summary for Thursday, June 16, 2011

    CodePlex Daily Summary for Thursday, June 16, 2011Popular ReleasesTibiaPingFixer: TibiaPingFixer v.1.0: TibiaPingFixer v.1.0TerrariViewer: TerrariViewer v3.1 [Terraria Inventory Editor]: This version adds tool tips. Almost every picture box you mouse over will tell you what item is in that box. I have also cleaned up the GUI a little more to make things easier on my end. There are various bug fixes including ones associated with opening different characters in the same instance of the program. As always, please bring any bugs you find to my attention.CommonLibrary.NET: CommonLibrary.NET - 0.9.7 Beta: A collection of very reusable code and components in C# 3.5 ranging from ActiveRecord, Csv, Command Line Parsing, Configuration, Holiday Calendars, Logging, Authentication, and much more. Samples in <root>\src\Lib\CommonLibrary.NET\Samples CommonLibrary.NET 0.9.7Documentation 6738 6503 New 6535 Enhancements 6583 6737DropBox Linker: DropBox Linker 1.2: Public sub-folders are now monitored for changes as well (thanks to mcm69) Automatic public sync folder detection (thanks to mcm69) Non-Latin and special characters encoded correctly in URLs Pop-ups are now slot-based (use first free slot and will never be overlapped — test it while previewing timeout) Public sync folder setting is hidden when auto-detected Timeout interval is displayed in popup previews A lot of major and minor code refactoring performed .NET Framework 4.0 Client...Terraria World Viewer: Version 1.3: Update June 15th Removed "Draw Markers" checkbox from main window because of redundancy/confusing. (Select all or no items from the Settings tab for the same effect.) Fixed Marker preferences not being saved. It is now possible to render more than one map without having to restart the application. World file will not be locked while the world is being rendered. Note: The World Viewer might render an inaccurate map or even crash if Terraria decides to modify the World file during the pro...MVC Controls Toolkit: Mvc Controls Toolkit 1.1.5 RC: Added Extended Dropdown allows a prompt item to be inserted as first element. RequiredAttribute, if present, trggers if no element is chosen Client side javascript function to set/get the values of DateTimeInput, TypedTextBox, TypedEditDisplay, and to bind/unbind a "change" handler The selected page in the pager is applied the attribute selected-page="selected" that can be used in the definition of CSS rules to style the selected page items controls now interpret a null value as an empr...Umbraco CMS: Umbraco CMS 5.0 CTP 1: Umbraco 5 Community Technology Preview Umbraco 5 will be the next version of everyone's favourite, friendly ASP.NET CMS that already powers over 100,000 websites worldwide. Try out our first CTP of version 5 today! If you're new to Umbraco and would like to get a quick low-down on our popular and easy-to-learn approach to content management, check out our intro video here. What's in the v5 CTP box? This is a preview version of version 5 and includes support for the following familiar Umbr...Ribbon Browser for Microsoft Dynamics CRM 2011: Ribbon Browser (1.0.514.30): Initial releaseCoding4Fun Kinect Toolkit: Coding4Fun.Kinect Toolkit: Version 1.0Kinect Mouse Cursor: Kinect Mouse Cursor v1.0: The initial release of the Kinect Mouse Cursor project!patterns & practices: Project Silk: Project Silk Community Drop 11 - June 14, 2011: Changes from previous drop: Many code changes: please see the readme.mht for details. New "Client Data Management and Caching" chapter. Updated "Application Notifications" chapter. Updated "Architecture" chapter. Updated "jQuery UI Widget" chapter. Updated "Widget QuickStart" appendix and code. Guidance Chapters Ready for Review The Word documents for the chapters are included with the source code in addition to the CHM to help you provide feedback. The PDF is provided as a separat...Orchard Project: Orchard 1.2: Build: 1.2.41 Published: 6/14/2010 How to Install Orchard To install Orchard using Web PI, follow these instructions: http://www.orchardproject.net/docs/Installing-Orchard.ashx. Web PI will detect your hardware environment and install the application. Alternatively, to install the release manually, download the Orchard.Web.1.2.41.zip file. http://orchardproject.net/docs/Manually-installing-Orchard-zip-file.ashx The zip contents are pre-built and ready-to-run. Simply extract the contents o...PowerGUI Visual Studio Extension: PowerGUI VSX 1.3.4: Changes - Got rid of suppressed exceptions on assemblies loading at project startup - Fixed Issue #28535 "No Print Support" - Enabled IntelliSence commands wich are supported by ActiPro Syntax Editor control: ToggleBookmark, NextBookmark, PreviousBookmark, ShowMemberList - Added missing Import directives in PS Script project template - Fixed exception occurring on debug start - Fixed an issue: after creating a new PS project, a debugging session hung being run for the second timeSnippet Designer: Snippet Designer 1.4.0: Snippet Designer 1.4.0 for Visual Studio 2010 Change logSnippet Explorer ChangesReworked language filter UI to work better in the side bar. Added result count drop down which lets you choose how many results to see. Language filter and result count choices are persisted after Visual Studio is closed. Added file name to search criteria. Search is now case insensitive. Snippet Editor Changes Snippet Editor ChangesAdded menu option for the $end$ symbol which indicates where the c...Mobile Device Detection and Redirection: 1.0.4.1: Stable Release 51 Degrees.mobi Foundation is the best way to detect and redirect mobile devices and their capabilities on ASP.NET and is being used on thousands of websites worldwide. We’re highly confident in our software and we recommend all users update to this version. Changes to Version 1.0.4.1Changed the BlackberryHandler and BlackberryVersion6Handler to have equal CONFIDENCE values to ensure they both get a chance at detecting BlackBerry version 4&5 and version 6 devices. Prior to thi...Rawr: Rawr 4.1.06: This is the Downloadable WPF version of Rawr!For web-based version see http://elitistjerks.com/rawr.php You can find the version notes at: http://rawr.codeplex.com/wikipage?title=VersionNotes Rawr AddonWe now have a Rawr Official Addon for in-game exporting and importing of character data hosted on Curse. The Addon does not perform calculations like Rawr, it simply shows your exported Rawr data in wow tooltips and lets you export your character to Rawr (including bag and bank items) like Char...AcDown????? - Anime&Comic Downloader: AcDown????? v3.0 Beta6: ??AcDown?????????????,?????????????,????、????。?????Acfun????? ????32??64? Windows XP/Vista/7 ????????????? ??:????????Windows XP???,?????????.NET Framework 2.0???(x86)?.NET Framework 2.0???(x64),?????"?????????"??? ??v3.0 Beta6 ?????(imanhua.com)????? ???? ?? ??"????","?????","?????","????"?????? "????"?????"????????"?? ??????????? ?????????????? ?????????????/???? ?? ????Windows 7???????????? ????????? ?? ????????????? ???????/??????????? ???????????? ?? ?? ?????(imanh...Pulse: Pulse Beta 2: - Added new wallpapers provider http://wallbase.cc. Supports english search, multiple keywords* - Improved font rendering in Options window - Added "Set wallpaper as logon background" option* - Fixed crashes if there is no internet connection - Fixed: Rewalls downloads empty images sometimes - Added filters* Note 1: wallbase provider supports only english search. Rewalls provider supports only russian search but Pulse automatically translates your english keyword into russian using Google Tr...WPF Application Framework (WAF): WPF Application Framework (WAF) 2.0.0.7: Version: 2.0.0.7 (Milestone 7): This release contains the source code of the WPF Application Framework (WAF) and the sample applications. Requirements .NET Framework 4.0 (The package contains a solution file for Visual Studio 2010) The unit test projects require Visual Studio 2010 Professional Remark The sample applications are using Microsoft’s IoC container MEF. However, the WPF Application Framework (WAF) doesn’t force you to use the same IoC container in your application. You can use ...Windows Azure VM Assistant: AzureVMAssist V1.0.0.5: AzureVMAssist V1.0.0.5 (Debug) - Test Release VersionNew ProjectsASP.NET REST Services Framework: This framework provides capability to work with backend server-side .NET code via REST services from client-side javascript or other types of client code. REST-service component is a server-side framework that allows easy creation and working with REST services within any ASP.NET application. Ones a REST-service is defined it can be consumed via regular URL, or using client-side javascript call that resembles the standard C# style function call that is expected to be used within server-sid...ASP.NET, MVC, Learning: This project is for MojtabaSahraei's blog ResourceAuto Downloads Service: ADSrv (Auto Downloads Service) is a windows services (based on BITS) to add, remove and track downloads from several text files.BizTalk BDD Sample: This project is to go alongside the videos I have recently done about BDD and acceptance testing in BizTalk development.Bluvee Boxer: Video conveter for the WD TV Live Hub.Clomibep: PL: Zaawansowany system zarzadzania trescia Clomibep. EN: Advenced content managment system ClomibepCVPAT: CVPAT is a Process Automation ToolDigital Life Assistant Framework: DLAEF SharePoint 2010 web parts: SharePoint 2010 visual web parts ( SharePoint 2010 only ) Please change "Deploy.cmd" with the correct SharePoint site url, then run it from the SharePoint 2010 server.Entity Framework Query Visualizer: This is a visual studio debug visualizer for retrieving the SQL query generated by the Entity Framework at run time. In order to install this visualizer, you need to copy the downloaded DLL file ( EntityFrameworkLinqQueryVisualizer.dll ) to "C:\Users\<User Name>\Documents\Visual Studio 2010\Visualizers" folderHighYouth: HighYouthHMM-CMS: CMS pour le site HMMICompas: Sample startup siteKontrolDJNET: KontrolDJ.NET is: * A midi translator for KontrolDJ KDJ500 controller: This software is designed to work with Traktor Pro 2.0.1, Traktor Pro 1.0.1 or Traktor 3.4. (4 Decks support, Led feedback, Soft Takeover, ...) * An HID to Midi translator for all your gamepads, joysticks, ... This software is designed to work with Windows XP SP3, Vista and Seven. OS: Windows XP SP3, Vista and Seven (32 or 64bits). LevelZap: LevelZap is a Windows Explorer add-on that adds an item to the contextual menu on all folders allowing the user to "zap" the folder by moving all files/folders within it up one level, then deleting the folder itself. Works on Windows XP or later, both 32-bit and 64-bit versions.Locadora de Veiculos: Locadora de Veiculos - Projeto teste da pós graduaçãoMediator Framework: LINQ DataSource Integration FrameworkMetin2 Patcher: This project is a patcher. First Release Under ConstructionMVC Obsidian: Obsidian aims at creating a solid Quickstart solution for MVC3 projects.Orchard Delete Content Type: This Orchard modules provides a feature to delete dynamic content types.Osbourne Shell (Forth-like scripting language for .NET): I wrote it under the influence of LSD. There are a lot of architectural & codding mistakes and I do not want to even try to correct them. So, enjoy, lol.PowerShell EventLogWatcher Module: A PowerShell module that provides some additional functions to enhance PowerShell Eventing in relation to Windows Event Log events. Subscriptions can be made and actions taken when new events are written to a log. In a sense, this can be used as "poor mans" auditing system.Present it now!: PresentItNow allows to present the desktop to others on the LAN. Since SharedView does not work with IE9 and Netmeeting is not working on Vista/Windows 7 there is a need for a tool to be able to share the desktop with others on the LAN. This is a simple tool written in C#.Quadruple 128-bit Floating Point Library: 128-bit floating point library with 64 effective bits of precision (vs. 53 for the built-in Double type) and a 64 bit exponent (vs. 11 for Doubles). Greater range avoids under/overflows and makes log arithmetic unnecessary.Ribbon Browser for Microsoft Dynamics CRM 2011: This tool helps developer to browse ribbons in Microsoft Dynamics CRM. It makes easier to identify ribbon controls properties.Rsp.Windows.Forms: This project includes several custom Button types, Windows Form types, a numeric textbox and a custom MessageBox class. * RoundedButton - A button with rounded corners. * ShadedButton - A button with customizable shine. * ColorizedButton - A button with customizable Tint color for specified background image. * NumericTextBox - Textbox allowing only numeric input. * MsgBoxUI - Alternative to Windows MessageBox with a nicer look. * ShadowedForm - Windows form with a shadow. ...SocialTFS: SocialTFS is an extension of the Team Foundation Server which provides members of a global software team with information collected from Enterprise 2.0 applications, such as professional social networks and corporate microblogging. SocialTFS makes it easier for members of large distributed software teams to get in touch with each other, using corporate microblogging services (first StatusNet, then Yammer) and professional SNS profiles (Ohloh and LinkedIn). SocialTFS is part of a researc...SQLite Code Generator: Contains a stand alone GUI application and a Visual Studio Custom Tool for automatically generating a .NET data access layer code for objects in a SQLite database.Taste : state machines made easy: Taste is a lightweight state machine implementation for .NET. Its main purpose is to simplify the implementation of complex ViewModels in WPF and Silverlight applications, where the code to execute, the commands to enable and their effects depend on the current state of the View.Tau: TauTelerik MVC Music Store: This project has Telerik OpenAccess ORM as its database access logic and is entirely based of http://mvcmusicstore.codeplex.com/ . TextFileToGrid: This is the library made specifically to render the text file data stored in tabular form into data grid view.TFS Scrumboard: TFS Scrumboard is an extension to TFS 2010 Web Access, providing easy planning and managing of workitem progress.Umbraco Advertising Management: This is the home page for the Umbraco Advertising Management Project. Umbraco CMS is an .NET opensource CMS. This project has just started, you can download the source code of the initial version. The objective of this project is to create a package that would provide a new toVAI: The goal of this project is to create a home entertainment solution focused on various forms of user interaction such as audio, video, and traditional.XBee DSS service for Robotics Studio: This is a Microsoft Robotics Studio DSS service used to communicate with XBee devices. It is able to send messages to remote end devices and receive data samples from them. It is built on top of the Grommet library.????: ??:???

    Read the article

  • WhatsApp &amp; Tasker for Android &ndash; Read &amp; Write messages

    - by Shaurya Anand
    So, I finally gave up on all my previous the Microsoft Mobile/Phone OS devices and made my switch to Android this year. I am using my Samsung Galaxy Note GT-N7000 with CyanogenMod 9.1.0 (http://get.cm/get/jenkins/7086/cm-9.1.0-n7000.zip) and ClockworkMod 6.0.1.2 (http://download2.clockworkmod.com/recoveries/recovery-clockwork-6.0.1.2-n7000.zip) since August this year and I am so happy with the performance and the flexibility it offers me. As a software developer by profession, I would expect most of my gadget to be highly customizable and programmable (one time or at intervals) to suit my needs as close as it can. I was introduced to Automation for Android – Tasker (https://play.google.com/store/apps/details?id=net.dinglisch.android.taskerm&hl=en) via reddit (http://www.reddit.com/r/tasker) and the word ‘automation’ was enough for me to dive right into this app. Only automation that I did earlier was switching profiles depending on location on there phones. And now, just imagine a complete set of possibilities that can be automate on the phone or via the phone. I did my research and found a couple of other tools that do the same/as close as what Tasker can do and few of them are even free. There’s one even by Microsoft called on{X} (https://play.google.com/store/apps/details?id=com.microsoft.onx.app&hl=en). Microsoft’s on{X} really caught my eye. You can write code for your phone on the web application by them, deploy it on your phone and even trace the flow all using your PC. Really brilliant, I love the fact that it’s all JavaScript. Here comes the but, it is still very very young and it’s policy of accessing my News Feed on Facebook is not something that I can not digest. On{X} is good, but as I said earlier, the API is not very mature and hence, I gave up on it. I bought Tasker, the best 5,00 € I spent in ages and I want to talk about it in this post. I am still a “noob” while operating this tool, but I tried my shot at automating WhatsApp (https://play.google.com/store/apps/details?id=com.whatsapp&hl=en), a popular messenger for various platform. The requirement for the automation is that, if I send a WhatsApp ‘wru’ message to the phone, it should respond back giving the location and battery level of my phone. It could be useful, if you like to locate your misplaced phone or automatically reply to your partner/friend, honestly, I don’t know what you will use it - through this post, I am just introducing automating WhatsApp using Tasker. Before we begin, the following script only works when your phone is rooted as we will be accessing the WhatsApp database and type some special characters like ‘:’. Let’s follow the code line by line: Profile:         Location request from XYZ. (12) // Name of your profile. Event:         Notification [ Owner Application:WhatsApp Title:* ] // When a new notification comes from WhatsApp, this event is fired. Read the end note, if you face problems with Chrome app after enabling Tasker accessibility. Enter:         A1: Run Shell [ Command:sqlite3 // We will access the WhatsApp database and check if the message comes from designated phone number or not. We mustn’t reply to every message.                 /data/data/com.whatsapp/databases/msgstore.db "SELECT _id, data FROM                  messages WHERE key_from_me='0' AND key_remote_jid LIKE '%XXXXXXXXXXX%' // Replace XXXXXXXXXXX with the phone number of your message sender.                 ORDER BY _id DESC LIMIT 1;" Timeout (Seconds):10 Use Root:On Store // I made a timeout for 10 seconds, if in case WhatsApp is busy accessing the database.                 Result In:%WHATSAPP_CURRREQ ] // Store the read Id and the last message on to the variable %WHATSAPP_CURRREQ         A2: If [ %WHATSAPP_CURRREQ ~R .*[wW][rR][uU].* ] // Check if the pattern of the message is correct and we are all set to send the location.                 A3: If [ %WHATSAPP_CURRREQ !~ %WHATSAPP_LASTREQ ] // Verify that the message is different from the last request. Remember every message has a unique Id.                         A4: Notify [ Title:WhatsApp location request... Text:Sending location // Just a notification that the location message is being prepared.                                 to Krati Gupta... Icon:<icon> Number:0 Permanent:On Priority:3 ] // Make a note it is a permanent notification, we will clear it later.                         A5: Secure Settings [ Configuration:Pattern Lock Disabled // I am disabling the pattern lock, that I use using the plugin Secure Settings.                                 Package:com.intangibleobject.securesettings.plugin Name:Secure // You can download the plugin from here: https://play.google.com/store/apps/details?id=com.intangibleobject.securesettings.plugin&hl=en                                 Settings ]                         A6: Secure Settings [ Configuration:Keyguard Disabled // Disable the keygaurd, it is useful, when your phone is on lock and you want to automate everything, even the typing.                                 Package:com.intangibleobject.securesettings.plugin Name:Secure                                 Settings ]                         A7: Secure Settings [ Configuration:GPS Enabled // Pretty clear, turn on the GPS and get location at A8                                 Package:com.intangibleobject.securesettings.plugin Name:Secure                                 Settings ]                         A8: AutoShortcut [ Configuration:WhatsApp: Some One // I am using AutoShortcut plugin (https://play.google.com/store/apps/details?id=com.joaomgcd.autoshortcut) to start WhatsApp with the indented recipient.                                 Package:com.joaomgcd.autoshortcut Name:AutoShortcut ] // Replace Some One, actually choose it from the plugin, the right recipient.                         A9: Get Location [ Source:Any Timeout (Seconds):30 Continue Task // I am getting the location, timeout is 30 seconds, adjust it accordingly.                                 Immediately:Off Keep Tracking:Off ]                         A10: Secure Settings [ Configuration:Screen Dim // Now, this extension of the plugin Secure Settings, wakes your device so that you can type out the string on the WhatsApp app.                                 5 Seconds Package:com.intangibleobject.securesettings.plugin                                 Name:Secure Settings ]                         A11: Run Shell [ Command:input text // Now, I am using the shell script to type the text to the window, because the ‘:’ while not be typed from the Type task in Tasker.                                 LOCATION:maps.google.com/maps?q=%LOC Timeout (Seconds):0 Use Root:On // And also, this is way faster, but remember you need root for this, not for the other way of typing.                                 Store Result In: ]                         A12: Dpad [ Button:Right Repeat Times:1 ] // Focus the Send button                         A13: Dpad [ Button:Press Repeat Times:1 ] // And press it.                         A14: Dpad [ Button:Left Repeat Times:1 ] // Get back to the typing box.                         A15: Run Shell [ Command:input text LOCATION_ACCURACY:%LOCACC Timeout                                 (Seconds):0 Use Root:On Store Result In: ]                         A16: Dpad [ Button:Right Repeat Times:1 ]                         A17: Dpad [ Button:Press Repeat Times:1 ]                         A18: Dpad [ Button:Left Repeat Times:1 ]                         A19: Run Shell [ Command:input text BATTERY_LEVEL:%BATT% Timeout // I am adding Battery level in my case as well.                                 (Seconds):0 Use Root:On Store Result In: ]                         A20: Dpad [ Button:Right Repeat Times:1 ]                         A21: Dpad [ Button:Press Repeat Times:1 ]                         A22: Variable Set [ Name:%WHATSAPP_LASTREQ To:%WHATSAPP_CURRREQ Do // And now, we say, request is done.                                 Maths:Off Append:Off ]                         A23: Button [ Button:Back ] // I am exiting the WhatsApp nicely and not killing it. If you are the murderer kind, kill it, just know, you don’t have any place in the heaven.                         A24: Button [ Button:Back ]                         A25: Notify Cancel [ Title: Warn Not Exist:Off ] // Remove the permanent notification.                         A26: Notify [ Title:WhatsApp location request Text:Location sent // Make a temporary notification, and say, location is sent.                                 successfully. Icon:<icon> Number:0 Permanent:Off Priority:3 ]                                                         A27: Secure Settings [ Configuration:GPS Disabled // Disable all the horrible things we turned on earlier.                                 Package:com.intangibleobject.securesettings.plugin Name:Secure                                 Settings ]                         A28: Secure Settings [ Configuration:Pattern Lock Enabled                                 Package:com.intangibleobject.securesettings.plugin Name:Secure                                 Settings ]                         A29: Secure Settings [ Configuration:Keyguard Enabled                                 Package:com.intangibleobject.securesettings.plugin Name:Secure                                 Settings ]                 A30: End If         A31: End If Download this Task from here: http://db.tt/9vRmbhyb That’s it in the above small example – you can read/write messages from/to WhatsApp app. I am using n7000-cm9.1-cwr6. Oh yea, and if you are having the Talkback auto enabled for Chrome browser, you need to turn Off the Web scripts to run. Tasker is amazing, I have automated a lot of tasks using this tool. I will share a few none generic ones with you in my coming post here.

    Read the article

  • Anatomy of a .NET Assembly - CLR metadata 1

    - by Simon Cooper
    Before we look at the bytes comprising the CLR-specific data inside an assembly, we first need to understand the logical format of the metadata (For this post I only be looking at simple pure-IL assemblies; mixed-mode assemblies & other things complicates things quite a bit). Metadata streams Most of the CLR-specific data inside an assembly is inside one of 5 streams, which are analogous to the sections in a PE file. The name of each section in a PE file starts with a ., and the name of each stream in the CLR metadata starts with a #. All but one of the streams are heaps, which store unstructured binary data. The predefined streams are: #~ Also called the metadata stream, this stream stores all the information on the types, methods, fields, properties and events in the assembly. Unlike the other streams, the metadata stream has predefined contents & structure. #Strings This heap is where all the namespace, type & member names are stored. It is referenced extensively from the #~ stream, as we'll be looking at later. #US Also known as the user string heap, this stream stores all the strings used in code directly. All the strings you embed in your source code end up in here. This stream is only referenced from method bodies. #GUID This heap exclusively stores GUIDs used throughout the assembly. #Blob This heap is for storing pure binary data - method signatures, generic instantiations, that sort of thing. Items inside the heaps (#Strings, #US, #GUID and #Blob) are indexed using a simple binary offset from the start of the heap. At that offset is a coded integer giving the length of that item, then the item's bytes immediately follow. The #GUID stream is slightly different, in that GUIDs are all 16 bytes long, so a length isn't required. Metadata tables The #~ stream contains all the assembly metadata. The metadata is organised into 45 tables, which are binary arrays of predefined structures containing information on various aspects of the metadata. Each entry in a table is called a row, and the rows are simply concatentated together in the file on disk. For example, each row in the TypeRef table contains: A reference to where the type is defined (most of the time, a row in the AssemblyRef table). An offset into the #Strings heap with the name of the type An offset into the #Strings heap with the namespace of the type. in that order. The important tables are (with their table number in hex): 0x2: TypeDef 0x4: FieldDef 0x6: MethodDef 0x14: EventDef 0x17: PropertyDef Contains basic information on all the types, fields, methods, events and properties defined in the assembly. 0x1: TypeRef The details of all the referenced types defined in other assemblies. 0xa: MemberRef The details of all the referenced members of types defined in other assemblies. 0x9: InterfaceImpl Links the types defined in the assembly with the interfaces that type implements. 0xc: CustomAttribute Contains information on all the attributes applied to elements in this assembly, from method parameters to the assembly itself. 0x18: MethodSemantics Links properties and events with the methods that comprise the get/set or add/remove methods of the property or method. 0x1b: TypeSpec 0x2b: MethodSpec These tables provide instantiations of generic types and methods for each usage within the assembly. There are several ways to reference a single row within a table. The simplest is to simply specify the 1-based row index (RID). The indexes are 1-based so a value of 0 can represent 'null'. In this case, which table the row index refers to is inferred from the context. If the table can't be determined from the context, then a particular row is specified using a token. This is a 4-byte value with the most significant byte specifying the table, and the other 3 specifying the 1-based RID within that table. This is generally how a metadata table row is referenced from the instruction stream in method bodies. The third way is to use a coded token, which we will look at in the next post. So, back to the bytes Now we've got a rough idea of how the metadata is logically arranged, we can now look at the bytes comprising the start of the CLR data within an assembly: The first 8 bytes of the .text section are used by the CLR loader stub. After that, the CLR-specific data starts with the CLI header. I've highlighted the important bytes in the diagram. In order, they are: The size of the header. As the header is a fixed size, this is always 0x48. The CLR major version. This is always 2, even for .NET 4 assemblies. The CLR minor version. This is always 5, even for .NET 4 assemblies, and seems to be ignored by the runtime. The RVA and size of the metadata header. In the diagram, the RVA 0x20e4 corresponds to the file offset 0x2e4 Various flags specifying if this assembly is pure-IL, whether it is strong name signed, and whether it should be run as 32-bit (this is how the CLR differentiates between x86 and AnyCPU assemblies). A token pointing to the entrypoint of the assembly. In this case, 06 (the last byte) refers to the MethodDef table, and 01 00 00 refers to to the first row in that table. (after a gap) RVA of the strong name signature hash, which comes straight after the CLI header. The RVA 0x2050 corresponds to file offset 0x250. The rest of the CLI header is mainly used in mixed-mode assemblies, and so is zeroed in this pure-IL assembly. After the CLI header comes the strong name hash, which is a SHA-1 hash of the assembly using the strong name key. After that comes the bodies of all the methods in the assembly concatentated together. Each method body starts off with a header, which I'll be looking at later. As you can see, this is a very small assembly with only 2 methods (an instance constructor and a Main method). After that, near the end of the .text section, comes the metadata, containing a metadata header and the 5 streams discussed above. We'll be looking at this in the next post. Conclusion The CLI header data doesn't have much to it, but we've covered some concepts that will be important in later posts - the logical structure of the CLR metadata and the overall layout of CLR data within the .text section. Next, I'll have a look at the contents of the #~ stream, and how the table data is arranged on disk.

    Read the article

  • Premature-Optimization and Performance Anxiety

    - by James Michael Hare
    While writing my post analyzing the new .NET 4 ConcurrentDictionary class (here), I fell into one of the classic blunders that I myself always love to warn about.  After analyzing the differences of time between a Dictionary with locking versus the new ConcurrentDictionary class, I noted that the ConcurrentDictionary was faster with read-heavy multi-threaded operations.  Then, I made the classic blunder of thinking that because the original Dictionary with locking was faster for those write-heavy uses, it was the best choice for those types of tasks.  In short, I fell into the premature-optimization anti-pattern. Basically, the premature-optimization anti-pattern is when a developer is coding very early for a perceived (whether rightly-or-wrongly) performance gain and sacrificing good design and maintainability in the process.  At best, the performance gains are usually negligible and at worst, can either negatively impact performance, or can degrade maintainability so much that time to market suffers or the code becomes very fragile due to the complexity. Keep in mind the distinction above.  I'm not talking about valid performance decisions.  There are decisions one should make when designing and writing an application that are valid performance decisions.  Examples of this are knowing the best data structures for a given situation (Dictionary versus List, for example) and choosing performance algorithms (linear search vs. binary search).  But these in my mind are macro optimizations.  The error is not in deciding to use a better data structure or algorithm, the anti-pattern as stated above is when you attempt to over-optimize early on in such a way that it sacrifices maintainability. In my case, I was actually considering trading the safety and maintainability gains of the ConcurrentDictionary (no locking required) for a slight performance gain by using the Dictionary with locking.  This would have been a mistake as I would be trading maintainability (ConcurrentDictionary requires no locking which helps readability) and safety (ConcurrentDictionary is safe for iteration even while being modified and you don't risk the developer locking incorrectly) -- and I fell for it even when I knew to watch out for it.  I think in my case, and it may be true for others as well, a large part of it was due to the time I was trained as a developer.  I began college in in the 90s when C and C++ was king and hardware speed and memory were still relatively priceless commodities and not to be squandered.  In those days, using a long instead of a short could waste precious resources, and as such, we were taught to try to minimize space and favor performance.  This is why in many cases such early code-bases were very hard to maintain.  I don't know how many times I heard back then to avoid too many function calls because of the overhead -- and in fact just last year I heard a new hire in the company where I work declare that she didn't want to refactor a long method because of function call overhead.  Now back then, that may have been a valid concern, but with today's modern hardware even if you're calling a trivial method in an extremely tight loop (which chances are the JIT compiler would optimize anyway) the results of removing method calls to speed up performance are negligible for the great majority of applications.  Now, obviously, there are those coding applications where speed is absolutely king (for example drivers, computer games, operating systems) where such sacrifices may be made.  But I would strongly advice against such optimization because of it's cost.  Many folks that are performing an optimization think it's always a win-win.  That they're simply adding speed to the application, what could possibly be wrong with that?  What they don't realize is the cost of their choice.  For every piece of straight-forward code that you obfuscate with performance enhancements, you risk the introduction of bugs in the long term technical debt of the application.  It will become so fragile over time that maintenance will become a nightmare.  I've seen such applications in places I have worked.  There are times I've seen applications where the designer was so obsessed with performance that they even designed their own memory management system for their application to try to squeeze out every ounce of performance.  Unfortunately, the application stability often suffers as a result and it is very difficult for anyone other than the original designer to maintain. I've even seen this recently where I heard a C++ developer bemoaning that in VS2010 the iterators are about twice as slow as they used to be because Microsoft added range checking (probably as part of the 0x standard implementation).  To me this was almost a joke.  Twice as slow sounds bad, but it almost never as bad as you think -- especially if you're gaining safety.  The only time twice is really that much slower is when once was too slow to begin with.  Think about it.  2 minutes is slow as a response time because 1 minute is slow.  But if an iterator takes 1 microsecond to move one position and a new, safer iterator takes 2 microseconds, this is trivial!  The only way you'd ever really notice this would be in iterating a collection just for the sake of iterating (i.e. no other operations).  To my mind, the added safety makes the extra time worth it. Always favor safety and maintainability when you can.  I know it can be a hard habit to break, especially if you started out your career early or in a language such as C where they are very performance conscious.  But in reality, these type of micro-optimizations only end up hurting you in the long run. Remember the two laws of optimization.  I'm not sure where I first heard these, but they are so true: For beginners: Do not optimize. For experts: Do not optimize yet. This is so true.  If you're a beginner, resist the urge to optimize at all costs.  And if you are an expert, delay that decision.  As long as you have chosen the right data structures and algorithms for your task, your performance will probably be more than sufficient.  Chances are it will be network, database, or disk hits that will be your slow-down, not your code.  As they say, 98% of your code's bottleneck is in 2% of your code so premature-optimization may add maintenance and safety debt that won't have any measurable impact.  Instead, code for maintainability and safety, and then, and only then, when you find a true bottleneck, then you should go back and optimize further.

    Read the article

  • ASP.NET MVC 3 Hosting :: MVC 2 Strongly Typed HTML Helper and Enhanced Validation Sample

    - by mbridge
    In lue of the off the official release of ASP.NET MVC 2 RTM, I decided I would put together a quick sample of the enhanced HTML.Helpers and validation controls. I am going to use my sample event site where I will have a form so a user can search for information about a certain events. So when the Search page loads the Search action is fired return my strongly typed model. to the view.    1: [HttpGet]    2: public ViewResult Search(): public ViewResult Search()    3: {    4:     IList<EventsModel> result = _eventsService.GetEventList();    5:     var viewModel = new EventSearchModel    6:                         {    7:                             EventList = new SelectList(result, "EventCode","EventName","Select Event")    8:                         };    9:     return View(viewModel);  10: } Nothing special here, although I did want to show how to load up a strongly typed drop down list because that hung me up for a little bit. So to that, I am going to pass back a SelectList to the view and my HTML helper should no how to load this. So lets take a look at the mark up for the view.    1: <%@ Page Title="" Language="C#" MasterPageFile="~/Views/Shared/Site.Master"    2: Inherits="System.Web.Mvc.ViewPage<EventsSample.Models.EventSearchModel>" %>    3:     4: <asp:Content ID="Content1" ContentPlaceHolderID="TitleContent" runat="server">    5:     Search    6: </asp:Content>    7:     8: <asp:Content ID="Content2" ContentPlaceHolderID="MainContent" runat="server">    9:   10:     <h2>Search for Events</h2>  11:   12:     <% using (Html.BeginForm("Search","Events")) {%>  13:         <%= Html.ValidationSummary(true) %>  14:          15:         <fieldset>  16:             <legend>Fields</legend>  17:              18:             <div class="editor-label">  19:                 <%= Html.LabelFor(model => model.EventNumber) %>  20:             </div>  21:             <div class="editor-field">  22:                 <%= Html.TextBoxFor(model => model.EventNumber) %>  23:                 <%= Html.ValidationMessageFor(model => model.EventNumber) %>  24:             </div>  25:              26:             <div class="editor-label">  27:                 <%= Html.LabelFor(model => model.GuestLastName) %>  28:             </div>  29:             <div class="editor-field">  30:                 <%= Html.TextBoxFor(model => model.GuestLastName) %>  31:                 <%= Html.ValidationMessageFor(model => model.GuestLastName) %>  32:             </div>  33:              34:             <div class="editor-label">  35:                 <%= Html.LabelFor(model => model.EventName) %>  36:             </div>  37:             <div class="editor-field">  38:                 <%= Html.DropDownListFor(model => model.EventName, Model.EventList,"Select Event") %>  39:                 <%= Html.ValidationMessageFor(model => model.EventName) %>  40:             </div>  41:              42:             <p>  43:                 <input type="submit" value="Save" />  44:             </p>  45:         </fieldset>  46:   47:     <% } %>  48:   49:     <div>  50:         <%= Html.ActionLink("Back to List", "Index") %>  51:     </div>  52:   53: </asp:Content> A nice feature is the scaffolding that MVC has to generate code. I simply right clicked inside my Search() action, inside the EventsController and selected “Add View” and then I selected my strongly typed object that I wanted to pass to the view and also selected that I wanted the content type be “Edit”. With that the aspx page was completely generated, although I did have to go back in and change the textbox for the Event Names to a drop down list of the names to select from. The new feature with MVC 2 are the strongly typed HTML helpers. So now, my textboxes, drop down list, and validation helpers are all strongly typed to my model.  This features gives you the benefits of intellisense and also makes it easier to debug. “The Gu” has a great post about the feature in case you want more details. The DropDownListFor function to generate the drop down list was a little tricky for me. You first need to use a Lanbda expression to pass in the property you want the selected value assigned to in your model, and then you need to pass in the list directly from the model. Validations To validate the form, you can use the strongly type validation HTML helpers which will inspect your model and return errors if the validation fails. The definitions of these rules are set directly on the Model itself so lets take a look.    1: using System.ComponentModel.DataAnnotations;    2: using System.Web.Mvc;    3:     4: namespace EventsSample.Models    5: {    6:     public class EventSearchModel    7:     {    8:         [Required(ErrorMessage = "Please enter the event number.")]    9:         [RegularExpression(@"\w{6}",  10:             ErrorMessage = "The Event Number must be 6 letters and/or numbers.")]  11:         public string EventNumber { get; set; }  12:   13:         [Required(ErrorMessage = "Please enter the guest's last name.")]  14:         [RegularExpression(@"^[A-Za-zÀ-ÖØ-öø-ÿ1-9 '\-\.]{1,22}$",  15:             ErrorMessage = "The gueest's last name must 1 to 20 characters.")]  16:         public string GuestLastName { get; set; }  17:   18:         public string EventName { get; set; }  19:         public SelectList EventList { get; set; }  20:     }  21: } Pretty cool! Okay, the only thing left to do is perform the validation in the POST action.    1: [HttpPost]    2: public ViewResult Search(EventSearchModel eventSearchModel)    3: {    4:     if (ModelState.IsValid) return View("SearchResults");    5:     else    6:     {    7:          IList<EventsModel> result = _eventsService.GetEventList();    8:         eventSearchModel.EventList = new SelectList(result, "EVentCode","EventName");   9:   10:         return View(eventSearchModel);  11:     }  12: }  13:     } If the form entries are valid, here I am simply displaying the SearchResult, but in a real world sample I would also go out get the results first. You get the idea though. In my case, when the form is not valid, I also had to reload my SelectList with the event names before I loaded the page again. Remember this is MVC, no _VieState here :) So that’s it. Now my form is validating the data and when it fails it looks like this.

    Read the article

< Previous Page | 335 336 337 338 339 340 341 342 343 344 345 346  | Next Page >