andrew case - Page 401 - Developer IT

Developer’s Life – Attitude and Communication – They Can Cause Problems – Notes from the Field #027

- by Pinal Dave

[Note from Pinal]: This is a 27th episode of Notes from the Field series. The biggest challenge for anyone is to understand human nature. We human have so many things on our mind at any moment of time. There are cases when what we say is not what we mean and there are cases where what we mean we do not say. We do say and things as per our mood and our agenda in mind. Sometimes there are incidents when our attitude creates confusion in the communication and we end up creating a situation which is absolutely not warranted. In this episode of the Notes from the Field series database expert Mike Walsh explains a very crucial issue we face in our career, which is not technical but more to relate to human nature. Read on this may be the best blog post you might read in recent times. In this week’s note from the field, I’m taking a slight departure from technical knowledge and concepts explained. We’ll be back to it next week, I’m sure. Pinal wanted us to explain some of the issues we bump into and how we see some of our customers arrive at problem situations and how we have helped get them back on the right track. Often it is a technical problem we are officially solving – but in a lot of cases as a consultant, we are really helping fix some communication difficulties. This is a technical blog post and not an “advice column” in a newspaper – but the longer I am a consultant, the more years I add to my experience in technology the more I learn that the vast majority of the problems we encounter have “soft skills” included in the chain of causes for the issue we are helping overcome. This is not going to be exhaustive but I hope that sharing four pieces of advice inspired by real issues starts a process of searching for places where we can be the cause of these challenges and look at fixing them in ourselves. Or perhaps we can begin looking at resolving them in teams that we manage. I’ll share three statements that I’ve either heard, read or said and talk about some of the communication or attitude challenges highlighted by the statement. 1 – “But that’s the SAN Administrator’s responsibility…” I heard that early on in my consulting career when talking with a customer who had serious corruption and no good recent backups – potentially no good backups at all. The statement doesn’t have to be this one exactly, but the attitude here is an attitude of “my job stops here, and I don’t care about the intent or principle of why I’m here.” It’s also a situation of having the attitude that as long as there is someone else to blame, I’m fine… You see in this case, the DBA had a suspicion that the backups were not being handled right. They were the DBA and they knew that they had responsibility to ensure SQL backups were good to go – it’s a basic requirement of a production DBA. In my “As A DBA Where Do I start?!” presentation, I argue that is job #1 of a DBA. But in this case, the thought was that there was someone else to blame. Rather than create extra work and take on responsibility it was decided to just let it be another team’s responsibility. This failed the company, the company’s customers and no one won. As technologists – we should strive to go the extra mile. If there is a lack of clarity around roles and responsibilities and we know it – we should push to get it resolved. Especially as the DBAs who should act as the advocates of the data contained in the databases we are responsible for. 2 – “We’ve always done it this way, it’s never caused a problem before!” Complacency. I have to say that many failures I’ve been paid good money to help recover from would have not happened had it been for an attitude of complacency. If any thoughts like this have entered your mind about your situation you may be suffering from it. If, while reading this, you get this sinking feeling in your stomach about that one thing you know should be fixed but haven’t done it.. Why don’t you stop and go fix it then come back.. “We should have better backups, but we’re on a SAN so we should be fine really.” “Technically speaking that could happen, but what are the chances?” “We’ll just clean that up as a fast follow” ..and so on. In the age of tightening IT budgets, increased expectations of up time, availability and performance there is no room for complacency. Our customers and business units expect – no demand – the best. Complacency says “we will give you second best or hopefully good enough and we accept the risk and know this may hurt us later. Sometimes an organization will opt for “good enough” and I agree with the concept that at times the perfect can be the enemy of the good. But when we make those decisions in a vacuum and are not reporting them up and discussing them as an organization that is different. That is us unilaterally choosing to do something less than the best and purposefully playing a game of chance. 3 – “This device must accept interference from other devices but not create any” I’ve paraphrased this one – but it’s something the Federal Communications Commission – a federal agency in the United States that regulates electronic communication – requires of all manufacturers of any device that could cause or receive interference electronically. I blogged in depth about this here (http://www.straightpathsql.com/archives/2011/07/relationship-advice-from-the-fcc/) so I won’t go into much detail other than to say this… If we all operated more on the premise that we should do our best to not be the cause of conflict, and to be less easily offended and less upset when we perceive offense life would be easier in many areas! This doesn’t always cause the issues we are called in to help out. Not directly. But where we see it is in unhealthy relationships between the various technology teams at a client. We’ll see teams hoarding knowledge, not sharing well with others and almost working against other teams instead of working with them. If you trace these problems back far enough it often stems from someone or some group of people violating this principle from the FCC. To Sum It Up Technology problems are easy to solve. At Linchpin People we help many customers get past the toughest technological challenge – and at the end of the day it is really just a repeatable process of pattern based troubleshooting, logical thinking and starting at the beginning and carefully stepping through to the end. It’s easy at the end of the day. The tough part of what we do as consultants is the people skills. Being able to help get teams working together, being able to help teams take responsibility, to improve team to team communication? That is the difficult part, and we get to use the soft skills on every engagement. Work on professional development (http://professionaldevelopment.sqlpass.org/) and see continuing improvement here, not just with technology. I can teach just about anyone how to be an excellent DBA and performance tuner, but some of these soft skills are much more difficult to teach. If you want to get started with performance analytics and triage of virtualized SQL Servers with the help of experts, read more over at Fix Your SQL Server. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article

Build an Organization Chart In Visio 2010

- by Mysticgeek

With trying to manage a business these days, it’s very important to have an Organization Chart to keep everything manageable. Here we’ll show you how to build one in Visio 2010. This Guest Article was written by our friends over at Office 2010 Club. Need for Organization Charts The need of creating Organization Charts are becoming indispensable these days, as companies start focusing on extensive hiring for far reach availability, increase in productivity and targeting diverse markets. Considering this rigorous change, creating an organization chart can help stakeholders in comprehending the ever growing organization structure & hierarchy with an ease. It shows the basic structure of organization along with defining the relationships between employees working in different departments. Opportunely, Microsoft Visio 2010 offers an easy way to create Organization chart. As before now, orthodox ways of listing organization hierarchy have been used for defining the structure of departments along with communication possible including; horizontal and vertical communications. To transform these lists which defines organizational structure, into a detailed chart, Visio 2010 includes an add-in for importing Excel spreadsheet, which comes in handy for pulling out data from spreadsheet to create an organization chart. Importantly, you don’t need to indulge yourself in maze of defining organizational hierarchies and chalking-out structure, as you just need to specify the column & row headers, along with data you need to import and it will automatically create out chart defining; organizational hierarchies with specified credentials of each employee, categorized in their corresponding departments. Creating Organization Charts in Visio 2010 To start off with, we have created an Excel spreadsheet having fields, Name, Supervisor, Designation, Department and Phone. The Name field contains name of all the employees working in different departments, whereas Supervisor field contains name of supervisors or team leads. This field is vital for creating Organization Chart, as it defines the basic structure & hierarchy in chart. Now launch Visio 2010, head over to View tab, under Add-Ons menu, from Business options, click Organization Chart Wizard. This will start Organization Chart Wizard, in the first step, enable Information that’s already stored in a file or database option, and click Next. As we are importing Excel sheet, select the second option for importing Excel spreadsheet. Specify the Excel file path and click Next to continue. In this step, you need to specify the fields which actually defines the structure of an organization. In our case, these are Name & Supervisor fields. After specifying fields, click Next to Proceed further. As organization chart is primarily for showing the hierarchy of departments/employees working in organization along with how they are linked together, and who supervises whom. Considering this, in this step we will leave out Supervisor field, because it’s inclusion wouldn’t be necessary as Visio automatically chalks-out the basic structure defined in Excel sheet. Add the rest of the fields under Displayed fields category, and click Next. Now choose the fields which you want to include in Organization Chart’s shapes and click Next. This step is about breaking the chart into multiple pages, if you are dealing with 100+ employees, you may want to specify numbers of pages on which Organization Chart will be displayed. But in our case, we are dealing with much less amount of data, so we will enable I want the wizard to automatically break my organization chart across pages option. Specify the name you need to show on the top of the page. If you are having less than 20 hierarchies, enter the name of the highest ranked employee in organization and click Finish to end the wizard. It will instantly create an Organization chart out of specified Excel spreadsheet. Highest ranked employee will be shown on top of the organization chart, supervising various employees from different departments. As shown below, his immediate subordinates further manages other employees and so on. For advance customizations, head over to Org Chart tab, here you will find different groups for setting up the Org Chart’s hierarchy and manage other employees’ positions. Under Arrange group, shapes’ arrangements can be changed and it provides easy navigation through the chart. You can also change the type of the position and hide subordinates of selected employee. From Picture group, you can insert a picture of the employees, departments, etc. From synchronization group, you have the option of creating a synced copy and expanding subordinates of selected employee. Under Organization Data group, you can change whole layout of Organization chart from Display Options including; shape display, show divider, enable/disable imported fields, change block position, and fill colors, etc. If at any point of time, you need to insert new position or announce vacancy, Organization Chart stencil is always available on the left sidebar. Drag the desired Organization Chart shape into main diagram page, to maintain the structure integrity, i.e, for inserting subordinates for a specific employee, drag the position shape over the existing employee shape box. For instance, We have added a consultant in organization, who is directly under CEO, for maintaining this, we have dragged the Consultant box and just dropped it over the CEO box to make the immediate subordinate position. Adding details to new position is a cinch, just right-click new position box and click Properties. This will open up Shape Data dialog, start filling in all the relevant information and click OK. Here you can see the newly created position is easily populated with all the specified information. Now expanding an Organization Chart doesn’t require maintenance of long lists any more. Under Design tab, you can also try out different designs & layouts over organization chart to make it look more flamboyant and professional. Conclusion An Organization Chart is a great way of showing detailed organizational hierarchies; with defined credentials of employees, departments structure, new vacancies, newly hired employees, recently added departments, and importantly shows most convenient way of interaction between different departments & employees, etc. Similar Articles Productive Geek Tips Geek Reviews: Using Dia as a Free Replacement for Microsoft VisioMysticgeek Blog: Create Appealing Charts In Excel 2007Create Charts in Excel 2007 the Easy Way with Chart AdvisorCreate a Hyperlink in a Word 2007 Flow Chart and Hide Annoying ScreenTipsCreate A Flow Chart In Word 2007 TouchFreeze Alternative in AutoHotkey The Icy Undertow Desktop Windows Home Server – Backup to LAN The Clear & Clean Desktop Use This Bookmarklet to Easily Get Albums Use AutoHotkey to Assign a Hotkey to a Specific Window Latest Software Reviews Tinyhacker Random Tips HippoRemote Pro 2.2 Xobni Plus for Outlook All My Movies 5.9 CloudBerry Online Backup 1.5 for Windows Home Server Know if Someone Accessed Your Facebook Account Shop for Music with Windows Media Player 12 Access Free Documentaries at BBC Documentaries Rent Cameras In Bulk At CameraRenter Download Songs From MySpace Steve Jobs’ iPhone 4 Keynote Video

Read the article

Complex type support in process flow – XMLTYPE

- by shawn

Before OWB 11.2 release, there are only 5 simple data types supported in process flow: DATE, BOOLEAN, INTEGER, FLOAT and STRING. A new complex data type – XMLTYPE is added in 11.2, in order to support complex data being passed between the process flow activities. In this article we will give a simple example to illustrate the usage of the new type and some related editors. Suppose there is a bookstore that uses XML format orders as shown below (we use the simplest form for the illustration purpose), then we can create a process flow to handle the order, take the order as the input, then extract necessary information, and generate a confirmation email to the customer automatically. <order id=’0001’> <customer> <name>Tom</name> <email>[email protected]</email> </customer> <book id=’Java_001’> <quantity>3</quantity> </book> </order> Considering a simple user case here: we use an input parameter/variable with XMLTYPE to hold the XML content of the order; then we can use an Assign activity to retrieve the email info from the order; after that, we can create an email activity to send the email (Other activities might be added in practical case, but will not be described here). 1) Set XML content value For testing purpose, we will create a variable to hold the sample order, and then this will be used among the process flow activities. When the variable is of XMLTYPE and the “Literal” value is set the true, the advance editor will be enabled. Click the “Advance Editor” shown as above, a simple xml editor will popup. The editor has basic features like syntax highlight and check as shown below: We can also do the basic validation or validation against schema with the editor by selecting the normalized schema. With this, it will be easier to provide the value for XMLTYPE variables. 2) Extract information from XML content After setting the value, we need to extract the email information with the Assign activity. In process flow, an enhanced expression builder is used to help users construct the XPath for extracting values from XML content. When the variable’s literal value is set the false, the advance editor is enabled. Click the button, the advance editor will popup, as shown below: The editor is based on the expression builder (which is often used in mapping etc), an XPath lib panel is appended which provides some help information on how to write the XPath. The expression used here is: “XMLTYPE.EXTRACT(XML_ORDER,'/order/customer/email/text()').getStringVal()”, which uses ‘/order/customer/email/text()’ as the XPath to extract the email info from the XML document. A variable called “EMAIL_ADDR” is created with String data type to hold the value extracted. Then we bind the “VARIABLE” parameter of Assign activity to “EMAIL_ADDR” variable, which means the value of the “EMAIL_ADDR” activity will be set to the result of the “VALUE” parameter of Assign activity. 3) Use the extracted information in Email activity We bind the “TO_ADDRESS” parameter of the email activity to the “EMAIL_ADDR” variable created in above step. We can also extract other information from the xml order directly through the expression, for example, we can set the “MESSAGE_BODY” with value “'Dear '||XMLTYPE.EXTRACT(XML_ORDER,'/order/customer/name/text()').getStringVal()||chr(13)||chr(10)||' You have ordered '||XMLTYPE.EXTRACT(XML_ORDER,'/order/book/quantity/text()').getStringVal()||' '||XMLTYPE.EXTRACT(XML_ORDER,'/order/book/@id').getStringVal()”. This expression will extract the customer name, the quantity and the book id from the order to compose the message body. To make the email activity work, we need provide some other necessary information, Such as “SMTP_SERVER” (which is the SMTP server used to send the emails, like “mail.bookstore.com”. The default PORT number is set to 25. You need to change the value accordingly), “FROM_ADDRESS” and “SUBJECT”. Then the process flow is ready to go. After deploying the process flow package, we can simply run the process flow to check if the result is as expected (An email will be sent to the specified email address with proper subject and message body). Note: In oracle 11g, there is an enhanced security feature - ACL (Access Control List), which restrict the network access within db, so we need to edit the list to allow UTL_SMTP work if you are using oracle 11g. Refer to chapter “Access Control Lists for UTL_TCP/HTTP/SMTP” and “Managing Fine-Grained Access to External Network Services” for more details. In previous releases, XMLTYPE already exists in other OWB objects, like mapping/transformation etc. When the mapping/transformation is dragged into a process flow, the parameters with XMLTYPE are mapped to STRING. Now with the XMLTYPE support in process flow, the XMLTYPE will map to XMLTYPE in a more natural way, and we can leverage the new data type for the design.

Read the article

VS 2010 Debugger Improvements (BreakPoints, DataTips, Import/Export)

- by ScottGu

This is the twenty-first in a series of blog posts I’m doing on the VS 2010 and .NET 4 release. Today’s blog post covers a few of the nice usability improvements coming with the VS 2010 debugger. The VS 2010 debugger has a ton of great new capabilities. Features like Intellitrace (aka historical debugging), the new parallel/multithreaded debugging capabilities, and dump debuging support typically get a ton of (well deserved) buzz and attention when people talk about the debugging improvements with this release. I’ll be doing blog posts in the future that demonstrate how to take advantage of them as well. With today’s post, though, I thought I’d start off by covering a few small, but nice, debugger usability improvements that were also included with the VS 2010 release, and which I think you’ll find useful. Breakpoint Labels VS 2010 includes new support for better managing debugger breakpoints. One particularly useful feature is called “Breakpoint Labels” – it enables much better grouping and filtering of breakpoints within a project or across a solution. With previous releases of Visual Studio you had to manage each debugger breakpoint as a separate item. Managing each breakpoint separately can be a pain with large projects and for cases when you want to maintain “logical groups” of breakpoints that you turn on/off depending on what you are debugging. Using the new VS 2010 “breakpoint labeling” feature you can now name these “groups” of breakpoints and manage them as a unit. Grouping Multiple Breakpoints Together using a Label Below is a screen-shot of the breakpoints window within Visual Studio 2010. This lists all of the breakpoints defined within my solution (which in this case is the ASP.NET MVC 2 code base): The first and last breakpoint in the list above breaks into the debugger when a Controller instance is created or released by the ASP.NET MVC Framework. Using VS 2010, I can now select these two breakpoints, right-click, and then select the new “Edit labels…” menu command to give them a common label/name (making them easier to find and manage): Below is the dialog that appears when I select the “Edit labels” command. We can use it to create a new string label for our breakpoints or select an existing one we have already defined. In this case we’ll create a new label called “Lifetime Management” to describe what these two breakpoints cover: When we press the OK button our two selected breakpoints will be grouped under the newly created “Lifetime Management” label: Filtering/Sorting Breakpoints by Label We can use the “Search” combobox to quickly filter/sort breakpoints by label. Below we are only showing those breakpoints with the “Lifetime Management” label: Toggling Breakpoints On/Off by Label We can also toggle sets of breakpoints on/off by label group. We can simply filter by the label group, do a Ctrl-A to select all the breakpoints, and then enable/disable all of them with a single click: Importing/Exporting Breakpoints VS 2010 now supports importing/exporting breakpoints to XML files – which you can then pass off to another developer, attach to a bug report, or simply re-load later. To export only a subset of breakpoints, you can filter by a particular label and then click the “Export breakpoint” button in the Breakpoints window: Above I’ve filtered my breakpoint list to only export two particular breakpoints (specific to a bug that I’m chasing down). I can export these breakpoints to an XML file and then attach it to a bug report or email – which will enable another developer to easily setup the debugger in the correct state to investigate it on a separate machine. Pinned DataTips Visual Studio 2010 also includes some nice new “DataTip pinning” features that enable you to better see and track variable and expression values when in the debugger. Simply hover over a variable or expression within the debugger to expose its DataTip (which is a tooltip that displays its value) – and then click the new “pin” button on it to make the DataTip always visible: You can “pin” any number of DataTips you want onto the screen. In addition to pinning top-level variables, you can also drill into the sub-properties on variables and pin them as well. Below I’ve “pinned” three variables: “category”, “Request.RawUrl” and “Request.LogonUserIdentity.Name”. Note that these last two variable are sub-properties of the “Request” object. Associating Comments with Pinned DataTips Hovering over a pinned DataTip exposes some additional UI within the debugger: Clicking the comment button at the bottom of this UI expands the DataTip - and allows you to optionally add a comment with it: This makes it really easy to attach and track debugging notes: Pinned DataTips are usable across both Debug Sessions and Visual Studio Sessions Pinned DataTips can be used across multiple debugger sessions. This means that if you stop the debugger, make a code change, and then recompile and start a new debug session - any pinned DataTips will still be there, along with any comments you associate with them. Pinned DataTips can also be used across multiple Visual Studio sessions. This means that if you close your project, shutdown Visual Studio, and then later open the project up again – any pinned DataTips will still be there, along with any comments you associate with them. See the Value from Last Debug Session (Great Code Editor Feature) How many times have you ever stopped the debugger only to go back to your code and say: $#@! – what was the value of that variable again??? One of the nice things about pinned DataTips is that they keep track of their “last value from debug session” – and you can look these values up within the VB/C# code editor even when the debugger is no longer running. DataTips are by default hidden when you are in the code editor and the debugger isn’t running. On the left-hand margin of the code editor, though, you’ll find a push-pin for each pinned DataTip that you’ve previously setup: Hovering your mouse over a pinned DataTip will cause it to display on the screen. Below you can see what happens when I hover over the first pin in the editor - it displays our debug session’s last values for the “Request” object DataTip along with the comment we associated with them: This makes it much easier to keep track of state and conditions as you toggle between code editing mode and debugging mode on your projects. Importing/Exporting Pinned DataTips As I mentioned earlier in this post, pinned DataTips are by default saved across Visual Studio sessions (you don’t need to do anything to enable this). VS 2010 also now supports importing/exporting pinned DataTips to XML files – which you can then pass off to other developers, attach to a bug report, or simply re-load later. Combined with the new support for importing/exporting breakpoints, this makes it much easier for multiple developers to share debugger configurations and collaborate across debug sessions. Summary Visual Studio 2010 includes a bunch of great new debugger features – both big and small. Today’s post shared some of the nice debugger usability improvements. All of the features above are supported with the Visual Studio 2010 Professional edition (the Pinned DataTip features are also supported in the free Visual Studio 2010 Express Editions) I’ll be covering some of the “big big” new debugging features like Intellitrace, parallel/multithreaded debugging, and dump file analysis in future blog posts. Hope this helps, Scott P.S. In addition to blogging, I am also now using Twitter for quick updates and to share links. Follow me at: twitter.com/scottgu

Read the article

Portal And Content - Content Integration - Best Practices

- by Stefan Krantz

Lately we have seen an increase in projects that have failed to either get user friendly content integration or non satisfactory performance. Our intention is to mitigate any knowledge gap that our previous post might have left you with, therefore this post will repeat some recommendation or reference back to old useful post. Moreover this post will help you understand ground up how to design, architect and implement business enabled, responsive and performing portals with complex requirements on business centric information publishing. Design the Information Model The key to successful portal deployments is Information modeling, it's a key task to understand the use case you designing for, therefore I have designed a set of question you need to ask yourself or your customer: Question: Who will own the content, IT or Business? Answer: BusinessQuestion: Who will publish the content, IT or Business? Answer: BusinessQuestion: Will there be multiple publishers? Answer: YesQuestion: Are the publishers computer scientist?Answer: NoQuestion: How often do the information changes, daily, weekly, monthly?Answer: Daily, weekly If your answers to the questions matches at least 2, we strongly recommend you design your content with following principles: Divide your pages in to logical sections, where each section is marked with its purpose Assign capabilities to each section, does it contain text, images, formatting and/or is it static and is populated through other contextual information Select editor/design element type WYSIWYG - Rich Text Plain Text - non-format text Image - Image object Static List - static list of formatted informationDynamic Data List - assembled information from multiple data files through CMIS query The result of such design map could look like following below examples: Based on the outcome of the required elements in the design column 3 from the left you will now simply design a data model in WebCenter Content - Site Studio by creating a Region Definition structure matching your design requirements.For more information on how to create a Region definition see following post: Region Definition Post - note see instruction 7 for details. Each region definition can now be used to instantiate data files, a data file will hold the actual data for each element in the region definition. Another way you can see this is to compare the region definition as an extension to the metadata model in WebCenter Content for each data file item. Design content templates With a solid dependable information model we can now proceed to template creation and page design, in this phase focuses on how to place the content sections from the region definition on the page via a Content Presenter template. Remember by creating content presenter templates you will leverage the latest and most integrated technology WebCenter has to offer. This phase is much easier since the you already have the information model and design wire-frames to base the logic on, however there is still few considerations to pay attention to: Base the template on ADF and make only necessary exceptions to markup when required Leverage ADF design components for Tabs, Accordions and other similar components, this way the design in the content published areas will comply with other design areas based on custom ADF taskflows There is no performance impact when using meta data or region definition based data All data access regardless of type, metadata or xml data it can be accessed via the Content Presenter - Node. See below for applied examples on how to access data Access metadata property from Document - #{node.propertyMap['myProp'].value}myProp in this example can be for instance (dDocName, dDocTitle, xComments or any other available metadata) Access element data from data file xml - #{node.propertyMap['[Region Definition Name]:[Element name]'].asTextHtml}Region Definition Name is the expect region definition that the current data file is instantiatingElement name is the element value you like to grab from the data file I recommend you read following useful post on content template topic:CMIS queries and template creation - note see instruction 9 for detailsStatic List template rendering For more information on templates:Single Item Content TemplateMulti Item Content TemplateExpression Language Internationalization Considerations When integrating content assets via content presenter you by now probably understand that the content item/data file is wired to the page, what is also pretty common at this stage is that the content item/data file only support one language since its not practical or business friendly to mix that into a complex structure. Therefore you will be left with a very common dilemma that you will have to either build a complete new portal for each locale, which is not an good option! However with little bit of information modeling and clear naming convention this can be addressed. Basically you can simply make sure that all content item/data file are named with a predictable naming convention like "Content1_EN" for the English rendition and "Content1_ES" for the Spanish rendition. This way through simple none complex customizations you will be able to dynamically switch the actual content item/data file just before rendering. By following proposed approach above you not only enable a simple mechanism for internationalized content you also preserve the functionality in the content presenter to support business accessible run-time publishing of information on existing and new pages. I recommend you read following useful post on Internationalization topics:Internationalize with Content Presenter Integrate with Review & Approval processes Today the Review and approval functionality and configuration is based out of WebCenter Content - Criteria Workflows. Criteria Workflows uses the metadata of the checked in document to evaluate if the document is under any review/approval process. So for instance if a Criteria Workflow is configured to force any documents with Version = "2" or "higher" and Content Type is "Instructions", any matching content item version on check in will now enter the workflow before getting released for general access. Few things to consider when configuring Criteria Workflows: Make sure to not trigger on version one for Content Items that are Data Files - if you trigger on version 1 you will not only approve an empty document you will also have a content presenter pointing to a none existing document - since the document will only be available after successful completion of the workflow Approval workflows sometimes requires more complex criteria, the recommendation if that is the case is that the meta data triggering such criteria is automatically populated, this can be achieved through many approaches including Content Profiles Criteria workflows are configured and managed in WebCenter Content Administration Applets where you can configure one or more workflows. When you configured Criteria workflows the Content Presenter will support the editors with the approval process directly inline in the "Contribution mode" of the portal. In addition to approve/reject and details of the task, the content presenter natively support the user to view the current and future version of the change he/she is approving. See below for example: Architectural recommendation To support review&approval processes - minimize the amount of data files per page Each CMIS query can consume significant time depending on the complexity of the query - minimize the amount of CMIS queries per page Use Content Presenter Templates based on ADF - this way you minimize the design considerations and optimize the usage of caching Implement the page in as few Data files as possible - simplifies publishing process, increases performance and simplifies release process Named data file (node) or list of named nodes when integrating to pages increases performance vs. querying for data Named data file (node) or list of named nodes when integrating to pages enables business centric page creation and publishing and reduces the need for IT department interaction Summary Just because one architectural decision solves a business problem it doesn't mean its the right one, when designing portals all architecture has to be in harmony and not impacting each other. For instance the most technical complex solution is not always the best since it will most likely defeat the business accessibility, performance or both, therefore the best approach is to first design for simplicity that even a non-technical user can operate, after that consider the performance impact and final look at the technology challenges these brings and workaround them first with out-of-the-box features, after that design and develop functions to complement the short comings.

Read the article

Replication Services as ETL extraction tool

- by jorg

In my last blog post I explained the principles of Replication Services and the possibilities it offers in a BI environment. One of the possibilities I described was the use of snapshot replication as an ETL extraction tool: “Snapshot Replication can also be useful in BI environments, if you don’t need a near real-time copy of the database, you can choose to use this form of replication. Next to an alternative for Transactional Replication it can be used to stage data so it can be transformed and moved into the data warehousing environment afterwards. In many solutions I have seen developers create multiple SSIS packages that simply copies data from one or more source systems to a staging database that figures as source for the ETL process. The creation of these packages takes a lot of (boring) time, while Replication Services can do the same in minutes. It is possible to filter out columns and/or records and it can even apply schema changes automatically so I think it offers enough features here. I don’t know how the performance will be and if it really works as good for this purpose as I expect, but I want to try this out soon!” Well I have tried it out and I must say it worked well. I was able to let replication services do work in a fraction of the time it would cost me to do the same in SSIS. What I did was the following: Configure snapshot replication for some Adventure Works tables, this was quite simple and straightforward. Create an SSIS package that executes the snapshot replication on demand and waits for its completion. This is something that you can’t do with out of the box functionality. While configuring the snapshot replication two SQL Agent Jobs are created, one for the creation of the snapshot and one for the distribution of the snapshot. Unfortunately these jobs are asynchronous which means that if you execute them they immediately report back if the job started successfully or not, they do not wait for completion and report its result afterwards. So I had to create an SSIS package that executes the jobs and waits for their completion before the rest of the ETL process continues. Fortunately I was able to create the SSIS package with the desired functionality. I have made a step-by-step guide that will help you configure the snapshot replication and I have uploaded the SSIS package you need to execute it. Configure snapshot replication The first step is to create a publication on the database you want to replicate. Connect to SQL Server Management Studio and right-click Replication, choose for New.. Publication… The New Publication Wizard appears, click Next Choose your “source” database and click Next Choose Snapshot publication and click Next You can now select tables and other objects that you want to publish Expand Tables and select the tables that are needed in your ETL process In the next screen you can add filters on the selected tables which can be very useful. Think about selecting only the last x days of data for example. Its possible to filter out rows and/or columns. In this example I did not apply any filters. Schedule the Snapshot Agent to run at a desired time, by doing this a SQL Agent Job is created which we need to execute from a SSIS package later on. Next you need to set the Security Settings for the Snapshot Agent. Click on the Security Settings button. In this example I ran the Agent under the SQL Server Agent service account. This is not recommended as a security best practice. Fortunately there is an excellent article on TechNet which tells you exactly how to set up the security for replication services. Read it here and make sure you follow the guidelines! On the next screen choose to create the publication at the end of the wizard Give the publication a name (SnapshotTest) and complete the wizard The publication is created and the articles (tables in this case) are added Now the publication is created successfully its time to create a new subscription for this publication. Expand the Replication folder in SSMS and right click Local Subscriptions, choose New Subscriptions The New Subscription Wizard appears Select the publisher on which you just created your publication and select the database and publication (SnapshotTest) You can now choose where the Distribution Agent should run. If it runs at the distributor (push subscriptions) it causes extra processing overhead. If you use a separate server for your ETL process and databases choose to run each agent at its subscriber (pull subscriptions) to reduce the processing overhead at the distributor. Of course we need a database for the subscription and fortunately the Wizard can create it for you. Choose for New database Give the database the desired name, set the desired options and click OK You can now add multiple SQL Server Subscribers which is not necessary in this case but can be very useful. You now need to set the security settings for the Distribution Agent. Click on the …. button Again, in this example I ran the Agent under the SQL Server Agent service account. Read the security best practices here Click Next Make sure you create a synchronization job schedule again. This job is also necessary in the SSIS package later on. Initialize the subscription at first synchronization Select the first box to create the subscription when finishing this wizard Complete the wizard by clicking Finish The subscription will be created In SSMS you see a new database is created, the subscriber. There are no tables or other objects in the database available yet because the replication jobs did not ran yet. Now expand the SQL Server Agent, go to Jobs and search for the job that creates the snapshot: Rename this job to “CreateSnapshot” Now search for the job that distributes the snapshot: Rename this job to “DistributeSnapshot” Create an SSIS package that executes the snapshot replication We now need an SSIS package that will take care of the execution of both jobs. The CreateSnapshot job needs to execute and finish before the DistributeSnapshot job runs. After the DistributeSnapshot job has started the package needs to wait until its finished before the package execution finishes. The Execute SQL Server Agent Job Task is designed to execute SQL Agent Jobs from SSIS. Unfortunately this SSIS task only executes the job and reports back if the job started succesfully or not, it does not report if the job actually completed with success or failure. This is because these jobs are asynchronous. The SSIS package I’ve created does the following: It runs the CreateSnapshot job It checks every 5 seconds if the job is completed with a for loop When the CreateSnapshot job is completed it starts the DistributeSnapshot job And again it waits until the snapshot is delivered before the package will finish successfully Quite simple and the package is ready to use as standalone extract mechanism. After executing the package the replicated tables are added to the subscriber database and are filled with data: Download the SSIS package here (SSIS 2008) Conclusion In this example I only replicated 5 tables, I could create a SSIS package that does the same in approximately the same amount of time. But if I replicated all the 70+ AdventureWorks tables I would save a lot of time and boring work! With replication services you also benefit from the feature that schema changes are applied automatically which means your entire extract phase wont break. Because a snapshot is created using the bcp utility (bulk copy) it’s also quite fast, so the performance will be quite good. Disadvantages of using snapshot replication as extraction tool is the limitation on source systems. You can only choose SQL Server or Oracle databases to act as a publisher. So if you plan to build an extract phase for your ETL process that will invoke a lot of tables think about replication services, it would save you a lot of time and thanks to the Extract SSIS package I’ve created you can perfectly fit it in your usual SSIS ETL process.

Read the article

Using SQL Execution Plans to discover the Swedish alphabet

- by Rob Farley

SQL Server is quite remarkable in a bunch of ways. In this post, I’m using the way that the Query Optimizer handles LIKE to keep it SARGable, the Execution Plans that result, Collations, and PowerShell to come up with the Swedish alphabet. SARGability is the ability to seek for items in an index according to a particular set of criteria. If you don’t have SARGability in play, you need to scan the whole index (or table if you don’t have an index). For example, I can find myself in the phonebook easily, because it’s sorted by LastName and I can find Farley in there by moving to the Fs, and so on. I can’t find everyone in my suburb easily, because the phonebook isn’t sorted that way. I can’t even find people who have six letters in their last name, because also the book is sorted by LastName, it’s not sorted by LEN(LastName). This is all stuff I’ve looked at before, including in the talk I gave at SQLBits in October 2010. If I try to find everyone who’s names start with F, I can do that using a query a bit like: SELECT LastName FROM dbo.PhoneBook WHERE LEFT(LastName,1) = 'F'; Unfortunately, the Query Optimizer doesn’t realise that all the entries that satisfy LEFT(LastName,1) = 'F' will be together, and it has to scan the whole table to find them. But if I write: SELECT LastName FROM dbo.PhoneBook WHERE LastName LIKE 'F%'; then SQL is smart enough to understand this, and performs an Index Seek instead. To see why, I look further into the plan, in particular, the properties of the Index Seek operator. The ToolTip shows me what I’m after: You’ll see that it does a Seek to find any entries that are at least F, but not yet G. There’s an extra Predicate in there (a Residual Predicate if you like), which checks that each LastName is really LIKE F% – I suppose it doesn’t consider that the Seek Predicate is quite enough – but most of the benefit is seen by its working out the Seek Predicate, filtering to just the “at least F but not yet G” section of the data. This got me curious though, particularly about where the G comes from, and whether I could leverage it to create the Swedish alphabet. I know that in the Swedish language, there are three extra letters that appear at the end of the alphabet. One of them is ä that appears in the word Västerås. It turns out that Västerås is quite hard to find in an index when you’re looking it up in a Swedish map. I talked about this briefly in my five-minute talk on Collation from SQLPASS (the one which was slightly less than serious). So by looking at the plan, I can work out what the next letter is in the alphabet of the collation used by the column. In other words, if my alphabet were Swedish, I’d be able to tell what the next letter after F is – just in case it’s not G. It turns out it is… Yes, the Swedish letter after F is G. But I worked this out by using a copy of my PhoneBook table that used the Finnish_Swedish_CI_AI collation. I couldn’t find how the Query Optimizer calculates the G, and my friend Paul White (@SQL_Kiwi) tells me that it’s frustratingly internal to the QO. He’s particularly smart, even if he is from New Zealand. To investigate further, I decided to do some PowerShell, leveraging the Get-SqlPlan function that I blogged about recently (make sure you also have the SqlServerCmdletSnapin100 snap-in added). I started by indicating that I was going to use Finnish_Swedish_CI_AI as my collation of choice, and that I’d start whichever letter cam straight after the number 9. I figure that this is a cheat’s way of guessing the first letter of the alphabet (but it doesn’t actually work in Unicode – luckily I’m using varchar not nvarchar. Actually, there are a few aspects of this code that only work using ASCII, so apologies if you were wanting to apply it to Greek, Japanese, etc). I also initialised my $alphabet variable. $collation = 'Finnish_Swedish_CI_AI'; $firstletter = '9'; $alphabet = ''; Now I created the table for my test. A single field would do, and putting a Clustered Index on it would suffice for the Seeks. Invoke-Sqlcmd -server . -data tempdb -query "create table dbo.collation_test (col varchar(10) collate $collation primary key);" Now I get into the looping. $c = $firstletter; $stillgoing = $true; while ($stillgoing) { I construct the query I want, seeking for entries which start with whatever $c has reached, and get the plan for it: $query = "select col from dbo.collation_test where col like '$($c)%';"; [xml] $pl = get-sqlplan $query "." "tempdb"; At this point, my $pl variable is a scary piece of XML, representing the execution plan. A bit of hunting through it showed me that the EndRange element contained what I was after, and that if it contained NULL, then I was done. $stillgoing = ($pl.ShowPlanXML.BatchSequence.Batch.Statements.StmtSimple.QueryPlan.RelOp.IndexScan.SeekPredicates.SeekPredicateNew.SeekKeys.EndRange -ne $null); Now I could grab the value out of it (which came with apostrophes that needed stripping), and append that to my $alphabet variable. if ($stillgoing) { $c=$pl.ShowPlanXML.BatchSequence.Batch.Statements.StmtSimple.QueryPlan.RelOp.IndexScan.SeekPredicates.SeekPredicateNew.SeekKeys.EndRange.RangeExpressions.ScalarOperator.ScalarString.Replace("'",""); $alphabet += $c; } Finally, finishing the loop, dropping the table, and showing my alphabet! } Invoke-Sqlcmd -server . -data tempdb -query "drop table dbo.collation_test;"; $alphabet; When I run all this, I see that the Swedish alphabet is ABCDEFGHIJKLMNOPQRSTUVXYZÅÄÖ, which matches what I see at Wikipedia. Interesting to see that the letters on the end are still there, even with Case Insensitivity. Turns out they’re not just “letters with accents”, they’re letters in their own right. I’m sure you gave up reading long ago, and really aren’t that fazed about the idea of doing this using PowerShell. I chose PowerShell because I’d already come up with an easy way of grabbing the estimated plan for a query, and PowerShell does allow for easy navigation of XML. I find the most interesting aspect of this as the fact that the Query Optimizer uses the next letter of the alphabet to maintain the SARGability of LIKE. I’m hoping they do something similar for a whole bunch of operations. Oh, and the fact that you know how to find stuff in the IKEA catalogue. Footnote: If you are interested in whether this works in other languages, you might want to consider the following screenshot, which shows that in principle, it should work with Japanese. It might be a bit harder to run this in PowerShell though, as I’m not sure how it translates. In Hiragana, the Japanese alphabet starts ?, ?, ?, ?, ?, ...

Read the article

Using jQuery Live instead of jQuery Hover function

- by hajan

Let’s say we have a case where we need to create mouseover / mouseout functionality for a list which will be dynamically filled with data on client-side. We can use jQuery hover function, which handles the mouseover and mouseout events with two functions. See the following example: <!DOCTYPE html> <html lang="en"> <head id="Head1" runat="server"> <title>jQuery Mouseover / Mouseout Demo</title> <script type="text/javascript" src="http://ajax.aspnetcdn.com/ajax/jquery/jquery-1.4.4.js"></script> <style type="text/css"> .hover { color:Red; cursor:pointer;} </style> <script type="text/javascript"> $(function () { $("li").hover( function () { $(this).addClass("hover"); }, function () { $(this).removeClass("hover"); }); }); </script> </head> <body> <form id="form2" runat="server"> <ul> <li>Data 1</li> <li>Data 2</li> <li>Data 3</li> <li>Data 4</li> <li>Data 5</li> <li>Data 6</li> </ul> </form> </body> </html> Now, if you have situation where you want to add new data dynamically... Lets say you have a button to add new item in the list. Add the following code right bellow the </ul> tag <input type="text" id="txtItem" /> <input type="button" id="addNewItem" value="Add New Item" /> And add the following button click functionality: //button add new item functionality $("#addNewItem").click(function (event) { event.preventDefault(); $("<li>" + $("#txtItem").val() + "</li>").appendTo("ul"); }); The mouse over effect won't work for the newly added items. Therefore, we need to use live or delegate function. These both do the same job. The main difference is that for some cases delegate is considered a bit faster, and can be used in chaining. In our case, we can use both. I will use live function. $("li").live("mouseover mouseout", function (event) { if (event.type == "mouseover") $(this).addClass("hover"); else $(this).removeClass("hover"); }); The complete code is: <!DOCTYPE html> <html lang="en"> <head id="Head1" runat="server"> <title>jQuery Mouseover / Mouseout Demo</title> <script type="text/javascript" src="http://ajax.aspnetcdn.com/ajax/jquery/jquery-1.4.4.js"></script> <style type="text/css"> .hover { color:Red; cursor:pointer;} </style> <script type="text/javascript"> $(function () { $("li").live("mouseover mouseout", function (event) { if (event.type == "mouseover") $(this).addClass("hover"); else $(this).removeClass("hover"); }); //button add new item functionality $("#addNewItem").click(function (event) { event.preventDefault(); $("<li>" + $("#txtItem").val() + "</li>").appendTo("ul"); }); }); </script> </head> <body> <form id="form2" runat="server"> <ul> <li>Data 1</li> <li>Data 2</li> <li>Data 3</li> <li>Data 4</li> <li>Data 5</li> <li>Data 6</li> </ul> <input type="text" id="txtItem" /> <input type="button" id="addNewItem" value="Add New Item" /> </form> </body> </html> So, basically when replacing hover with live, you see we use the mouseover and mouseout names for both events. Check the working demo which is available HERE. Hope this was useful blog for you. Hope it’s helpful. HajanReference blog: http://codeasp.net/blogs/hajan/microsoft-net/1260/using-jquery-live-instead-of-jquery-hover-function

Read the article

Using the BAM Interceptor with Continuation

- by Charles Young

Originally posted on: http://geekswithblogs.net/cyoung/archive/2014/06/02/using-the-bam-interceptor-with-continuation.aspxI’ve recently been resurrecting some code written several years ago that makes extensive use of the BAM Interceptor provided as part of BizTalk Server’s BAM event observation library. In doing this, I noticed an issue with continuations. Essentially, whenever I tried to configure one or more continuations for an activity, the BAM Interceptor failed to complete the activity correctly. Careful inspection of my code confirmed that I was initializing and invoking the BAM interceptor correctly, so I was mystified. However, I eventually found the problem. It is a logical error in the BAM Interceptor code itself. The BAM Interceptor provides a useful mechanism for implementing dynamic tracking. It supports configurable ‘track points’. These are grouped into named ‘locations’. BAM uses the term ‘step’ as a synonym for ‘location’. Each track point defines a BAM action such as starting an activity, extracting a data item, enabling a continuation, etc. Each step defines a collection of track points. Understanding Steps The BAM Interceptor provides an abstract model for handling configuration of steps. It doesn’t, however, define any specific configuration mechanism (e.g., config files, SSO, etc.) It is up to the developer to decide how to store, manage and retrieve configuration data. At run time, this configuration is used to register track points which then drive the BAM Interceptor. The full semantics of a step are not immediately clear from Microsoft’s documentation. They represent a point in a business activity where BAM tracking occurs. They are named locations in the code. What is less obvious is that they always represent either the full tracking work for a given activity or a discrete fragment of that work which commences with the start of a new activity or the continuation of an existing activity. The BAM Interceptor enforces this by throwing an error if no ‘start new’ or ‘continue’ track point is registered for a named location. This constraint implies that each step must marked with an ‘end activity’ track point. One of the peculiarities of BAM semantics is that when an activity is continued under a correlated ID, you must first mark the current activity as ‘ended’ in order to ensure the right housekeeping is done in the database. If you re-start an ended activity under the same ID, you will leave the BAM import tables in an inconsistent state. A step, therefore, always represents an entire unit of work for a given activity or continuation ID. For activities with continuation, each unit of work is termed a ‘fragment’. Instance and Fragment State Internally, the BAM Interceptor maintains state data at two levels. First, it represents the overall state of the activity using a ‘trace instance’ token. This token contains the name and ID of the activity together with a couple of state flags. The second level of state represents a ‘trace fragment’. As we have seen, a fragment of an activity corresponds directly to the notion of a ‘step’. It is the unit of work done at a named location, and it must be bounded by start and end, or continue and end, actions. When handling continuations, the BAM Interceptor differentiates between ‘root’ fragments and other fragments. Very simply, a root fragment represents the start of an activity. Other fragments represent continuations. This is where the logic breaks down. The BAM Interceptor loses state integrity for root fragments when continuations are defined. Initialization Microsoft’s BAM Interceptor code supports the initialization of BAM Interceptors from track point configuration data. The process starts by populating an Activity Interceptor Configuration object with an array of track points. These can belong to different steps (aka ‘locations’) and can be registered in any order. Once it is populated with track points, the Activity Interceptor Configuration is used to initialise the BAM Interceptor. The BAM Interceptor sets up a hash table of array lists. Each step is represented by an array list, and each array list contains an ordered set of track points. The BAM Interceptor represents track points as ‘executable’ components. When the OnStep method of the BAM Interceptor is called for a given step, the corresponding list of track points is retrieved and each track point is executed in turn. Each track point retrieves any required data using a call back mechanism and then serializes a BAM trace fragment object representing a specific action (e.g., start, update, enable continuation, stop, etc.). The serialised trace fragment is then handed off to a BAM event stream (buffered or direct) which takes the appropriate action. The Root of the Problem The logic breaks down in the Activity Interceptor Configuration. Each Activity Interceptor Configuration is initialised with an instance of a ‘trace instance’ token. This provides the basic metadata for the activity as a whole. It contains the activity name and ID together with state flags indicating if the activity ID is a root (i.e., not a continuation fragment) and if it is completed. This single token is then shared by all trace actions for all steps registered with the Activity Interceptor Configuration. Each trace instance token is automatically initialised to represent a root fragment. However, if you subsequently register a ‘continuation’ step with the Activity Interceptor Configuration, the ‘root’ flag is set to false at the point the ‘continue’ track point is registered for that step. If you use a ‘reflector’ tool to inspect the code for the ActivityInterceptorConfiguration class, you can see the flag being set in one of the overloads of the RegisterContinue method. This makes no sense. The trace instance token is shared across all the track points registered with the Activity Interceptor Configuration. The Activity Interceptor Configuration is designed to hold track points for multiple steps. The ‘root’ flag is clearly meant to be initialised to ‘true’ for the preliminary root fragment and then subsequently set to false at the point that a continuation step is processed. Instead, if the Activity Interceptor Configuration contains a continuation step, it is changed to ‘false’ before the root fragment is processed. This is clearly an error in logic. The problem causes havoc when the BAM Interceptor is used with continuation. Effectively the root step is no longer processed correctly, and the ultimate effect is that the continued activity never completes! This has nothing to do with the root and the continuation being in the same process. It is due to a fundamental mistake of setting the ‘root’ flag to false for a continuation before the root fragment is processed. The Workaround Fortunately, it is easy to work around the bug. The trick is to ensure that you create a new Activity Interceptor Configuration object for each individual step. This may mean filtering your configuration data to extract the track points for a single step or grouping the configured track points into individual steps and the creating a separate Activity Interceptor Configuration for each group. In my case, the first approach was required. Here is what the amended code looks like: // Because of a logic error in Microsoft's code, a separate ActivityInterceptorConfiguration must be used // for each location. The following code extracts only those track points for a given step name (location). var trackPointGroup = from ResolutionService.TrackPoint tp in bamActivity.TrackPoints where (string)tp.Location == bamStepName select tp; var bamActivityInterceptorConfig = new Microsoft.BizTalk.Bam.EventObservation.ActivityInterceptorConfiguration(activityName); foreach (var trackPoint in trackPointGroup) { switch (trackPoint.Type) { case TrackPointType.Start: bamActivityInterceptorConfig.RegisterStartNew(trackPoint.Location, trackPoint.ExtractionInfo); break; etc… I’m using LINQ to filter a list of track points for those entries that correspond to a given step and then registering only those track points on a new instance of the ActivityInterceptorConfiguration class. As soon as I re-wrote the code to do this, activities with continuations started to complete correctly.

Read the article

Fun with Aggregates

- by Paul White

There are interesting things to be learned from even the simplest queries. For example, imagine you are given the task of writing a query to list AdventureWorks product names where the product has at least one entry in the transaction history table, but fewer than ten. One possible query to meet that specification is: SELECT p.Name FROM Production.Product AS p JOIN Production.TransactionHistory AS th ON p.ProductID = th.ProductID GROUP BY p.ProductID, p.Name HAVING COUNT_BIG(*) < 10; That query correctly returns 23 rows (execution plan and data sample shown below): The execution plan looks a bit different from the written form of the query: the base tables are accessed in reverse order, and the aggregation is performed before the join. The general idea is to read all rows from the history table, compute the count of rows grouped by ProductID, merge join the results to the Product table on ProductID, and finally filter to only return rows where the count is less than ten. This ‘fully-optimized’ plan has an estimated cost of around 0.33 units. The reason for the quote marks there is that this plan is not quite as optimal as it could be – surely it would make sense to push the Filter down past the join too? To answer that, let’s look at some other ways to formulate this query. This being SQL, there are any number of ways to write logically-equivalent query specifications, so we’ll just look at a couple of interesting ones. The first query is an attempt to reverse-engineer T-SQL from the optimized query plan shown above. It joins the result of pre-aggregating the history table to the Product table before filtering: SELECT p.Name FROM ( SELECT th.ProductID, cnt = COUNT_BIG(*) FROM Production.TransactionHistory AS th GROUP BY th.ProductID ) AS q1 JOIN Production.Product AS p ON p.ProductID = q1.ProductID WHERE q1.cnt < 10; Perhaps a little surprisingly, we get a slightly different execution plan: The results are the same (23 rows) but this time the Filter is pushed below the join! The optimizer chooses nested loops for the join, because the cardinality estimate for rows passing the Filter is a bit low (estimate 1 versus 23 actual), though you can force a merge join with a hint and the Filter still appears below the join. In yet another variation, the < 10 predicate can be ‘manually pushed’ by specifying it in a HAVING clause in the “q1” sub-query instead of in the WHERE clause as written above. The reason this predicate can be pushed past the join in this query form, but not in the original formulation is simply an optimizer limitation – it does make efforts (primarily during the simplification phase) to encourage logically-equivalent query specifications to produce the same execution plan, but the implementation is not completely comprehensive. Moving on to a second example, the following query specification results from phrasing the requirement as “list the products where there exists fewer than ten correlated rows in the history table”: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) < 10 ); Unfortunately, this query produces an incorrect result (86 rows): The problem is that it lists products with no history rows, though the reasons are interesting. The COUNT_BIG(*) in the EXISTS clause is a scalar aggregate (meaning there is no GROUP BY clause) and scalar aggregates always produce a value, even when the input is an empty set. In the case of the COUNT aggregate, the result of aggregating the empty set is zero (the other standard aggregates produce a NULL). To make the point really clear, let’s look at product 709, which happens to be one for which no history rows exist: -- Scalar aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709; -- Vector aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709 GROUP BY th.ProductID; The estimated execution plans for these two statements are almost identical: You might expect the Stream Aggregate to have a Group By for the second statement, but this is not the case. The query includes an equality comparison to a constant value (709), so all qualified rows are guaranteed to have the same value for ProductID and the Group By is optimized away. In fact there are some minor differences between the two plans (the first is auto-parameterized and qualifies for trivial plan, whereas the second is not auto-parameterized and requires cost-based optimization), but there is nothing to indicate that one is a scalar aggregate and the other is a vector aggregate. This is something I would like to see exposed in show plan so I suggested it on Connect. Anyway, the results of running the two queries show the difference at runtime: The scalar aggregate (no GROUP BY) returns a result of zero, whereas the vector aggregate (with a GROUP BY clause) returns nothing at all. Returning to our EXISTS query, we could ‘fix’ it by changing the HAVING clause to reject rows where the scalar aggregate returns zero: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) BETWEEN 1 AND 9 ); The query now returns the correct 23 rows: Unfortunately, the execution plan is less efficient now – it has an estimated cost of 0.78 compared to 0.33 for the earlier plans. Let’s try adding a redundant GROUP BY instead of changing the HAVING clause: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY th.ProductID HAVING COUNT_BIG(*) < 10 ); Not only do we now get correct results (23 rows), this is the execution plan: I like to compare that plan to quantum physics: if you don’t find it shocking, you haven’t understood it properly :) The simple addition of a redundant GROUP BY has resulted in the EXISTS form of the query being transformed into exactly the same optimal plan we found earlier. What’s more, in SQL Server 2008 and later, we can replace the odd-looking GROUP BY with an explicit GROUP BY on the empty set: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ); I offer that as an alternative because some people find it more intuitive (and it perhaps has more geek value too). Whichever way you prefer, it’s rather satisfying to note that the result of the sub-query does not exist for a particular correlated value where a vector aggregate is used (the scalar COUNT aggregate always returns a value, even if zero, so it always ‘EXISTS’ regardless which ProductID is logically being evaluated). The following query forms also produce the optimal plan and correct results, so long as a vector aggregate is used (you can probably find more equivalent query forms): WHERE Clause SELECT p.Name FROM Production.Product AS p WHERE ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) < 10; APPLY SELECT p.Name FROM Production.Product AS p CROSS APPLY ( SELECT NULL FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ) AS ca (dummy); FROM Clause SELECT q1.Name FROM ( SELECT p.Name, cnt = ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) FROM Production.Product AS p ) AS q1 WHERE q1.cnt < 10; This last example uses SUM(1) instead of COUNT and does not require a vector aggregate…you should be able to work out why :) SELECT q.Name FROM ( SELECT p.Name, cnt = ( SELECT SUM(1) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID ) FROM Production.Product AS p ) AS q WHERE q.cnt < 10; The semantics of SQL aggregates are rather odd in places. It definitely pays to get to know the rules, and to be careful to check whether your queries are using scalar or vector aggregates. As we have seen, query plans do not show in which ‘mode’ an aggregate is running and getting it wrong can cause poor performance, wrong results, or both. © 2012 Paul White Twitter: @SQL_Kiwi email: [email protected]

Read the article

Windows Azure Myths

- by BuckWoody

Windows Azure is part of the Microsoft "stack" - the suite of software and services we offer. Because we have so many products in almost every part of technology, it's hard to know everything about all parts of what we do - even for those of us who work here. So it's no surprise that some folks are not as familiar with Windows and SQL Azure as they are, say Windows Server or XBox. As I chat with folks about a solution for a business or organization need, I put Windows Azure into the mix. I always start off with "What do you already know about Windows Azure?" so that I don't bore folks with information they already have. I some cases they've checked out the product ahead of time and have specific questions, in others they aren't as familiar, and in still others there is a fair amount of mis-information. Sometimes that's because of a marketing failure, sometimes it's hearsay, and somtetimes it's active misinformation. I thought I might lay out a few of these misconceptions. As always - do your fact-checking! Never take anyone's word alone (including mine) as gospel. Make sure you educate yourself on your options. Your company or your clients depend on you to have the right information on IT, so make sure you live up to that. Myth 1: Nobody uses Windows Azure It's true that we don't give out numbers on the amount of clients on Windows and SQL Azure. But lots of folks are here - companies you may have heard of like Boeing, NASA, Fujitsu, The City of London, Nuedesic, and many others. I deal with firms small and large that use Windows Azure for mission-critical applications, sometimes totally on Windows and/or SQL Azure, sometimes in conjunction with an on-premises system, sometimes for only a specific component in Windows Azure like storage. The interesting thing is that many sites you visit have a Windows Azure component, or are running on Windows Azure. They just don't announce it. Just like the other cloud providers, the companies have asked to be completely branded themselves - they don't want you to be aware or care that they are on Windows Azure. Sometimes that's for security, other times it's for different reasons. It's just like the web sites you visit. For the most part, they don't advertise which OS or Web Server they use. It really just shouldn't matter. The point is that they just use what works to solve a given problem. Check out a few public case studies here: https://www.windowsazure.com/en-us/home/case-studies/ Myth 2: It's only for Microsoft stuff - can't use Open Source This is the one I face the most, and am the most dismayed by. We work just fine with many open source products, including Java, NodeJS, PHP, Ruby, Python, Hadoop, and many other languages and applications. You can quickly deploy a Wordpress, Umbraco and other "kits". We have software development kits (SDK's) for iPhones, iPads, Android, Windows phones and more. We have an SDK to work with FaceBook and other social networks. In short, we play well with others. More on the languages and runtimes we support here: https://www.windowsazure.com/en-us/develop/overview/ More on the SDK's here: http://www.wadewegner.com/2011/05/windows-azure-toolkit-for-ios/, http://www.wadewegner.com/2011/08/windows-azure-toolkits-for-devices-now-with-android/, http://azuretoolkit.codeplex.com/ Myth 3: Microsoft expects me to switch everything to "the cloud" No, we don't. That would be disasterous, unless the only things you run in your company uses works perfectly in Azure. Use Windows Azure - or any cloud for that matter - where it works. Whenever I talk to companies, I focus on two things: Something that is broken and needs to be re-architected Something you want to do that is new If something is broken, and you need new tools to scale, extend, add capacity dynamically and so on, then you can consider using Windows or SQL Azure. It can help solve problems that you have, or it may include a component you don't want to write or architect yourself. Sometimes you want to do something new, like extend your company's offerings to mobile phones, to the web, or to a social network. More info on where it works here: http://blogs.msdn.com/b/buckwoody/archive/2011/01/18/windows-azure-and-sql-azure-use-cases.aspx Myth 4: I have to write code to use Windows and SQL Azure If Windows Azure is a PaaS - a Platform as a Service - then don't you have to write code to use it? Nope. Windows and SQL Azure are made up of various components. Some of those components allow you to write and deploy code (like Compute) and others don't. We have lots of customers using Windows Azure storage as a backup, to securely share files instead of using DropBox, to distribute videos or code or firmware, and more. Others use our High Performance Computing (HPC) offering to rent a supercomputer when they need one. You can even throw workloads at that using Excel! In addition there are lots of other components in Windows Azure you can use, from the Windows Azure Media Services to others. More here: https://www.windowsazure.com/en-us/home/scenarios/saas/ Myth 5: Windows Azure is just another form of "vendor lock-in" Windows Azure uses .NET, OSS languages and standard interfaces for the code. Sure, you're not going to take the code line-for-line and run it on a mainframe, but it's standard code that you write, and can port to something else. And the data is yours - you can bring it back whever you want. It's either in text or binary form, that you have complete control over. There are no licenses - you can "pay as you go", and when you're done, you can leave the service and take all your code, data and IP with you. So go out there, read up, try it. Use it where it works. And don't believe everything you hear - sometimes the Internet doesn't get it all correct. :)

Read the article

Of C# Iterators and Performance

- by James Michael Hare

Some of you reading this will be wondering, "what is an iterator" and think I'm locked in the world of C++. Nope, I'm talking C# iterators. No, not enumerators, iterators. So, for those of you who do not know what iterators are in C#, I will explain it in summary, and for those of you who know what iterators are but are curious of the performance impacts, I will explore that as well. Iterators have been around for a bit now, and there are still a bunch of people who don't know what they are or what they do. I don't know how many times at work I've had a code review on my code and have someone ask me, "what's that yield word do?" Basically, this post came to me as I was writing some extension methods to extend IEnumerable<T> -- I'll post some of the fun ones in a later post. Since I was filtering the resulting list down, I was using the standard C# iterator concept; but that got me wondering: what are the performance implications of using an iterator versus returning a new enumeration? So, to begin, let's look at a couple of methods. This is a new (albeit contrived) method called Every(...). The goal of this method is to access and enumeration and return every nth item in the enumeration (including the first). So Every(2) would return items 0, 2, 4, 6, etc. Now, if you wanted to write this in the traditional way, you may come up with something like this: public static IEnumerable<T> Every<T>(this IEnumerable<T> list, int interval) { List<T> newList = new List<T>(); int count = 0; foreach (var i in list) { if ((count++ % interval) == 0) { newList.Add(i); } } return newList; } So basically this method takes any IEnumerable<T> and returns a new IEnumerable<T> that contains every nth item. Pretty straight forward. The problem? Well, Every<T>(...) will construct a list containing every nth item whether or not you care. What happens if you were searching this result for a certain item and find that item after five tries? You would have generated the rest of the list for nothing. Enter iterators. This C# construct uses the yield keyword to effectively defer evaluation of the next item until it is asked for. This can be very handy if the evaluation itself is expensive or if there's a fair chance you'll never want to fully evaluate a list. We see this all the time in Linq, where many expressions are chained together to do complex processing on a list. This would be very expensive if each of these expressions evaluated their entire possible result set on call. Let's look at the same example function, this time using an iterator: public static IEnumerable<T> Every<T>(this IEnumerable<T> list, int interval) { int count = 0; foreach (var i in list) { if ((count++ % interval) == 0) { yield return i; } } } Notice it does not create a new return value explicitly, the only evidence of a return is the "yield return" statement. What this means is that when an item is requested from the enumeration, it will enter this method and evaluate until it either hits a yield return (in which case that item is returned) or until it exits the method or hits a yield break (in which case the iteration ends. Behind the scenes, this is all done with a class that the CLR creates behind the scenes that keeps track of the state of the iteration, so that every time the next item is asked for, it finds that item and then updates the current position so it knows where to start at next time. It doesn't seem like a big deal, does it? But keep in mind the key point here: it only returns items as they are requested. Thus if there's a good chance you will only process a portion of the return list and/or if the evaluation of each item is expensive, an iterator may be of benefit. This is especially true if you intend your methods to be chainable similar to the way Linq methods can be chained. For example, perhaps you have a List<int> and you want to take every tenth one until you find one greater than 10. We could write that as: List<int> someList = new List<int>(); // fill list here someList.Every(10).TakeWhile(i => i <= 10); Now is the difference more apparent? If we use the first form of Every that makes a copy of the list. It's going to copy the entire list whether we will need those items or not, that can be costly! With the iterator version, however, it will only take items from the list until it finds one that is > 10, at which point no further items in the list are evaluated. So, sounds neat eh? But what's the cost is what you're probably wondering. So I ran some tests using the two forms of Every above on lists varying from 5 to 500,000 integers and tried various things. Now, iteration isn't free. If you are more likely than not to iterate the entire collection every time, iterator has some very slight overhead: Copy vs Iterator on 100% of Collection (10,000 iterations) Collection Size Num Iterated Type Total ms 5 5 Copy 5 5 5 Iterator 5 50 50 Copy 28 50 50 Iterator 27 500 500 Copy 227 500 500 Iterator 247 5000 5000 Copy 2266 5000 5000 Iterator 2444 50,000 50,000 Copy 24,443 50,000 50,000 Iterator 24,719 500,000 500,000 Copy 250,024 500,000 500,000 Iterator 251,521 Notice that when iterating over the entire produced list, the times for the iterator are a little better for smaller lists, then getting just a slight bit worse for larger lists. In reality, given the number of items and iterations, the result is near negligible, but just to show that iterators come at a price. However, it should also be noted that the form of Every that returns a copy will have a left-over collection to garbage collect. However, if we only partially evaluate less and less through the list, the savings start to show and make it well worth the overhead. Let's look at what happens if you stop looking after 80% of the list: Copy vs Iterator on 80% of Collection (10,000 iterations) Collection Size Num Iterated Type Total ms 5 4 Copy 5 5 4 Iterator 5 50 40 Copy 27 50 40 Iterator 23 500 400 Copy 215 500 400 Iterator 200 5000 4000 Copy 2099 5000 4000 Iterator 1962 50,000 40,000 Copy 22,385 50,000 40,000 Iterator 19,599 500,000 400,000 Copy 236,427 500,000 400,000 Iterator 196,010 Notice that the iterator form is now operating quite a bit faster. But the savings really add up if you stop on average at 50% (which most searches would typically do): Copy vs Iterator on 50% of Collection (10,000 iterations) Collection Size Num Iterated Type Total ms 5 2 Copy 5 5 2 Iterator 4 50 25 Copy 25 50 25 Iterator 16 500 250 Copy 188 500 250 Iterator 126 5000 2500 Copy 1854 5000 2500 Iterator 1226 50,000 25,000 Copy 19,839 50,000 25,000 Iterator 12,233 500,000 250,000 Copy 208,667 500,000 250,000 Iterator 122,336 Now we see that if we only expect to go on average 50% into the results, we tend to shave off around 40% of the time. And this is only for one level deep. If we are using this in a chain of query expressions it only adds to the savings. So my recommendation? If you have a resonable expectation that someone may only want to partially consume your enumerable result, I would always tend to favor an iterator. The cost if they iterate the whole thing does not add much at all -- and if they consume only partially, you reap some really good performance gains. Next time I'll discuss some of my favorite extensions I've created to make development life a little easier and maintainability a little better.

Read the article

Oracle Expands Sun Blade Portfolio for Cloud and Highly Virtualized Environments

- by Ferhat Hatay

Oracle announced the expansion of Sun Blade Portfolio for cloud and highly virtualized environments that deliver powerful performance and simplified management as tightly integrated systems. Along with the SPARC T3-1B blade server, Oracle VM blade cluster reference configuration and Oracle's optimized solution for Oracle WebLogic Suite, Oracle introduced the dual-node Sun Blade X6275 M2 server module with some impressive benchmark results. Benchmarks on the Sun Blade X6275 M2 server module demonstrate the outstanding performance characteristics critical for running varied commercial applications used in cloud and highly virtualized environments. These include best-in-class SPEC CPU2006 results with the Intel Xeon processor 5600 series, six Fluent world records and 1.8 times the price-performance of the IBM Power 755 running NAMD, a prominent bio-informatics workload. Benchmarks for Sun Blade X6275 M2 server module SPEC CPU2006 The Sun Blade X6275 M2 server module demonstrated best in class SPECint_rate2006 results for all published results using the Intel Xeon processor 5600 series, with a result of 679. This result is 97% better than the HP BL460c G7 blade, 80% better than the IBM HS22V blade, and 79% better than the Dell M710 blade. This result demonstrates the density advantage of the new Oracle's server module for space-constrained data centers. Sun Blade X6275M2 (2 Nodes, Intel Xeon X5670 2.93GHz) - 679 SPECint_rate2006; HP ProLiant BL460c G7 (2.93 GHz, Intel Xeon X5670) - 347 SPECint_rate2006; IBM BladeCenter HS22V (Intel Xeon X5680) - 377 SPECint_rate2006; Dell PowerEdge M710 (Intel Xeon X5680, 3.33 GHz) - 380 SPECint_rate2006. SPEC, SPECint, SPECfp reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org as of 11/24/2010 and this report. For more specifics about these results, please go to see http://blogs.sun.com/BestPerf Fluent The Sun Fire X6275 M2 server module produced world-record results on each of the six standard cases in the current "FLUENT 12" benchmark test suite at 8-, 12-, 24-, 32-, 64- and 96-core configurations. These results beat the most recent QLogic score with IBM DX 360 M series platforms and QLogic "Truescale" interconnects. Results on sedan_4m test case on the Sun Blade X6275 M2 server module are 23% better than the HP C7000 system, and 20% better than the IBM DX 360 M2; Dell has not posted a result for this test case. Results can be found at the FLUENT website. ANSYS's FLUENT software solves fluid flow problems, and is based on a numerical technique called computational fluid dynamics (CFD), which is used in the automotive, aerospace, and consumer products industries. The FLUENT 12 benchmark test suite consists of seven models that are well suited for multi-node clustered environments and representative of modern engineering CFD clusters. Vendors benchmark their systems with the principal objective of providing comparative performance information for FLUENT software that, among other things, depends on compilers, optimization, interconnect, and the performance characteristics of the hardware. FLUENT application performance is representative of other commercial applications that require memory and CPU resources to be available in a scalable cluster-ready format. FLUENT benchmark has six conventional test cases (eddy_417k, turbo_500k, aircraft_2m, sedan_4m, truck_14m, truck_poly_14m) at various core counts. All information on the FLUENT website (http://www.fluent.com) is Copyrighted1995-2010 by ANSYS Inc. Results as of November 24, 2010. For more specifics about these results, please go to see http://blogs.sun.com/BestPerf NAMD Results on the Sun Blade X6275 M2 server module running NAMD (a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems) show up to a 1.8X better price/performance than IBM's Power 7-based system. For space-constrained environments, the ultra-dense Sun Blade X6275 M2 server module provides a 1.7X better price/performance per rack unit than IBM's system. IBM Power 755 4-way Cluster (16U). Total price for cluster: $324,212. See IBM United States Hardware Announcement 110-008, dated February 9, 2010, pp. 4, 21 and 39-46. Sun Blade X6275 M2 8-Blade Cluster (10U). Total price for cluster: $193,939. Price/performance and performance/RU comparisons based on f1ATPase molecule test results. Sun Blade X6275 M2 cluster: $3,568/step/sec, 5.435 step/sec/RU. IBM Power 755 cluster: $6,355/step/sec, 3.189 step/sec/U. See http://www-03.ibm.com/systems/power/hardware/reports/system_perf.html. See http://www.ks.uiuc.edu/Research/namd/performance.html for more information, results as of 11/24/10. For more specifics about these results, please go to see http://blogs.sun.com/BestPerf Reverse Time Migration The Reverse Time Migration is heavily used in geophysical imaging and modeling for Oil & Gas Exploration. The Sun Blade X6275 M2 server module showed up to a 40% performance improvement over the previous generation server module with super-linear scalability to 16 nodes for the 9-Point Stencil used in this Reverse Time Migration computational kernel. The balanced combination of Oracle's Sun Storage 7410 system with the Sun Blade X6275 M2 server module cluster showed linear scalability for the total application throughput, including the I/O and MPI communication, to produce a final 3-D seismic depth imaged cube for interpretation. The final image write time from the Sun Blade X6275 M2 server module nodes to Oracle's Sun Storage 7410 system achieved 10GbE line speed of 1.25 GBytes/second or better performance. Between subsequent runs, the effects of I/O buffer caching on the Sun Blade X6275 M2 server module nodes and write optimized caching on the Sun Storage 7410 system gave up to 1.8 GBytes/second effective write performance. The performance results and characterization of this Reverse Time Migration benchmark could serve as a useful measure for many other I/O intensive commercial applications. 3D VTI Reverse Time Migration Seismic Depth Imaging, see http://blogs.sun.com/BestPerf/entry/3d_vti_reverse_time_migration for more information, results as of 11/14/2010.

Read the article

Parent Objects

- by Ali Bahrami

Support for Parent Objects was added in Solaris 11 Update 1. The following material is adapted from the PSARC arc case, and the Solaris Linker and Libraries Manual. A "plugin" is a shared object, usually loaded via dlopen(), that is used by a program in order to allow the end user to add functionality to the program. Examples of plugins include those used by web browsers (flash, acrobat, etc), as well as mdb and elfedit modules. The object that loads the plugin at runtime is called the "parent object". Unlike most object dependencies, the parent is not identified by name, but by its status as the object doing the load. Historically, building a good plugin is has been more complicated than it should be: A parent and its plugin usually share a 2-way dependency: The plugin provides one or more routines for the parent to call, and the parent supplies support routines for use by the plugin for things like memory allocation and error reporting. It is a best practice to build all objects, including plugins, with the -z defs option, in order to ensure that the object specifies all of its dependencies, and is self contained. However: The parent is usually an executable, which cannot be linked to via the usual library mechanisms provided by the link editor. Even if the parent is a shared object, which could be a normal library dependency to the plugin, it may be desirable to build plugins that can be used by more than one parent, in which case embedding a dependency NEEDED entry for one of the parents is undesirable. The usual way to build a high quality plugin with -z defs uses a special mapfile provided by the parent. This mapfile defines the parent routines, specifying the PARENT attribute (see example below). This works, but is inconvenient, and error prone. The symbol table in the parent already describes what it makes available to plugins — ideally the plugin would obtain that information directly rather than from a separate mapfile. The new -z parent option to ld allows a plugin to link to the parent and access the parent symbol table. This differs from a typical dependency: No NEEDED record is created. The relationship is recorded as a logical connection to the parent, rather than as an explicit object name However, it operates in the same manner as any other dependency in terms of making symbols available to the plugin. When the -z parent option is used, the link-editor records the basename of the parent object in the dynamic section, using the new tag DT_SUNW_PARENT. This is an informational tag, which is not used by the runtime linker to locate the parent, but which is available for diagnostic purposes. The ld(1) manpage documentation for the -z parent option is: -z parent=object Specifies a "parent object", which can be an executable or shared object, against which to link the output object. This option is typically used when creating "plugin" shared objects intended to be loaded by an executable at runtime via the dlopen() function. The symbol table from the parent object is used to satisfy references from the plugin object. The use of the -z parent option makes symbols from the object calling dlopen() available to the plugin. Example For this example, we use a main program, and a plugin. The parent provides a function named parent_callback() for the plugin to call. The plugin provides a function named plugin_func() to the parent: % cat main.c #include <stdio.h> #include <dlfcn.h> #include <link.h> void parent_callback(void) { printf("plugin_func() has called parent_callback()\n"); } int main(int argc, char **argv) { typedef void plugin_func_t(void); void *hdl; plugin_func_t *plugin_func; if (argc != 2) { fprintf(stderr, "usage: main plugin\n"); return (1); } if ((hdl = dlopen(argv[1], RTLD_LAZY)) == NULL) { fprintf(stderr, "unable to load plugin: %s\n", dlerror()); return (1); } plugin_func = (plugin_func_t *) dlsym(hdl, "plugin_func"); if (plugin_func == NULL) { fprintf(stderr, "unable to find plugin_func: %s\n", dlerror()); return (1); } (*plugin_func)(); return (0); } % cat plugin.c #include <stdio.h> extern void parent_callback(void); void plugin_func(void) { printf("parent has called plugin_func() from plugin.so\n"); parent_callback(); } Building this in the traditional manner, without -zdefs: % cc -o main main.c % cc -G -o plugin.so plugin.c % ./main ./plugin.so parent has called plugin_func() from plugin.so plugin_func() has called parent_callback() As noted above, when building any shared object, the -z defs option is recommended, in order to ensure that the object is self contained and specifies all of its dependencies. However, the use of -z defs prevents the plugin object from linking due to the unsatisfied symbol from the parent object: % cc -zdefs -G -o plugin.so plugin.c Undefined first referenced symbol in file parent_callback plugin.o ld: fatal: symbol referencing errors. No output written to plugin.so A mapfile can be used to specify to ld that the parent_callback symbol is supplied by the parent object. % cat plugin.mapfile $mapfile_version 2 SYMBOL_SCOPE { global: parent_callback { FLAGS = PARENT }; }; % cc -zdefs -Mplugin.mapfile -G -o plugin.so plugin.c However, the -z parent option to ld is the most direct solution to this problem, allowing the plugin to actually link against the parent object, and obtain the available symbols from it. An added benefit of using -z parent instead of a mapfile, is that the name of the parent object is recorded in the dynamic section of the plugin, and can be displayed by the file utility: % cc -zdefs -zparent=main -G -o plugin.so plugin.c % elfdump -d plugin.so | grep PARENT [0] SUNW_PARENT 0xcc main % file plugin.so plugin.so: ELF 32-bit LSB dynamic lib 80386 Version 1, parent main, dynamically linked, not stripped % ./main ./plugin.so parent has called plugin_func() from plugin.so plugin_func() has called parent_callback() We can also observe this in elfedit plugins on Solaris systems running Solaris 11 Update 1 or newer: % file /usr/lib/elfedit/dyn.so /usr/lib/elfedit/dyn.so: ELF 32-bit LSB dynamic lib 80386 Version 1, parent elfedit, dynamically linked, not stripped, no debugging information available Related Other Work The GNU ld has an option named --just-symbols that can be used in a similar manner: --just-symbols=filename Read symbol names and their addresses from filename, but do not relocate it or include it in the output. This allows your output file to refer symbolically to absolute locations of memory defined in other programs. You may use this option more than once. -z parent is a higher level operation aimed specifically at simplifying the construction of high quality plugins. Although it employs the same operation, it differs from --just symbols in 2 significant ways: There can only be one parent. The parent is recorded in the created object, and can be displayed by 'file', or other similar tools.

Read the article

Try a sample: Using the counter predicate for event sampling

- by extended_events

Extended Events offers a rich filtering mechanism, called predicates, that allows you to reduce the number of events you collect by specifying criteria that will be applied during event collection. (You can find more information about predicates in Using SQL Server 2008 Extended Events (by Jonathan Kehayias)) By evaluating predicates early in the event firing sequence we can reduce the performance impact of collecting events by stopping event collection when the criteria are not met. You can specify predicates on both event fields and on a special object called a predicate source. Predicate sources are similar to action in that they typically are related to some type of global information available from the server. You will find that many of the actions available in Extended Events have equivalent predicate sources, but actions and predicates sources are not the same thing. Applying predicates, whether on a field or predicate source, is very similar to what you are used to in T-SQL in terms of how they work; you pick some field/source and compare it to a value, for example, session_id = 52. There is one predicate source that merits special attention though, not just for its special use, but for how the order of predicate evaluation impacts the behavior you see. I’m referring to the counter predicate source. The counter predicate source gives you a way to sample a subset of events that otherwise meet the criteria of the predicate; for example you could collect every other event, or only every tenth event. Simple CountingThe counter predicate source works by creating an in memory counter that increments every time the predicate statement is evaluated. Here is a simple example with my favorite event, sql_statement_completed, that only collects the second statement that is run. (OK, that’s not much of a sample, but this is for demonstration purposes. Here is the session definition: CREATE EVENT SESSION counter_test ON SERVERADD EVENT sqlserver.sql_statement_completed (ACTION (sqlserver.sql_text) WHERE package0.counter = 2)ADD TARGET package0.ring_bufferWITH (MAX_DISPATCH_LATENCY = 1 SECONDS) You can find general information about the session DDL syntax in BOL and from Pedro’s post Introduction to Extended Events. The important part here is the WHERE statement that defines that I only what the event where package0.count = 2; in other words, only the second instance of the event. Notice that I need to provide the package name along with the predicate source. You don’t need to provide the package name if you’re using event fields, only for predicate sources. Let’s say I run the following test queries: -- Run three statements to test the sessionSELECT 'This is the first statement'GOSELECT 'This is the second statement'GOSELECT 'This is the third statement';GO Once you return the event data from the ring buffer and parse the XML (see my earlier post on reading event data) you should see something like this: event_name sql_text sql_statement_completed SELECT ‘This is the second statement’ You can see that only the second statement from the test was actually collected. (Feel free to try this yourself. Check out what happens if you remove the WHERE statement from your session. Go ahead, I’ll wait.) Percentage Sampling OK, so that wasn’t particularly interesting, but you can probably see that this could be interesting, for example, lets say I need a 25% sample of the statements executed on my server for some type of QA analysis, that might be more interesting than just the second statement. All comparisons of predicates are handled using an object called a predicate comparator; the simple comparisons such as equals, greater than, etc. are mapped to the common mathematical symbols you know and love (eg. = and >), but to do the less common comparisons you will need to use the predicate comparators directly. You would probably look to the MOD operation to do this type sampling; we would too, but we don’t call it MOD, we call it divides_by_uint64. This comparator evaluates whether one number is divisible by another with no remainder. The general syntax for using a predicate comparator is pred_comp(field, value), field is always first and value is always second. So lets take a look at how the session changes to answer our new question of 25% sampling: CREATE EVENT SESSION counter_test_25 ON SERVERADD EVENT sqlserver.sql_statement_completed (ACTION (sqlserver.sql_text) WHERE package0.divides_by_uint64(package0.counter,4))ADD TARGET package0.ring_bufferWITH (MAX_DISPATCH_LATENCY = 1 SECONDS)GO Here I’ve replaced the simple equivalency check with the divides_by_uint64 comparator to check if the counter is evenly divisible by 4, which gives us back every fourth record. I’ll leave it as an exercise for the reader to test this session. Why order matters I indicated at the start of this post that order matters when it comes to the counter predicate – it does. Like most other predicate systems, Extended Events evaluates the predicate statement from left to right; as soon as the predicate statement is proven false we abandon evaluation of the remainder of the statement. The counter predicate source is only incremented when it is evaluated so whether or not the counter is incremented will depend on where it is in the predicate statement and whether a previous criteria made the predicate false or not. Here is a generic example: Pred1: (WHERE statement_1 AND package0.counter = 2)Pred2: (WHERE package0.counter = 2 AND statement_1) Let’s say I cause a number of events as follows and examine what happens to the counter predicate source. Iteration Statement Pred1 Counter Pred2 Counter A Not statement_1 0 1 B statement_1 1 2 C Not statement_1 1 3 D statement_1 2 4 As you can see, in the case of Pred1, statement_1 is evaluated first, when it fails (A & C) predicate evaluation is stopped and the counter is not incremented. With Pred2 the counter is evaluated first, so it is incremented on every iteration of the event and the remaining parts of the predicate are then evaluated. In this example, Pred1 would return an event for D while Pred2 would return an event for B. But wait, there is an interesting side-effect here; consider Pred2 if I had run my statements in the following order: Not statement_1 Not statement_1 statement_1 statement_1 In this case I would never get an event back from the system because the point at which counter=2, the rest of the predicate evaluates as false so the event is not returned. If you’re using the counter target for sampling and you’re not getting the expected events, or any events, check the order of the predicate criteria. As a general rule I’d suggest that the counter criteria should be the last element of your predicate statement since that will assure that your sampling rate will apply to the set of event records defined by the rest of your predicate. Aside: I’m interested in hearing about uses for putting the counter predicate criteria earlier in the predicate statement. If you have one, post it in a comment to share with the class. - Mike Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article

Writing the tests for FluentPath

- by Bertrand Le Roy

Writing the tests for FluentPath is a challenge. The library is a wrapper around a legacy API (System.IO) that wasn’t designed to be easily testable. If it were more testable, the sensible testing methodology would be to tell System.IO to act against a mock file system, which would enable me to verify that my code is doing the expected file system operations without having to manipulate the actual, physical file system: what we are testing here is FluentPath, not System.IO. Unfortunately, that is not an option as nothing in System.IO enables us to plug a mock file system in. As a consequence, we are left with few options. A few people have suggested me to abstract my calls to System.IO away so that I could tell FluentPath – not System.IO – to use a mock instead of the real thing. That in turn is getting a little silly: FluentPath already is a thin abstraction around System.IO, so layering another abstraction between them would double the test surface while bringing little or no value. I would have to test that new abstraction layer, and that would bring us back to square one. Unless I’m missing something, the only option I have here is to bite the bullet and test against the real file system. Of course, the tests that do that can hardly be called unit tests. They are more integration tests as they don’t only test bits of my code. They really test the successful integration of my code with the underlying System.IO. In order to write such tests, the techniques of BDD work particularly well as they enable you to express scenarios in natural language, from which test code is generated. Integration tests are being better expressed as scenarios orchestrating a few basic behaviors, so this is a nice fit. The Orchard team has been successfully using SpecFlow for integration tests for a while and I thought it was pretty cool so that’s what I decided to use. Consider for example the following scenario: Scenario: Change extension Given a clean test directory When I change the extension of bar\notes.txt to foo Then bar\notes.txt should not exist And bar\notes.foo should exist This is human readable and tells you everything you need to know about what you’re testing, but it is also executable code. What happens when SpecFlow compiles this scenario is that it executes a bunch of regular expressions that identify the known Given (set-up phases), When (actions) and Then (result assertions) to identify the code to run, which is then translated into calls into the appropriate methods. Nothing magical. Here is the code generated by SpecFlow: [NUnit.Framework.TestAttribute()] [NUnit.Framework.DescriptionAttribute("Change extension")] public virtual void ChangeExtension() { TechTalk.SpecFlow.ScenarioInfo scenarioInfo = new TechTalk.SpecFlow.ScenarioInfo("Change extension", ((string[])(null))); #line 6 this.ScenarioSetup(scenarioInfo); #line 7 testRunner.Given("a clean test directory"); #line 8 testRunner.When("I change the extension of " + "bar\\notes.txt to foo"); #line 9 testRunner.Then("bar\\notes.txt should not exist"); #line 10 testRunner.And("bar\\notes.foo should exist"); #line hidden testRunner.CollectScenarioErrors();} The #line directives are there to give clues to the debugger, because yes, you can put breakpoints into a scenario: The way you usually write tests with SpecFlow is that you write the scenario first, let it fail, then write the translation of your Given, When and Then into code if they don’t already exist, which results in running but failing tests, and then you write the code to make your tests pass (you implement the scenario). In the case of FluentPath, I built a simple Given method that builds a simple file hierarchy in a temporary directory that all scenarios are going to work with: [Given("a clean test directory")] public void GivenACleanDirectory() { _path = new Path(SystemIO.Path.GetTempPath()) .CreateSubDirectory("FluentPathSpecs") .MakeCurrent(); _path.GetFileSystemEntries() .Delete(true); _path.CreateFile("foo.txt", "This is a text file named foo."); var bar = _path.CreateSubDirectory("bar"); bar.CreateFile("baz.txt", "bar baz") .SetLastWriteTime(DateTime.Now.AddSeconds(-2)); bar.CreateFile("notes.txt", "This is a text file containing notes."); var barbar = bar.CreateSubDirectory("bar"); barbar.CreateFile("deep.txt", "Deep thoughts"); var sub = _path.CreateSubDirectory("sub"); sub.CreateSubDirectory("subsub"); sub.CreateFile("baz.txt", "sub baz") .SetLastWriteTime(DateTime.Now); sub.CreateFile("binary.bin", new byte[] {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0xFF}); } Then, to implement the scenario that you can read above, I had to write the following When: [When("I change the extension of (.*) to (.*)")] public void WhenIChangeTheExtension( string path, string newExtension) { var oldPath = Path.Current.Combine(path.Split('\\')); oldPath.Move(p => p.ChangeExtension(newExtension)); } As you can see, the When attribute is specifying the regular expression that will enable the SpecFlow engine to recognize what When method to call and also how to map its parameters. For our scenario, “bar\notes.txt” will get mapped to the path parameter, and “foo” to the newExtension parameter. And of course, the code that verifies the assumptions of the scenario: [Then("(.*) should exist")] public void ThenEntryShouldExist(string path) { Assert.IsTrue(_path.Combine(path.Split('\\')).Exists); } [Then("(.*) should not exist")] public void ThenEntryShouldNotExist(string path) { Assert.IsFalse(_path.Combine(path.Split('\\')).Exists); } These steps should be written with reusability in mind. They are building blocks for your scenarios, not implementation of a specific scenario. Think small and fine-grained. In the case of the above steps, I could reuse each of those steps in other scenarios. Those tests are easy to write and easier to read, which means that they also constitute a form of documentation. Oh, and SpecFlow is just one way to do this. Rob wrote a long time ago about this sort of thing (but using a different framework) and I highly recommend this post if I somehow managed to pique your interest: http://blog.wekeroad.com/blog/make-bdd-your-bff-2/ And this screencast (Rob always makes excellent screencasts): http://blog.wekeroad.com/mvc-storefront/kona-3/ (click the “Download it here” link)

Read the article

Windows Azure Myths

- by BuckWoody

Windows Azure is part of the Microsoft "stack" - the suite of software and services we offer. Because we have so many products in almost every part of technology, it's hard to know everything about all parts of what we do - even for those of us who work here. So it's no surprise that some folks are not as familiar with Windows and SQL Azure as they are, say Windows Server or XBox. As I chat with folks about a solution for a business or organization need, I put Windows Azure into the mix. I always start off with "What do you already know about Windows Azure?" so that I don't bore folks with information they already have. I some cases they've checked out the product ahead of time and have specific questions, in others they aren't as familiar, and in still others there is a fair amount of mis-information. Sometimes that's because of a marketing failure, sometimes it's hearsay, and somtetimes it's active misinformation. I thought I might lay out a few of these misconceptions. As always - do your fact-checking! Never take anyone's word alone (including mine) as gospel. Make sure you educate yourself on your options. Your company or your clients depend on you to have the right information on IT, so make sure you live up to that. Myth 1: Nobody uses Windows Azure It's true that we don't give out numbers on the amount of clients on Windows and SQL Azure. But lots of folks are here - companies you may have heard of like Boeing, NASA, Fujitsu, The City of London, Nuedesic, and many others. I deal with firms small and large that use Windows Azure for mission-critical applications, sometimes totally on Windows and/or SQL Azure, sometimes in conjunction with an on-premises system, sometimes for only a specific component in Windows Azure like storage. The interesting thing is that many sites you visit have a Windows Azure component, or are running on Windows Azure. They just don't announce it. Just like the other cloud providers, the companies have asked to be completely branded themselves - they don't want you to be aware or care that they are on Windows Azure. Sometimes that's for security, other times it's for different reasons. It's just like the web sites you visit. For the most part, they don't advertise which OS or Web Server they use. It really just shouldn't matter. The point is that they just use what works to solve a given problem. Check out a few public case studies here: https://www.windowsazure.com/en-us/home/case-studies/ Myth 2: It's only for Microsoft stuff - can't use Open Source This is the one I face the most, and am the most dismayed by. We work just fine with many open source products, including Java, NodeJS, PHP, Ruby, Python, Hadoop, and many other languages and applications. You can quickly deploy a Wordpress, Umbraco and other "kits". We have software development kits (SDK's) for iPhones, iPads, Android, Windows phones and more. We have an SDK to work with FaceBook and other social networks. In short, we play well with others. More on the languages and runtimes we support here: https://www.windowsazure.com/en-us/develop/overview/ More on the SDK's here: http://www.wadewegner.com/2011/05/windows-azure-toolkit-for-ios/, http://www.wadewegner.com/2011/08/windows-azure-toolkits-for-devices-now-with-android/, http://azuretoolkit.codeplex.com/ Myth 3: Microsoft expects me to switch everything to "the cloud" No, we don't. That would be disasterous, unless the only things you run in your company uses works perfectly in Azure. Use Windows Azure - or any cloud for that matter - where it works. Whenever I talk to companies, I focus on two things: Something that is broken and needs to be re-architected Something you want to do that is new If something is broken, and you need new tools to scale, extend, add capacity dynamically and so on, then you can consider using Windows or SQL Azure. It can help solve problems that you have, or it may include a component you don't want to write or architect yourself. Sometimes you want to do something new, like extend your company's offerings to mobile phones, to the web, or to a social network. More info on where it works here: http://blogs.msdn.com/b/buckwoody/archive/2011/01/18/windows-azure-and-sql-azure-use-cases.aspx Myth 4: I have to write code to use Windows and SQL Azure If Windows Azure is a PaaS - a Platform as a Service - then don't you have to write code to use it? Nope. Windows and SQL Azure are made up of various components. Some of those components allow you to write and deploy code (like Compute) and others don't. We have lots of customers using Windows Azure storage as a backup, to securely share files instead of using DropBox, to distribute videos or code or firmware, and more. Others use our High Performance Computing (HPC) offering to rent a supercomputer when they need one. You can even throw workloads at that using Excel! In addition there are lots of other components in Windows Azure you can use, from the Windows Azure Media Services to others. More here: https://www.windowsazure.com/en-us/home/scenarios/saas/ Myth 5: Windows Azure is just another form of "vendor lock-in" Windows Azure uses .NET, OSS languages and standard interfaces for the code. Sure, you're not going to take the code line-for-line and run it on a mainframe, but it's standard code that you write, and can port to something else. And the data is yours - you can bring it back whever you want. It's either in text or binary form, that you have complete control over. There are no licenses - you can "pay as you go", and when you're done, you can leave the service and take all your code, data and IP with you. So go out there, read up, try it. Use it where it works. And don't believe everything you hear - sometimes the Internet doesn't get it all correct. :)

Read the article

The blocking nature of aggregates

- by Rob Farley

I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here. In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

Read the article

The blocking nature of aggregates

- by Rob Farley

I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here. In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

Read the article

MySQL Cluster 7.3 Labs Release – Foreign Keys Are In!

- by Mat Keep

0 0 1 1097 6254 Homework 52 14 7337 14.0 Normal 0 false false false EN-US JA X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:Cambria; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin; mso-ansi-language:EN-US;} Summary (aka TL/DR): Support for Foreign Key constraints has been one of the most requested feature enhancements for MySQL Cluster. We are therefore extremely excited to announce that Foreign Keys are part of the first Labs Release of MySQL Cluster 7.3 – available for download, evaluation and feedback now! (Select the mysql-cluster-7.3-labs-June-2012 build) In this blog, I will attempt to discuss the design rationale, implementation, configuration and steps to get started in evaluating the first MySQL Cluster 7.3 Labs Release. Pace of Innovation It was only a couple of months ago that we announced the General Availability (GA) of MySQL Cluster 7.2, delivering 1 billion Queries per Minute, with 70x higher cross-shard JOIN performance, Memcached NoSQL key-value API and cross-data center replication. This release has been a huge hit, with downloads and deployments quickly reaching record levels. The announcement of the first MySQL Cluster 7.3 Early Access lab release at today's MySQL Innovation Day event demonstrates the continued pace in Cluster development, and provides an opportunity for the community to evaluate and feedback on new features they want to see. What’s the Plan for MySQL Cluster 7.3? Well, Foreign Keys, as you may have gathered by now (!), and this is the focus of this first Labs Release. As with MySQL Cluster 7.2, we plan to publish a series of preview releases for 7.3 that will incrementally add new candidate features for a final GA release (subject to usual safe harbor statement below*), including: - New NoSQL APIs; - Features to automate the configuration and provisioning of multi-node clusters, on premise or in the cloud; - Performance and scalability enhancements; - Taking advantage of features in the latest MySQL 5.x Server GA. Design Rationale MySQL Cluster is designed as a “Not-Only-SQL” database. It combines attributes that enable users to blend the best of both relational and NoSQL technologies into solutions that deliver web scalability with 99.999% availability and real-time performance, including: Concurrent NoSQL and SQL access to the database; Auto-sharding with simple scale-out across commodity hardware; Multi-master replication with failover and recovery both within and across data centers; Shared-nothing architecture with no single point of failure; Online scaling and schema changes; ACID compliance and support for complex queries, across shards. Native support for Foreign Key constraints enables users to extend the benefits of MySQL Cluster into a broader range of use-cases, including: - Packaged applications in areas such as eCommerce and Web Content Management that prescribe databases with Foreign Key support. - In-house developments benefiting from Foreign Key constraints to simplify data models and eliminate the additional application logic needed to maintain data consistency and integrity between tables. Implementation The Foreign Key functionality is implemented directly within MySQL Cluster’s data nodes, allowing any client API accessing the cluster to benefit from them – whether using SQL or one of the NoSQL interfaces (Memcached, C++, Java, JPA or HTTP/REST.) The core referential actions defined in the SQL:2003 standard are implemented: CASCADE RESTRICT NO ACTION SET NULL In addition, the MySQL Cluster implementation supports the online adding and dropping of Foreign Keys, ensuring the Cluster continues to serve both read and write requests during the operation. An important difference to note with the Foreign Key implementation in InnoDB is that MySQL Cluster does not support the updating of Primary Keys from within the Data Nodes themselves - instead the UPDATE is emulated with a DELETE followed by an INSERT operation. Therefore an UPDATE operation will return an error if the parent reference is using a Primary Key, unless using CASCADE action, in which case the delete operation will result in the corresponding rows in the child table being deleted. The Engineering team plans to change this behavior in a subsequent preview release. Also note that when using InnoDB "NO ACTION" is identical to "RESTRICT". In the case of MySQL Cluster “NO ACTION” means “deferred check”, i.e. the constraint is checked before commit, allowing user-defined triggers to automatically make changes in order to satisfy the Foreign Key constraints. Configuration There is nothing special you have to do here – Foreign Key constraint checking is enabled by default. If you intend to migrate existing tables from another database or storage engine, for example from InnoDB, there are a couple of best practices to observe: 1. Analyze the structure of the Foreign Key graph and run the ALTER TABLE ENGINE=NDB in the correct sequence to ensure constraints are enforced 2. Alternatively drop the Foreign Key constraints prior to the import process and then recreate when complete. Getting Started Read this blog for a demonstration of using Foreign Keys with MySQL Cluster. You can download MySQL Cluster 7.3 Labs Release with Foreign Keys today - (select the mysql-cluster-7.3-labs-June-2012 build) If you are new to MySQL Cluster, the Getting Started guide will walk you through installing an evaluation cluster on a singe host (these guides reflect MySQL Cluster 7.2, but apply equally well to 7.3) Post any questions to the MySQL Cluster forum where our Engineering team will attempt to assist you. Post any bugs you find to the MySQL bug tracking system (select MySQL Cluster from the Category drop-down menu) And if you have any feedback, please post them to the Comments section of this blog. Summary MySQL Cluster 7.2 is the GA, production-ready release of MySQL Cluster. This first Labs Release of MySQL Cluster 7.3 gives you the opportunity to preview and evaluate future developments in the MySQL Cluster database, and we are very excited to be able to share that with you. Let us know how you get along with MySQL Cluster 7.3, and other features that you want to see in future releases. * Safe Harbor Statement This information is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Read the article

The Data Scientist

- by BuckWoody

A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I think it means, and why I’m excited about it. In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets. The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly. I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas. Persistence The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all? This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization. Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to. Processing Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on. A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one. Presentation This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays. This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does. And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist. Why I’m excited I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail. Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data. So watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way.

Read the article

The Shift: how Orchard painlessly shifted to document storage, and how it’ll affect you

- by Bertrand Le Roy

We’ve known it all along. The storage for Orchard content items would be much more efficient using a document database than a relational one. Orchard content items are composed of parts that serialize naturally into infoset kinds of documents. Storing them as relational data like we’ve done so far was unnatural and requires the data for a single item to span multiple tables, related through 1-1 relationships. This means lots of joins in queries, and a great potential for Select N+1 problems. Document databases, unfortunately, are still a tough sell in many places that prefer the more familiar relational model. Being able to x-copy Orchard to hosters has also been a basic constraint in the design of Orchard. Combine those with the necessity at the time to run in medium trust, and with license compatibility issues, and you’ll find yourself with very few reasonable choices. So we went, a little reluctantly, for relational SQL stores, with the dream of one day transitioning to document storage. We have played for a while with the idea of building our own document storage on top of SQL databases, and Sébastien implemented something more than decent along those lines, but we had a better way all along that we didn’t notice until recently… In Orchard, there are fields, which are named properties that you can add dynamically to a content part. Because they are so dynamic, we have been storing them as XML into a column on the main content item table. This infoset storage and its associated API are fairly generic, but were only used for fields. The breakthrough was when Sébastien realized how this existing storage could give us the advantages of document storage with minimal changes, while continuing to use relational databases as the substrate. public bool CommercialPrices { get { return this.Retrieve(p => p.CommercialPrices); } set { this.Store(p => p.CommercialPrices, value); } } This code is very compact and efficient because the API can infer from the expression what the type and name of the property are. It is then able to do the proper conversions for you. For this code to work in a content part, there is no need for a record at all. This is particularly nice for site settings: one query on one table and you get everything you need. This shows how the existing infoset solves the data storage problem, but you still need to query. Well, for those properties that need to be filtered and sorted on, you can still use the current record-based relational system. This of course continues to work. We do however provide APIs that make it trivial to store into both record properties and the infoset storage in one operation: public double Price { get { return Retrieve(r => r.Price); } set { Store(r => r.Price, value); } } This code looks strikingly similar to the non-record case above. The difference is that it will manage both the infoset and the record-based storages. The call to the Store method will send the data in both places, keeping them in sync. The call to the Retrieve method does something even cooler: if the property you’re looking for exists in the infoset, it will return it, but if it doesn’t, it will automatically look into the record for it. And if that wasn’t cool enough, it will take that value from the record and store it into the infoset for the next time it’s required. This means that your data will start automagically migrating to infoset storage just by virtue of using the code above instead of the usual: public double Price { get { return Record.Price; } set { Record.Price = value; } } As your users browse the site, it will get faster and faster as Select N+1 issues will optimize themselves away. If you preferred, you could still have explicit migration code, but it really shouldn’t be necessary most of the time. If you do already have code using QueryHints to mitigate Select N+1 issues, you might want to reconsider those, as with the new system, you’ll want to avoid joins that you don’t need for filtering or sorting, further optimizing your queries. There are some rare cases where the storage of the property must be handled differently. Check out this string[] property on SearchSettingsPart for example: public string[] SearchedFields { get { return (Retrieve<string>("SearchedFields") ?? "") .Split(new[] {',', ' '}, StringSplitOptions.RemoveEmptyEntries); } set { Store("SearchedFields", String.Join(", ", value)); } } The array of strings is transformed by the property accessors into and from a comma-separated list stored in a string. The Retrieve and Store overloads used in this case are lower-level versions that explicitly specify the type and name of the attribute to retrieve or store. You may be wondering what this means for code or operations that look directly at the database tables instead of going through the new infoset APIs. Even if there is a record, the infoset version of the property will win if it exists, so it is necessary to keep the infoset up-to-date. It’s not very complicated, but definitely something to keep in mind. Here is what a product record looks like in Nwazet.Commerce for example: And here is the same data in the infoset: The infoset is stored in Orchard_Framework_ContentItemRecord or Orchard_Framework_ContentItemVersionRecord, depending on whether the content type is versionable or not. A good way to find what you’re looking for is to inspect the record table first, as it’s usually easier to read, and then get the item record of the same id. Here is the detailed XML document for this product: <Data> <ProductPart Inventory="40" Price="18" Sku="pi-camera-box" OutOfStockMessage="" AllowBackOrder="false" Weight="0.2" Size="" ShippingCost="null" IsDigital="false" /> <ProductAttributesPart Attributes="" /> <AutoroutePart DisplayAlias="camera-box" /> <TitlePart Title="Nwazet Pi Camera Box" /> <BodyPart Text="[...]" /> <CommonPart CreatedUtc="2013-09-10T00:39:00Z" PublishedUtc="2013-09-14T01:07:47Z" /> </Data> The data is neatly organized under each part. It is easy to see how that document is all you need to know about that content item, all in one table. If you want to modify that data directly in the database, you should be careful to do it in both the record table and the infoset in the content item record. In this configuration, the record is now nothing more than an index, and will only be used for sorting and filtering. Of course, it’s perfectly fine to mix record-backed properties and record-less properties on the same part. It really depends what you think must be sorted and filtered on. In turn, this potentially simplifies migrations considerably. So here it is, the great shift of Orchard to document storage, something that Orchard has been designed for all along, and that we were able to implement with a satisfying and surprising economy of resources. Expect this code to make its way into the 1.8 version of Orchard when that’s available.

Read the article

Packaging Swing apps with integrated JavaFX content

- by igor

JavaFX provides a lot of interesting capabilities for developing rich client applications in Java, but what if you are working on an existing Swing application and you want to take advantage of these new features? Maybe you want to use one or two controls like the LineChart or a MediaView. Maybe you want to embed a large Scene Graph as an initial step in porting your application to FX. A hybrid Swing/FX application might just be the answer. Developing a hybrid Swing + JavaFX application is not terribly difficult, but until recently the deployment of hybrid applications has not simple as a "pure" JavaFX application. The existing tools focused on packaging FX Applications, or Swing applications - they did not account for hybrid applications. But with JavaFX 2.2 the tools include support for this hybrid application use case. Solution In JavaFX 2.2 we extended the packaging ant tasks to greatly simplify deploying hybrid applications. You now use the same deployment approach as you would for pure JavaFX applications. Just bundle your main application jar with the fx:jar ant task and then generate html/jnlp files using fx:deploy. The only difference is setting toolkit attribute for the fx:application tag as shown below: <fx:application id="swingFXApp" mainClass="${main.class}" toolkit="swing"/> The value of ${main.class} in the example above is your application class which has a main method. It does not need to extend JavaFX Application class. The resulting package provides support for the same set of execution modes as a package for a JavaFX application, although the packages which are created are not identical to the packages created for a pure FX application. You will see two JNLP files generated in the case of a hybrid application - one for use from Swing applet and another for the webstart launch. Note that these improvements do not alter the set of features available to Swing applications. The packaging tools just make it easier to use the advanced features of JavaFX in your Swing application. The same limits still apply, for example a Swing application can not use JavaFX Preloaders and code changes are necessary to support HTML splash screens. Why should I use the JavaFX ant tasks for packaging my Swing application? While using FX packaging tool for a Swing application may seem like a mismatch at face value, there are some really good reasons to use this approach. The primary justification for our packaging tools is to simplify the creation of your application artifacts, and to reduce manual errors. Plus, no one should have to write JNLP by hand. Some specific benefits include: Your application jar will include a launcher program. This improves your standalone launch by: checking for the JavaFX runtime guiding the user through any necessary installations setting the system proxy for Java The ant tasks will generate JNLP and HTML files for your swing app: avoids learning unnecessary details about JNLP, and eliminates the error-prone hand editing of JNLP files simplifies using advanced features like embedding JNLP and signing jars as BLOBs to improve launch performance.you can also embed the signing certificate details to improve the user's experience allows the use of web page templates to inject the generated code directly into your actual web page instead of being forced to copy/paste the generated code snippets. What about native packing? Absolutely! The very same ant task can generate a native bundle for a Swing application with JavaFX content. Try running one of these sample native bundles for the "SwingInterop" FX example: exe and dmg. I also used another feature on these examples: a click-through license agreement for .exe installers and OS X DMG drag installers. Small Caveat This packaging procedure is optimized around using the JavaFX packaging tools for your entire Swing application. If you are trying to embed JavaFX content into existing project (with an existing build/packing process) then you may need to experiment in order to find the best way to integrate the JavaFX packaging steps into your existing build procedure. As long as you can use ant in your build process this should be a workable approach. It some cases solution could be less than ideal. For example, you need to use fx:jar to package your main jar file in order to produce a double-clickable jar or a native bundle. The jar will be created from scratch, but you may already be creating the main jar file with a custom manifest. This may lead to some redundant steps in your build process. Hopefully the benefits will outweigh the problems. This is an area of ongoing development for the team, and we will continue to refine and improve both the tools and the process. Please share your experiences and suggestions with us. You can comment here on the blog or file issues to JIRA. Sample code Here is the full ant code used to package SwingInterop. You can grab latest JavaFX samples and try it yourself: <target name="-post-jar"> <taskdef resource="com/sun/javafx/tools/ant/antlib.xml" uri="javafx:com.sun.javafx.tools.ant" classpath="${javafx.tools.ant.jar}"/>  <fx:application id="swingFXApp" mainClass="${main.class}" toolkit="swing"/>  <fx:jar destfile="${dist.jar}"> <fileset dir="${build.classes.dir}"/> <fx:application refid="swingFXApp" name="SwingInterop"/> <manifest> <attribute name="Implementation-Vendor" value="${application.vendor}"/> <attribute name="Implementation-Title" value="${application.title}"/> <attribute name="Implementation-Version" value="1.0"/> </manifest> </fx:jar>  <delete file="${build.dir}/test.keystore"/> <genkey alias="TestAlias" storepass="xyz123" keystore="${build.dir}/test.keystore" dname="CN=Samples, OU=JavaFX Dev, O=Oracle, C=US"/> <fx:signjar keystore="${build.dir}/test.keystore" alias="TestAlias" storepass="xyz123"> <fileset file="${dist.jar}"/> </fx:signjar>  <fx:deploy width="960" height="720" includeDT="true" nativeBundles="all" outdir="${basedir}/${dist.dir}" embedJNLP="true" outfile="${application.title}"> <fx:application refId="swingFXApp"/> <fx:resources> <fx:fileset dir="${basedir}/${dist.dir}" includes="SwingInterop.jar"/> </fx:resources> <fx:permissions/> <info title="Sample app: ${application.title}" vendor="${application.vendor}"/> </fx:deploy> </target>

Read the article

Extending Oracle CEP with Predictive Analytics

- by vikram.shukla(at)oracle.com

Introduction: OCEP is often used as a business rules engine to execute a set of business logic rules via CQL statements, and take decisions based on the outcome of those rules. There are times where configuring rules manually is sufficient because an application needs to deal with only a small and well-defined set of static rules. However, in many situations customers don't want to pre-define such rules for two reasons. First, they are dealing with events with lots of columns and manually crafting such rules for each column or a set of columns and combinations thereof is almost impossible. Second, they are content with probabilistic outcomes and do not care about 100% precision. The former is the case when a user is dealing with data with high dimensionality, the latter when an application can live with "false" positives as they can be discarded after further inspection, say by a Human Task component in a Business Process Management software. The primary goal of this blog post is to show how this can be achieved by combining OCEP with Oracle Data Mining® and leveraging the latter's rich set of algorithms and functionality to do predictive analytics in real time on streaming events. The secondary goal of this post is also to show how OCEP can be extended to invoke any arbitrary external computation in an RDBMS from within CEP. The extensible facility is known as the JDBC cartridge. The rest of the post describes the steps required to achieve this: We use the dataset available at http://blogs.oracle.com/datamining/2010/01/fraud_and_anomaly_detection_made_simple.html to showcase the capabilities. We use it to show how transaction anomalies or fraud can be detected. Building the model: Follow the self-explanatory steps described at the above URL to build the model. It is very simple - it uses built-in Oracle Data Mining PL/SQL packages to cleanse, normalize and build the model out of the dataset. You can also use graphical Oracle Data Miner® to build the models. To summarize, it involves: Specifying which algorithms to use. In this case we use Support Vector Machines as we're trying to find anomalies in highly dimensional dataset.Build model on the data in the table for the algorithms specified. For this example, the table was populated in the scott/tiger schema with appropriate privileges. Configuring the Data Source: This is the first step in building CEP application using such an integration. Our datasource looks as follows in the server config file. It is advisable that you use the Visualizer to add it to the running server dynamically, rather than manually edit the file. <data-source> <name>DataMining</name> <data-source-params> <jndi-names> <element>DataMining</element> </jndi-names> <global-transactions-protocol>OnePhaseCommit</global-transactions-protocol> </data-source-params> <connection-pool-params> <credential-mapping-enabled></credential-mapping-enabled> <test-table-name>SQL SELECT 1 from DUAL</test-table-name> <initial-capacity>1</initial-capacity> <max-capacity>15</max-capacity> <capacity-increment>1</capacity-increment> </connection-pool-params> <driver-params> <use-xa-data-source-interface>true</use-xa-data-source-interface> <driver-name>oracle.jdbc.OracleDriver</driver-name> <url>jdbc:oracle:thin:@localhost:1522:orcl</url> <properties> <element> <value>scott</value> <name>user</name> </element> <element> <value>{Salted-3DES}AzFE5dDbO2g=</value> <name>password</name> </element> <element> <name>com.bea.core.datasource.serviceName</name> <value>oracle11.2g</value> </element> <element> <name>com.bea.core.datasource.serviceVersion</name> <value>11.2.0</value> </element> <element> <name>com.bea.core.datasource.serviceObjectClass</name> <value>java.sql.Driver</value> </element> </properties> </driver-params> </data-source> Designing the EPN: The EPN is very simple in this example. We briefly describe each of the components. The adapter ("DataMiningAdapter") reads data from a .csv file and sends it to the CQL processor downstream. The event payload here is same as that of the table in the database (refer to the attached project or do a "desc table-name" from a SQL*PLUS prompt). While this is for convenience in this example, it need not be the case. One can still omit fields in the streaming events, and need not match all columns in the table on which the model was built. Better yet, it does not even need to have the same name as columns in the table, as long as you alias them in the USING clause of the mining function. (Caveat: they still need to draw values from a similar universe or domain, otherwise it constitutes incorrect usage of the model). There are two things in the CQL processor ("DataMiningProc") that make scoring possible on streaming events. 1. User defined cartridge function Please refer to the OCEP CQL reference manual to find more details about how to define such functions. We include the function below in its entirety for illustration. <?xml version="1.0" encoding="UTF-8"?> <jdbcctxconfig:config xmlns:jdbcctxconfig="http://www.bea.com/ns/wlevs/config/application" xmlns:jc="http://www.oracle.com/ns/ocep/config/jdbc"> <jc:jdbc-ctx> <name>Oracle11gR2</name> <data-source>DataMining</data-source> <function name="prediction2"> <param name="CQLMONTH" type="char"/> <param name="WEEKOFMONTH" type="int"/> <param name="DAYOFWEEK" type="char" /> <param name="MAKE" type="char" /> <param name="ACCIDENTAREA" type="char" /> <param name="DAYOFWEEKCLAIMED" type="char" /> <param name="MONTHCLAIMED" type="char" /> <param name="WEEKOFMONTHCLAIMED" type="int" /> <param name="SEX" type="char" /> <param name="MARITALSTATUS" type="char" /> <param name="AGE" type="int" /> <param name="FAULT" type="char" /> <param name="POLICYTYPE" type="char" /> <param name="VEHICLECATEGORY" type="char" /> <param name="VEHICLEPRICE" type="char" /> <param name="FRAUDFOUND" type="int" /> <param name="POLICYNUMBER" type="int" /> <param name="REPNUMBER" type="int" /> <param name="DEDUCTIBLE" type="int" /> <param name="DRIVERRATING" type="int" /> <param name="DAYSPOLICYACCIDENT" type="char" /> <param name="DAYSPOLICYCLAIM" type="char" /> <param name="PASTNUMOFCLAIMS" type="char" /> <param name="AGEOFVEHICLES" type="char" /> <param name="AGEOFPOLICYHOLDER" type="char" /> <param name="POLICEREPORTFILED" type="char" /> <param name="WITNESSPRESNT" type="char" /> <param name="AGENTTYPE" type="char" /> <param name="NUMOFSUPP" type="char" /> <param name="ADDRCHGCLAIM" type="char" /> <param name="NUMOFCARS" type="char" /> <param name="CQLYEAR" type="int" /> <param name="BASEPOLICY" type="char" /> <return-component-type>char</return-component-type> <sql><![CDATA[ SELECT to_char(PREDICTION_PROBABILITY(CLAIMSMODEL, '0' USING *)) AS probability FROM (SELECT :CQLMONTH AS MONTH, :WEEKOFMONTH AS WEEKOFMONTH, :DAYOFWEEK AS DAYOFWEEK, :MAKE AS MAKE, :ACCIDENTAREA AS ACCIDENTAREA, :DAYOFWEEKCLAIMED AS DAYOFWEEKCLAIMED, :MONTHCLAIMED AS MONTHCLAIMED, :WEEKOFMONTHCLAIMED, :SEX AS SEX, :MARITALSTATUS AS MARITALSTATUS, :AGE AS AGE, :FAULT AS FAULT, :POLICYTYPE AS POLICYTYPE, :VEHICLECATEGORY AS VEHICLECATEGORY, :VEHICLEPRICE AS VEHICLEPRICE, :FRAUDFOUND AS FRAUDFOUND, :POLICYNUMBER AS POLICYNUMBER, :REPNUMBER AS REPNUMBER, :DEDUCTIBLE AS DEDUCTIBLE, :DRIVERRATING AS DRIVERRATING, :DAYSPOLICYACCIDENT AS DAYSPOLICYACCIDENT, :DAYSPOLICYCLAIM AS DAYSPOLICYCLAIM, :PASTNUMOFCLAIMS AS PASTNUMOFCLAIMS, :AGEOFVEHICLES AS AGEOFVEHICLES, :AGEOFPOLICYHOLDER AS AGEOFPOLICYHOLDER, :POLICEREPORTFILED AS POLICEREPORTFILED, :WITNESSPRESNT AS WITNESSPRESENT, :AGENTTYPE AS AGENTTYPE, :NUMOFSUPP AS NUMOFSUPP, :ADDRCHGCLAIM AS ADDRCHGCLAIM, :NUMOFCARS AS NUMOFCARS, :CQLYEAR AS YEAR, :BASEPOLICY AS BASEPOLICY FROM dual) ]]> </sql> </function> </jc:jdbc-ctx> </jdbcctxconfig:config> 2. Invoking the function for each event. Once this function is defined, you can invoke it from CQL as follows: <?xml version="1.0" encoding="UTF-8"?> <wlevs:config xmlns:wlevs="http://www.bea.com/ns/wlevs/config/application"> <processor> <name>DataMiningProc</name> <rules> <query id="q1"><![CDATA[ ISTREAM(SELECT S.CQLMONTH, S.WEEKOFMONTH, S.DAYOFWEEK, S.MAKE, : S.BASEPOLICY, C.F AS probability FROM StreamDataChannel [NOW] AS S, TABLE(prediction2@Oracle11gR2(S.CQLMONTH, S.WEEKOFMONTH, S.DAYOFWEEK, S.MAKE, ..., S.BASEPOLICY) AS F of char) AS C) ]]></query> </rules> </processor> </wlevs:config> Finally, the last stage in the EPN prints out the probability of the event being an anomaly. One can also define a threshold in CQL to filter out events that are normal, i.e., below a certain mark as defined by the analyst or designer. Sample Runs: Now let's see how this behaves when events are streamed through CEP. We use only two events for brevity, one normal and other one not. This is one of the "normal" looking events and the probability of it being anomalous is less than 60%. Event is: eventType=DataMiningOutEvent object=q1 time=2904821976256 S.CQLMONTH=Dec, S.WEEKOFMONTH=5, S.DAYOFWEEK=Wednesday, S.MAKE=Honda, S.ACCIDENTAREA=Urban, S.DAYOFWEEKCLAIMED=Tuesday, S.MONTHCLAIMED=Jan, S.WEEKOFMONTHCLAIMED=1, S.SEX=Female, S.MARITALSTATUS=Single, S.AGE=21, S.FAULT=Policy Holder, S.POLICYTYPE=Sport - Liability, S.VEHICLECATEGORY=Sport, S.VEHICLEPRICE=more than 69000, S.FRAUDFOUND=0, S.POLICYNUMBER=1, S.REPNUMBER=12, S.DEDUCTIBLE=300, S.DRIVERRATING=1, S.DAYSPOLICYACCIDENT=more than 30, S.DAYSPOLICYCLAIM=more than 30, S.PASTNUMOFCLAIMS=none, S.AGEOFVEHICLES=3 years, S.AGEOFPOLICYHOLDER=26 to 30, S.POLICEREPORTFILED=No, S.WITNESSPRESENT=No, S.AGENTTYPE=External, S.NUMOFSUPP=none, S.ADDRCHGCLAIM=1 year, S.NUMOFCARS=3 to 4, S.CQLYEAR=1994, S.BASEPOLICY=Liability, probability=.58931702982118561 isTotalOrderGuarantee=true\nAnamoly probability: .58931702982118561 However, the following event is scored as an anomaly with a very high probability of 89%. So there is likely to be something wrong with it. A close look reveals that the value of "deductible" field (10000) is not "normal". What exactly constitutes normal here?. If you run the query on the database to find ALL distinct values for the "deductible" field, it returns the following set: {300, 400, 500, 700} Event is: eventType=DataMiningOutEvent object=q1 time=2598483773496 S.CQLMONTH=Dec, S.WEEKOFMONTH=5, S.DAYOFWEEK=Wednesday, S.MAKE=Honda, S.ACCIDENTAREA=Urban, S.DAYOFWEEKCLAIMED=Tuesday, S.MONTHCLAIMED=Jan, S.WEEKOFMONTHCLAIMED=1, S.SEX=Female, S.MARITALSTATUS=Single, S.AGE=21, S.FAULT=Policy Holder, S.POLICYTYPE=Sport - Liability, S.VEHICLECATEGORY=Sport, S.VEHICLEPRICE=more than 69000, S.FRAUDFOUND=0, S.POLICYNUMBER=1, S.REPNUMBER=12, S.DEDUCTIBLE=10000, S.DRIVERRATING=1, S.DAYSPOLICYACCIDENT=more than 30, S.DAYSPOLICYCLAIM=more than 30, S.PASTNUMOFCLAIMS=none, S.AGEOFVEHICLES=3 years, S.AGEOFPOLICYHOLDER=26 to 30, S.POLICEREPORTFILED=No, S.WITNESSPRESENT=No, S.AGENTTYPE=External, S.NUMOFSUPP=none, S.ADDRCHGCLAIM=1 year, S.NUMOFCARS=3 to 4, S.CQLYEAR=1994, S.BASEPOLICY=Liability, probability=.89171554529576691 isTotalOrderGuarantee=true\nAnamoly probability: .89171554529576691 Conclusion: By way of this example, we show: real-time scoring of events as they flow through CEP leveraging Oracle Data Mining.how CEP applications can invoke complex arbitrary external computations (function shipping) in an RDBMS.

Read the article

WhatsApp & Tasker for Android – Read & Write messages

- by Shaurya Anand

So, I finally gave up on all my previous the Microsoft Mobile/Phone OS devices and made my switch to Android this year. I am using my Samsung Galaxy Note GT-N7000 with CyanogenMod 9.1.0 (http://get.cm/get/jenkins/7086/cm-9.1.0-n7000.zip) and ClockworkMod 6.0.1.2 (http://download2.clockworkmod.com/recoveries/recovery-clockwork-6.0.1.2-n7000.zip) since August this year and I am so happy with the performance and the flexibility it offers me. As a software developer by profession, I would expect most of my gadget to be highly customizable and programmable (one time or at intervals) to suit my needs as close as it can. I was introduced to Automation for Android – Tasker (https://play.google.com/store/apps/details?id=net.dinglisch.android.taskerm&hl=en) via reddit (http://www.reddit.com/r/tasker) and the word ‘automation’ was enough for me to dive right into this app. Only automation that I did earlier was switching profiles depending on location on there phones. And now, just imagine a complete set of possibilities that can be automate on the phone or via the phone. I did my research and found a couple of other tools that do the same/as close as what Tasker can do and few of them are even free. There’s one even by Microsoft called on{X} (https://play.google.com/store/apps/details?id=com.microsoft.onx.app&hl=en). Microsoft’s on{X} really caught my eye. You can write code for your phone on the web application by them, deploy it on your phone and even trace the flow all using your PC. Really brilliant, I love the fact that it’s all JavaScript. Here comes the but, it is still very very young and it’s policy of accessing my News Feed on Facebook is not something that I can not digest. On{X} is good, but as I said earlier, the API is not very mature and hence, I gave up on it. I bought Tasker, the best 5,00 € I spent in ages and I want to talk about it in this post. I am still a “noob” while operating this tool, but I tried my shot at automating WhatsApp (https://play.google.com/store/apps/details?id=com.whatsapp&hl=en), a popular messenger for various platform. The requirement for the automation is that, if I send a WhatsApp ‘wru’ message to the phone, it should respond back giving the location and battery level of my phone. It could be useful, if you like to locate your misplaced phone or automatically reply to your partner/friend, honestly, I don’t know what you will use it - through this post, I am just introducing automating WhatsApp using Tasker. Before we begin, the following script only works when your phone is rooted as we will be accessing the WhatsApp database and type some special characters like ‘:’. Let’s follow the code line by line: Profile: Location request from XYZ. (12) // Name of your profile. Event: Notification [ Owner Application:WhatsApp Title:* ] // When a new notification comes from WhatsApp, this event is fired. Read the end note, if you face problems with Chrome app after enabling Tasker accessibility. Enter: A1: Run Shell [ Command:sqlite3 // We will access the WhatsApp database and check if the message comes from designated phone number or not. We mustn’t reply to every message. /data/data/com.whatsapp/databases/msgstore.db "SELECT _id, data FROM messages WHERE key_from_me='0' AND key_remote_jid LIKE '%XXXXXXXXXXX%' // Replace XXXXXXXXXXX with the phone number of your message sender. ORDER BY _id DESC LIMIT 1;" Timeout (Seconds):10 Use Root:On Store // I made a timeout for 10 seconds, if in case WhatsApp is busy accessing the database. Result In:%WHATSAPP_CURRREQ ] // Store the read Id and the last message on to the variable %WHATSAPP_CURRREQ A2: If [ %WHATSAPP_CURRREQ ~R .*[wW][rR][uU].* ] // Check if the pattern of the message is correct and we are all set to send the location. A3: If [ %WHATSAPP_CURRREQ !~ %WHATSAPP_LASTREQ ] // Verify that the message is different from the last request. Remember every message has a unique Id. A4: Notify [ Title:WhatsApp location request... Text:Sending location // Just a notification that the location message is being prepared. to Krati Gupta... Icon:<icon> Number:0 Permanent:On Priority:3 ] // Make a note it is a permanent notification, we will clear it later. A5: Secure Settings [ Configuration:Pattern Lock Disabled // I am disabling the pattern lock, that I use using the plugin Secure Settings. Package:com.intangibleobject.securesettings.plugin Name:Secure // You can download the plugin from here: https://play.google.com/store/apps/details?id=com.intangibleobject.securesettings.plugin&hl=en Settings ] A6: Secure Settings [ Configuration:Keyguard Disabled // Disable the keygaurd, it is useful, when your phone is on lock and you want to automate everything, even the typing. Package:com.intangibleobject.securesettings.plugin Name:Secure Settings ] A7: Secure Settings [ Configuration:GPS Enabled // Pretty clear, turn on the GPS and get location at A8 Package:com.intangibleobject.securesettings.plugin Name:Secure Settings ] A8: AutoShortcut [ Configuration:WhatsApp: Some One // I am using AutoShortcut plugin (https://play.google.com/store/apps/details?id=com.joaomgcd.autoshortcut) to start WhatsApp with the indented recipient. Package:com.joaomgcd.autoshortcut Name:AutoShortcut ] // Replace Some One, actually choose it from the plugin, the right recipient. A9: Get Location [ Source:Any Timeout (Seconds):30 Continue Task // I am getting the location, timeout is 30 seconds, adjust it accordingly. Immediately:Off Keep Tracking:Off ] A10: Secure Settings [ Configuration:Screen Dim // Now, this extension of the plugin Secure Settings, wakes your device so that you can type out the string on the WhatsApp app. 5 Seconds Package:com.intangibleobject.securesettings.plugin Name:Secure Settings ] A11: Run Shell [ Command:input text // Now, I am using the shell script to type the text to the window, because the ‘:’ while not be typed from the Type task in Tasker. LOCATION:maps.google.com/maps?q=%LOC Timeout (Seconds):0 Use Root:On // And also, this is way faster, but remember you need root for this, not for the other way of typing. Store Result In: ] A12: Dpad [ Button:Right Repeat Times:1 ] // Focus the Send button A13: Dpad [ Button:Press Repeat Times:1 ] // And press it. A14: Dpad [ Button:Left Repeat Times:1 ] // Get back to the typing box. A15: Run Shell [ Command:input text LOCATION_ACCURACY:%LOCACC Timeout (Seconds):0 Use Root:On Store Result In: ] A16: Dpad [ Button:Right Repeat Times:1 ] A17: Dpad [ Button:Press Repeat Times:1 ] A18: Dpad [ Button:Left Repeat Times:1 ] A19: Run Shell [ Command:input text BATTERY_LEVEL:%BATT% Timeout // I am adding Battery level in my case as well. (Seconds):0 Use Root:On Store Result In: ] A20: Dpad [ Button:Right Repeat Times:1 ] A21: Dpad [ Button:Press Repeat Times:1 ] A22: Variable Set [ Name:%WHATSAPP_LASTREQ To:%WHATSAPP_CURRREQ Do // And now, we say, request is done. Maths:Off Append:Off ] A23: Button [ Button:Back ] // I am exiting the WhatsApp nicely and not killing it. If you are the murderer kind, kill it, just know, you don’t have any place in the heaven. A24: Button [ Button:Back ] A25: Notify Cancel [ Title: Warn Not Exist:Off ] // Remove the permanent notification. A26: Notify [ Title:WhatsApp location request Text:Location sent // Make a temporary notification, and say, location is sent. successfully. Icon:<icon> Number:0 Permanent:Off Priority:3 ] A27: Secure Settings [ Configuration:GPS Disabled // Disable all the horrible things we turned on earlier. Package:com.intangibleobject.securesettings.plugin Name:Secure Settings ] A28: Secure Settings [ Configuration:Pattern Lock Enabled Package:com.intangibleobject.securesettings.plugin Name:Secure Settings ] A29: Secure Settings [ Configuration:Keyguard Enabled Package:com.intangibleobject.securesettings.plugin Name:Secure Settings ] A30: End If A31: End If Download this Task from here: http://db.tt/9vRmbhyb That’s it in the above small example – you can read/write messages from/to WhatsApp app. I am using n7000-cm9.1-cwr6. Oh yea, and if you are having the Talkback auto enabled for Chrome browser, you need to turn Off the Web scripts to run. Tasker is amazing, I have automated a lot of tasks using this tool. I will share a few none generic ones with you in my coming post here.

Search Results

Search found 24201 results on 969 pages for 'andrew case'.

Page 401/969 | < Previous Page | 397 398 399 400 401 402 403 404 405 406 407 408 | Next Page >

- by Pinal Dave

- by Mysticgeek

- by shawn

- by ScottGu

- by Stefan Krantz

- by jorg

- by Rob Farley

- by hajan

- by Charles Young

- by Paul White

- by BuckWoody

- by James Michael Hare

- by Ferhat Hatay

- by Ali Bahrami

- by extended_events

- by Bertrand Le Roy

- by BuckWoody

- by Rob Farley

- by Rob Farley

- by Mat Keep

- by BuckWoody

- by Bertrand Le Roy

- by igor

- by vikram.shukla(at)oracle.com

- by Shaurya Anand

< Previous Page | 397 398 399 400 401 402 403 404 405 406 407 408 | Next Page >