Search Results

Search found 6441 results on 258 pages for 'schema compare'.

Page 240/258 | < Previous Page | 236 237 238 239 240 241 242 243 244 245 246 247 | Next Page >

CodePlex Daily Summary for Monday, June 14, 2010

CodePlex Daily Summary for Monday, June 14, 2010New ProjectsBD File Hash: BD File Hash is a convenient file hash and hash compare tool for Windows which currently works with MD5, SHA-1, and SHA-256 algorithms. FileScan: This is an application that searches through a drive or directory structure for files matching a filter. This project was converted from VB to ...genesis9: genesis9HeinanOS: HeinanOS is an operating system developed mainly in C++. HeinanOS is a light OS (1.44 MB image) with a lot of capabilites and many more are being ...MediaBrowserWS - Creates a Web Service for the popular MediaBrowser plugin: Creates a web service in Media Center for accessing your MediaBrowser collection. Allows for external devices (Tablets/phones/laptops) to access a ...MME: New Edition of Managed Menu Extensions for Visual Studio 2010 The Main goal of "MME" is to provide easy access to adding Right Click menus in the ...MVMMapper: Generate the ViewModel and its mapping to the Model when implementing MVVM in .NET. Developed using T4 templates. Current version supports Silver...ProjectArDotNet: Si te agarro te parto! Si te agarro te emperno no me importa que seas menor de edad!Scriptagility for DotNetNuke: Scriptagility is a DotNetNuke module for Javascript developers. This module provides dynamic client scripting infrastructure for developing javascr...simpleLinux Distro: SimpleLinux. is a Linux distributions that is easy to use. Simple Linux website: http://simplelinux.tkTag Cloud Control for asp.net: Tag Cloud Control for asp.net allows the user to display the most important keywords to display in tag cloud. Each Tag has it own navigation url to...thefreeimdb: fsadie qwUppityUp: UppityUp is a simple and light-weight tray application which monitors a remote server and shows a notification when it comes online. This is usefu...Vivid3D 2 - DirectX 10 3D ToolKit: The sequel to my first ever engine wrote several years ago. It is not based on it in anyway. VSIDev: VSI DevXTQXK_WORK: Actionscript 3.0东坡博客: 这是一个ASP。net mvc 2博客。New Releases.NET Extensions - Extension Methods Library: Release 2010.08: Added extension methods for Bitmap manipulation (scaling for now): - Bitmap.ScaleToSize() - Bitmap.ScaleToSizeProportional() - Bitmap.ScaleProport...Black Falcon Software's Database Data-Access-Layers: “SQLHELPER”, “ORAHELPER” - Handling Binary Data: See attached document...BTech Networking Library: BTech Networking Library: Same as pervious just new namespace, extended networking coming soon!!!Community Forums NNTP bridge: Community Forums NNTP Bridge V37: Release of the Community Forums NNTP Bridge to access the social and anwsers MS forums with a single, open source NNTP bridge. This release has ad...Generic Entity Model 2: GEM2 build 54383: This is second BETA release of GEM2! Please see source code change sets for updates! Following implementation is not included in this release: My...Hades: Projet Hadès - Official Demo - Version 0.1.0 Beta: ---------------------------------------------------------------------------- - Projet Hadès - Official Demo - Version 0.1.0 Beta ------------------...HeinanOS: HeinanOS M1 Source Code: You can download HeinanOS M1 Source Code and contribute to HeinanOS development! Be aware that you should not use this code for your own systems! ...HeinanOS: Milestone 1: This is the first major release for HeinanOS 1.0 Please note this is a PRE-RELEASE! This release includes the following features: -Bootable DOS-...HKGolden Express: HKGoldenExpress (Build 201006131900): New features: (None) Bug fix: Incorrect message submit date of message/ replies. (Note: Showing message submit date is enabled since Build 20100...HKGolden Express: HKGoldenExpress (Build 201006140110): New features: (None) Bug fix: (None) Improvements: (None) Other changes: Set time zone of message date as Hong Kong. Adjusted the format of messa...MediaCoder.NET: MediaCoder.NET v1.0 Beta 1.5: Installer file for MediaCoder.NET v1.0 beta 1.5. Now converts multiple files.MME: First release: Features of this release 1. One installer MME.msi. However you can also install MMEMenuManagerSetup.vsix which installs a project template that e...MSBuild Launch Pad (mPad): 1.1 Beta 1: Platform selection box is added.MVMMapper: MVMMapper Release v 1.0.1: This release has no downloadable documentation. Please use the Documentation section to get started.NginxTray: NginxTray 0.7 RC2: NginxTray 0.7 RC2PowerAuras: PowerAuras-3.0.0K-beta3: New Auras: Item Name Equipment Slot Tracking Changes from beta1 5 new aura textures Fixed Tracking bug Added graphical equipment slot sele...PowerAuras: PowerAuras-3.0.0K-beta4: New Auras: Item Name Equipment Slot Tracking Changes from beta1 5 new aura textures Fixed Tracking bug Added graphical equipment slot sele...Scriptagility for DotNetNuke: Scriptagility 1.0 (Beta): Initial public release please evaluate and feedbackSharpDevelop: SharpDevelop 4.0 Beta 1: Release notes: http://community.sharpdevelop.net/forums/t/11388.aspxsimpleLinux Distro: Project X3: This is an example of download for simpleLinuxSOAPI - StackOverflow API Parser/Wrapper Generator: SOAPI Beta 3: The SOAPI Beta 3 download will be made availabe later today when the initial documentation is complete. The previously available Beta 1 download h...Sofa: Initial release V1.0: This is the first release of Sofa. As it is made of code being previously used, as we tested it is a stable release. But bugs are always possible,...Tag Cloud Control for asp.net: Tag Cloud Control for asp.net: Tag Cloud Control for asp.net allows the user to display the most important keywords to display in tag cloud. Each Tag has it own navigation url to...UppityUp: UppityUp v0.1: First functional version, supports monitoring availability by ping (ICMP) requests. Fit for general use. Consists of one standalone .exe file - no...VCC: Latest build, v2.1.30613.0: Automatic drop of latest buildWindStyle ExifInfo for Windows Live Writer: 1.1.0.0: Add: Multiple Language(English and Simplified Chinese); Add: Insert multiple files; Fix: Error when insert pictures without Exif info; Update: Icon...Work Recorder - Hold on own time!: WorkRecorder 1.2: +Add a whole day chartXsltDb - DotNetNuke Module Builder: 01.01.24: Syntax highlighting delivered!New samples for RadControls. On single page you can find RadTreeView, RadRating, RadChart, RadFormDecorator, RadEdito...xUnit.net Contrib: xunitcontrib 0.4 (ReSharper 5.0 RTM + dotCover): xunitcontrib release 0.4 (ReSharper runner) This release provides a test runner plugin for Resharper 5.0, 4.5 and 4.1, targetting all versions of x...Most Popular ProjectsCommunity Forums NNTP bridgeRIA Services EssentialsNeatUploadBxf (Basic XAML Framework)Agile Personal Development Methodology.NET Transactional File ManagerSOLID by exampleASP.NET MVC Time PlannerWEI ShareSiverlight ProjectMost Active ProjectsjQuery Library for SharePoint Web Servicespatterns & practices – Enterprise LibraryNB_Store - Free DotNetNuke Ecommerce Catalog ModuleRhyduino - Arduino and Managed CodeCommunity Forums NNTP bridgeCassandraemonBlogEngine.NETLightweight Fluent WorkflowMediaCoder.NETAndrew's XNA Helpers

Read the article
Interview with Lenz Grimmer about MySQL Connect

- by Keith Larson

Keith Larson: Thank you for allowing me to do this interview with you. I have been talking with a few different Oracle ACEs about the MySQL Connect Conference. I figured the MySQL community might be missing you as well. You have been very busy with Oracle Linux but I know you still have an eye on the MySQL Community. How have things been?Lenz Grimmer: Thanks for including me in this series of interviews, I feel honored! I've read the other interviews, and really liked them. I still try to follow what's going on over in the MySQL community and it's good to see that many of the familiar faces are still around. Over the course of the 9 years that I was involved with MySQL, many colleagues and contacts turned into good friends and we still maintain close relationships.It's been almost 1.5 years ago that I moved into my new role here in the Linux team at Oracle, and I really enjoy working on a Linux distribution again (I worked for SUSE before I joined MySQL AB in 2002). I'm still learning a lot - Linux in the data center has greatly evolved in so many ways and there are a lot of new and exciting technologies to explore. Keith Larson: What were your thoughts when you heard that Oracle was going to deliver the MySQL Connect conference to the MySQL Community?Lenz Grimmer: I think it's testament to the fact that Oracle deeply cares about MySQL, despite what many skeptics may say. What started as "MySQL Sunday" two years ago has now evolved into a full-blown sub-conference, with 80 sessions at one of the largest corporate IT events in the world. I find this quite telling, not many products at Oracle enjoy this level of exposure! So it certainly makes me feel proud to see how far MySQL has come. Keith Larson: Have you had a chance to look over the sessions? What are your thoughts on them?Lenz Grimmer: I did indeed look at the final schedule.The content committee did a great job with selecting these sessions. I'm glad to see that the content selection was influenced by involving well-known and respected members of the MySQL community. The sessions cover a broad range of topics and technologies, both covering established topics as well as recent developments. Keith Larson: When you get a chance, what sessions do you plan on attending?Lenz Grimmer: I will actually be manning the Oracle booth in the exhibition area on one of these days, so I'm not sure if I'll have a lot of time attending sessions. But if I do, I'd love to see the keynotes and catch some of the sessions that talk about recent developments and new features in MySQL, High Availability and Clustering . Quite a lot has happened and it's hard to keep up with this constant flow of new MySQL releases.In particular, the following sessions caught my attention: MySQL Connect Keynote: The State of the Dolphin Evaluating MySQL High-Availability Alternatives CERN’s MySQL “as a Service” Deployment with Oracle VM: Empowering Users MySQL 5.6 Replication: Taking Scalability and High Availability to the Next Level What’s New in MySQL Server 5.6? MySQL Security: Past and Present MySQL at Twitter: Development and Deployment MySQL Community BOF MySQL Connect Keynote: MySQL Perspectives Keith Larson: So I will ask you just like I have asked the others I have interviewed, any tips that you would give to people for handling the long hours at conferences?Lenz Grimmer: Wear comfortable shoes and make sure to drink a lot! Also prepare a plan of the sessions you would like to attend beforehand and familiarize yourself with the venue, so you can get to the next talk in time without scrambling to find the location. The good thing about piggybacking on such a large conference like Oracle OpenWorld is that you benefit from the whole infrastructure. For example, there is a nice schedule builder that helps you to keep track of your sessions of interest. Other than that, bring enough business cards and talk to people, build up your network among your peers and other MySQL professionals! Keith Larson: What features of the MySQL 5.6 release do you look forward to the most ?Lenz Grimmer: There has been solid progress in so many areas like the InnoDB Storage Engine, the Optimizer, Replication or Performance Schema, it's hard for me to really highlight anything in particular. All in all, MySQL 5.6 sounds like a very promising release. I'm confident it will follow the tradition that Oracle already established with MySQL 5.5, which received a lot of praise even from very critical members of the MySQL community. If I had to name a single feature, I'm particularly and personally happy that the precise GIS functions have finally made it into a GA release - that was long overdue. Keith Larson: In your opinion what is the best reason for someone to attend this event?Lenz Grimmer: This conference is an excellent opportunity to get in touch with the key people in the MySQL community and ecosystem and to get facts and information from the domain experts and developers that work on MySQL. The broad range of topics should attract people from a variety of roles and relations to MySQL, beginning with Developers and DBAs, to CIOs considering MySQL as a viable solution for their requirements. Keith Larson: You will be attending MySQL Connect and have some Oracle Linux Demos, do you see a growing demand for MySQL on Oracle Linux ?Lenz Grimmer: Yes! Oracle Linux is our recommended Linux distribution and we have a good relationship to the MySQL engineering group. They use Oracle Linux as a base Linux platform for development and QA, so we make sure that MySQL and Oracle Linux are well tested together. Setting up a MySQL server on Oracle Linux can be done very quickly, and many customers recognize the benefits of using them both in combination.Because Oracle Linux is available for free (including free bug fixes and errata), it's an ideal choice for running MySQL in your data center. You can run the same Linux distribution on both your development/staging systems as well as on the production machines, you decide which of these should be covered by a support subscription and at which level of support. This gives you flexibility and provides some really attractive cost-saving opportunities. Keith Larson: Since I am a Linux user and fan, what is on the horizon for Oracle Linux?Lenz Grimmer: We're working hard on broadening the ecosystem around Oracle Linux, building up partnerships with ISVs and IHVs to certify Oracle Linux as a fully supported platform for their products. We also continue to collaborate closely with the Linux kernel community on various projects, to make sure that Linux scales and performs well on large systems and meets the demands of today's data centers. These improvements and enhancements will then rolled into the Unbreakable Enterprise Kernel, which is the key ingredient that sets Oracle Linux apart from other distributions. We also have a number of ongoing projects which are making good progress, and I'm sure you'll hear more about this at the upcoming OpenWorld conference :) Keith Larson: What is something that more people should be aware of when it comes to Oracle Linux and MySQL ?Lenz Grimmer: Many people assume that Oracle Linux is just tuned for Oracle products, such as the Oracle Database or our Engineered Systems. While it's of course true that we do a lot of testing and optimization for these workloads, Oracle Linux is and will remain a general-purpose Linux distribution that is a very good foundation for setting up a LAMP-Stack, for example. We also provide MySQL RPM packages for Oracle Linux, so you can easily stay up to date if you need something newer than what's included in the stock distribution.One more thing that is really unique to Oracle Linux is Ksplice, which allows you to apply security patches to the running Linux kernel, without having to reboot. This ensures that your MySQL database server keeps up and running and is not affected by any downtime. Keith Larson: What else would you like to add ?Lenz Grimmer: Thanks again for getting in touch with me, I appreciated the opportunity. I'm looking forward to MySQL Connect and Oracle OpenWorld and to meet you and many other people from the MySQL community that I haven't seen for quite some time! Keith Larson: Thank you Lenz!

Read the article
SQL SERVER – Introduction to SQL Server 2014 In-Memory OLTP

- by Pinal Dave

In SQL Server 2014 Microsoft has introduced a new database engine component called In-Memory OLTP aka project “Hekaton” which is fully integrated into the SQL Server Database Engine. It is optimized for OLTP workloads accessing memory resident data. In-memory OLTP helps us create memory optimized tables which in turn offer significant performance improvement for our typical OLTP workload. The main objective of memory optimized table is to ensure that highly transactional tables could live in memory and remain in memory forever without even losing out a single record. The most significant part is that it still supports majority of our Transact-SQL statement. Transact-SQL stored procedures can be compiled to machine code for further performance improvements on memory-optimized tables. This engine is designed to ensure higher concurrency and minimal blocking. In-Memory OLTP alleviates the issue of locking, using a new type of multi-version optimistic concurrency control. It also substantially reduces waiting for log writes by generating far less log data and needing fewer log writes. Points to remember Memory-optimized tables refer to tables using the new data structures and key words added as part of In-Memory OLTP. Disk-based tables refer to your normal tables which we used to create in SQL Server since its inception. These tables use a fixed size 8 KB pages that need to be read from and written to disk as a unit. Natively compiled stored procedures refer to an object Type which is new and is supported by in-memory OLTP engine which convert it into machine code, which can further improve the data access performance for memory –optimized tables. Natively compiled stored procedures can only reference memory-optimized tables, they can’t be used to reference any disk –based table. Interpreted Transact-SQL stored procedures, which is what SQL Server has always used. Cross-container transactions refer to transactions that reference both memory-optimized tables and disk-based tables. Interop refers to interpreted Transact-SQL that references memory-optimized tables. Using In-Memory OLTP In-Memory OLTP engine has been available as part of SQL Server 2014 since June 2013 CTPs. Installation of In-Memory OLTP is part of the SQL Server setup application. The In-Memory OLTP components can only be installed with a 64-bit edition of SQL Server 2014 hence they are not available with 32-bit editions. Creating Databases Any database that will store memory-optimized tables must have a MEMORY_OPTIMIZED_DATA filegroup. This filegroup is specifically designed to store the checkpoint files needed by SQL Server to recover the memory-optimized tables, and although the syntax for creating the filegroup is almost the same as for creating a regular filestream filegroup, it must also specify the option CONTAINS MEMORY_OPTIMIZED_DATA. Here is an example of a CREATE DATABASE statement for a database that can support memory-optimized tables: CREATE DATABASE InMemoryDB ON PRIMARY(NAME = [InMemoryDB_data], FILENAME = 'D:\data\InMemoryDB_data.mdf', size=500MB), FILEGROUP [SampleDB_mod_fg] CONTAINS MEMORY_OPTIMIZED_DATA (NAME = [InMemoryDB_mod_dir], FILENAME = 'S:\data\InMemoryDB_mod_dir'), (NAME = [InMemoryDB_mod_dir], FILENAME = 'R:\data\InMemoryDB_mod_dir') LOG ON (name = [SampleDB_log], Filename='L:\log\InMemoryDB_log.ldf', size=500MB) COLLATE Latin1_General_100_BIN2; Above example code creates files on three different drives (D: S: and R:) for the data files and in memory storage so if you would like to run this code kindly change the drive and folder locations as per your convenience. Also notice that binary collation was specified as Windows (non-SQL). BIN2 collation is the only collation support at this point for any indexes on memory optimized tables. It is also possible to add a MEMORY_OPTIMIZED_DATA file group to an existing database, use the below command to achieve the same. ALTER DATABASE AdventureWorks2012 ADD FILEGROUP hekaton_mod CONTAINS MEMORY_OPTIMIZED_DATA; GO ALTER DATABASE AdventureWorks2012 ADD FILE (NAME='hekaton_mod', FILENAME='S:\data\hekaton_mod') TO FILEGROUP hekaton_mod; GO Creating Tables There is no major syntactical difference between creating a disk based table or a memory –optimized table but yes there are a few restrictions and a few new essential extensions. Essentially any memory-optimized table should use the MEMORY_OPTIMIZED = ON clause as shown in the Create Table query example. DURABILITY clause (SCHEMA_AND_DATA or SCHEMA_ONLY) Memory-optimized table should always be defined with a DURABILITY value which can be either SCHEMA_AND_DATA or SCHEMA_ONLY the former being the default. A memory-optimized table defined with DURABILITY=SCHEMA_ONLY will not persist the data to disk which means the data durability is compromised whereas DURABILITY= SCHEMA_AND_DATA ensures that data is also persisted along with the schema. Indexing Memory Optimized Table A memory-optimized table must always have an index for all tables created with DURABILITY= SCHEMA_AND_DATA and this can be achieved by declaring a PRIMARY KEY Constraint at the time of creating a table. The following example shows a PRIMARY KEY index created as a HASH index, for which a bucket count must also be specified. CREATE TABLE Mem_Table ( [Name] VARCHAR(32) NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 100000), [City] VARCHAR(32) NULL, [State_Province] VARCHAR(32) NULL, [LastModified] DATETIME NOT NULL, ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA); Now as you can see in the above query example we have used the clause MEMORY_OPTIMIZED = ON to make sure that it is considered as a memory optimized table and not just a normal table and also used the DURABILITY Clause= SCHEMA_AND_DATA which means it will persist data along with metadata and also you can notice this table has a PRIMARY KEY mentioned upfront which is also a mandatory clause for memory-optimized tables. We will talk more about HASH Indexes and BUCKET_COUNT in later articles on this topic which will be focusing more on Row and Index storage on Memory-Optimized tables. So stay tuned for that as well. Now as we covered the basics of Memory Optimized tables and understood the key things to remember while using memory optimized tables, let’s explore more using examples to understand the Performance gains using memory-optimized tables. I will be using the database which i created earlier in this article i.e. InMemoryDB in the below Demo Exercise. USE InMemoryDB GO -- Creating a disk based table CREATE TABLE dbo.Disktable ( Id INT IDENTITY, Name CHAR(40) ) GO CREATE NONCLUSTERED INDEX IX_ID ON dbo.Disktable (Id) GO -- Creating a memory optimized table with similar structure and DURABILITY = SCHEMA_AND_DATA CREATE TABLE dbo.Memorytable_durable ( Id INT NOT NULL PRIMARY KEY NONCLUSTERED Hash WITH (bucket_count =1000000), Name CHAR(40) ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA) GO -- Creating an another memory optimized table with similar structure but DURABILITY = SCHEMA_Only CREATE TABLE dbo.Memorytable_nondurable ( Id INT NOT NULL PRIMARY KEY NONCLUSTERED Hash WITH (bucket_count =1000000), Name CHAR(40) ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_only) GO -- Now insert 100000 records in dbo.Disktable and observe the Time Taken DECLARE @i_t bigint SET @i_t =1 WHILE @i_t<= 100000 BEGIN INSERT INTO dbo.Disktable(Name) VALUES('sachin' + CONVERT(VARCHAR,@i_t)) SET @i_t+=1 END -- Do the same inserts for Memory table dbo.Memorytable_durable and observe the Time Taken DECLARE @i_t bigint SET @i_t =1 WHILE @i_t<= 100000 BEGIN INSERT INTO dbo.Memorytable_durable VALUES(@i_t, 'sachin' + CONVERT(VARCHAR,@i_t)) SET @i_t+=1 END -- Now finally do the same inserts for Memory table dbo.Memorytable_nondurable and observe the Time Taken DECLARE @i_t bigint SET @i_t =1 WHILE @i_t<= 100000 BEGIN INSERT INTO dbo.Memorytable_nondurable VALUES(@i_t, 'sachin' + CONVERT(VARCHAR,@i_t)) SET @i_t+=1 END The above 3 Inserts took 1.20 minutes, 54 secs, and 2 secs respectively to insert 100000 records on my machine with 8 Gb RAM. This proves the point that memory-optimized tables can definitely help businesses achieve better performance for their highly transactional business table and memory- optimized tables with Durability SCHEMA_ONLY is even faster as it does not bother persisting its data to disk which makes it supremely fast. Koenig Solutions is one of the few organizations which offer IT training on SQL Server 2014 and all its updates. Now, I leave the decision on using memory_Optimized tables on you, I hope you like this article and it helped you understand the fundamentals of IN-Memory OLTP . Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: Koenig

Read the article
Integrating Flickr with ASP.Net application

- by sreejukg

Flickr is the popular photo management and sharing application offered by yahoo. The services from flicker allow you to store and share photos and videos online. Flicker offers strong API support for almost all services they provide. Using this API, developers can integrate photos to their public website. Since 2005, developers have collaborated on top of Flickr's APIs to build fun, creative, and gorgeous experiences around photos that extend beyond Flickr. In this article I am going to demonstrate how easily you can bring the photos stored on flicker to your website. Let me explain the scenario this article is trying to address. I have a flicker account where I upload photos and share in many ways offered by Flickr. Now I have a public website, instead of re-upload the photos again to public website, I want to show this from Flickr. Also I need complete control over what photo to display. So I went and referred the Flickr documentation and there is API support ready to address my scenario (and more… ). FlickerAPI for ASP.Net To Integrate Flicker with ASP.Net applications, there is a library available in CodePlex. You can find it here http://flickrnet.codeplex.com/ Visit the URL and download the latest version. The download includes a Zip file, when you unzip you will get a number of dlls. Since I am going to use ASP.Net application, I need FlickrNet.dll. See the screenshot of all the dlls, and there is a help file available in the download (.chm) for your reference. Once you have the dll, you need to use Flickr API from your website. I assume you have a flicker account and you are familiar with Flicker services. Arrange your photos using Sets in Flickr In flicker, you can define sets and add your uploaded photos to sets. You can compare set to photo album. A set is a logical collection of photos, which is an excellent option for you to categorize your photos. Typically you will have a number of sets each set having few photos. You can write application that brings photos from sets to your website. For the purpose of this article I already created a set Flickr and added some photos to it. Once you logged in to Flickr, you can see the Sets under the Menu. In the Sets page, you will see all the sets you have created. As you notice, you can see certain sample images I have uploaded just to test the functionality. Though I wish I couldn’t create good photos so please bear with me. I have created 2 photo sets named Blue Album and Red Album. Click on the image for the set, will take you to the corresponding set page. In the set “Red Album” there are 4 photos and the set has a unique ID (highlighted in the URL). You can simply retrieve the photos with the set id from your application. In this article I am going to retrieve the images from Red album in my ASP.Net page. For that First I need to setup FlickrAPI for my usage. Configure Flickr API Key As I mentioned, we are going to use Flickr API to retrieve the photos stored in Flickr. In order to get access to Flickr API, you need an API key. To create an API key, navigate to the URL http://www.flickr.com/services/apps/create/ Click on Request an API key link, now you need to tell Flickr whether your application in commercial or non-commercial. I have selected a non-commercial key. Now you need to enter certain information about your application. Once you enter the details, Click on the submit button. Now Flickr will create the API key for your application. Generating non-commercial API key is very easy, in couple of steps the key will be generated and you can use the key in your application immediately. ASP.Net application for retrieving photos Now we need write an ASP.Net application that display pictures from Flickr. Create an empty web application (I named this as FlickerIntegration) and add a reference to FlickerNet.dll. Add a web form page to the application where you will retrieve and display photos(I have named this as Gallery.aspx). After doing all these, the solution explorer will look similar to following. I have used the below code in the Gallery.aspx page. The output for the above code is as follows. I am going to explain the code line by line here. First it is adding a reference to the FlickrNet namespace. using FlickrNet; Then create a Flickr object by using your API key. Flickr f = new Flickr("<yourAPIKey>"); Now when you retrieve photos, you can decide what all fields you need to retrieve from Flickr. Every photo in Flickr contains lots of information. Retrieving all will affect the performance. For the demonstration purpose, I have retrieved all the available fields as follows. PhotoSearchExtras.All But if you want to specify the fields you can use logical OR operator(|). For e.g. the following statement will retrieve owner name and date taken. PhotoSearchExtras extraInfo = PhotoSearchExtras.OwnerName | PhotoSearchExtras.DateTaken; Then retrieve all the photos from a photo set using PhotoSetsGetPhotos method. I have passed the PhotoSearchExtras object created earlier. PhotosetPhotoCollection photos = f.PhotosetsGetPhotos("72157629872940852", extraInfo); The PhotoSetsGetPhotos method will return a collection of Photo objects. You can just navigate through the collection using a foreach statement. foreach (Photo p in photos) { //access each photo properties } Photo class have lot of properties that map with the properties from Flickr. The chm documentation comes along with the CodePlex download is a great asset for you to understand the fields. In the above code I just used the following p.LargeUrl – retrieves the large image url for the photo. p.ThumbnailUrl – retrieves the thumbnail url for the photo p.Title – retrieves the Title of the photo p.DateUploaded – retrieves the date of upload Visual Studio intellisense will give you all properties, so it is easy, you can just try with Visual Studio intellisense to find the right properties you are looking for. Most of hem are self-explanatory. So you can try retrieving the required properties. In the above code, I just pushed the photos to the page. In real time you can use the retrieved photos along with JQuery libraries to create animated photo galleries, slideshows etc. Configuration and Troubleshooting If you get access denied error while executing the code, you need to disable the caching in Flickr API. FlickrNet cache the photos to your local disk when retrieved. You can specify a cache folder where the application need write permission. You can specify the Cache folder in the code as follows. Flickr.CacheLocation = Server.MapPath("./FlickerCache/"); If the application doesn’t have have write permission to the cache folder, the application will throw access denied error. If you cannot give write permission to the cache folder, then you must disable the caching. You can do this from code as follows. Flickr.CacheDisabled = true; Disabling cache will have an impact on the performance. Take care! Also you can define the Flickr settings in web.config file.You can find the documentation here. http://flickrnet.codeplex.com/wikipage?title=ExampleConfigFile&ProjectName=flickrnet Flickr is a great place for storing and sharing photos. The API access allows developers to do seamless integration with the photos uploaded on Flickr.

Read the article
Microsoft TechEd 2010 - Day 3 @ Bangalore

- by sathya

Microsoft TechEd 2010 - Day 3 @ Bangalore Sorry for my delayed post on day 3 because I had to travel from Blore to Chennai So I couldnt write for the past two days. On day 3 as usual we had lot of simultaneous tracks on various sessions. This day I choose the Your Data, Our Platform Track. It had sessions on the following 5 topics : Developing Data-tier Applications in Visual Studio 2010 - by Sanjay Nagamangalam SQL Server Query Optimization, Execution and Debugging Query Performance - by Vinod Kumar M SQL Server Utility - Its about more than 1 SQL Server - by Vinod Kumar Jagannathan Data Recovery / Consistency with CheckDB - by Vinod Kumar M Developing with SQL Server Spatial and Deep dive into Spatial Indexing - by Pinal Dave Developing Data-tier Applications in Visual Studio 2010 - by Sanjay Nagamangalam This was one of the superb sessions i have attended. He explained all the concepts in detail with a demo. The important thing in this is there is something called Data-Tier application project which is newly introduced in this VS2010 with which we can manage all our data along with our application inside our VS itself. We can create DB,Tables,Procs,Views etc. here itself and once we deploy it creates a compressed file called .dacpac which stores all the changes in Table Schema,Created procs, etc. on to that single file which reduces our (developer's) effort in preparing the deployment scripts and giving it to the DBA. It also has some policy configurations which can be managed easily by checking some rules like in outlook. For Ex : IF the SQL Server Version > 10 then deploy else dont. This rule specifies that even if we try to deploy on SQL Server DB with version less than 10 It will not do it. And if we deploy some .dacpac to SQL server production db with the option upgrade DB with this dacpac once everything completes successfully it will say success else it rollsback to the prior version. Even if it gets deployed successfully and later @ a point of time you wish to revert it back to the prior version, you can go ahead and delete the existing dacpac version so that it reverts to the older version of the db changes. And for the good questions that were asked in the session T-Shirts were given. SQL Server Query Optimization, Execution and Debugging Query Performance - by Vinod Kumar M This one too was the best session. The speaker Vinod explained everything very much clearly. This was really useful session and you dont believe, as per my knowledge, in the total 3 days in the TechEd except the Keynote, for this session seats were full (House FULL) People were even standing out to attend this session. Such a great one it was. The speaker did a deep dive in to the Query Plan section and showed which actually causes the problem. Its all about the thing that we need to understand about the execution of SQL server Queries. We think in a way and SQL Server never executes in that way. We need to understand that first. He also told about there might be two plans generated for a single query at a point of time because of parallel processors in the system. The Key is here in every query. There is something called Estimated Row Count and Actual Row Count in the query plan. If the estimated row count by SQL server tallies with the actual row count your performance will be awesome. He said some tweaks to achieve the same. After this as usual we had lunch SQL Server Utility - Its about more than 1 SQL Server - by Vinod Kumar Jagannathan This was more of a DBA's session. Am really sorry I was totally blank and I was not interested to attend this session and walked out to attend Migrating to the cloud by Harish Ranganathan (My favorite Speaker) but unfortunately that was some other persons session. There the speaker was telling about how to configure the connection strings in such a way that we can connect to the SQL Azure platform from our VS and also showed us how to deploy the same in to Windows Azure. In between there were lot of technical problems like laptop hang, user locked and he was switching between systems, also i came in the half so i wasnt able to listen that fully. In between, Since I got an MCTS certification they gave me T-Shirt with the lines 'Iam Certified. Are you?' and they asked me to wear that. If we wear that we might get spotted and they would give us some goodies So on the 3rd day I was wearing that T-Shirt. I got spotted by the person Tarun who was coordinating things about the certification, and he was accompanied with a cameraman and they interviewed me about the certification and I was shown live in the Teched and was seen by 60000 live viewers of the TechEd. I was really happy on that. Data Recovery / Consistency with CheckDB - by Vinod Kumar M This was one of the best sessions too in the TechEd. This guy is really amazing. In front of us he crashed a DB and showed how to recover the same in 6 different ways for different no of failures. Showed about Different types of error msgs like : 823,824,825 msdb..suspect_pages DBCC CheckDB (different parameters to it) I am really waiting for his session to get uploaded live in the Teched Website. Here is his contact info If you wish to connect to him : Twitter : @vinodk_sql Website : www.ExtremeExperts.com Blog : http://blogs.sqlxml.org/vinodkumar Developing with SQL Server Spatial and Deep dive into Spatial Indexing - by Pinal Dave Pinal Dave is a King in SQL and he is a SQL MVP and he is the owner of SQLAuthority.com He took the session on Spatial Databases from the start. Showed about the different types of Spatial : Geometric and Geographic Geometric : x and y axis its a planar surface Geographic : Spherical surface with 3600 as the maximum which is used to represent the geographic points on the earth and easy to draw maps of different kinds. He had a lot of obstacles during his session like rain coming inside the hall, mic wires got bursted due to rain, Videos off on the display screens. In spite of that he asked the audience to come in the front rows and managed to take a good session without ppts and finally we got the displays on and he was showing demos on the same what he explained orally. That was really a fun filled informative session. He gave some books for the persons who asked good questions and answered well for his questions and I got one too (It was a book on Data Mining - Wrox Publishers) And finally after all these things there was Keynote session for close of the TechEd. and we all assembled in a big hall where Mr.Ashok Soota, a man of age around 70 co-founder of Mindtree was called to give some lecture on his successes. He was explaining about his past and what all companies he switched and for what reasons and what are all his successes and what are all his failures and the learnings of him from his past failures. and his success and failures on his partnerships with the other concern. And there were some questions for him like What is your suggestion on young entrepreneur? How did you learn from past failures? What is reiterating your success? What is your suggestion on partnerships? How to choose partnerships? etc. And they said @ 7.30 Pm there would be a party night, but unfortunately i was not able to attend that because I had to catch my train and before that i had to pack things, so I started @ 7 itself. Thats it about the TechED!!! Stay tuned for further Technology updates.

Read the article
New Validation Attributes in ASP.NET MVC 3 Future

- by imran_ku07

Introduction: Validating user inputs is an very important step in collecting information from users because it helps you to prevent errors during processing data. Incomplete or improperly formatted user inputs will create lot of problems for your application. Fortunately, ASP.NET MVC 3 makes it very easy to validate most common input validations. ASP.NET MVC 3 includes Required, StringLength, Range, RegularExpression, Compare and Remote validation attributes for common input validation scenarios. These validation attributes validates most of your user inputs but still validation for Email, File Extension, Credit Card, URL, etc are missing. Fortunately, some of these validation attributes are available in ASP.NET MVC 3 Future. In this article, I will show you how to leverage Email, Url, CreditCard and FileExtensions validation attributes(which are available in ASP.NET MVC 3 Future) in ASP.NET MVC 3 application. Description: First of all you need to download ASP.NET MVC 3 RTM Source Code from here. Then extract all files in a folder. Then open MvcFutures project from mvc3-rtm-sources\mvc3\src\MvcFutures folder. Build the project. In case, if you get compile time error(s) then simply remove the reference of System.Web.WebPages and System.Web.Mvc assemblies and add the reference of System.Web.WebPages and System.Web.Mvc 3 assemblies again but from the .NET tab and then build the project again, it will create a Microsoft.Web.Mvc assembly inside mvc3-rtm-sources\mvc3\src\MvcFutures\obj\Debug folder. Now we can use Microsoft.Web.Mvc assembly inside our application. Create a new ASP.NET MVC 3 application. For demonstration purpose, I will create a dummy model UserInformation. So create a new class file UserInformation.cs inside Model folder and add the following code, public class UserInformation { [Required] public string Name { get; set; } [Required] [EmailAddress] public string Email { get; set; } [Required] [Url] public string Website { get; set; } [Required] [CreditCard] public string CreditCard { get; set; } [Required] [FileExtensions(Extensions = "jpg,jpeg")] public string Image { get; set; } } Inside UserInformation class, I am using Email, Url, CreditCard and FileExtensions validation attributes which are defined in Microsoft.Web.Mvc assembly. By default FileExtensionsAttribute allows png, jpg, jpeg and gif extensions. You can override this by using Extensions property of FileExtensionsAttribute class. Then just open(or create) HomeController.cs file and add the following code, public class HomeController : Controller { public ActionResult Index() { return View(); } [HttpPost] public ActionResult Index(UserInformation u) { return View(); } } Next just open(or create) Index view for Home controller and add the following code, @model NewValidationAttributesinASPNETMVC3Future.Model.UserInformation @{ ViewBag.Title = "Index"; Layout = "~/Views/Shared/_Layout.cshtml"; } <h2>Index</h2> <script src="@Url.Content("~/Scripts/jquery.validate.min.js")" type="text/javascript"></script> <script src="@Url.Content("~/Scripts/jquery.validate.unobtrusive.min.js")" type="text/javascript"></script> @using (Html.BeginForm()) { @Html.ValidationSummary(true) <fieldset> <legend>UserInformation</legend> <div class="editor-label"> @Html.LabelFor(model => model.Name) </div> <div class="editor-field"> @Html.EditorFor(model => model.Name) @Html.ValidationMessageFor(model => model.Name) </div> <div class="editor-label"> @Html.LabelFor(model => model.Email) </div> <div class="editor-field"> @Html.EditorFor(model => model.Email) @Html.ValidationMessageFor(model => model.Email) </div> <div class="editor-label"> @Html.LabelFor(model => model.Website) </div> <div class="editor-field"> @Html.EditorFor(model => model.Website) @Html.ValidationMessageFor(model => model.Website) </div> <div class="editor-label"> @Html.LabelFor(model => model.CreditCard) </div> <div class="editor-field"> @Html.EditorFor(model => model.CreditCard) @Html.ValidationMessageFor(model => model.CreditCard) </div> <div class="editor-label"> @Html.LabelFor(model => model.Image) </div> <div class="editor-field"> @Html.EditorFor(model => model.Image) @Html.ValidationMessageFor(model => model.Image) </div> <p> <input type="submit" value="Save" /> </p> </fieldset> } <div> @Html.ActionLink("Back to List", "Index") </div> Now just run your application. You will find that both client side and server side validation for the above validation attributes works smoothly. Summary: Email, URL, Credit Card and File Extension input validations are very common. In this article, I showed you how you can validate these input validations into your application. I explained this with an example. I am also attaching a sample application which also includes Microsoft.Web.Mvc.dll. So you can add a reference of Microsoft.Web.Mvc assembly directly instead of doing any manual work. Hope you will enjoy this article too. SyntaxHighlighter.all()

Read the article
Fragmented Log files could be slowing down your database

- by Fatherjack

Something that is sometimes forgotten by a lot of DBAs is the fact that database log files get fragmented in the same way that you get fragmentation in a data file. The cause is very different but the effect is the same – too much effort reading and writing data. Data files get fragmented as data is changed through normal system activity, INSERTs, UPDATEs and DELETEs cause fragmentation and most experienced DBAs are monitoring their indexes for fragmentation and dealing with it accordingly. However, you don’t hear about so many working on their log files. How can a log file get fragmented? I’m glad you asked. When you create a database there are at least two files created on the disk storage; an mdf for the data and an ldf for the log file (you can also have ndf files for extra data storage but that’s off topic for now). It is wholly possible to have more than one log file but in most cases there is little point in creating more than one as the log file is written to in a ‘wrap-around’ method (more on that later). When a log file is created at the time that a database is created the file is actually sub divided into a number of virtual log files (VLFs). The number and size of these VLFs depends on the size chosen for the log file. VLFs are also created in the space added to a log file when a log file growth event takes place. Do you have your log files set to auto grow? Then you have potentially been introducing many VLFs into your log file. Let’s get to see how many VLFs we have in a brand new database. USE master GO CREATE DATABASE VLF_Test ON ( NAME = VLF_Test, FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10.ROCK_2008\MSSQL\DATA\VLF_Test.mdf', SIZE = 100, MAXSIZE = 500, FILEGROWTH = 50 ) LOG ON ( NAME = VLF_Test_Log, FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10.ROCK_2008\MSSQL\DATA\VLF_Test_log.ldf', SIZE = 5MB, MAXSIZE = 250MB, FILEGROWTH = 5MB ); go USE VLF_Test go DBCC LOGINFO; The results of this are firstly a new database is created with specified files sizes and the the DBCC LOGINFO results are returned to the script editor. The DBCC LOGINFO results have plenty of interesting information in them but lets first note there are 4 rows of information, this relates to the fact that 4 VLFs have been created in the log file. The values in the FileSize column are the sizes of each VLF in bytes, you will see that the last one to be created is slightly larger than the others. So, a 5MB log file has 4 VLFs of roughly 1.25 MB. Lets alter the CREATE DATABASE script to create a log file that’s a bit bigger and see what happens. Alter the code above so that the log file details are replaced by LOG ON ( NAME = VLF_Test_Log, FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10.ROCK_2008\MSSQL\DATA\VLF_Test_log.ldf', SIZE = 1GB, MAXSIZE = 25GB, FILEGROWTH = 1GB ); With a bigger log file specified we get more VLFs What if we make it bigger again? LOG ON ( NAME = VLF_Test_Log, FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10.ROCK_2008\MSSQL\DATA\VLF_Test_log.ldf', SIZE = 5GB, MAXSIZE = 250GB, FILEGROWTH = 5GB ); This time we see more VLFs are created within our log file. We now have our 5GB log file comprised of 16 files of 320MB each. In fact these sizes fall into all the ranges that control the VLF creation criteria – what a coincidence! The rules that are followed when a log file is created or has it’s size increased are pretty basic. If the file growth is lower than 64MB then 4 VLFs are created If the growth is between 64MB and 1GB then 8 VLFs are created If the growth is greater than 1GB then 16 VLFs are created. Now the potential for chaos comes if the default values and settings for log file growth are used. By default a database log file gets a 1MB log file with unlimited growth in steps of 10%. The database we just created is 6 MB, let’s add some data and see what happens. USE vlf_test go -- we need somewhere to put the data so, a table is in order IF OBJECT_ID('A_Table') IS NOT NULL DROP TABLE A_Table go CREATE TABLE A_Table ( Col_A int IDENTITY, Col_B CHAR(8000) ) GO -- Let's check the state of the log file -- 4 VLFs found EXECUTE ('DBCC LOGINFO'); go -- We can go ahead and insert some data and then check the state of the log file again INSERT A_Table (col_b) SELECT TOP 500 REPLICATE('a',2000) FROM sys.columns AS sc, sys.columns AS sc2 GO -- insert 500 rows and we get 22 VLFs EXECUTE ('DBCC LOGINFO'); go -- Let's insert more rows INSERT A_Table (col_b) SELECT TOP 2000 REPLICATE('a',2000) FROM sys.columns AS sc, sys.columns AS sc2 GO 10 -- insert 2000 rows, in 10 batches and we suddenly have 107 VLFs EXECUTE ('DBCC LOGINFO'); Well, that escalated quickly! Our log file is split, internally, into 107 fragments after a few thousand inserts. The same happens with any logged transactions, I just chose to illustrate this with INSERTs. Having too many VLFs can cause performance degradation at times of database start up, log backup and log restore operations so it’s well worth keeping a check on this property. How do we prevent excessive VLF creation? Creating the database with larger files and also with larger growth steps and actively choosing to grow your databases rather than leaving it to the Auto Grow event can make sure that the growths are made with a size that is optimal. How do we resolve a situation of a database with too many VLFs? This process needs to be done when the database is under little or no stress so that you don’t affect system users. The steps are: BACKUP LOG YourDBName TO YourBackupDestinationOfChoice Shrink the log file to its smallest possible size DBCC SHRINKFILE(FileNameOfTLogHere, TRUNCATEONLY) * Re-size the log file to the size you want it to, taking in to account your expected needs for the coming months or year. ALTER DATABASE YourDBName MODIFY FILE ( NAME = FileNameOfTLogHere, SIZE = TheSizeYouWantItToBeIn_MB) * – If you don’t know the file name of your log file then run sp_helpfile while you are connected to the database that you want to work on and you will get the details you need. The resize step can take quite a while This is already detailed far better than I can explain it by Kimberley Tripp in her blog 8-Steps-to-better-Transaction-Log-throughput.aspx. The result of this will be a log file with a VLF count according to the bullet list above. Knowing when VLFs are being created By complete coincidence while I have been writing this blog (it’s been quite some time from it’s inception to going live) Jonathan Kehayias from SQLSkills.com has written a great article on how to track database file growth using Event Notifications and Service Broker. I strongly recommend taking a look at it as this is going to catch any sneaky auto grows that take place and let you know about them right away. Hassle free monitoring of VLFs If you are lucky or wise enough to be using SQL Monitor or another monitoring tool that let’s you write your own custom metrics then you can keep an eye on this very easily. There is a custom metric for VLFs (written by Stuart Ainsworth) already on the site and there are some others there are very useful so take a moment or two to look around while you are there. Resources MSDN – http://msdn.microsoft.com/en-us/library/ms179355(v=sql.105).aspx Kimberly Tripp from SQLSkills.com – http://www.sqlskills.com/BLOGS/KIMBERLY/post/8-Steps-to-better-Transaction-Log-throughput.aspx Thomas LaRock at Simple-Talk.com – http://www.simple-talk.com/sql/database-administration/monitoring-sql-server-virtual-log-file-fragmentation/ Disclosure I am a Friend of Red Gate. This means that I am more than likely to say good things about Red Gate DBA and Developer tools. No matter how awesome I make them sound, take the time to compare them with other products before you contact the Red Gate sales team to make your order.

Read the article
ASP.NET MVC 3 Hosting :: Rolling with Razor in MVC v3 Preview

- by mbridge

Razor is an alternate view engine for asp.net MVC. It was introduced in the “WebMatrix” tool and has now been released as part of the asp.net MVC 3 preview 1. Basically, Razor allows us to replace the clunky <% %> syntax with a much cleaner coding model, which integrates very nicely with HTML. Additionally, it provides some really nice features for master page type scenarios and you don’t lose access to any of the features you are currently familiar with, such as HTML helper methods. First, download and install the ASP.NET MVC Preview 1. You can find this at http://www.microsoft.com/downloads/details.aspx?FamilyID=cb42f741-8fb1-4f43-a5fa-812096f8d1e8&displaylang=en. Now, follow these steps to create your first asp.net mvc project using Razor: 1. Open Visual Studio 2010 2. Create a new project. Select File->New->Project (Shift Control N) 3. You will see the list of project types which should look similar to what’s shown: 4. Select “ASP.NET MVC 3 Web Application (Razor).” Set the application name to RazorTest and the path to c:projectsRazorTest for this tutorial. If you select accidently select ASPX, you will end up with the standard asp.net view engine and template, which isn’t what you want. 5. For this tutorial, and ONLY for this tutorial, select “No, do not create a unit test project.” In general, you should create and use a unit test project. Code without unit tests is kind of like diet ice cream. It just isn’t very good. Now, once we have this done, our brand new project will be created. In all likelihood, Visual Studio will leave you looking at the “HomeController.cs” class, as shown below: Immediately, you should notice one difference. The Index action used to look like: public ActionResult Index () { ViewData[“Message”] = “Welcome to ASP.Net MVC!”; Return View(); } While this will still compile and run just fine, ASP.Net MVC 3 has a much nicer way of doing this: public ActionResult Index() { ViewModel.Message = “Welcome to ASP.Net MVC!”; Return View(); } Instead of using ViewData we are using the new ViewModel object, which uses the new dynamic data typing of .Net 4.0 to allow us to express ourselves much more cleanly. This isn’t a tutorial on ALL of MVC 3, but the ViewModel concept is one we will need as we dig into Razor. What comes in the box? When we create a project using the ASP.Net MVC 3 Template with Razor, we get a standard project setup, just like we did in ASP.NET MVC 2.0 but with some differences. Instead of seeing “.aspx” view files and “.ascx” files, we see files with the “.cshtml” which is the default razor extension. Before we discuss the details of a razor file, one thing to keep in mind is that since this is an extremely early preview, intellisense is not currently enabled with the razor view engine. This is promised as an updated before the final release. Just like with the aspx view engine, the convention of the folder name for a set of views matching the controller name without the word “Controller” still stands. Similarly, each action in the controller will usually have a corresponding view file in the appropriate view directory. Remember, in asp.net MVC, convention over configuration is key to successful development! The initial template organizes views in the following folders, located in the project under Views: - Account – The default account management views used by the Account controller. Each file represents a distinct view. - Home – Views corresponding to the appropriate actions within the home controller. - Shared – This contains common view objects used by multiple views. Within here, master pages are stored, as well as partial page views (user controls). By convention, these partial views are named “_XXXPartial.cshtml” where XXX is the appropriate name, such as _LogonPartial.cshtml. Additionally, display templates are stored under here. With this in mind, let us take a look at the index.cshtml file under the home view directory. When you open up index.cshtml you should see 1: @inherits System.Web.Mvc.WebViewPage 2: @{ 3: View.Title = "Home Page"; 4: LayoutPage = "~/Views/Shared/_Layout.cshtml"; 5: } 6: <h2>@View.Message</h2> 7: <p> 8: To learn more about ASP.NET MVC visit <a href="http://asp.net/mvc" title="ASP.NET MVC 9: Website">http://asp.net/mvc</a>. 10: </p> So looking through this, we observe the following facts: Line 1 imports the base page that all views (using Razor) are based on, which is System.Web.Mvc.WebViewPage. Note that this is different than System.Web.MVC.ViewPage which is used by asp.net MVC 2.0 Also note that instead of the <% %> syntax, we use the very simple ‘@’ sign. The View Engine contains enough context sensitive logic that it can even distinguish between @ in code and @ in an email. It’s a very clean markup. Line 2 introduces the idea of a code block in razor. A code block is a scoping mechanism just like it is in a normal C# class. It is designated by @{… } and any C# code can be placed in between. Note that this is all server side code just like it is when using the aspx engine and <% %>. Line 3 allows us to set the page title in the client page’s file. This is a new feature which I’ll talk more about when we get to master pages, but it is another of the nice things razor brings to asp.net mvc development. Line 4 is where we specify our “master” page, but as you can see, you can place it almost anywhere you want, because you tell it where it is located. A Layout Page is similar to a master page, but it gains a bit when it comes to flexibility. Again, we’ll come back to this in a later installment. Line 6 and beyond is where we display the contents of our view. No more using <%: %> intermixed with code. Instead, we get to use very clean syntax such as @View.Message. This is a lot easier to read than <%:@View.Message%> especially when intermixed with html. For example: <p> My name is @View.Name and I live at @View.Address </p> Compare this to the equivalent using the aspx view engine <p> My name is <%:View.Name %> and I live at <%: View.Address %> </p> While not an earth shaking simplification, it is easier on the eyes. As we explore other features, this clean markup will become more and more valuable.

Read the article
SPARC T4-4 Beats 8-CPU IBM POWER7 on TPC-H @3000GB Benchmark

- by Brian

Oracle's SPARC T4-4 server delivered a world record TPC-H @3000GB benchmark result for systems with four processors. This result beats eight processor results from IBM (POWER7) and HP (x86). The SPARC T4-4 server also delivered better performance per core than these eight processor systems from IBM and HP. Comparisons below are based upon system to system comparisons, highlighting Oracle's complete software and hardware solution. This database world record result used Oracle's Sun Storage 2540-M2 arrays (rotating disk) connected to a SPARC T4-4 server running Oracle Solaris 11 and Oracle Database 11g Release 2 demonstrating the power of Oracle's integrated hardware and software solution. The SPARC T4-4 server based configuration achieved a TPC-H scale factor 3000 world record for four processor systems of 205,792 QphH@3000GB with price/performance of $4.10/QphH@3000GB. The SPARC T4-4 server with four SPARC T4 processors (total of 32 cores) is 7% faster than the IBM Power 780 server with eight POWER7 processors (total of 32 cores) on the TPC-H @3000GB benchmark. The SPARC T4-4 server is 36% better in price performance compared to the IBM Power 780 server on the TPC-H @3000GB Benchmark. The SPARC T4-4 server is 29% faster than the IBM Power 780 for data loading. The SPARC T4-4 server is up to 3.4 times faster than the IBM Power 780 server for the Refresh Function. The SPARC T4-4 server with four SPARC T4 processors is 27% faster than the HP ProLiant DL980 G7 server with eight x86 processors on the TPC-H @3000GB benchmark. The SPARC T4-4 server is 52% faster than the HP ProLiant DL980 G7 server for data loading. The SPARC T4-4 server is up to 3.2 times faster than the HP ProLiant DL980 G7 for the Refresh Function. The SPARC T4-4 server achieved a peak IO rate from the Oracle database of 17 GB/sec. This rate was independent of the storage used, as demonstrated by the TPC-H @3000TB benchmark which used twelve Sun Storage 2540-M2 arrays (rotating disk) and the TPC-H @1000TB benchmark which used four Sun Storage F5100 Flash Array devices (flash storage). [*] The SPARC T4-4 server showed linear scaling from TPC-H @1000GB to TPC-H @3000GB. This demonstrates that the SPARC T4-4 server can handle the increasingly larger databases required of DSS systems. [*] The SPARC T4-4 server benchmark results demonstrate a complete solution of building Decision Support Systems including data loading, business questions and refreshing data. Each phase usually has a time constraint and the SPARC T4-4 server shows superior performance during each phase. [*] The TPC believes that comparisons of results published with different scale factors are misleading and discourages such comparisons. Performance Landscape The table lists the leading TPC-H @3000GB results for non-clustered systems. TPC-H @3000GB, Non-Clustered Systems System Processor P/C/T – Memory Composite(QphH) $/perf($/QphH) Power(QppH) Throughput(QthH) Database Available SPARC Enterprise M9000 3.0 GHz SPARC64 VII+ 64/256/256 – 1024 GB 386,478.3 $18.19 316,835.8 471,428.6 Oracle 11g R2 09/22/11 SPARC T4-4 3.0 GHz SPARC T4 4/32/256 – 1024 GB 205,792.0 $4.10 190,325.1 222,515.9 Oracle 11g R2 05/31/12 SPARC Enterprise M9000 2.88 GHz SPARC64 VII 32/128/256 – 512 GB 198,907.5 $15.27 182,350.7 216,967.7 Oracle 11g R2 12/09/10 IBM Power 780 4.1 GHz POWER7 8/32/128 – 1024 GB 192,001.1 $6.37 210,368.4 175,237.4 Sybase 15.4 11/30/11 HP ProLiant DL980 G7 2.27 GHz Intel Xeon X7560 8/64/128 – 512 GB 162,601.7 $2.68 185,297.7 142,685.6 SQL Server 2008 10/13/10 P/C/T = Processors, Cores, Threads QphH = the Composite Metric (bigger is better) $/QphH = the Price/Performance metric in USD (smaller is better) QppH = the Power Numerical Quantity QthH = the Throughput Numerical Quantity The following table lists data load times and refresh function times during the power run. TPC-H @3000GB, Non-Clustered Systems Database Load & Database Refresh System Processor Data Loading(h:m:s) T4Advan RF1(sec) T4Advan RF2(sec) T4Advan SPARC T4-4 3.0 GHz SPARC T4 04:08:29 1.0x 67.1 1.0x 39.5 1.0x IBM Power 780 4.1 GHz POWER7 05:51:50 1.5x 147.3 2.2x 133.2 3.4x HP ProLiant DL980 G7 2.27 GHz Intel Xeon X7560 08:35:17 2.1x 173.0 2.6x 126.3 3.2x Data Loading = database load time RF1 = power test first refresh transaction RF2 = power test second refresh transaction T4 Advan = the ratio of time to T4 time Complete benchmark results found at the TPC benchmark website http://www.tpc.org. Configuration Summary and Results Hardware Configuration: SPARC T4-4 server 4 x SPARC T4 3.0 GHz processors (total of 32 cores, 128 threads) 1024 GB memory 8 x internal SAS (8 x 300 GB) disk drives External Storage: 12 x Sun Storage 2540-M2 array storage, each with 12 x 15K RPM 300 GB drives, 2 controllers, 2 GB cache Software Configuration: Oracle Solaris 11 11/11 Oracle Database 11g Release 2 Enterprise Edition Audited Results: Database Size: 3000 GB (Scale Factor 3000) TPC-H Composite: 205,792.0 QphH@3000GB Price/performance: $4.10/QphH@3000GB Available: 05/31/2012 Total 3 year Cost: $843,656 TPC-H Power: 190,325.1 TPC-H Throughput: 222,515.9 Database Load Time: 4:08:29 Benchmark Description The TPC-H benchmark is a performance benchmark established by the Transaction Processing Council (TPC) to demonstrate Data Warehousing/Decision Support Systems (DSS). TPC-H measurements are produced for customers to evaluate the performance of various DSS systems. These queries and updates are executed against a standard database under controlled conditions. Performance projections and comparisons between different TPC-H Database sizes (100GB, 300GB, 1000GB, 3000GB, 10000GB, 30000GB and 100000GB) are not allowed by the TPC. TPC-H is a data warehousing-oriented, non-industry-specific benchmark that consists of a large number of complex queries typical of decision support applications. It also includes some insert and delete activity that is intended to simulate loading and purging data from a warehouse. TPC-H measures the combined performance of a particular database manager on a specific computer system. The main performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@SF, where SF is the number of GB of raw data, referred to as the scale factor). QphH@SF is intended to summarize the ability of the system to process queries in both single and multiple user modes. The benchmark requires reporting of price/performance, which is the ratio of the total HW/SW cost plus 3 years maintenance to the QphH. A secondary metric is the storage efficiency, which is the ratio of total configured disk space in GB to the scale factor. Key Points and Best Practices Twelve Sun Storage 2540-M2 arrays were used for the benchmark. Each Sun Storage 2540-M2 array contains 12 15K RPM drives and is connected to a single dual port 8Gb FC HBA using 2 ports. Each Sun Storage 2540-M2 array showed 1.5 GB/sec for sequential read operations and showed linear scaling, achieving 18 GB/sec with twelve Sun Storage 2540-M2 arrays. These were stand alone IO tests. The peak IO rate measured from the Oracle database was 17 GB/sec. Oracle Solaris 11 11/11 required very little system tuning. Some vendors try to make the point that storage ratios are of customer concern. However, storage ratio size has more to do with disk layout and the increasing capacities of disks – so this is not an important metric in which to compare systems. The SPARC T4-4 server and Oracle Solaris efficiently managed the system load of over one thousand Oracle Database parallel processes. Six Sun Storage 2540-M2 arrays were mirrored to another six Sun Storage 2540-M2 arrays on which all of the Oracle database files were placed. IO performance was high and balanced across all the arrays. The TPC-H Refresh Function (RF) simulates periodical refresh portion of Data Warehouse by adding new sales and deleting old sales data. Parallel DML (parallel insert and delete in this case) and database log performance are a key for this function and the SPARC T4-4 server outperformed both the IBM POWER7 server and HP ProLiant DL980 G7 server. (See the RF columns above.) See Also Transaction Processing Performance Council (TPC) Home Page Ideas International Benchmark Page SPARC T4-4 Server oracle.com OTN Oracle Solaris oracle.com OTN Oracle Database 11g Release 2 Enterprise Edition oracle.com OTN Sun Storage 2540-M2 Array oracle.com OTN Disclosure Statement TPC-H, QphH, $/QphH are trademarks of Transaction Processing Performance Council (TPC). For more information, see www.tpc.org. SPARC T4-4 205,792.0 QphH@3000GB, $4.10/QphH@3000GB, available 5/31/12, 4 processors, 32 cores, 256 threads; IBM Power 780 QphH@3000GB, 192,001.1 QphH@3000GB, $6.37/QphH@3000GB, available 11/30/11, 8 processors, 32 cores, 128 threads; HP ProLiant DL980 G7 162,601.7 QphH@3000GB, $2.68/QphH@3000GB available 10/13/10, 8 processors, 64 cores, 128 threads.

Read the article
Authenticating clients in the new WCF Http stack

- by cibrax

About this time last year, I wrote a couple of posts about how to use the “Interceptors” from the REST starker kit for implementing several authentication mechanisms like “SAML”, “Basic Authentication” or “OAuth” in the WCF Web programming model. The things have changed a lot since then, and Glenn finally put on our hands a new version of the Web programming model that deserves some attention and I believe will help us a lot to build more Http oriented services in the .NET stack. What you can get today from wcf.codeplex.com is a preview with some cool features like Http Processors (which I already discussed here), a new and improved version of the HttpClient library, Dependency injection and better TDD support among others. However, the framework still does not support an standard way of doing client authentication on the services (This is something planned for the upcoming releases I believe). For that reason, moving the existing authentication interceptors to this new programming model was one of the things I did in the last few days. In order to make authentication simple and easy to extend, I first came up with a model based on what I called “Authentication Interceptors”. An authentication interceptor maps to an existing Http authentication mechanism and implements the following interface, public interface IAuthenticationInterceptor{ string Scheme { get; } bool DoAuthentication(HttpRequestMessage request, HttpResponseMessage response, out IPrincipal principal);} An authentication interceptors basically needs to returns the http authentication schema that implements in the property “Scheme”, and implements the authentication mechanism in the method “DoAuthentication”. As you can see, this last method “DoAuthentication” only relies on the HttpRequestMessage and HttpResponseMessage classes, making the testing of this interceptor very simple (There is no need to do some black magic with the WCF context or messages). After this, I implemented a couple of interceptors for supporting basic authentication and brokered authentication with SAML (using WIF) in my services. The following code illustrates how the basic authentication interceptors looks like. public class BasicAuthenticationInterceptor : IAuthenticationInterceptor{ Func<UsernameAndPassword, bool> userValidation; string realm; public BasicAuthenticationInterceptor(Func<UsernameAndPassword, bool> userValidation, string realm) { if (userValidation == null) throw new ArgumentNullException("userValidation"); if (string.IsNullOrEmpty(realm)) throw new ArgumentNullException("realm"); this.userValidation = userValidation; this.realm = realm; } public string Scheme { get { return "Basic"; } } public bool DoAuthentication(HttpRequestMessage request, HttpResponseMessage response, out IPrincipal principal) { string[] credentials = ExtractCredentials(request); if (credentials.Length == 0 || !AuthenticateUser(credentials[0], credentials[1])) { response.StatusCode = HttpStatusCode.Unauthorized; response.Content = new StringContent("Access denied"); response.Headers.WwwAuthenticate.Add(new AuthenticationHeaderValue("Basic", "realm=" + this.realm)); principal = null; return false; } else { principal = new GenericPrincipal(new GenericIdentity(credentials[0]), new string[] {}); return true; } } private string[] ExtractCredentials(HttpRequestMessage request) { if (request.Headers.Authorization != null && request.Headers.Authorization.Scheme.StartsWith("Basic")) { string encodedUserPass = request.Headers.Authorization.Parameter.Trim(); Encoding encoding = Encoding.GetEncoding("iso-8859-1"); string userPass = encoding.GetString(Convert.FromBase64String(encodedUserPass)); int separator = userPass.IndexOf(':'); string[] credentials = new string[2]; credentials[0] = userPass.Substring(0, separator); credentials[1] = userPass.Substring(separator + 1); return credentials; } return new string[] { }; } private bool AuthenticateUser(string username, string password) { var usernameAndPassword = new UsernameAndPassword { Username = username, Password = password }; if (this.userValidation(usernameAndPassword)) { return true; } return false; }} This interceptor receives in the constructor a callback in the form of a Func delegate for authenticating the user and the “realm”, which is required as part of the implementation. The rest is a general implementation of the basic authentication mechanism using standard http request and response messages. I also implemented another interceptor for authenticating a SAML token with WIF. public class SamlAuthenticationInterceptor : IAuthenticationInterceptor{ SecurityTokenHandlerCollection handlers = null; public SamlAuthenticationInterceptor(SecurityTokenHandlerCollection handlers) { if (handlers == null) throw new ArgumentNullException("handlers"); this.handlers = handlers; } public string Scheme { get { return "saml"; } } public bool DoAuthentication(HttpRequestMessage request, HttpResponseMessage response, out IPrincipal principal) { SecurityToken token = ExtractCredentials(request); if (token != null) { ClaimsIdentityCollection claims = handlers.ValidateToken(token); principal = new ClaimsPrincipal(claims); return true; } else { response.StatusCode = HttpStatusCode.Unauthorized; response.Content = new StringContent("Access denied"); principal = null; return false; } } private SecurityToken ExtractCredentials(HttpRequestMessage request) { if (request.Headers.Authorization != null && request.Headers.Authorization.Scheme == "saml") { XmlTextReader xmlReader = new XmlTextReader(new StringReader(request.Headers.Authorization.Parameter)); var col = SecurityTokenHandlerCollection.CreateDefaultSecurityTokenHandlerCollection(); SecurityToken token = col.ReadToken(xmlReader); return token; } return null; }}This implementation receives a “SecurityTokenHandlerCollection” instance as part of the constructor. This class is part of WIF, and basically represents a collection of token managers to know how to handle specific xml authentication tokens (SAML is one of them). I also created a set of extension methods for injecting these interceptors as part of a service route when the service is initialized. var basicAuthentication = new BasicAuthenticationInterceptor((u) => true, "ContactManager");var samlAuthentication = new SamlAuthenticationInterceptor(serviceConfiguration.SecurityTokenHandlers); // use MEF for providing instancesvar catalog = new AssemblyCatalog(typeof(Global).Assembly);var container = new CompositionContainer(catalog);var configuration = new ContactManagerConfiguration(container); RouteTable.Routes.AddServiceRoute<ContactResource>("contact", configuration, basicAuthentication, samlAuthentication);RouteTable.Routes.AddServiceRoute<ContactsResource>("contacts", configuration, basicAuthentication, samlAuthentication); In the code above, I am injecting the basic authentication and saml authentication interceptors in the “contact” and “contacts” resource implementations that come as samples in the code preview. I will use another post to discuss more in detail how the brokered authentication with SAML model works with this new WCF Http bits. The code is available to download in this location.

Read the article
Portal And Content - Content Integration - Best Practices

- by Stefan Krantz

Lately we have seen an increase in projects that have failed to either get user friendly content integration or non satisfactory performance. Our intention is to mitigate any knowledge gap that our previous post might have left you with, therefore this post will repeat some recommendation or reference back to old useful post. Moreover this post will help you understand ground up how to design, architect and implement business enabled, responsive and performing portals with complex requirements on business centric information publishing. Design the Information Model The key to successful portal deployments is Information modeling, it's a key task to understand the use case you designing for, therefore I have designed a set of question you need to ask yourself or your customer: Question: Who will own the content, IT or Business? Answer: BusinessQuestion: Who will publish the content, IT or Business? Answer: BusinessQuestion: Will there be multiple publishers? Answer: YesQuestion: Are the publishers computer scientist?Answer: NoQuestion: How often do the information changes, daily, weekly, monthly?Answer: Daily, weekly If your answers to the questions matches at least 2, we strongly recommend you design your content with following principles: Divide your pages in to logical sections, where each section is marked with its purpose Assign capabilities to each section, does it contain text, images, formatting and/or is it static and is populated through other contextual information Select editor/design element type WYSIWYG - Rich Text Plain Text - non-format text Image - Image object Static List - static list of formatted informationDynamic Data List - assembled information from multiple data files through CMIS query The result of such design map could look like following below examples: Based on the outcome of the required elements in the design column 3 from the left you will now simply design a data model in WebCenter Content - Site Studio by creating a Region Definition structure matching your design requirements.For more information on how to create a Region definition see following post: Region Definition Post - note see instruction 7 for details. Each region definition can now be used to instantiate data files, a data file will hold the actual data for each element in the region definition. Another way you can see this is to compare the region definition as an extension to the metadata model in WebCenter Content for each data file item. Design content templates With a solid dependable information model we can now proceed to template creation and page design, in this phase focuses on how to place the content sections from the region definition on the page via a Content Presenter template. Remember by creating content presenter templates you will leverage the latest and most integrated technology WebCenter has to offer. This phase is much easier since the you already have the information model and design wire-frames to base the logic on, however there is still few considerations to pay attention to: Base the template on ADF and make only necessary exceptions to markup when required Leverage ADF design components for Tabs, Accordions and other similar components, this way the design in the content published areas will comply with other design areas based on custom ADF taskflows There is no performance impact when using meta data or region definition based data All data access regardless of type, metadata or xml data it can be accessed via the Content Presenter - Node. See below for applied examples on how to access data Access metadata property from Document - #{node.propertyMap['myProp'].value}myProp in this example can be for instance (dDocName, dDocTitle, xComments or any other available metadata) Access element data from data file xml - #{node.propertyMap['[Region Definition Name]:[Element name]'].asTextHtml}Region Definition Name is the expect region definition that the current data file is instantiatingElement name is the element value you like to grab from the data file I recommend you read following useful post on content template topic:CMIS queries and template creation - note see instruction 9 for detailsStatic List template rendering For more information on templates:Single Item Content TemplateMulti Item Content TemplateExpression Language Internationalization Considerations When integrating content assets via content presenter you by now probably understand that the content item/data file is wired to the page, what is also pretty common at this stage is that the content item/data file only support one language since its not practical or business friendly to mix that into a complex structure. Therefore you will be left with a very common dilemma that you will have to either build a complete new portal for each locale, which is not an good option! However with little bit of information modeling and clear naming convention this can be addressed. Basically you can simply make sure that all content item/data file are named with a predictable naming convention like "Content1_EN" for the English rendition and "Content1_ES" for the Spanish rendition. This way through simple none complex customizations you will be able to dynamically switch the actual content item/data file just before rendering. By following proposed approach above you not only enable a simple mechanism for internationalized content you also preserve the functionality in the content presenter to support business accessible run-time publishing of information on existing and new pages. I recommend you read following useful post on Internationalization topics:Internationalize with Content Presenter Integrate with Review & Approval processes Today the Review and approval functionality and configuration is based out of WebCenter Content - Criteria Workflows. Criteria Workflows uses the metadata of the checked in document to evaluate if the document is under any review/approval process. So for instance if a Criteria Workflow is configured to force any documents with Version = "2" or "higher" and Content Type is "Instructions", any matching content item version on check in will now enter the workflow before getting released for general access. Few things to consider when configuring Criteria Workflows: Make sure to not trigger on version one for Content Items that are Data Files - if you trigger on version 1 you will not only approve an empty document you will also have a content presenter pointing to a none existing document - since the document will only be available after successful completion of the workflow Approval workflows sometimes requires more complex criteria, the recommendation if that is the case is that the meta data triggering such criteria is automatically populated, this can be achieved through many approaches including Content Profiles Criteria workflows are configured and managed in WebCenter Content Administration Applets where you can configure one or more workflows. When you configured Criteria workflows the Content Presenter will support the editors with the approval process directly inline in the "Contribution mode" of the portal. In addition to approve/reject and details of the task, the content presenter natively support the user to view the current and future version of the change he/she is approving. See below for example: Architectural recommendation To support review&approval processes - minimize the amount of data files per page Each CMIS query can consume significant time depending on the complexity of the query - minimize the amount of CMIS queries per page Use Content Presenter Templates based on ADF - this way you minimize the design considerations and optimize the usage of caching Implement the page in as few Data files as possible - simplifies publishing process, increases performance and simplifies release process Named data file (node) or list of named nodes when integrating to pages increases performance vs. querying for data Named data file (node) or list of named nodes when integrating to pages enables business centric page creation and publishing and reduces the need for IT department interaction Summary Just because one architectural decision solves a business problem it doesn't mean its the right one, when designing portals all architecture has to be in harmony and not impacting each other. For instance the most technical complex solution is not always the best since it will most likely defeat the business accessibility, performance or both, therefore the best approach is to first design for simplicity that even a non-technical user can operate, after that consider the performance impact and final look at the technology challenges these brings and workaround them first with out-of-the-box features, after that design and develop functions to complement the short comings.

Read the article
Fun with Aggregates

- by Paul White

There are interesting things to be learned from even the simplest queries. For example, imagine you are given the task of writing a query to list AdventureWorks product names where the product has at least one entry in the transaction history table, but fewer than ten. One possible query to meet that specification is: SELECT p.Name FROM Production.Product AS p JOIN Production.TransactionHistory AS th ON p.ProductID = th.ProductID GROUP BY p.ProductID, p.Name HAVING COUNT_BIG(*) < 10; That query correctly returns 23 rows (execution plan and data sample shown below): The execution plan looks a bit different from the written form of the query: the base tables are accessed in reverse order, and the aggregation is performed before the join. The general idea is to read all rows from the history table, compute the count of rows grouped by ProductID, merge join the results to the Product table on ProductID, and finally filter to only return rows where the count is less than ten. This ‘fully-optimized’ plan has an estimated cost of around 0.33 units. The reason for the quote marks there is that this plan is not quite as optimal as it could be – surely it would make sense to push the Filter down past the join too? To answer that, let’s look at some other ways to formulate this query. This being SQL, there are any number of ways to write logically-equivalent query specifications, so we’ll just look at a couple of interesting ones. The first query is an attempt to reverse-engineer T-SQL from the optimized query plan shown above. It joins the result of pre-aggregating the history table to the Product table before filtering: SELECT p.Name FROM ( SELECT th.ProductID, cnt = COUNT_BIG(*) FROM Production.TransactionHistory AS th GROUP BY th.ProductID ) AS q1 JOIN Production.Product AS p ON p.ProductID = q1.ProductID WHERE q1.cnt < 10; Perhaps a little surprisingly, we get a slightly different execution plan: The results are the same (23 rows) but this time the Filter is pushed below the join! The optimizer chooses nested loops for the join, because the cardinality estimate for rows passing the Filter is a bit low (estimate 1 versus 23 actual), though you can force a merge join with a hint and the Filter still appears below the join. In yet another variation, the < 10 predicate can be ‘manually pushed’ by specifying it in a HAVING clause in the “q1” sub-query instead of in the WHERE clause as written above. The reason this predicate can be pushed past the join in this query form, but not in the original formulation is simply an optimizer limitation – it does make efforts (primarily during the simplification phase) to encourage logically-equivalent query specifications to produce the same execution plan, but the implementation is not completely comprehensive. Moving on to a second example, the following query specification results from phrasing the requirement as “list the products where there exists fewer than ten correlated rows in the history table”: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) < 10 ); Unfortunately, this query produces an incorrect result (86 rows): The problem is that it lists products with no history rows, though the reasons are interesting. The COUNT_BIG(*) in the EXISTS clause is a scalar aggregate (meaning there is no GROUP BY clause) and scalar aggregates always produce a value, even when the input is an empty set. In the case of the COUNT aggregate, the result of aggregating the empty set is zero (the other standard aggregates produce a NULL). To make the point really clear, let’s look at product 709, which happens to be one for which no history rows exist: -- Scalar aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709; -- Vector aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709 GROUP BY th.ProductID; The estimated execution plans for these two statements are almost identical: You might expect the Stream Aggregate to have a Group By for the second statement, but this is not the case. The query includes an equality comparison to a constant value (709), so all qualified rows are guaranteed to have the same value for ProductID and the Group By is optimized away. In fact there are some minor differences between the two plans (the first is auto-parameterized and qualifies for trivial plan, whereas the second is not auto-parameterized and requires cost-based optimization), but there is nothing to indicate that one is a scalar aggregate and the other is a vector aggregate. This is something I would like to see exposed in show plan so I suggested it on Connect. Anyway, the results of running the two queries show the difference at runtime: The scalar aggregate (no GROUP BY) returns a result of zero, whereas the vector aggregate (with a GROUP BY clause) returns nothing at all. Returning to our EXISTS query, we could ‘fix’ it by changing the HAVING clause to reject rows where the scalar aggregate returns zero: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) BETWEEN 1 AND 9 ); The query now returns the correct 23 rows: Unfortunately, the execution plan is less efficient now – it has an estimated cost of 0.78 compared to 0.33 for the earlier plans. Let’s try adding a redundant GROUP BY instead of changing the HAVING clause: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY th.ProductID HAVING COUNT_BIG(*) < 10 ); Not only do we now get correct results (23 rows), this is the execution plan: I like to compare that plan to quantum physics: if you don’t find it shocking, you haven’t understood it properly :) The simple addition of a redundant GROUP BY has resulted in the EXISTS form of the query being transformed into exactly the same optimal plan we found earlier. What’s more, in SQL Server 2008 and later, we can replace the odd-looking GROUP BY with an explicit GROUP BY on the empty set: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ); I offer that as an alternative because some people find it more intuitive (and it perhaps has more geek value too). Whichever way you prefer, it’s rather satisfying to note that the result of the sub-query does not exist for a particular correlated value where a vector aggregate is used (the scalar COUNT aggregate always returns a value, even if zero, so it always ‘EXISTS’ regardless which ProductID is logically being evaluated). The following query forms also produce the optimal plan and correct results, so long as a vector aggregate is used (you can probably find more equivalent query forms): WHERE Clause SELECT p.Name FROM Production.Product AS p WHERE ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) < 10; APPLY SELECT p.Name FROM Production.Product AS p CROSS APPLY ( SELECT NULL FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ) AS ca (dummy); FROM Clause SELECT q1.Name FROM ( SELECT p.Name, cnt = ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) FROM Production.Product AS p ) AS q1 WHERE q1.cnt < 10; This last example uses SUM(1) instead of COUNT and does not require a vector aggregate…you should be able to work out why :) SELECT q.Name FROM ( SELECT p.Name, cnt = ( SELECT SUM(1) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID ) FROM Production.Product AS p ) AS q WHERE q.cnt < 10; The semantics of SQL aggregates are rather odd in places. It definitely pays to get to know the rules, and to be careful to check whether your queries are using scalar or vector aggregates. As we have seen, query plans do not show in which ‘mode’ an aggregate is running and getting it wrong can cause poor performance, wrong results, or both. © 2012 Paul White Twitter: @SQL_Kiwi email: [email protected]

Read the article
How to create a simple adf dashboard application with EJB 3.0

- by Rodrigues, Raphael

In this month's Oracle Magazine, Frank Nimphius wrote a very good article about an Oracle ADF Faces dashboard application to support persistent user personalization. You can read this entire article clicking here. The idea in this article is to extend the dashboard application. My idea here is to create a similar dashboard application, but instead ADF BC model layer, I'm intending to use EJB3.0. There are just a one small trick here and I'll show you. I'm using the HR usual oracle schema. The steps are: 1. Create a ADF Fusion Application with EJB as a layer model 2. Generate the entities from table (I'm using Department and Employees only) 3. Create a new Session Bean. I called it: HRSessionEJB 4. Create a new method like that: public List getAllDepartmentsHavingEmployees(){ JpaEntityManager jpaEntityManager = (JpaEntityManager)em.getDelegate(); Query query = jpaEntityManager.createNamedQuery("Departments.allDepartmentsHavingEmployees"); JavaBeanResult.setQueryResultClass(query, AggregatedDepartment.class); return query.getResultList(); } 5. In the Departments entity, create a new native query annotation: @Entity @NamedQueries( { @NamedQuery(name = "Departments.findAll", query = "select o from Departments o") }) @NamedNativeQueries({ @NamedNativeQuery(name="Departments.allDepartmentsHavingEmployees", query = "select e.department_id, d.department_name , sum(e.salary), avg(e.salary) , max(e.salary), min(e.salary) from departments d , employees e where d.department_id = e.department_id group by e.department_id, d.department_name")}) public class Departments implements Serializable {...} 6. Create a new POJO called AggregatedDepartment: package oramag.sample.dashboard.model; import java.io.Serializable; import java.math.BigDecimal; public class AggregatedDepartment implements Serializable{ @SuppressWarnings("compatibility:5167698678781240729") private static final long serialVersionUID = 1L; private BigDecimal departmentId; private String departmentName; private BigDecimal sum; private BigDecimal avg; private BigDecimal max; private BigDecimal min; public AggregatedDepartment() { super(); } public AggregatedDepartment(BigDecimal departmentId, String departmentName, BigDecimal sum, BigDecimal avg, BigDecimal max, BigDecimal min) { super(); this.departmentId = departmentId; this.departmentName = departmentName; this.sum = sum; this.avg = avg; this.max = max; this.min = min; } public void setDepartmentId(BigDecimal departmentId) { this.departmentId = departmentId; } public BigDecimal getDepartmentId() { return departmentId; } public void setDepartmentName(String departmentName) { this.departmentName = departmentName; } public String getDepartmentName() { return departmentName; } public void setSum(BigDecimal sum) { this.sum = sum; } public BigDecimal getSum() { return sum; } public void setAvg(BigDecimal avg) { this.avg = avg; } public BigDecimal getAvg() { return avg; } public void setMax(BigDecimal max) { this.max = max; } public BigDecimal getMax() { return max; } public void setMin(BigDecimal min) { this.min = min; } public BigDecimal getMin() { return min; } } 7. Create the util java class called JavaBeanResult. The function of this class is to configure a native SQL query to return POJOs in a single line of code using the utility class. Credits: http://onpersistence.blogspot.com.br/2010/07/eclipselink-jpa-native-constructor.html package oramag.sample.dashboard.model.util; /******************************************************************************* * Copyright (c) 2010 Oracle. All rights reserved. * This program and the accompanying materials are made available under the * terms of the Eclipse Public License v1.0 and Eclipse Distribution License v. 1.0 * which accompanies this distribution. * The Eclipse Public License is available at http://www.eclipse.org/legal/epl-v10.html * and the Eclipse Distribution License is available at * http://www.eclipse.org/org/documents/edl-v10.php. * * @author shsmith ******************************************************************************/ import java.lang.reflect.Constructor; import java.lang.reflect.InvocationTargetException; import java.util.ArrayList; import java.util.List; import javax.persistence.Query; import org.eclipse.persistence.exceptions.ConversionException; import org.eclipse.persistence.internal.helper.ConversionManager; import org.eclipse.persistence.internal.sessions.AbstractRecord; import org.eclipse.persistence.internal.sessions.AbstractSession; import org.eclipse.persistence.jpa.JpaHelper; import org.eclipse.persistence.queries.DatabaseQuery; import org.eclipse.persistence.queries.QueryRedirector; import org.eclipse.persistence.sessions.Record; import org.eclipse.persistence.sessions.Session; /*** * This class is a simple query redirector that intercepts the result of a * native query and builds an instance of the specified JavaBean class from each * result row. The order of the selected columns musts match the JavaBean class * constructor arguments order. * * To configure a JavaBeanResult on a native SQL query use: * JavaBeanResult.setQueryResultClass(query, SomeBeanClass.class); * where query is either a JPA SQL Query or native EclipseLink DatabaseQuery. * * @author shsmith * */ public final class JavaBeanResult implements QueryRedirector { private static final long serialVersionUID = 3025874987115503731L; protected Class resultClass; public static void setQueryResultClass(Query query, Class resultClass) { JavaBeanResult javaBeanResult = new JavaBeanResult(resultClass); DatabaseQuery databaseQuery = JpaHelper.getDatabaseQuery(query); databaseQuery.setRedirector(javaBeanResult); } public static void setQueryResultClass(DatabaseQuery query, Class resultClass) { JavaBeanResult javaBeanResult = new JavaBeanResult(resultClass); query.setRedirector(javaBeanResult); } protected JavaBeanResult(Class resultClass) { this.resultClass = resultClass; } @SuppressWarnings("unchecked") public Object invokeQuery(DatabaseQuery query, Record arguments, Session session) { List results = new ArrayList(); try { Constructor[] constructors = resultClass.getDeclaredConstructors(); Constructor javaBeanClassConstructor = null; // (Constructor) resultClass.getDeclaredConstructors()[0]; Class[] constructorParameterTypes = null; // javaBeanClassConstructor.getParameterTypes(); List rows = (List) query.execute( (AbstractSession) session, (AbstractRecord) arguments); for (Object[] columns : rows) { boolean found = false; for (Constructor constructor : constructors) { javaBeanClassConstructor = constructor; constructorParameterTypes = javaBeanClassConstructor.getParameterTypes(); if (columns.length == constructorParameterTypes.length) { found = true; break; } // if (columns.length != constructorParameterTypes.length) { // throw new ColumnParameterNumberMismatchException( // resultClass); // } } if (!found) throw new ColumnParameterNumberMismatchException( resultClass); Object[] constructorArgs = new Object[constructorParameterTypes.length]; for (int j = 0; j < columns.length; j++) { Object columnValue = columns[j]; Class parameterType = constructorParameterTypes[j]; // convert the column value to the correct type--if possible constructorArgs[j] = ConversionManager.getDefaultManager() .convertObject(columnValue, parameterType); } results.add(javaBeanClassConstructor.newInstance(constructorArgs)); } } catch (ConversionException e) { throw new ColumnParameterMismatchException(e); } catch (IllegalArgumentException e) { throw new ColumnParameterMismatchException(e); } catch (InstantiationException e) { throw new ColumnParameterMismatchException(e); } catch (IllegalAccessException e) { throw new ColumnParameterMismatchException(e); } catch (InvocationTargetException e) { throw new ColumnParameterMismatchException(e); } return results; } public final class ColumnParameterMismatchException extends RuntimeException { private static final long serialVersionUID = 4752000720859502868L; public ColumnParameterMismatchException(Throwable t) { super( "Exception while processing query results-ensure column order matches constructor parameter order", t); } } public final class ColumnParameterNumberMismatchException extends RuntimeException { private static final long serialVersionUID = 1776794744797667755L; public ColumnParameterNumberMismatchException(Class clazz) { super( "Number of selected columns does not match number of constructor arguments for: " + clazz.getName()); } } } 8. Create the DataControl and a jsf or jspx page 9. Drag allDepartmentsHavingEmployees from DataControl and drop in your page 10. Choose Graph > Type: Bar (Normal) > any layout 11. In the wizard screen, Bars label, adds: sum, avg, max, min. In the X Axis label, adds: departmentName, and click in OK button 12. Run the page, the result is showed below: You can download the workspace here . It was using the latest jdeveloper version 11.1.2.2.

Read the article
SQL SERVER – SSMS: Disk Usage Report

- by Pinal Dave

Let us start with humor! I think we the series on various reports, we come to a logical point. We covered all the reports at server level. This means the reports we saw were targeted towards activities that are related to instance level operations. These are mostly like how a doctor diagnoses a patient. At this point I am reminded of a dialog which I read somewhere: Patient: Doc, It hurts when I touch my head. Doc: Ok, go on. What else have you experienced? Patient: It hurts even when I touch my eye, it hurts when I touch my arms, it even hurts when I touch my feet, etc. Doc: Hmmm … Patient: I feel it hurts when I touch anywhere in my body. Doc: Ahh … now I get it. You need a plaster to your finger John. Sometimes the server level gives an indicator to what is happening in the system, but we need to get to the root cause for a specific database. So, this is the first blog in series where we would start discussing about database level reports. To launch database level reports, expand selected server in Object Explorer, expand the Databases folder, and then right-click any database for which we want to look at reports. From the menu, select Reports, then Standard Reports, and then any of database level reports. In this blog, we would talk about four “disk” reports because they are similar: Disk Usage Disk Usage by Top Tables Disk Usage by Table Disk Usage by Partition Disk Usage This report shows multiple information about the database. Let us discuss them one by one. We have divided the output into 5 different sections. Section 1 shows the high level summary of the database. It shows the space used by database files (mdf and ldf). Under the hood, the report uses, various DMVs and DBCC Commands, it is using sys.data_spaces and DBCC SHOWFILESTATS. Section 2 and 3 are pie charts. One for data file allocation and another for the transaction log file. Pie chart for “Data Files Space Usage (%)” shows space consumed data, indexes, allocated to the SQL Server database, and unallocated space which is allocated to the SQL Server database but not yet filled with anything. “Transaction Log Space Usage (%)” used DBCC SQLPERF (LOGSPACE) and shows how much empty space we have in the physical transaction log file. Section 4 shows the data from Default Trace and looks at Event IDs 92, 93, 94, 95 which are for “Data File Auto Grow”, “Log File Auto Grow”, “Data File Auto Shrink” and “Log File Auto Shrink” respectively. Here is an expanded view for that section. If default trace is not enabled, then this section would be replaced by the message “Trace Log is disabled” as highlighted below. Section 5 of the report uses DBCC SHOWFILESTATS to get information. Here is the enhanced version of that section. This shows the physical layout of the file. In case you have In-Memory Objects in the database (from SQL Server 2014), then report would show information about those as well. Here is the screenshot taken for a different database, which has In-Memory table. I have highlighted new things which are only shown for in-memory database. The new sections which are highlighted above are using sys.dm_db_xtp_checkpoint_files, sys.database_files and sys.data_spaces. The new type for in-memory OLTP is ‘FX’ in sys.data_space. The next set of reports is targeted to get information about a table and its storage. These reports can answer questions like: Which is the biggest table in the database? How many rows we have in table? Is there any table which has a lot of reserved space but its unused? Which partition of the table is having more data? Disk Usage by Top Tables This report provides detailed data on the utilization of disk space by top 1000 tables within the Database. The report does not provide data for memory optimized tables. Disk Usage by Table This report is same as earlier report with few difference. First Report shows only 1000 rows First Report does order by values in DMV sys.dm_db_partition_stats whereas second one does it based on name of the table. Both of the reports have interactive sort facility. We can click on any column header and change the sorting order of data. Disk Usage by Partition This report shows the distribution of the data in table based on partition in the table. This is so similar to previous output with the partition details now. Here is the query taken from profiler. SELECT row_number() OVER (ORDER BY a1.used_page_count DESC, a1.index_id) AS row_number , (dense_rank() OVER (ORDER BY a5.name, a2.name))%2 AS l1 , a1.OBJECT_ID , a5.name AS [schema] , a2.name , a1.index_id , a3.name AS index_name , a3.type_desc , a1.partition_number , a1.used_page_count * 8 AS total_used_pages , a1.reserved_page_count * 8 AS total_reserved_pages , a1.row_count FROM sys.dm_db_partition_stats a1 INNER JOIN sys.all_objects a2 ON ( a1.OBJECT_ID = a2.OBJECT_ID) AND a1.OBJECT_ID NOT IN (SELECT OBJECT_ID FROM sys.tables WHERE is_memory_optimized = 1) INNER JOIN sys.schemas a5 ON (a5.schema_id = a2.schema_id) LEFT OUTER JOIN sys.indexes a3 ON ( (a1.OBJECT_ID = a3.OBJECT_ID) AND (a1.index_id = a3.index_id) ) WHERE (SELECT MAX(DISTINCT partition_number) FROM sys.dm_db_partition_stats a4 WHERE (a4.OBJECT_ID = a1.OBJECT_ID)) >= 1 AND a2.TYPE <> N'S' AND a2.TYPE <> N'IT' ORDER BY a5.name ASC, a2.name ASC, a1.index_id, a1.used_page_count DESC, a1.partition_number Using all of the above reports, you should be able to get the usage of database files and also space used by tables. I think this is too much disk information for a single blog and I hope you have used them in the past to get data. Do let me know if you found anything interesting using these reports in your environments. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, T SQL Tagged: SQL Reports

Read the article
Building KPIs to monitor your business Its not really about the Technology

When I have discussions with people about Business Intelligence, one of the questions the inevitably come up is about building KPIs and how to accomplish that. From a technical level the concept of a KPI is very simple, almost too simple in that it is like the tip of an iceberg floating above the water. The key to that iceberg is not really the tip, but the mass of the iceberg that is hidden beneath the surface upon which the tip sits. The analogy of the iceberg is not meant to indicate that the foundation of the KPI is overly difficult or complex. The disparity in size in meant to indicate that the larger thing that needs to be defined is not the technical tip, but the underlying business definition of what the KPI means. From a technical perspective the KPI consists of primarily the following items: Actual Value This is the actual value data point that is being measured. An example would be something like the amount of sales. Target Value This is the target goal for the KPI. This is a number that can be measured against Actual Value. An example would be $10,000 in monthly sales. Target Indicator Range This is the definition of ranges that define what type of indicator the user will see comparing the Actual Value to the Target Value. Most often this is defined by stoplight, but can be any indicator that is going to show a status in a quick fashion to the user. Typically this would be something like: Red Light = Actual Value more than 5% below target; Yellow Light = Within 5% of target either direction; Green Light = More than 5% higher than Target Value Status\Trend Indicator This is an optional attribute of a KPI that is typically used to show some kind of trend. The vast majority of these indicators are used to show some type of progress against a previous period. As an example, the status indicator might be used to show how the monthly sales compare to last month. With this type of indicator there needs to be not only a definition of what the ranges are for your status indictor, but then also what value the number needs to be compared against. So now we have an idea of what data points a KPI consists of from a technical perspective lets talk a bit about tools. As you can see technically there is not a whole lot to them and the choice of technology is not as important as the definition of the KPIs, which we will get to in a minute. There are many different types of tools in the Microsoft BI stack that you can use to expose your KPI to the business. These include Performance Point, SharePoint, Excel, and SQL Reporting Services. There are pluses and minuses to each technology and the right technology is based a lot on your goals and how you want to deliver the information to the users. Additionally, there are other non-Microsoft tools that can be used to expose KPI indicators to your business users. Regardless of the technology used as your front end, the heavy lifting of KPI is in the business definition of the values and benchmarks for that KPI. The discussion about KPIs is very dependent on the history of an organization and how much they are exposed to the attributes of a KPI. Often times when discussing KPIs with a business contact who has not been exposed to KPIs the discussion tends to also be a session educating the business user about what a KPI is and what goes into the definition of a KPI. The majority of times the business user has an idea of what their actual values are and they have been tracking those numbers for some time, generally in Excel and all manually. So they will know the amount of sales last month along with sales two years ago in the same month. Where the conversation tends to get stuck is when you start discussing what the target value should be. The actual value is answering the What and How much questions. When you are talking about the Target values you are asking the question Is this number good or bad. Typically, the user will know whether or not the value is good or bad, but most of the time they are not able to quantify what is good or bad. Their response is usually something like I just know. Because they have been watching the sales quantity for years now, they can tell you that a 5% decrease in sales this month might actually be a good thing, maybe because the salespeople are all waiting until next month when the new versions come out. It can sometimes be very hard to break the business people of this habit. One of the fears generally is that the status indicator is not subjective. Thus, in the scenario above, the business user is going to be fearful that their boss, just looking at a negative red indicator, is going to haul them out to the woodshed for a bad month. But, on the flip side, if all you are displaying is the amount of sales, only a person with knowledge of last month sales and the target amount for this month would have any idea if $10,000 in sales is good or not. Here is where a key point about KPIs needs to be communicated to both the business user and any user who might be viewing the results of that KPI. The KPI is just one tool that is used to report on business performance. The KPI is meant as a quick indicator of one business statistic. It is not meant to tell the entire story. It does not answer the question Why. Its primary purpose is to objectively and quickly expose an area of the business that might warrant more review. There is always going to be the need to do further analysis on any potential negative or neutral KPI. So, hopefully, once you have convinced your business user to come up with some target numbers and ranges for status indicators, you then need to take the next step and help them answer the Why question. The main question here to ask is, Okay, you see the indicator and you need to discover why the number is what is, where do you go?. The answer is usually a combination of sources. A sales manager might have some of the following items at their disposal (Marketing report showing a decrease in the promotional discounts for the month, Pricing Report showing the reduction of prices of older models, an Inventory Report showing the discontinuation of a particular product line, or a memo showing the ending of a large affiliate partnership. The answers to the question Why are never as simple as a single indicator value. Bring able to quickly get to this information is all about designing how a user accesses the KPIs and then also how easily they can get to the additional information they need. This is where a Dashboard mentality can come in handy. For example, the business user can have a dashboard that shows their KPIs, but also has links to some of the common reports that they run regarding Sales Data. The users boss may have the same KPIs on their dashboard, but instead of links to individual reports they are going to have a link to a status report that was created by the user that pulls together all the data about the KPI in a summary format the users boss can review. So some of the key things to think about when building or evaluating KPIs for your organization: Technology should not be the driving factor KPIs are of little value without some indicator for whether a value is good, bad or neutral. KPIs only give an answer to the Is this number good\bad? question Make sure the ability to drill into the Why of a KPI is close at hand and relevant to the user who is viewing the KPI. The KPI is a key business tool when defined properly to help monitor business performance across the enterprise in an objective and consistent manner. At times it might feel like the process of defining the business aspects of a KPI can sometimes be arduous, the payoff in the end can far outweigh the costs. Some of the benefits of going through this process are a better understanding of the key metrics for an organization and the measure of those metrics and a consistent snapshot of business performance that can be utilized across the organization. And I think that these are benefits to any organization regardless of the technology or the implementation.Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

Read the article
New Feature in ODI 11.1.1.6: ODI for Big Data

- by Julien Testut

Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";} By Ananth Tirupattur Starting with Oracle Data Integrator 11.1.1.6.0, ODI is offering a solution to process Big Data. This post provides an overview of this feature. With all the buzz around Big Data and before getting into the details of ODI for Big Data, I will provide a brief introduction to Big Data and Oracle Solution for Big Data. So, what is Big Data? Big data includes: structured data (this includes data from relation data stores, xml data stores), semi-structured data (this includes data from weblogs) unstructured data (this includes data from text blob, images) Traditionally, business decisions are based on the information gathered from transactional data. For example, transactional Data from CRM applications is fed to a decision system for analysis and decision making. Products such as ODI play a key role in enabling decision systems. However, with the emergence of massive amounts of semi-structured and unstructured data it is important for decision system to include them in the analysis to achieve better decision making capability. While there is an abundance of opportunities for business for gaining competitive advantages, process of Big Data has challenges. The challenges of processing Big Data include: Volume of data Velocity of data - The high Rate at which data is generated Variety of data In order to address these challenges and convert them into opportunities, we would need an appropriate framework, platform and the right set of tools. Hadoop is an open source framework which is highly scalable, fault tolerant system, for storage and processing large amounts of data. Hadoop provides 2 key services, distributed and reliable storage called Hadoop Distributed File System or HDFS and a framework for parallel data processing called Map-Reduce. Innovations in Hadoop and its related technology continue to rapidly evolve, hence therefore, it is highly recommended to follow information on the web to keep up with latest information. Oracle's vision is to provide a comprehensive solution to address the challenges faced by Big Data. Oracle is providing the necessary Hardware, software and tools for processing Big Data Oracle solution includes: Big Data Appliance Oracle NoSQL Database Cloudera distribution for Hadoop Oracle R Enterprise- R is a statistical package which is very popular among data scientists. ODI solution for Big Data Oracle Loader for Hadoop for loading data from Hadoop to Oracle. Further details can be found here: http://www.oracle.com/us/products/database/big-data-appliance/overview/index.html ODI Solution for Big Data: ODI’s goal is to minimize the need to understand the complexity of Hadoop framework and simplify the adoption of processing Big Data seamlessly in an enterprise. ODI is providing the capabilities for an integrated architecture for processing Big Data. This includes capability to load data in to Hadoop, process data in Hadoop and load data from Hadoop into Oracle. ODI is expanding its support for Big Data by providing the following out of the box Knowledge Modules (KMs). IKM File to Hive (LOAD DATA).Load unstructured data from File (Local file system or HDFS ) into Hive IKM Hive Control AppendTransform and validate structured data on Hive IKM Hive TransformTransform unstructured data on Hive IKM File/Hive to Oracle (OLH)Load processed data in Hive to Oracle RKM HiveReverse engineer Hive tables to generate models Using the Loading KM you can map files (local and HDFS files) to the corresponding Hive tables. For example, you can map weblog files categorized by date into a corresponding partitioned Hive table schema. Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";} Using the Hive control Append KM you can validate and transform data in Hive. In the below example, two source Hive tables are joined and mapped to a target Hive table. Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";} The Hive Transform KM facilitates processing of semi-structured data in Hive. In the below example, the data from weblog is processed using a Perl script and mapped to target Hive table. Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";} Using the Oracle Loader for Hadoop (OLH) KM you can load data from Hive table or HDFS to a corresponding table in Oracle. OLH is available as a standalone product. ODI greatly enhances OLH capability by generating the configuration and mapping files for OLH based on the configuration provided in the interface and KM options. ODI seamlessly invokes OLH when executing the scenario. In the below example, a HDFS file is mapped to a table in Oracle. Development and Deployment:The following diagram illustrates the development and deployment of ODI solution for Big Data. Using the ODI Studio on your development machine create and develop ODI solution for processing Big Data by connecting to a MySQL DB or Oracle database on a BDA machine or Hadoop cluster. Schedule the ODI scenarios to be executed on the ODI agent deployed on the BDA machine or Hadoop cluster. ODI Solution for Big Data provides several exciting new capabilities to facilitate the adoption of Big Data in an enterprise. You can find more information about the Oracle Big Data connectors on OTN. You can find an overview of all the new features introduced in ODI 11.1.1.6 in the following document: ODI 11.1.1.6 New Features Overview

Read the article
Silverlight for Everyone!!

- by subodhnpushpak

Someone asked me to compare Silverlight / HTML development. I realized that the question can be answered in many ways: Below is the high level comparison between a HTML /JavaScript client and Silverlight client and why silverlight was chosen over HTML / JavaScript client (based on type of users and major functionalities provided): 1. For end users Browser compatibility Silverlight is a plug-in and requires installation first. However, it does provides consistent look and feel across all browsers. For HTML / DHTML, there is a need to tweak JavaScript for each of the browser supported. In fact, tags like <span> and <div> works differently on different browser / version. So, HTML works on most of the systems but also requires lot of efforts coding-wise to adhere to all standards/ browsers / versions. Out of browser support No support in HTML. Third party tools like Google gears offers some functionalities but there are lots of issues around platform and accessibility. Out of box support for out-of-browser support. provides features like drag and drop onto application surface. Cut and copy paste in HTML HTML is displayed in browser; which, in turn provides facilities for cut copy and paste. Silverlight (specially 4) provides rich features for cut-copy-paste along with full control over what can be cut copy pasted by end users and .advanced features like visual tree printing. Rich user experience HTML can provide some rich experience by use of some JavaScript libraries like JQuery. However, extensive use of JavaScript combined with various versions of browsers and the supported JavaScript makes the solution cumbersome. Silverlight is meant for RIA experience. User data storage on client end In HTML only small amount of data can be stored that too in cookies. In Silverlight large data may be stored, that too in secure way. This increases the response time. Post back In HTML / JavaScript the post back can be stopped by use of AJAX. Extensive use of AJAX can be a bottleneck as browser stack is used for the calls. Both look and feel and data travel over network. In Silverlight everything run the client side. Calls are made to server ONLY for data; which also reduces network traffic in long run. 2. For Developers Coding effort HTML / JavaScript can take considerable amount to code if features (requirements) are rich. For AJAX like interfaces; knowledge of third party kits like DOJO / Yahoo UI / JQuery is required which has steep learning curve. ASP .Net coding world revolves mostly along <table> tags for alignments whereas most popular tools provide <div> tags; which requires lots of tweaking. AJAX calls can be a bottlenecks for performance, if the calls are many. In Silverlight; coding is in C#, which is managed code. XAML is also very intuitive and Blend can be used to provide look and feel. Event handling is much clean than in JavaScript. Provides for many clean patterns like MVVM and composable application. Each call to server is asynchronous in silverlight. AJAX is in built into silverlight. Threading can be done at the client side itself to provide for better responsiveness; etc. Debugging Debugging in HTML / JavaScript is difficult. As JavaScript is interpreted; there is NO compile time error handling. Debugging in Silverlight is very helpful. As it is compiled; it provides rich features for both compile time and run time error handling. Multi -targeting browsers HTML / JavaScript have different rendering behaviours in different browsers / and their versions. JavaScript have to be written to sublime the differences in browser behaviours. Silverlight works exactly the same in all browsers and works on almost all popular browser. Multi-targeting desktop No support in HTML / JavaScript Silverlight is very close to WPF. Bot the platform may be easily targeted while maintaining the same source code. Rich toolkit HTML /JavaScript have limited toolkit as controls Silverlight provides a rich set of controls including graphs, audio, video, layout, etc. 3. For Architects Design Patterns Silverlight provides for patterns like MVVM (MVC) and rich (fat) client architecture. This segregates the "separation of concern" very clearly. Client (silverlight) does what it is expected to do and server does what it is expected of. In HTML / JavaScript world most of the processing is done on the server side. Extensibility Silverlight provides great deal of extensibility as custom controls may be made. Extensibility is NOT restricted by browser but by the plug-in silverlight runs in. HTML / JavaScript works in a certain way and extensibility is generally done on the server side rather than client end. Client side is restricted by the limitations of the browser. Performance Silverlight provides localized storage which may be used for cached data storage. this reduces the response time. As processing can be done on client side itself; there is no need for server round trips. this decreases the round about time. Look and feel of the application is downloaded ONLY initially, afterwards ONLY data is fetched form the server. Security Silverlight is compiled code downloaded as .XAP; As compared to HTML / JavaScript, it provides more secure sandboxed approach. Cross - scripting is inherently prohibited in silverlight by default. If proper guidelines are followed silverlight provides much robust security mechanism as against HTML / JavaScript world. For example; knowing server Address in obfuscated JavaScript is easier than a compressed compiled obfuscated silverlight .XAP file. Some of these like (offline and Canvas support) will be available in HTML5. However, the timelines are not encouraging at all. According to Ian Hickson, editor of the HTML5 specification, the specification to reach the W3C Candidate Recommendation stage during 2012, and W3C Recommendation in the year 2022 or later. see http://en.wikipedia.org/wiki/HTML5 for details. The above is MY opinion. I will love to hear yours; do let me know via comments. Technorati Tags: Silverlight

Read the article
SQL SERVER – Faster SQL Server Databases and Applications – Power and Control with SafePeak Caching Options

- by Pinal Dave

Update: This blog post is written based on the SafePeak, which is available for free download. Today, I’d like to examine more closely one of my preferred technologies for accelerating SQL Server databases, SafePeak. Safepeak’s software provides a variety of advanced data caching options, techniques and tools to accelerate the performance and scalability of SQL Server databases and applications. I’d like to look more closely at some of these options, as some of these capabilities could help you address lagging database and performance on your systems. To better understand the available options, it is best to start by understanding the difference between the usual “Basic Caching” vs. SafePeak’s “Dynamic Caching”. Basic Caching Basic Caching (or the stale and static cache) is an ability to put the results from a query into cache for a certain period of time. It is based on TTL, or Time-to-live, and is designed to stay in cache no matter what happens to the data. For example, although the actual data can be modified due to DML commands (update/insert/delete), the cache will still hold the same obsolete query data. Meaning that with the Basic Caching is really static / stale cache. As you can tell, this approach has its limitations. Dynamic Caching Dynamic Caching (or the non-stale cache) is an ability to put the results from a query into cache while maintaining the cache transaction awareness looking for possible data modifications. The modifications can come as a result of: DML commands (update/insert/delete), indirect modifications due to triggers on other tables, executions of stored procedures with internal DML commands complex cases of stored procedures with multiple levels of internal stored procedures logic. When data modification commands arrive, the caching system identifies the related cache items and evicts them from cache immediately. In the dynamic caching option the TTL setting still exists, although its importance is reduced, since the main factor for cache invalidation (or cache eviction) become the actual data updates commands. Now that we have a basic understanding of the differences between “basic” and “dynamic” caching, let’s dive in deeper. SafePeak: A comprehensive and versatile caching platform SafePeak comes with a wide range of caching options. Some of SafePeak’s caching options are automated, while others require manual configuration. Together they provide a complete solution for IT and Data managers to reach excellent performance acceleration and application scalability for a wide range of business cases and applications. Automated caching of SQL Queries: Fully/semi-automated caching of all “read” SQL queries, containing any types of data, including Blobs, XMLs, Texts as well as all other standard data types. SafePeak automatically analyzes the incoming queries, categorizes them into SQL Patterns, identifying directly and indirectly accessed tables, views, functions and stored procedures; Automated caching of Stored Procedures: Fully or semi-automated caching of all read” stored procedures, including procedures with complex sub-procedure logic as well as procedures with complex dynamic SQL code. All procedures are analyzed in advance by SafePeak’s Metadata-Learning process, their SQL schemas are parsed – resulting with a full understanding of the underlying code, objects dependencies (tables, views, functions, sub-procedures) enabling automated or semi-automated (manually review and activate by a mouse-click) cache activation, with full understanding of the transaction logic for cache real-time invalidation; Transaction aware cache: Automated cache awareness for SQL transactions (SQL and in-procs); Dynamic SQL Caching: Procedures with dynamic SQL are pre-parsed, enabling easy cache configuration, eliminating SQL Server load for parsing time and delivering high response time value even in most complicated use-cases; Fully Automated Caching: SQL Patterns (including SQL queries and stored procedures) that are categorized by SafePeak as “read and deterministic” are automatically activated for caching; Semi-Automated Caching: SQL Patterns categorized as “Read and Non deterministic” are patterns of SQL queries and stored procedures that contain reference to non-deterministic functions, like getdate(). Such SQL Patterns are reviewed by the SafePeak administrator and in usually most of them are activated manually for caching (point and click activation); Fully Dynamic Caching: Automated detection of all dependent tables in each SQL Pattern, with automated real-time eviction of the relevant cache items in the event of “write” commands (a DML or a stored procedure) to one of relevant tables. A default setting; Semi Dynamic Caching: A manual cache configuration option enabling reducing the sensitivity of specific SQL Patterns to “write” commands to certain tables/views. An optimization technique relevant for cases when the query data is either known to be static (like archive order details), or when the application sensitivity to fresh data is not critical and can be stale for short period of time (gaining better performance and reduced load); Scheduled Cache Eviction: A manual cache configuration option enabling scheduling SQL Pattern cache eviction based on certain time(s) during a day. A very useful optimization technique when (for example) certain SQL Patterns can be cached but are time sensitive. Example: “select customers that today is their birthday”, an SQL with getdate() function, which can and should be cached, but the data stays relevant only until 00:00 (midnight); Parsing Exceptions Management: Stored procedures that were not fully parsed by SafePeak (due to too complex dynamic SQL or unfamiliar syntax), are signed as “Dynamic Objects” with highest transaction safety settings (such as: Full global cache eviction, DDL Check = lock cache and check for schema changes, and more). The SafePeak solution points the user to the Dynamic Objects that are important for cache effectiveness, provides easy configuration interface, allowing you to improve cache hits and reduce cache global evictions. Usually this is the first configuration in a deployment; Overriding Settings of Stored Procedures: Override the settings of stored procedures (or other object types) for cache optimization. For example, in case a stored procedure SP1 has an “insert” into table T1, it will not be allowed to be cached. However, it is possible that T1 is just a “logging or instrumentation” table left by developers. By overriding the settings a user can allow caching of the problematic stored procedure; Advanced Cache Warm-Up: Creating an XML-based list of queries and stored procedure (with lists of parameters) for periodically automated pre-fetching and caching. An advanced tool allowing you to handle more rare but very performance sensitive queries pre-fetch them into cache allowing high performance for users’ data access; Configuration Driven by Deep SQL Analytics: All SQL queries are continuously logged and analyzed, providing users with deep SQL Analytics and Performance Monitoring. Reduce troubleshooting from days to minutes with database objects and SQL Patterns heat-map. The performance driven configuration helps you to focus on the most important settings that bring you the highest performance gains. Use of SafePeak SQL Analytics allows continuous performance monitoring and analysis, easy identification of bottlenecks of both real-time and historical data; Cloud Ready: Available for instant deployment on Amazon Web Services (AWS). As you can see, there are many options to configure SafePeak’s SQL Server database and application acceleration caching technology to best fit a lot of situations. If you’re not familiar with their technology, they offer free-trial software you can download that comes with a free “help session” to help get you started. You can access the free trial here. Also, SafePeak is available to use on Amazon Cloud. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
C#: LINQ vs foreach - Round 1.

- by James Michael Hare

So I was reading Peter Kellner's blog entry on Resharper 5.0 and its LINQ refactoring and thought that was very cool. But that raised a point I had always been curious about in my head -- which is a better choice: manual foreach loops or LINQ? The answer is not really clear-cut. There are two sides to any code cost arguments: performance and maintainability. The first of these is obvious and quantifiable. Given any two pieces of code that perform the same function, you can run them side-by-side and see which piece of code performs better. Unfortunately, this is not always a good measure. Well written assembly language outperforms well written C++ code, but you lose a lot in maintainability which creates a big techncial debt load that is hard to offset as the application ages. In contrast, higher level constructs make the code more brief and easier to understand, hence reducing technical cost. Now, obviously in this case we're not talking two separate languages, we're comparing doing something manually in the language versus using a higher-order set of IEnumerable extensions that are in the System.Linq library. Well, before we discuss any further, let's look at some sample code and the numbers. First, let's take a look at the for loop and the LINQ expression. This is just a simple find comparison: // find implemented via LINQ public static bool FindViaLinq(IEnumerable<int> list, int target) { return list.Any(item => item == target); } // find implemented via standard iteration public static bool FindViaIteration(IEnumerable<int> list, int target) { foreach (var i in list) { if (i == target) { return true; } } return false; } Okay, looking at this from a maintainability point of view, the Linq expression is definitely more concise (8 lines down to 1) and is very readable in intention. You don't have to actually analyze the behavior of the loop to determine what it's doing. So let's take a look at performance metrics from 100,000 iterations of these methods on a List<int> of varying sizes filled with random data. For this test, we fill a target array with 100,000 random integers and then run the exact same pseudo-random targets through both searches. List<T> On 100,000 Iterations Method Size Total (ms) Per Iteration (ms) % Slower Any 10 26 0.00046 30.00% Iteration 10 20 0.00023 - Any 100 116 0.00201 18.37% Iteration 100 98 0.00118 - Any 1000 1058 0.01853 16.78% Iteration 1000 906 0.01155 - Any 10,000 10,383 0.18189 17.41% Iteration 10,000 8843 0.11362 - Any 100,000 104,004 1.8297 18.27% Iteration 100,000 87,941 1.13163 - The LINQ expression is running about 17% slower for average size collections and worse for smaller collections. Presumably, this is due to the overhead of the state machine used to track the iterators for the yield returns in the LINQ expressions, which seems about right in a tight loop such as this. So what about other LINQ expressions? After all, Any() is one of the more trivial ones. I decided to try the TakeWhile() algorithm using a Count() to get the position stopped like the sample Pete was using in his blog that Resharper refactored for him into LINQ: // Linq form public static int GetTargetPosition1(IEnumerable<int> list, int target) { return list.TakeWhile(item => item != target).Count(); } // traditionally iterative form public static int GetTargetPosition2(IEnumerable<int> list, int target) { int count = 0; foreach (var i in list) { if(i == target) { break; } ++count; } return count; } Once again, the LINQ expression is much shorter, easier to read, and should be easier to maintain over time, reducing the cost of technical debt. So I ran these through the same test data: List<T> On 100,000 Iterations Method Size Total (ms) Per Iteration (ms) % Slower TakeWhile 10 41 0.00041 128% Iteration 10 18 0.00018 - TakeWhile 100 171 0.00171 88% Iteration 100 91 0.00091 - TakeWhile 1000 1604 0.01604 94% Iteration 1000 825 0.00825 - TakeWhile 10,000 15765 0.15765 92% Iteration 10,000 8204 0.08204 - TakeWhile 100,000 156950 1.5695 92% Iteration 100,000 81635 0.81635 - Wow! I expected some overhead due to the state machines iterators produce, but 90% slower? That seems a little heavy to me. So then I thought, well, what if TakeWhile() is not the right tool for the job? The problem is TakeWhile returns each item for processing using yield return, whereas our for-loop really doesn't care about the item beyond using it as a stop condition to evaluate. So what if that back and forth with the iterator state machine is the problem? Well, we can quickly create an (albeit ugly) lambda that uses the Any() along with a count in a closure (if a LINQ guru knows a better way PLEASE let me know!), after all , this is more consistent with what we're trying to do, we're trying to find the first occurence of an item and halt once we find it, we just happen to be counting on the way. This mostly matches Any(). // a new method that uses linq but evaluates the count in a closure. public static int TakeWhileViaLinq2(IEnumerable<int> list, int target) { int count = 0; list.Any(item => { if(item == target) { return true; } ++count; return false; }); return count; } Now how does this one compare? List<T> On 100,000 Iterations Method Size Total (ms) Per Iteration (ms) % Slower TakeWhile 10 41 0.00041 128% Any w/Closure 10 23 0.00023 28% Iteration 10 18 0.00018 - TakeWhile 100 171 0.00171 88% Any w/Closure 100 116 0.00116 27% Iteration 100 91 0.00091 - TakeWhile 1000 1604 0.01604 94% Any w/Closure 1000 1101 0.01101 33% Iteration 1000 825 0.00825 - TakeWhile 10,000 15765 0.15765 92% Any w/Closure 10,000 10802 0.10802 32% Iteration 10,000 8204 0.08204 - TakeWhile 100,000 156950 1.5695 92% Any w/Closure 100,000 108378 1.08378 33% Iteration 100,000 81635 0.81635 - Much better! It seems that the overhead of TakeAny() returning each item and updating the state in the state machine is drastically reduced by using Any() since Any() iterates forward until it finds the value we're looking for -- for the task we're attempting to do. So the lesson there is, make sure when you use a LINQ expression you're choosing the best expression for the job, because if you're doing more work than you really need, you'll have a slower algorithm. But this is true of any choice of algorithm or collection in general. Even with the Any() with the count in the closure it is still about 30% slower, but let's consider that angle carefully. For a list of 100,000 items, it was the difference between 1.01 ms and 0.82 ms roughly in a List<T>. That's really not that bad at all in the grand scheme of things. Even running at 90% slower with TakeWhile(), for the vast majority of my projects, an extra millisecond to save potential errors in the long term and improve maintainability is a small price to pay. And if your typical list is 1000 items or less we're talking only microseconds worth of difference. It's like they say: 90% of your performance bottlenecks are in 2% of your code, so over-optimizing almost never pays off. So personally, I'll take the LINQ expression wherever I can because they will be easier to read and maintain (thus reducing technical debt) and I can rely on Microsoft's development to have coded and unit tested those algorithm fully for me instead of relying on a developer to code the loop logic correctly. If something's 90% slower, yes, it's worth keeping in mind, but it's really not until you start get magnitudes-of-order slower (10x, 100x, 1000x) that alarm bells should really go off. And if I ever do need that last millisecond of performance? Well then I'll optimize JUST THAT problem spot. To me it's worth it for the readability, speed-to-market, and maintainability.

Read the article
Try a sample: Using the counter predicate for event sampling

- by extended_events

Extended Events offers a rich filtering mechanism, called predicates, that allows you to reduce the number of events you collect by specifying criteria that will be applied during event collection. (You can find more information about predicates in Using SQL Server 2008 Extended Events (by Jonathan Kehayias)) By evaluating predicates early in the event firing sequence we can reduce the performance impact of collecting events by stopping event collection when the criteria are not met. You can specify predicates on both event fields and on a special object called a predicate source. Predicate sources are similar to action in that they typically are related to some type of global information available from the server. You will find that many of the actions available in Extended Events have equivalent predicate sources, but actions and predicates sources are not the same thing. Applying predicates, whether on a field or predicate source, is very similar to what you are used to in T-SQL in terms of how they work; you pick some field/source and compare it to a value, for example, session_id = 52. There is one predicate source that merits special attention though, not just for its special use, but for how the order of predicate evaluation impacts the behavior you see. I’m referring to the counter predicate source. The counter predicate source gives you a way to sample a subset of events that otherwise meet the criteria of the predicate; for example you could collect every other event, or only every tenth event. Simple CountingThe counter predicate source works by creating an in memory counter that increments every time the predicate statement is evaluated. Here is a simple example with my favorite event, sql_statement_completed, that only collects the second statement that is run. (OK, that’s not much of a sample, but this is for demonstration purposes. Here is the session definition: CREATE EVENT SESSION counter_test ON SERVERADD EVENT sqlserver.sql_statement_completed (ACTION (sqlserver.sql_text) WHERE package0.counter = 2)ADD TARGET package0.ring_bufferWITH (MAX_DISPATCH_LATENCY = 1 SECONDS) You can find general information about the session DDL syntax in BOL and from Pedro’s post Introduction to Extended Events. The important part here is the WHERE statement that defines that I only what the event where package0.count = 2; in other words, only the second instance of the event. Notice that I need to provide the package name along with the predicate source. You don’t need to provide the package name if you’re using event fields, only for predicate sources. Let’s say I run the following test queries: -- Run three statements to test the sessionSELECT 'This is the first statement'GOSELECT 'This is the second statement'GOSELECT 'This is the third statement';GO Once you return the event data from the ring buffer and parse the XML (see my earlier post on reading event data) you should see something like this: event_name sql_text sql_statement_completed SELECT ‘This is the second statement’ You can see that only the second statement from the test was actually collected. (Feel free to try this yourself. Check out what happens if you remove the WHERE statement from your session. Go ahead, I’ll wait.) Percentage Sampling OK, so that wasn’t particularly interesting, but you can probably see that this could be interesting, for example, lets say I need a 25% sample of the statements executed on my server for some type of QA analysis, that might be more interesting than just the second statement. All comparisons of predicates are handled using an object called a predicate comparator; the simple comparisons such as equals, greater than, etc. are mapped to the common mathematical symbols you know and love (eg. = and >), but to do the less common comparisons you will need to use the predicate comparators directly. You would probably look to the MOD operation to do this type sampling; we would too, but we don’t call it MOD, we call it divides_by_uint64. This comparator evaluates whether one number is divisible by another with no remainder. The general syntax for using a predicate comparator is pred_comp(field, value), field is always first and value is always second. So lets take a look at how the session changes to answer our new question of 25% sampling: CREATE EVENT SESSION counter_test_25 ON SERVERADD EVENT sqlserver.sql_statement_completed (ACTION (sqlserver.sql_text) WHERE package0.divides_by_uint64(package0.counter,4))ADD TARGET package0.ring_bufferWITH (MAX_DISPATCH_LATENCY = 1 SECONDS)GO Here I’ve replaced the simple equivalency check with the divides_by_uint64 comparator to check if the counter is evenly divisible by 4, which gives us back every fourth record. I’ll leave it as an exercise for the reader to test this session. Why order matters I indicated at the start of this post that order matters when it comes to the counter predicate – it does. Like most other predicate systems, Extended Events evaluates the predicate statement from left to right; as soon as the predicate statement is proven false we abandon evaluation of the remainder of the statement. The counter predicate source is only incremented when it is evaluated so whether or not the counter is incremented will depend on where it is in the predicate statement and whether a previous criteria made the predicate false or not. Here is a generic example: Pred1: (WHERE statement_1 AND package0.counter = 2)Pred2: (WHERE package0.counter = 2 AND statement_1) Let’s say I cause a number of events as follows and examine what happens to the counter predicate source. Iteration Statement Pred1 Counter Pred2 Counter A Not statement_1 0 1 B statement_1 1 2 C Not statement_1 1 3 D statement_1 2 4 As you can see, in the case of Pred1, statement_1 is evaluated first, when it fails (A & C) predicate evaluation is stopped and the counter is not incremented. With Pred2 the counter is evaluated first, so it is incremented on every iteration of the event and the remaining parts of the predicate are then evaluated. In this example, Pred1 would return an event for D while Pred2 would return an event for B. But wait, there is an interesting side-effect here; consider Pred2 if I had run my statements in the following order: Not statement_1 Not statement_1 statement_1 statement_1 In this case I would never get an event back from the system because the point at which counter=2, the rest of the predicate evaluates as false so the event is not returned. If you’re using the counter target for sampling and you’re not getting the expected events, or any events, check the order of the predicate criteria. As a general rule I’d suggest that the counter criteria should be the last element of your predicate statement since that will assure that your sampling rate will apply to the set of event records defined by the rest of your predicate. Aside: I’m interested in hearing about uses for putting the counter predicate criteria earlier in the predicate statement. If you have one, post it in a comment to share with the class. - Mike Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
SQL SERVER – Weekly Series – Memory Lane – #052

- by Pinal Dave

Let us continue with the final episode of the Memory Lane Series. Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Set Server Level FILLFACTOR Using T-SQL Script Specifies a percentage that indicates how full the Database Engine should make the leaf level of each index page during index creation or alteration. fillfactor must be an integer value from 1 to 100. The default is 0. Limitation of Online Index Rebuld Operation Online operation means when online operations are happening in the database are in normal operational condition, the processes which are participating in online operations does not require exclusive access to the database. Get Permissions of My Username / Userlogin on Server / Database A few days ago, I was invited to one of the largest database company. I was asked to review database schema and propose changes to it. There was special username or user logic was created for me, so I can review their database. I was very much interested to know what kind of permissions I was assigned per server level and database level. I did not feel like asking Sr. DBA the question about permissions. Simple Example of WHILE Loop With CONTINUE and BREAK Keywords This question is one of those questions which is very simple and most of the users get it correct, however few users find it confusing for the first time. I have tried to explain the usage of simple WHILE loop in the first example. BREAK keyword will exit the stop the while loop and control is moved to the next statement after the while loop. CONTINUE keyword skips all the statement after its execution and control is sent to the first statement of while loop. Forced Parameterization and Simple Parameterization – T-SQL and SSMS When the PARAMETERIZATION option is set to FORCED, any literal value that appears in a SELECT, INSERT, UPDATE or DELETE statement is converted to a parameter during query compilation. When the PARAMETERIZATION database option is SET to SIMPLE, the SQL Server query optimizer may choose to parameterize the queries. 2008 Transaction and Local Variables – Swap Variables – Update All At Once Concept Summary : Transaction have no effect over memory variables. When UPDATE statement is applied over any table (physical or memory) all the updates are applied at one time together when the statement is committed. First of all I suggest that you read the article listed above about the effect of transaction on local variant. As seen there local variables are independent of any transaction effect. Simulate INNER JOIN using LEFT JOIN statement – Performance Analysis Just a day ago, while I was working with JOINs I find one interesting observation, which has prompted me to create following example. Before we continue further let me make very clear that INNER JOIN should be used where it cannot be used and simulating INNER JOIN using any other JOINs will degrade the performance. If there are scopes to convert any OUTER JOIN to INNER JOIN it should be done with priority. 2009 Introduction to Business Intelligence – Important Terms & Definitions Business intelligence (BI) is a broad category of application programs and technologies for gathering, storing, analyzing, and providing access to data from various data sources, thus providing enterprise users with reliable and timely information and analysis for improved decision making. Difference Between Candidate Keys and Primary Key Candidate Key – A Candidate Key can be any column or a combination of columns that can qualify as unique key in database. There can be multiple Candidate Keys in one table. Each Candidate Key can qualify as Primary Key. Primary Key – A Primary Key is a column or a combination of columns that uniquely identify a record. Only one Candidate Key can be Primary Key. 2010 Taking Multiple Backup of Database in Single Command – Mirrored Database Backup I recently had a very interesting experience. In one of my recent consultancy works, I was told by our client that they are going to take the backup of the database and will also a copy of it at the same time. I expressed that it was surely possible if they were going to use a mirror command. In addition, they told me that whenever they take two copies of the database, the size of the database, is always reduced. Now this was something not clear to me, I said it was not possible and so I asked them to show me the script. Corrupted Backup File and Unsuccessful Restore The CTO, who was also present at the location, got very upset with this situation. He then asked when the last successful restore test was done. As expected, the answer was NEVER.There were no successful restore tests done before. During that time, I was present and I could clearly see the stress, confusion, carelessness and anger around me. I did not appreciate the feeling and I was pretty sure that no one in there wanted the atmosphere like me. 2011 TRACEWRITE – Wait Type – Wait Related to Buffer and Resolution SQL Trace is a SQL Server database engine technology which monitors specific events generated when various actions occur in the database engine. When any event is fired it goes through various stages as well various routes. One of the routes is Trace I/O Provider, which sends data to its final destination either as a file or rowset. DATEDIFF – Accuracy of Various Dateparts If you want to have accuracy in seconds, you need to use a different approach. In the first example, the accurate method is to find the number of seconds first and then divide it by 60 to convert it in minutes. Dedicated Access Control for SQL Server Express Edition http://www.youtube.com/watch?v=1k00z82u4OI Book Signing at SQLPASS 2012 Who I Am And How I Got Here – True Story as Blog Post If there was a shortcut to success – I want to know. I learnt SQL Server hard way and I am still learning. There are so many things, I have to learn. There is not enough time to learn everything which we want to learn. I am constantly working on it every day. I welcome you to join my journey as well. Please join me in my journey to learn SQL Server – more the merrier. Vacation, Travel and Study – A New Concept Even those who have advanced degrees and went to college for years, or even decades, find studying hard. There is a difference between studying for a career and studying for a certification. At least to get a degree there is a variety of subjects, with labs, exams, and practice problems to make things more interesting. Order By Numeric Values Formatted as String We have a table which has a column containing alphanumeric data. The data always has first as an integer and later part as a string. The business need is to order the data based on the first part of the alphanumeric data which is an integer. Now the problem is that no matter how we use ORDER BY the result is not produced as expected. Let us understand this with an example. Resolving SQL Server Connection Errors – SQL in Sixty Seconds #030 – Video One of the most famous errors related to SQL Server is about connecting to SQL Server itself. Here is how it goes, most of the time developers have worked with SQL Server and knows pretty much every error which they face during development language. However, hardly they install fresh SQL Server. As the installation of the SQL Server is a rare occasion unless you are a DBA who is responsible for such an instance – the error faced during installations are pretty rare as well. http://www.youtube.com/watch?v=1k00z82u4OI Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
MySQL Cluster 7.3 Labs Release – Foreign Keys Are In!

- by Mat Keep

0 0 1 1097 6254 Homework 52 14 7337 14.0 Normal 0 false false false EN-US JA X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:Cambria; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin; mso-ansi-language:EN-US;} Summary (aka TL/DR): Support for Foreign Key constraints has been one of the most requested feature enhancements for MySQL Cluster. We are therefore extremely excited to announce that Foreign Keys are part of the first Labs Release of MySQL Cluster 7.3 – available for download, evaluation and feedback now! (Select the mysql-cluster-7.3-labs-June-2012 build) In this blog, I will attempt to discuss the design rationale, implementation, configuration and steps to get started in evaluating the first MySQL Cluster 7.3 Labs Release. Pace of Innovation It was only a couple of months ago that we announced the General Availability (GA) of MySQL Cluster 7.2, delivering 1 billion Queries per Minute, with 70x higher cross-shard JOIN performance, Memcached NoSQL key-value API and cross-data center replication. This release has been a huge hit, with downloads and deployments quickly reaching record levels. The announcement of the first MySQL Cluster 7.3 Early Access lab release at today's MySQL Innovation Day event demonstrates the continued pace in Cluster development, and provides an opportunity for the community to evaluate and feedback on new features they want to see. What’s the Plan for MySQL Cluster 7.3? Well, Foreign Keys, as you may have gathered by now (!), and this is the focus of this first Labs Release. As with MySQL Cluster 7.2, we plan to publish a series of preview releases for 7.3 that will incrementally add new candidate features for a final GA release (subject to usual safe harbor statement below*), including: - New NoSQL APIs; - Features to automate the configuration and provisioning of multi-node clusters, on premise or in the cloud; - Performance and scalability enhancements; - Taking advantage of features in the latest MySQL 5.x Server GA. Design Rationale MySQL Cluster is designed as a “Not-Only-SQL” database. It combines attributes that enable users to blend the best of both relational and NoSQL technologies into solutions that deliver web scalability with 99.999% availability and real-time performance, including: Concurrent NoSQL and SQL access to the database; Auto-sharding with simple scale-out across commodity hardware; Multi-master replication with failover and recovery both within and across data centers; Shared-nothing architecture with no single point of failure; Online scaling and schema changes; ACID compliance and support for complex queries, across shards. Native support for Foreign Key constraints enables users to extend the benefits of MySQL Cluster into a broader range of use-cases, including: - Packaged applications in areas such as eCommerce and Web Content Management that prescribe databases with Foreign Key support. - In-house developments benefiting from Foreign Key constraints to simplify data models and eliminate the additional application logic needed to maintain data consistency and integrity between tables. Implementation The Foreign Key functionality is implemented directly within MySQL Cluster’s data nodes, allowing any client API accessing the cluster to benefit from them – whether using SQL or one of the NoSQL interfaces (Memcached, C++, Java, JPA or HTTP/REST.) The core referential actions defined in the SQL:2003 standard are implemented: CASCADE RESTRICT NO ACTION SET NULL In addition, the MySQL Cluster implementation supports the online adding and dropping of Foreign Keys, ensuring the Cluster continues to serve both read and write requests during the operation. An important difference to note with the Foreign Key implementation in InnoDB is that MySQL Cluster does not support the updating of Primary Keys from within the Data Nodes themselves - instead the UPDATE is emulated with a DELETE followed by an INSERT operation. Therefore an UPDATE operation will return an error if the parent reference is using a Primary Key, unless using CASCADE action, in which case the delete operation will result in the corresponding rows in the child table being deleted. The Engineering team plans to change this behavior in a subsequent preview release. Also note that when using InnoDB "NO ACTION" is identical to "RESTRICT". In the case of MySQL Cluster “NO ACTION” means “deferred check”, i.e. the constraint is checked before commit, allowing user-defined triggers to automatically make changes in order to satisfy the Foreign Key constraints. Configuration There is nothing special you have to do here – Foreign Key constraint checking is enabled by default. If you intend to migrate existing tables from another database or storage engine, for example from InnoDB, there are a couple of best practices to observe: 1. Analyze the structure of the Foreign Key graph and run the ALTER TABLE ENGINE=NDB in the correct sequence to ensure constraints are enforced 2. Alternatively drop the Foreign Key constraints prior to the import process and then recreate when complete. Getting Started Read this blog for a demonstration of using Foreign Keys with MySQL Cluster. You can download MySQL Cluster 7.3 Labs Release with Foreign Keys today - (select the mysql-cluster-7.3-labs-June-2012 build) If you are new to MySQL Cluster, the Getting Started guide will walk you through installing an evaluation cluster on a singe host (these guides reflect MySQL Cluster 7.2, but apply equally well to 7.3) Post any questions to the MySQL Cluster forum where our Engineering team will attempt to assist you. Post any bugs you find to the MySQL bug tracking system (select MySQL Cluster from the Category drop-down menu) And if you have any feedback, please post them to the Comments section of this blog. Summary MySQL Cluster 7.2 is the GA, production-ready release of MySQL Cluster. This first Labs Release of MySQL Cluster 7.3 gives you the opportunity to preview and evaluate future developments in the MySQL Cluster database, and we are very excited to be able to share that with you. Let us know how you get along with MySQL Cluster 7.3, and other features that you want to see in future releases. * Safe Harbor Statement This information is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Read the article
Recap: Oracle Fusion Middleware Strategies Driving Business Innovation

- by Harish Gaur

Hasan Rizvi, Executive Vice President of Oracle Fusion Middleware & Java took the stage on Tuesday to discuss how Oracle Fusion Middleware helps enable business innovation. Through a series of product demos and customer showcases, Hassan demonstrated how Oracle Fusion Middleware is a complete platform to harness the latest technological innovations (cloud, mobile, social and Fast Data) throughout the application lifecycle. Fig 1: Oracle Fusion Middleware is the foundation of business innovation This Session included 4 demonstrations to illustrate these strategies: 1. Build and deploy native mobile applications using Oracle ADF Mobile 2. Empower business user to model processes, design user interface and have rich mobile experience for process interaction using Oracle BPM Suite PS6. 3. Create collaborative user experience and integrate social sign-on using Oracle WebCenter Portal, Oracle WebCenter Content, Oracle Social Network & Oracle Identity Management 11g R2 4. Deploy and manage business applications on Oracle Exalogic Nike, LA Department of Water & Power and Nintendo joined Hasan on stage to share how their organizations are leveraging Oracle Fusion Middleware to enable business innovation. Managing Performance in the Wrld of Social and Mobile How do you provide predictable scalability and performance for an application that monitors active lifestyle of 8 million users on a daily basis? Nike’s answer is Oracle Coherence, a component of Oracle Fusion Middleware and Oracle Exadata. Fig 2: Oracle Coherence enabled data grid improves performance of Nike+ Digital Sports Platform Nicole Otto, Sr. Director of Consumer Digital Technology discussed the vision of the Nike+ platform, a platform which represents a shift for NIKE from a "product" to a "product +" experience. There are currently nearly 8 million users in the Nike+ system who are using digitally-enabled Nike+ devices. Once data from the Nike+ device is transmitted to Nike+ application, users access the Nike+ website or via the Nike mobile applicatoin, seeing metrics around their daily active lifestyle and even engage in socially compelling experiences to compare, compete or collaborate their data with their friends. Nike expects the number of users to grow significantly this year which will drive an explosion of data and potential new experiences. To deal with this challenge, Nike envisioned building a shared platform that would drive a consumer-centric model for the company. Nike built this new platform using Oracle Coherence and Oracle Exadata. Using Coherence, Nike built a data grid tier as a distributed cache, thereby provide low-latency access to most recent and relevant data to consumers. Nicole discussed how Nike+ Digital Sports Platform is unique in the way that it utilizes the Coherence Grid. Nike takes advantage of Coherence as a traditional cache using both cache-aside and cache-through patterns. This new tier has enabled Nike to create a horizontally scalable distributed event-driven processing architecture. Current data grid volume is approximately 150,000 request per minute with about 40 million objects at any given time on the grid. Improving Customer Experience Across Multiple Channels Customer experience is on top of every CIO's mind. Customer Experience needs to be consistent and secure across multiple devices consumers may use. This is the challenge Matt Lampe, CIO of Los Angeles Department of Water & Power (LADWP) was faced with. Despite being the largest utilities company in the country, LADWP had been relying on a 38 year old customer information system for serving its customers. Their prior system had been unable to keep up with growing customer demands. Last year, LADWP embarked on a journey to improve customer experience for 1.6million LA DWP customers using Oracle WebCenter platform. Figure 3: Multi channel & Multi lingual LADWP.com built using Oracle WebCenter & Oracle Identity Management platform Matt shed light on his efforts to drive customer self-service across 3 dimensions – new website, new IVR platform and new bill payment service. LADWP has built a new portal to increase customer self-service while reducing the transactions via IVR. LADWP's website is powered Oracle WebCenter Portal and is accessible by desktop and mobile devices. By leveraging Oracle WebCenter, LADWP eliminated the need to build, format, and maintain individual mobile applications or websites for different devices. Their entire content is managed using Oracle WebCenter Content and secured using Oracle Identity Management. This new portal automated their paper based processes to web based workflows for customers. This includes automation of Self Service implemented through My Account - like Bill Pay, Payment History, Bill History and Usage Analysis. LADWP's solution went live in April 2012. Matt indicated that LADWP's Self-Service Portal has greatly improved customer satisfaction. In a JD Power Associates website satisfaction survey, results indicate rankings have climbed by 25+ points, marking a remarkable increase in user experience. Bolstering Performance and Simplifying Manageability of Business Applications Ingvar Petursson, Senior Vice Preisdent of IT at Nintendo America joined Hasan on-stage to discuss their choice of Exalogic. Nintendo had significant new requirements coming their way for business systems, both internal and external, in the years to come, especially with new products like the WiiU on the horizon this holiday season. Nintendo needed a platform that could give them performance, availability and ease of management as they deploy business systems. Ingvar selected Engineered Systems for two reasons: 1. High performance 2. Ease of management Figure 4: Nintendo relies on Oracle Exalogic to run ATG eCommerce, Oracle e-Business Suite and several business applications Nintendo made a decision to run their business applications (ATG eCommerce, E-Business Suite) and several Fusion Middleware components on the Exalogic platform. What impressed Ingvar was the "stress” testing results during evaluation. Oracle Exalogic could handle their 3-year load estimates for many functions, which was better than Nintendo expected without any hardware expansion. Faster Processing of Big Data Middleware plays an increasingly important role in Big Data. Last year, we announced at OpenWorld the introduction of Oracle Data Integrator for Hadoop and Oracle Loader for Hadoop which helps in the ability to move, transform, load data to and from Big Data Appliance to Exadata. This year, we’ve added new capabilities to find, filter, and focus data using Oracle Event Processing. This product can natively integrate with Big Data Appliance or runs standalone. Hasan briefly discussed how NTT Docomo, largest mobile operator in Japan, leverages Oracle Event Processing & Oracle Coherence to process mobile data (from 13 million smartphone users) at a speed of 700K events per second before feeding it Hadoop for distributed processing of big data. Figure 5: Mobile traffic data processing at NTT Docomo with Oracle Event Processing & Oracle Coherence

Read the article
Fraud Detection with the SQL Server Suite Part 2

- by Dejan Sarka

This is the second part of the fraud detection whitepaper. You can find the first part in my previous blog post about this topic. My Approach to Data Mining Projects It is impossible to evaluate the time and money needed for a complete fraud detection infrastructure in advance. Personally, I do not know the customer’s data in advance. I don’t know whether there is already an existing infrastructure, like a data warehouse, in place, or whether we would need to build one from scratch. Therefore, I always suggest to start with a proof-of-concept (POC) project. A POC takes something between 5 and 10 working days, and involves personnel from the customer’s site – either employees or outsourced consultants. The team should include a subject matter expert (SME) and at least one information technology (IT) expert. The SME must be familiar with both the domain in question as well as the meaning of data at hand, while the IT expert should be familiar with the structure of data, how to access it, and have some programming (preferably Transact-SQL) knowledge. With more than one IT expert the most time consuming work, namely data preparation and overview, can be completed sooner. I assume that the relevant data is already extracted and available at the very beginning of the POC project. If a customer wants to have their people involved in the project directly and requests the transfer of knowledge, the project begins with training. I strongly advise this approach as it offers the establishment of a common background for all people involved, the understanding of how the algorithms work and the understanding of how the results should be interpreted, a way of becoming familiar with the SQL Server suite, and more. Once the data has been extracted, the customer’s SME (i.e. the analyst), and the IT expert assigned to the project will learn how to prepare the data in an efficient manner. Together with me, knowledge and expertise allow us to focus immediately on the most interesting attributes and identify any additional, calculated, ones soon after. By employing our programming knowledge, we can, for example, prepare tens of derived variables, detect outliers, identify the relationships between pairs of input variables, and more, in only two or three days, depending on the quantity and the quality of input data. I favor the customer’s decision of assigning additional personnel to the project. For example, I actually prefer to work with two teams simultaneously. I demonstrate and explain the subject matter by applying techniques directly on the data managed by each team, and then both teams continue to work on the data overview and data preparation under our supervision. I explain to the teams what kind of results we expect, the reasons why they are needed, and how to achieve them. Afterwards we review and explain the results, and continue with new instructions, until we resolve all known problems. Simultaneously with the data preparation the data overview is performed. The logic behind this task is the same – again I show to the teams involved the expected results, how to achieve them and what they mean. This is also done in multiple cycles as is the case with data preparation, because, quite frankly, both tasks are completely interleaved. A specific objective of the data overview is of principal importance – it is represented by a simple star schema and a simple OLAP cube that will first of all simplify data discovery and interpretation of the results, and will also prove useful in the following tasks. The presence of the customer’s SME is the key to resolving possible issues with the actual meaning of the data. We can always replace the IT part of the team with another database developer; however, we cannot conduct this kind of a project without the customer’s SME. After the data preparation and when the data overview is available, we begin the scientific part of the project. I assist the team in developing a variety of models, and in interpreting the results. The results are presented graphically, in an intuitive way. While it is possible to interpret the results on the fly, a much more appropriate alternative is possible if the initial training was also performed, because it allows the customer’s personnel to interpret the results by themselves, with only some guidance from me. The models are evaluated immediately by using several different techniques. One of the techniques includes evaluation over time, where we use an OLAP cube. After evaluating the models, we select the most appropriate model to be deployed for a production test; this allows the team to understand the deployment process. There are many possibilities of deploying data mining models into production; at the POC stage, we select the one that can be completed quickly. Typically, this means that we add the mining model as an additional dimension to an existing DW or OLAP cube, or to the OLAP cube developed during the data overview phase. Finally, we spend some time presenting the results of the POC project to the stakeholders and managers. Even from a POC, the customer will receive lots of benefits, all at the sole risk of spending money and time for a single 5 to 10 day project: The customer learns the basic patterns of frauds and fraud detection The customer learns how to do the entire cycle with their own people, only relying on me for the most complex problems The customer’s analysts learn how to perform much more in-depth analyses than they ever thought possible The customer’s IT experts learn how to perform data extraction and preparation much more efficiently than they did before All of the attendees of this training learn how to use their own creativity to implement further improvements of the process and procedures, even after the solution has been deployed to production The POC output for a smaller company or for a subsidiary of a larger company can actually be considered a finished, production-ready solution It is possible to utilize the results of the POC project at subsidiary level, as a finished POC project for the entire enterprise Typically, the project results in several important “side effects” Improved data quality Improved employee job satisfaction, as they are able to proactively contribute to the central knowledge about fraud patterns in the organization Because eventually more minds get to be involved in the enterprise, the company should expect more and better fraud detection patterns After the POC project is completed as described above, the actual project would not need months of engagement from my side. This is possible due to our preference to transfer the knowledge onto the customer’s employees: typically, the customer will use the results of the POC project for some time, and only engage me again to complete the project, or to ask for additional expertise if the complexity of the problem increases significantly. I usually expect to perform the following tasks: Establish the final infrastructure to measure the efficiency of the deployed models Deploy the models in additional scenarios Through reports By including Data Mining Extensions (DMX) queries in OLTP applications to support real-time early warnings Include data mining models as dimensions in OLAP cubes, if this was not done already during the POC project Create smart ETL applications that divert suspicious data for immediate or later inspection I would also offer to investigate how the outcome could be transferred automatically to the central system; for instance, if the POC project was performed in a subsidiary whereas a central system is available as well Of course, for the actual project, I would repeat the data and model preparation as needed It is virtually impossible to tell in advance how much time the deployment would take, before we decide together with customer what exactly the deployment process should cover. Without considering the deployment part, and with the POC project conducted as suggested above (including the transfer of knowledge), the actual project should still only take additional 5 to 10 days. The approximate timeline for the POC project is, as follows: 1-2 days of training 2-3 days for data preparation and data overview 2 days for creating and evaluating the models 1 day for initial preparation of the continuous learning infrastructure 1 day for presentation of the results and discussion of further actions Quite frequently I receive the following question: are we going to find the best possible model during the POC project, or during the actual project? My answer is always quite simple: I do not know. Maybe, if we would spend just one hour more for data preparation, or create just one more model, we could get better patterns and predictions. However, we simply must stop somewhere, and the best possible way to do this, according to my experience, is to restrict the time spent on the project in advance, after an agreement with the customer. You must also never forget that, because we build the complete learning infrastructure and transfer the knowledge, the customer will be capable of doing further investigations independently and improve the models and predictions over time without the need for a constant engagement with me.

Read the article
Cost Comparison Hard Disk Drive to Solid State Drive on Price per Gigabyte - dispelling a myth!

- by tonyrogerson

It is often said that Hard Disk Drive storage is significantly cheaper per GiByte than Solid State Devices – this is wholly inaccurate within the database space. People need to look at the cost of the complete solution and not just a single component part in isolation to what is really required to meet the business requirement. Buying a single Hitachi Ultrastar 600GB 3.5” SAS 15Krpm hard disk drive will cost approximately £239.60 (http://scan.co.uk, 22nd March 2012) compared to an OCZ 600GB Z-Drive R4 CM84 PCIe costing £2,316.54 (http://scan.co.uk, 22nd March 2012); I’ve not included FusionIO ioDrive because there is no public pricing available for it – something I never understand and personally when companies do this I immediately think what are they hiding, luckily in FusionIO’s case the product is proven though is expensive compared to OCZ enterprise offerings. On the face of it the single 15Krpm hard disk has a price per GB of £0.39, the SSD £3.86; this is what you will see in the press and this is what sales people will use in comparing the two technologies – do not be fooled by this bullshit people! What is the requirement? The requirement is the database will have a static size of 400GB kept static through archiving so growth and trim will balance the database size, the client requires resilience, there will be several hundred call centre staff querying the database where queries will read a small amount of data but there will be no hot spot in the data so the randomness will come across the entire 400GB of the database, estimates predict that the IOps required will be approximately 4,000IOps at peak times, because it’s a call centre system the IO latency is important and must remain below 5ms per IO. The balance between read and write is 70% read, 30% write. The requirement is now defined and we have three of the most important pieces of the puzzle – space required, estimated IOps and maximum latency per IO. Something to consider with regard SQL Server; write activity requires synchronous IO to the storage media specifically the transaction log; that means the write thread will wait until the IO is completed and hardened off until the thread can continue execution, the requirement has stated that 30% of the system activity will be write so we can expect a high amount of synchronous activity. The hardware solution needs to be defined; two possible solutions: hard disk or solid state based; the real question now is how many hard disks are required to achieve the IO throughput, the latency and resilience, ditto for the solid state. Hard Drive solution On a test on an HP DL380, P410i controller using IOMeter against a single 15Krpm 146GB SAS drive, the throughput given on a transfer size of 8KiB against a 40GiB file on a freshly formatted disk where the partition is the only partition on the disk thus the 40GiB file is on the outer edge of the drive so more sectors can be read before head movement is required: For 100% sequential IO at a queue depth of 16 with 8 worker threads 43,537 IOps at an average latency of 2.93ms (340 MiB/s), for 100% random IO at the same queue depth and worker threads 3,733 IOps at an average latency of 34.06ms (34 MiB/s). The same test was done on the same disk but the test file was 130GiB: For 100% sequential IO at a queue depth of 16 with 8 worker threads 43,537 IOps at an average latency of 2.93ms (340 MiB/s), for 100% random IO at the same queue depth and worker threads 528 IOps at an average latency of 217.49ms (4 MiB/s). From the result it is clear random performance gets worse as the disk fills up – I’m currently writing an article on short stroking which will cover this in detail. Given the work load is random in nature looking at the random performance of the single drive when only 40 GiB of the 146 GB is used gives near the IOps required but the latency is way out. Luckily I have tested 6 x 15Krpm 146GB SAS 15Krpm drives in a RAID 0 using the same test methodology, for the same test above on a 130 GiB for each drive added the performance boost is near linear, for each drive added throughput goes up by 5 MiB/sec, IOps by 700 IOps and latency reducing nearly 50% per drive added (172 ms, 94 ms, 65 ms, 47 ms, 37 ms, 30 ms). This is because the same 130GiB is spread out more as you add drives 130 / 1, 130 / 2, 130 / 3 etc. so implicit short stroking is occurring because there is less file on each drive so less head movement required. The best latency is still 30 ms but we have the IOps required now, but that’s on a 130GiB file and not the 400GiB we need. Some reality check here: a) the drive randomness is more likely to be 50/50 and not a full 100% but the above has highlighted the effect randomness has on the drive and the more a drive fills with data the worse the effect. For argument sake let us assume that for the given workload we need 8 disks to do the job, for resilience reasons we will need 16 because we need to RAID 1+0 them in order to get the throughput and the resilience, RAID 5 would degrade performance. Cost for hard drives: 16 x £239.60 = £3,833.60 For the hard drives we will need disk controllers and a separate external disk array because the likelihood is that the server itself won’t take the drives, a quick spec off DELL for a PowerVault MD1220 which gives the dual pathing with 16 disks 146GB 15Krpm 2.5” disks is priced at £7,438.00, note its probably more once we had two controller cards to sit in the server in, racking etc. Minimum cost taking the DELL quote as an example is therefore: {Cost of Hardware} / {Storage Required} £7,438.60 / 400 = £18.595 per GB £18.59 per GiB is a far cry from the £0.39 we had been told by the salesman and the myth. Yes, the storage array is composed of 16 x 146 disks in RAID 10 (therefore 8 usable) giving an effective usable storage availability of 1168GB but the actual storage requirement is only 400 and the extra disks have had to be purchased to get the IOps up. Solid State Drive solution A single card significantly exceeds the IOps and latency required, for resilience two will be required. ( £2,316.54 * 2 ) / 400 = £11.58 per GB With the SSD solution only two PCIe sockets are required, no external disk units, no additional controllers, no redundant controllers etc. Conclusion I hope by showing you an example that the myth that hard disk drives are cheaper per GiB than Solid State has now been dispelled - £11.58 per GB for SSD compared to £18.59 for Hard Disk. I’ve not even touched on the running costs, compare the costs of running 18 hard disks, that’s a lot of heat and power compared to two PCIe cards!Just a quick note: I've left a fair amount of information out due to this being a blog! If in doubt, email me :)I'll also deal with the myth that SSD's wear out at a later date as well - that's just way over done still, yes, 5 years ago, but now - no.

Read the article

Search Results

Search found 6441 results on 258 pages for 'schema compare'.

Page 240/258 | < Previous Page | 236 237 238 239 240 241 242 243 244 245 246 247 | Next Page >

- by Keith Larson

- by Pinal Dave

- by sreejukg

- by sathya

- by imran_ku07

- by Fatherjack

- by mbridge

- by Brian

- by cibrax

- by Stefan Krantz

- by Paul White

- by Rodrigues, Raphael

- by Pinal Dave

- by Julien Testut

- by subodhnpushpak

- by Pinal Dave

- by James Michael Hare

- by extended_events

- by Pinal Dave

- by Mat Keep

- by Harish Gaur

- by Dejan Sarka

- by tonyrogerson

< Previous Page | 236 237 238 239 240 241 242 243 244 245 246 247 | Next Page >