Search Results

Search found 5313 results on 213 pages for 'steve care'.

Page 202/213 | < Previous Page | 198 199 200 201 202 203 204 205 206 207 208 209 | Next Page >

Replication Services as ETL extraction tool

- by jorg

In my last blog post I explained the principles of Replication Services and the possibilities it offers in a BI environment. One of the possibilities I described was the use of snapshot replication as an ETL extraction tool: “Snapshot Replication can also be useful in BI environments, if you don’t need a near real-time copy of the database, you can choose to use this form of replication. Next to an alternative for Transactional Replication it can be used to stage data so it can be transformed and moved into the data warehousing environment afterwards. In many solutions I have seen developers create multiple SSIS packages that simply copies data from one or more source systems to a staging database that figures as source for the ETL process. The creation of these packages takes a lot of (boring) time, while Replication Services can do the same in minutes. It is possible to filter out columns and/or records and it can even apply schema changes automatically so I think it offers enough features here. I don’t know how the performance will be and if it really works as good for this purpose as I expect, but I want to try this out soon!” Well I have tried it out and I must say it worked well. I was able to let replication services do work in a fraction of the time it would cost me to do the same in SSIS. What I did was the following: Configure snapshot replication for some Adventure Works tables, this was quite simple and straightforward. Create an SSIS package that executes the snapshot replication on demand and waits for its completion. This is something that you can’t do with out of the box functionality. While configuring the snapshot replication two SQL Agent Jobs are created, one for the creation of the snapshot and one for the distribution of the snapshot. Unfortunately these jobs are asynchronous which means that if you execute them they immediately report back if the job started successfully or not, they do not wait for completion and report its result afterwards. So I had to create an SSIS package that executes the jobs and waits for their completion before the rest of the ETL process continues. Fortunately I was able to create the SSIS package with the desired functionality. I have made a step-by-step guide that will help you configure the snapshot replication and I have uploaded the SSIS package you need to execute it. Configure snapshot replication The first step is to create a publication on the database you want to replicate. Connect to SQL Server Management Studio and right-click Replication, choose for New.. Publication… The New Publication Wizard appears, click Next Choose your “source” database and click Next Choose Snapshot publication and click Next You can now select tables and other objects that you want to publish Expand Tables and select the tables that are needed in your ETL process In the next screen you can add filters on the selected tables which can be very useful. Think about selecting only the last x days of data for example. Its possible to filter out rows and/or columns. In this example I did not apply any filters. Schedule the Snapshot Agent to run at a desired time, by doing this a SQL Agent Job is created which we need to execute from a SSIS package later on. Next you need to set the Security Settings for the Snapshot Agent. Click on the Security Settings button. In this example I ran the Agent under the SQL Server Agent service account. This is not recommended as a security best practice. Fortunately there is an excellent article on TechNet which tells you exactly how to set up the security for replication services. Read it here and make sure you follow the guidelines! On the next screen choose to create the publication at the end of the wizard Give the publication a name (SnapshotTest) and complete the wizard The publication is created and the articles (tables in this case) are added Now the publication is created successfully its time to create a new subscription for this publication. Expand the Replication folder in SSMS and right click Local Subscriptions, choose New Subscriptions The New Subscription Wizard appears Select the publisher on which you just created your publication and select the database and publication (SnapshotTest) You can now choose where the Distribution Agent should run. If it runs at the distributor (push subscriptions) it causes extra processing overhead. If you use a separate server for your ETL process and databases choose to run each agent at its subscriber (pull subscriptions) to reduce the processing overhead at the distributor. Of course we need a database for the subscription and fortunately the Wizard can create it for you. Choose for New database Give the database the desired name, set the desired options and click OK You can now add multiple SQL Server Subscribers which is not necessary in this case but can be very useful. You now need to set the security settings for the Distribution Agent. Click on the …. button Again, in this example I ran the Agent under the SQL Server Agent service account. Read the security best practices here Click Next Make sure you create a synchronization job schedule again. This job is also necessary in the SSIS package later on. Initialize the subscription at first synchronization Select the first box to create the subscription when finishing this wizard Complete the wizard by clicking Finish The subscription will be created In SSMS you see a new database is created, the subscriber. There are no tables or other objects in the database available yet because the replication jobs did not ran yet. Now expand the SQL Server Agent, go to Jobs and search for the job that creates the snapshot: Rename this job to “CreateSnapshot” Now search for the job that distributes the snapshot: Rename this job to “DistributeSnapshot” Create an SSIS package that executes the snapshot replication We now need an SSIS package that will take care of the execution of both jobs. The CreateSnapshot job needs to execute and finish before the DistributeSnapshot job runs. After the DistributeSnapshot job has started the package needs to wait until its finished before the package execution finishes. The Execute SQL Server Agent Job Task is designed to execute SQL Agent Jobs from SSIS. Unfortunately this SSIS task only executes the job and reports back if the job started succesfully or not, it does not report if the job actually completed with success or failure. This is because these jobs are asynchronous. The SSIS package I’ve created does the following: It runs the CreateSnapshot job It checks every 5 seconds if the job is completed with a for loop When the CreateSnapshot job is completed it starts the DistributeSnapshot job And again it waits until the snapshot is delivered before the package will finish successfully Quite simple and the package is ready to use as standalone extract mechanism. After executing the package the replicated tables are added to the subscriber database and are filled with data: Download the SSIS package here (SSIS 2008) Conclusion In this example I only replicated 5 tables, I could create a SSIS package that does the same in approximately the same amount of time. But if I replicated all the 70+ AdventureWorks tables I would save a lot of time and boring work! With replication services you also benefit from the feature that schema changes are applied automatically which means your entire extract phase wont break. Because a snapshot is created using the bcp utility (bulk copy) it’s also quite fast, so the performance will be quite good. Disadvantages of using snapshot replication as extraction tool is the limitation on source systems. You can only choose SQL Server or Oracle databases to act as a publisher. So if you plan to build an extract phase for your ETL process that will invoke a lot of tables think about replication services, it would save you a lot of time and thanks to the Extract SSIS package I’ve created you can perfectly fit it in your usual SSIS ETL process.

Read the article
Conversion of BizTalk Projects to Use the New WCF-SAP Adaptor

- by Geordie

We are in the process of upgrading our BizTalk Environment from BizTalk 2006 R2 to BizTalk 2010. The SAP adaptor in BizTalk 2010 is an all new and more powerful WCF-SAP adaptor. When my colleagues tested out the new adaptor they discovered that the format of the data extracted from SAP was not identical to the old adaptor. This is not a big deal if the structure of the messages from SAP is simple. In this case we were receiving the delivery and invoice iDocs. Both these structures are complex especially the delivery document. Over the past few years I have tweaked the delivery mapping to remove bugs from original mapping. The idea of redoing these maps did not appeal and due to the current work load was not even an option. I opted for a rather crude alternative of pulling in the iDoc in the new typed format and then adding a static map at the start of the orchestration to convert the data to the old schema. Note WCF-SAP data formats (on the binding tab of the configuration dialog box is the ‘RecieiveIdocFormat’ field): Typed: Returns a XML document with the hierarchy represented in XML and all fields being represented by XML tags. RFC: Returns an XML document with the hierarchy represented in XML but the iDoc lines in flat file format. String: This returns the iDoc in a format that is closest to the original flat file format but is still wrapped with some top level XML tags. The files also contained some strange characters at the end of each line. I started with the invoice document and it was quite straight forward to add the mapping but this is where my problems started. The orchestrations for these documents are dynamic and so require the identity of the partner to be able to correctly configure the orchestration. The partner identity is in the EDI_DC40 segment of the iDoc. In the old project the RECPRN node of the segment was promoted. The code to set a variable to the partner ID was now failing. After lot of head scratching I discovered the problem was due to the addition of Namespaces to the fields in the EDI_DC40 segment. To overcome this I needed to use an xPath query with a Namespace Manager. This had to be done in custom code. I now tried to repeat the process with the delivery document. Unfortunately when we tried to get sample typed data from SAP an exception was thrown. The adapter "WCF-SAP" raised an error message. Details "Microsoft.ServiceModel.Channels.Common.XmlReaderGenerationException: The segment or group definition E2EDKA1001 was not found in the IDoc metadata. The UniqueId of the IDoc type is: IDOCTYP/3/DESADV01/ZASNEXT1/640. For Receive operations, the SAP adapter does not support unreleased segments. Our guess is that when the WCF-SAP adaptor tries to down load the data it retrieves a data schema from SAP. For some reason the schema does not match the data. This may be due to the version of SAP we are running or due to a customization. Either way resolving this problem did not look easy. When doing some research on this problem I found an article showing me how to get the data from SAP using the WCF-SAP adaptor without any XML tags. http://blogs.msdn.com/b/adapters/archive/2007/10/05/receiving-idocs-getting-the-raw-idoc-data.aspx Reproduction of Mustansir blog: Since the WCF based SAP Adapter is ... well, WCF based, all data flowing in and out of the adapter is encapsulated within a SOAP message. Which means there are those pesky xml tags all over the place. If you want to receive an Idoc from SAP, you can receive it in "Typed" format (in which case each column in each segment of the idoc appears within its own xml tag), or you can receive it in "String" format (in which case there are just 2 xml tags at the top, the raw xml data in string/flat file format, and the 2 closing xml tags). In "String" format, an incoming idoc (for ORDERS05, containing 5 data records) would look like: <ReceiveIdoc ><idocData>EDI_DC40 8000000000001064985620 E2EDK01005 800000000000106498500000100000001 E2EDK14 8000000000001064985000002000000020111000 E2EDK14 8000000000001064985000003000000020081000 E2EDK14 80000000000010649850000040000000200710 E2EDK14 80000000000010649850000050000000200600</idocData></ReceiveIdoc> (I have trimmed part of the control record so that it fits cleanly here on one line). Now, you're only interested in the IDOC data, and don't care much for the XML tags. It isn't that difficult to write your own pipeline component, or even some logic in the orchestration to remove the tags, right? Well, you don't need to write any extra code at all - the WCF Adapter can help you here! During the configuration of your one-way Receive Location using WCF-Custom, navigate to the Messages tab. Under the section "Inbound BizTalk Messge Body", select the "Path" radio button, and: (a) Enter the body path expression as: /*[local-name()='ReceiveIdoc']/*[local-name()='idocData'] (b) Choose "String" for the Node Encoding. What we've done is, used an XPATH to pull out the value of the "idocData" node from the XML. Your Receive Location will now emit text containing only the idoc data. You can at this point, for example, put the Flat File Pipeline component to convert the flat text into a different xml format based on some other schema you already have, and receive your version of the xml formatted message in your orchestration. This was potentially a much easier solution than adding the static maps to the orchestrations and overcame the issue with ‘Typed’ delivery documents. Not quite so fast… Note: When I followed Mustansir’s blog the characters at the end of each line disappeared. After configuring the adaptor and passing the iDoc data into the original flat file receive pipelines I was receiving exceptions. There was a failure executing the receive pipeline: "PAPINETPipelines.DeliveryFlatFileReceive, CustomerIntegration2.PAPINET.Pipelines, Version=1.0.0.0, Culture=neutral, PublicKeyToken=4ca3635fbf092bbb" Source: "Pipeline " Receive Port: "recSAP_Delivery" URI: "D:\CustomerIntegration2\SAP\Delivery\*.xml" Reason: An error occurred when parsing the incoming document: "Unexpected data found while looking for: 'Z2EDPZ7' The current definition being parsed is E2EDP07GRP. The stream offset where the error occured is 8859. The line number where the error occured is 23. The column where the error occured is 0.". Although the new flat file looked the same as the old one there was a differences. In the original file all lines in the document were exactly 1064 character long. In the new file all lines were truncated to the last alphanumeric character. The final piece of the puzzle was to add a custom pipeline component to pad all the lines to 1064 characters. This component was added to the decode node of the custom delivery and invoice flat file disassembler pipelines. Execute method of the custom pipeline component: public IBaseMessage Execute(IPipelineContext pc, IBaseMessage inmsg) { //Convert Stream to a string Stream s = null; IBaseMessagePart bodyPart = inmsg.BodyPart; // NOTE inmsg.BodyPart.Data is implemented only as a setter in the http adapter API and a //getter and setter for the file adapter. Use GetOriginalDataStream to get data instead. if (bodyPart != null) s = bodyPart.GetOriginalDataStream(); string newMsg = string.Empty; string strLine; try { StreamReader sr = new StreamReader(s); strLine = sr.ReadLine(); while (strLine != null) { //Execute padding code if (strLine != null) strLine = strLine.PadRight(1064, ' ') + "\r\n"; newMsg += strLine; strLine = sr.ReadLine(); } sr.Close(); } catch (IOException ex) { throw new Exception("Error occured trying to pad the message to 1064 charactors"); } //Convert back to stream and set to Data property inmsg.BodyPart.Data = new MemoryStream(Encoding.UTF8.GetBytes(newMsg)); ; //reset the position of the stream to zero inmsg.BodyPart.Data.Position = 0; return inmsg; }

Read the article
C#/.NET Little Wonders: The Timeout static class

- by James Michael Hare

Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here. When I started the “Little Wonders” series, I really wanted to pay homage to parts of the .NET Framework that are often small but can help in big ways. The item I have to discuss today really is a very small item in the .NET BCL, but once again I feel it can help make the intention of code much clearer and thus is worthy of note. The Problem - Magic numbers aren’t very readable or maintainable In my first Little Wonders Post (Five Little Wonders That Make Code Better) I mention the TimeSpan factory methods which, I feel, really help the readability of constructed TimeSpan instances. Just to quickly recap that discussion, ask yourself what the TimeSpan specified in each case below is 1: // Five minutes? Five Seconds? 2: var fiveWhat1 = new TimeSpan(0, 0, 5); 3: var fiveWhat2 = new TimeSpan(0, 0, 5, 0); 4: var fiveWhat3 = new TimeSpan(0, 0, 5, 0, 0); You’d think they’d all be the same unit of time, right? After all, most overloads tend to tack additional arguments on the end. But this is not the case with TimeSpan, where the constructor forms are: TimeSpan(int hours, int minutes, int seconds); TimeSpan(int days, int hours, int minutes, int seconds); TimeSpan(int days, int hours, int minutes, int seconds, int milliseconds); Notice how in the 4 and 5 parameter version we suddenly have the parameter days slipping in front of hours? This can make reading constructors like those above much harder. Fortunately, there are TimeSpan factory methods to help make your intention crystal clear: 1: // Ah! Much clearer! 2: var fiveSeconds = TimeSpan.FromSeconds(5); These are great because they remove all ambiguity from the reader! So in short, magic numbers in constructors and methods can be ambiguous, and anything we can do to clean up the intention of the developer will make the code much easier to read and maintain. Timeout – Readable identifiers for infinite timeout values In a similar way to TimeSpan, let’s consider specifying timeouts for some of .NET’s (or our own) many methods that allow you to specify timeout periods. For example, in the TPL Task class, there is a family of Wait() methods that can take TimeSpan or int for timeouts. Typically, if you want to specify an infinite timeout, you’d just call the version that doesn’t take a timeout parameter at all: 1: myTask.Wait(); // infinite wait But there are versions that take the int or TimeSpan for timeout as well: 1: // Wait for 100 ms 2: myTask.Wait(100); 3: 4: // Wait for 5 seconds 5: myTask.Wait(TimeSpan.FromSeconds(5); Now, if we want to specify an infinite timeout to wait on the Task, we could pass –1 (or a TimeSpan set to –1 ms), which what the .NET BCL methods with timeouts use to represent an infinite timeout: 1: // Also infinite timeouts, but harder to read/maintain 2: myTask.Wait(-1); 3: myTask.Wait(TimeSpan.FromMilliseconds(-1)); However, these are not as readable or maintainable. If you were writing this code, you might make the mistake of thinking 0 or int.MaxValue was an infinite timeout, and you’d be incorrect. Also, reading the code above it isn’t as clear that –1 is infinite unless you happen to know that is the specified behavior. To make the code like this easier to read and maintain, there is a static class called Timeout in the System.Threading namespace which contains definition for infinite timeouts specified as both int and TimeSpan forms: Timeout.Infinite An integer constant with a value of –1 Timeout.InfiniteTimeSpan A static readonly TimeSpan which represents –1 ms (only available in .NET 4.5+) This makes our calls to Task.Wait() (or any other calls with timeouts) much more clear: 1: // intention to wait indefinitely is quite clear now 2: myTask.Wait(Timeout.Infinite); 3: myTask.Wait(Timeout.InfiniteTimeSpan); But wait, you may say, why would we care at all? Why not use the version of Wait() that takes no arguments? Good question! When you’re directly calling the method with an infinite timeout that’s what you’d most likely do, but what if you are just passing along a timeout specified by a caller from higher up? Or perhaps storing a timeout value from a configuration file, and want to default it to infinite? For example, perhaps you are designing a communications module and want to be able to shutdown gracefully, but if you can’t gracefully finish in a specified amount of time you want to force the connection closed. You could create a Shutdown() method in your class, and take a TimeSpan or an int for the amount of time to wait for a clean shutdown – perhaps waiting for client to acknowledge – before terminating the connection. So, assume we had a pub/sub system with a class to broadcast messages: 1: // Some class to broadcast messages to connected clients 2: public class Broadcaster 3: { 4: // ... 5: 6: // Shutdown connection to clients, wait for ack back from clients 7: // until all acks received or timeout, whichever happens first 8: public void Shutdown(int timeout) 9: { 10: // Kick off a task here to send shutdown request to clients and wait 11: // for the task to finish below for the specified time... 12: 13: if (!shutdownTask.Wait(timeout)) 14: { 15: // If Wait() returns false, we timed out and task 16: // did not join in time. 17: } 18: } 19: } We could even add an overload to allow us to use TimeSpan instead of int, to give our callers the flexibility to specify timeouts either way: 1: // overload to allow them to specify Timeout in TimeSpan, would 2: // just call the int version passing in the TotalMilliseconds... 3: public void Shutdown(TimeSpan timeout) 4: { 5: Shutdown(timeout.TotalMilliseconds); 6: } Notice in case of this class, we don’t assume the caller wants infinite timeouts, we choose to rely on them to tell us how long to wait. So now, if they choose an infinite timeout, they could use the –1, which is more cryptic, or use Timeout class to make the intention clear: 1: // shutdown the broadcaster, waiting until all clients ack back 2: // without timing out. 3: myBroadcaster.Shutdown(Timeout.Infinite); We could even add a default argument using the int parameter version so that specifying no arguments to Shutdown() assumes an infinite timeout: 1: // Modified original Shutdown() method to add a default of 2: // Timeout.Infinite, works because Timeout.Infinite is a compile 3: // time constant. 4: public void Shutdown(int timeout = Timeout.Infinite) 5: { 6: // same code as before 7: } Note that you can’t default the ShutDown(TimeSpan) overload with Timeout.InfiniteTimeSpan since it is not a compile-time constant. The only acceptable default for a TimeSpan parameter would be default(TimeSpan) which is zero milliseconds, which specified no wait, not infinite wait. Summary While Timeout.Infinite and Timeout.InfiniteTimeSpan are not earth-shattering classes in terms of functionality, they do give you very handy and readable constant values that you can use in your programs to help increase readability and maintainability when specifying infinite timeouts for various timeouts in the BCL and your own applications. Technorati Tags: C#,CSharp,.NET,Little Wonders,Timeout,Task

Read the article
Windows Azure Myths

- by BuckWoody

Windows Azure is part of the Microsoft "stack" - the suite of software and services we offer. Because we have so many products in almost every part of technology, it's hard to know everything about all parts of what we do - even for those of us who work here. So it's no surprise that some folks are not as familiar with Windows and SQL Azure as they are, say Windows Server or XBox. As I chat with folks about a solution for a business or organization need, I put Windows Azure into the mix. I always start off with "What do you already know about Windows Azure?" so that I don't bore folks with information they already have. I some cases they've checked out the product ahead of time and have specific questions, in others they aren't as familiar, and in still others there is a fair amount of mis-information. Sometimes that's because of a marketing failure, sometimes it's hearsay, and somtetimes it's active misinformation. I thought I might lay out a few of these misconceptions. As always - do your fact-checking! Never take anyone's word alone (including mine) as gospel. Make sure you educate yourself on your options. Your company or your clients depend on you to have the right information on IT, so make sure you live up to that. Myth 1: Nobody uses Windows Azure It's true that we don't give out numbers on the amount of clients on Windows and SQL Azure. But lots of folks are here - companies you may have heard of like Boeing, NASA, Fujitsu, The City of London, Nuedesic, and many others. I deal with firms small and large that use Windows Azure for mission-critical applications, sometimes totally on Windows and/or SQL Azure, sometimes in conjunction with an on-premises system, sometimes for only a specific component in Windows Azure like storage. The interesting thing is that many sites you visit have a Windows Azure component, or are running on Windows Azure. They just don't announce it. Just like the other cloud providers, the companies have asked to be completely branded themselves - they don't want you to be aware or care that they are on Windows Azure. Sometimes that's for security, other times it's for different reasons. It's just like the web sites you visit. For the most part, they don't advertise which OS or Web Server they use. It really just shouldn't matter. The point is that they just use what works to solve a given problem. Check out a few public case studies here: https://www.windowsazure.com/en-us/home/case-studies/ Myth 2: It's only for Microsoft stuff - can't use Open Source This is the one I face the most, and am the most dismayed by. We work just fine with many open source products, including Java, NodeJS, PHP, Ruby, Python, Hadoop, and many other languages and applications. You can quickly deploy a Wordpress, Umbraco and other "kits". We have software development kits (SDK's) for iPhones, iPads, Android, Windows phones and more. We have an SDK to work with FaceBook and other social networks. In short, we play well with others. More on the languages and runtimes we support here: https://www.windowsazure.com/en-us/develop/overview/ More on the SDK's here: http://www.wadewegner.com/2011/05/windows-azure-toolkit-for-ios/, http://www.wadewegner.com/2011/08/windows-azure-toolkits-for-devices-now-with-android/, http://azuretoolkit.codeplex.com/ Myth 3: Microsoft expects me to switch everything to "the cloud" No, we don't. That would be disasterous, unless the only things you run in your company uses works perfectly in Azure. Use Windows Azure - or any cloud for that matter - where it works. Whenever I talk to companies, I focus on two things: Something that is broken and needs to be re-architected Something you want to do that is new If something is broken, and you need new tools to scale, extend, add capacity dynamically and so on, then you can consider using Windows or SQL Azure. It can help solve problems that you have, or it may include a component you don't want to write or architect yourself. Sometimes you want to do something new, like extend your company's offerings to mobile phones, to the web, or to a social network. More info on where it works here: http://blogs.msdn.com/b/buckwoody/archive/2011/01/18/windows-azure-and-sql-azure-use-cases.aspx Myth 4: I have to write code to use Windows and SQL Azure If Windows Azure is a PaaS - a Platform as a Service - then don't you have to write code to use it? Nope. Windows and SQL Azure are made up of various components. Some of those components allow you to write and deploy code (like Compute) and others don't. We have lots of customers using Windows Azure storage as a backup, to securely share files instead of using DropBox, to distribute videos or code or firmware, and more. Others use our High Performance Computing (HPC) offering to rent a supercomputer when they need one. You can even throw workloads at that using Excel! In addition there are lots of other components in Windows Azure you can use, from the Windows Azure Media Services to others. More here: https://www.windowsazure.com/en-us/home/scenarios/saas/ Myth 5: Windows Azure is just another form of "vendor lock-in" Windows Azure uses .NET, OSS languages and standard interfaces for the code. Sure, you're not going to take the code line-for-line and run it on a mainframe, but it's standard code that you write, and can port to something else. And the data is yours - you can bring it back whever you want. It's either in text or binary form, that you have complete control over. There are no licenses - you can "pay as you go", and when you're done, you can leave the service and take all your code, data and IP with you. So go out there, read up, try it. Use it where it works. And don't believe everything you hear - sometimes the Internet doesn't get it all correct. :)

Read the article
Rockmelt, the technology adoption model, and Facebook's spare internet

- by Roger Hart

Regardless of how good it is, you'd have to have a heart of stone not to make snide remarks about Rockmelt. After all, on the surface it looks a lot like some people spent two years building a browser instead of just bashing out a Chrome extension over a wet weekend. It probably does some more stuff. I don't know for sure because artificial scarcity is cool, apparently, so the "invitation" is still in the post*. I may in fact never know for sure, because I'm not wild about Facebook sign-in as a prerequisite for anything. From the video, and some initial reviews, my early reaction was: I have a browser, I have a Twitter client; what on earth is this for? The answer, of course, is "not me". Rockmelt is, in a way, quite audacious. Oh, sure, on launch day it's Bay Area bar-chat for the kids with no lenses in their retro specs and trousers that give you deep-vein thrombosis, but it's not really about them. Likewise, Facebook just launched Google Wave, or something. And all the tech snobbery and scorn packed into describing it that way is irrelevant next to what they're doing with their platform. Here's something I drew in MS Paint** because I don't want to get sued: (see: The technology adoption lifecycle) A while ago in the Guardian, John Lanchester dusted off the idiom that "technology is stuff that doesn't work yet". The rest of the article would be quite interesting if it wasn't largely about MySpace, and he's sort of got a point. If you bolt on the sentiment that risk-averse businessmen like things that work, you've got the essence of Crossing the Chasm. Products for the mainstream market don't look much like technology. Think for a second about early (1980s ish) hi-fi systems, with all the knobs and fiddly bits, their ostentatious technophile aesthetic. Then consider their sleeker and less (or at least less conspicuously) functional successors in the 1990s. The theory goes that innovators and early adopters like technology, it's a hobby in itself. The rest of the humans seem to like magic boxes with very few buttons that make stuff happen and never trouble them about why. Personally, I consider Apple's maddening insistence that iTunes is an acceptable way to move files around to be more or less morally unacceptable. Most people couldn't care less. Hence Rockmelt, and hence Facebook's continued growth. Rockmelt looks pointless to me, because I aggregate my social gubbins with Digsby, or use TweetDeck. But my use case is different and so are my enthusiasms. If I want to share photos, I'll use Flickr - but Facebook has photo sharing. If I want a short broadcast message, I'll use Twitter - Facebook has status updates. If I want to sell something with relatively little hassle, there's eBay - or Facebook marketplace. YouTube - check, FB Video. Email - messaging. Calendaring apps, yeah there are loads, or FB Events. What if I want to host a simple web page? Sure, they've got pages. Also Notes for blogging, and more games than I can count. This stuff is right there, where millions and millions of users are already, and for what they need it just works. It's not about me, because I'm not in the big juicy area under the curve. It's what 1990s portal sites could never have dreamed of achieving. Facebook is AOL on speed, crack, and some designer drugs it had specially imported from the future. It's a n00b-friendly gateway to the internet that just happens to serve up all the things you want to do online, right where you are. Oh, and everybody else is there too. The price of having all this and the social graph too is that you have all of this, and the social graph too. But plenty of folks have more incisive things to say than me about the whole privacy shebang, and it's not really what I'm talking about. Facebook is maintaining a vast, and fairly fully-featured training-wheels internet. And it makes up a large proportion of the online experience for a lot of people***. It's the entire web (2.0?) experience for the early and late majority. And sure, no individual bit of it is quite as slick or as fully-realised as something like Flickr (which wows me a bit every time I use it. Those guys are good at the web), but it doesn't have to be. It has to be unobtrusively good enough for the regular humans. It has to not feel like technology. This is what Rockmelt sort of is. You're online, you want something nebulously social, and you don't want to faff about with, say, Twitter clients. Wow! There it is on a really distracting sidebar, right in your browser. No effort! Yeah - fish nor fowl, much? It might work, I guess. There may be a demographic who want their social web experience more simply than tech tinkering, and who aren't just getting it from Facebook (or, for that matter, mobile devices). But I'd be surprised. Rockmelt feels like an attempt to grab a slice of Facebook-style "Look! It's right here, where you already are!", but it's still asking the mature market to install a new browser. Presumably this is where that Facebook sign-in predicate comes in handy, though it'll take some potent awareness marketing to make it fly. Meanwhile, Facebook quietly has the entire rest of the internet as a product management resource, and can continue to give most of the people most of what they want. Something that has not gone un-noticed in its potential to look a little sinister. But heck, they might even make Google Wave popular. *This was true last week when I drafted this post. I got an invite subsequently, hence the screenshot. **MS Paint is no fun any more. It's actually good in Windows 7. Farewell ironically-shonky diagrams. *** It's also behind a single sign-in, lending a veneer of confidence, and partially solving the problem of usernames being crummy unique identifiers. I'll be blogging about that at some point.

Read the article
Of C# Iterators and Performance

- by James Michael Hare

Some of you reading this will be wondering, "what is an iterator" and think I'm locked in the world of C++. Nope, I'm talking C# iterators. No, not enumerators, iterators. So, for those of you who do not know what iterators are in C#, I will explain it in summary, and for those of you who know what iterators are but are curious of the performance impacts, I will explore that as well. Iterators have been around for a bit now, and there are still a bunch of people who don't know what they are or what they do. I don't know how many times at work I've had a code review on my code and have someone ask me, "what's that yield word do?" Basically, this post came to me as I was writing some extension methods to extend IEnumerable<T> -- I'll post some of the fun ones in a later post. Since I was filtering the resulting list down, I was using the standard C# iterator concept; but that got me wondering: what are the performance implications of using an iterator versus returning a new enumeration? So, to begin, let's look at a couple of methods. This is a new (albeit contrived) method called Every(...). The goal of this method is to access and enumeration and return every nth item in the enumeration (including the first). So Every(2) would return items 0, 2, 4, 6, etc. Now, if you wanted to write this in the traditional way, you may come up with something like this: public static IEnumerable<T> Every<T>(this IEnumerable<T> list, int interval) { List<T> newList = new List<T>(); int count = 0; foreach (var i in list) { if ((count++ % interval) == 0) { newList.Add(i); } } return newList; } So basically this method takes any IEnumerable<T> and returns a new IEnumerable<T> that contains every nth item. Pretty straight forward. The problem? Well, Every<T>(...) will construct a list containing every nth item whether or not you care. What happens if you were searching this result for a certain item and find that item after five tries? You would have generated the rest of the list for nothing. Enter iterators. This C# construct uses the yield keyword to effectively defer evaluation of the next item until it is asked for. This can be very handy if the evaluation itself is expensive or if there's a fair chance you'll never want to fully evaluate a list. We see this all the time in Linq, where many expressions are chained together to do complex processing on a list. This would be very expensive if each of these expressions evaluated their entire possible result set on call. Let's look at the same example function, this time using an iterator: public static IEnumerable<T> Every<T>(this IEnumerable<T> list, int interval) { int count = 0; foreach (var i in list) { if ((count++ % interval) == 0) { yield return i; } } } Notice it does not create a new return value explicitly, the only evidence of a return is the "yield return" statement. What this means is that when an item is requested from the enumeration, it will enter this method and evaluate until it either hits a yield return (in which case that item is returned) or until it exits the method or hits a yield break (in which case the iteration ends. Behind the scenes, this is all done with a class that the CLR creates behind the scenes that keeps track of the state of the iteration, so that every time the next item is asked for, it finds that item and then updates the current position so it knows where to start at next time. It doesn't seem like a big deal, does it? But keep in mind the key point here: it only returns items as they are requested. Thus if there's a good chance you will only process a portion of the return list and/or if the evaluation of each item is expensive, an iterator may be of benefit. This is especially true if you intend your methods to be chainable similar to the way Linq methods can be chained. For example, perhaps you have a List<int> and you want to take every tenth one until you find one greater than 10. We could write that as: List<int> someList = new List<int>(); // fill list here someList.Every(10).TakeWhile(i => i <= 10); Now is the difference more apparent? If we use the first form of Every that makes a copy of the list. It's going to copy the entire list whether we will need those items or not, that can be costly! With the iterator version, however, it will only take items from the list until it finds one that is > 10, at which point no further items in the list are evaluated. So, sounds neat eh? But what's the cost is what you're probably wondering. So I ran some tests using the two forms of Every above on lists varying from 5 to 500,000 integers and tried various things. Now, iteration isn't free. If you are more likely than not to iterate the entire collection every time, iterator has some very slight overhead: Copy vs Iterator on 100% of Collection (10,000 iterations) Collection Size Num Iterated Type Total ms 5 5 Copy 5 5 5 Iterator 5 50 50 Copy 28 50 50 Iterator 27 500 500 Copy 227 500 500 Iterator 247 5000 5000 Copy 2266 5000 5000 Iterator 2444 50,000 50,000 Copy 24,443 50,000 50,000 Iterator 24,719 500,000 500,000 Copy 250,024 500,000 500,000 Iterator 251,521 Notice that when iterating over the entire produced list, the times for the iterator are a little better for smaller lists, then getting just a slight bit worse for larger lists. In reality, given the number of items and iterations, the result is near negligible, but just to show that iterators come at a price. However, it should also be noted that the form of Every that returns a copy will have a left-over collection to garbage collect. However, if we only partially evaluate less and less through the list, the savings start to show and make it well worth the overhead. Let's look at what happens if you stop looking after 80% of the list: Copy vs Iterator on 80% of Collection (10,000 iterations) Collection Size Num Iterated Type Total ms 5 4 Copy 5 5 4 Iterator 5 50 40 Copy 27 50 40 Iterator 23 500 400 Copy 215 500 400 Iterator 200 5000 4000 Copy 2099 5000 4000 Iterator 1962 50,000 40,000 Copy 22,385 50,000 40,000 Iterator 19,599 500,000 400,000 Copy 236,427 500,000 400,000 Iterator 196,010 Notice that the iterator form is now operating quite a bit faster. But the savings really add up if you stop on average at 50% (which most searches would typically do): Copy vs Iterator on 50% of Collection (10,000 iterations) Collection Size Num Iterated Type Total ms 5 2 Copy 5 5 2 Iterator 4 50 25 Copy 25 50 25 Iterator 16 500 250 Copy 188 500 250 Iterator 126 5000 2500 Copy 1854 5000 2500 Iterator 1226 50,000 25,000 Copy 19,839 50,000 25,000 Iterator 12,233 500,000 250,000 Copy 208,667 500,000 250,000 Iterator 122,336 Now we see that if we only expect to go on average 50% into the results, we tend to shave off around 40% of the time. And this is only for one level deep. If we are using this in a chain of query expressions it only adds to the savings. So my recommendation? If you have a resonable expectation that someone may only want to partially consume your enumerable result, I would always tend to favor an iterator. The cost if they iterate the whole thing does not add much at all -- and if they consume only partially, you reap some really good performance gains. Next time I'll discuss some of my favorite extensions I've created to make development life a little easier and maintainability a little better.

Read the article
C#: Does an IDisposable in a Halted Iterator Dispose?

- by James Michael Hare

If that sounds confusing, let me give you an example. Let's say you expose a method to read a database of products, and instead of returning a List<Product> you return an IEnumerable<Product> in iterator form (yield return). This accomplishes several good things: The IDataReader is not passed out of the Data Access Layer which prevents abstraction leak and resource leak potentials. You don't need to construct a full List<Product> in memory (which could be very big) if you just want to forward iterate once. If you only want to consume up to a certain point in the list, you won't incur the database cost of looking up the other items. This could give us an example like: 1: // a sample data access object class to do standard CRUD operations. 2: public class ProductDao 3: { 4: private DbProviderFactory _factory = SqlClientFactory.Instance 5: 6: // a method that would retrieve all available products 7: public IEnumerable<Product> GetAvailableProducts() 8: { 9: // must create the connection 10: using (var con = _factory.CreateConnection()) 11: { 12: con.ConnectionString = _productsConnectionString; 13: con.Open(); 14: 15: // create the command 16: using (var cmd = _factory.CreateCommand()) 17: { 18: cmd.Connection = con; 19: cmd.CommandText = _getAllProductsStoredProc; 20: cmd.CommandType = CommandType.StoredProcedure; 21: 22: // get a reader and pass back all results 23: using (var reader = cmd.ExecuteReader()) 24: { 25: while(reader.Read()) 26: { 27: yield return new Product 28: { 29: Name = reader["product_name"].ToString(), 30: ... 31: }; 32: } 33: } 34: } 35: } 36: } 37: } The database details themselves are irrelevant. I will say, though, that I'm a big fan of using the System.Data.Common classes instead of your provider specific counterparts directly (SqlCommand, OracleCommand, etc). This lets you mock your data sources easily in unit testing and also allows you to swap out your provider in one line of code. In fact, one of the shared components I'm most proud of implementing was our group's DatabaseUtility library that simplifies all the database access above into one line of code in a thread-safe and provider-neutral way. I went with my own flavor instead of the EL due to the fact I didn't want to force internal company consumers to use the EL if they didn't want to, and it made it easy to allow them to mock their database for unit testing by providing a MockCommand, MockConnection, etc that followed the System.Data.Common model. One of these days I'll blog on that if anyone's interested. Regardless, you often have situations like the above where you are consuming and iterating through a resource that must be closed once you are finished iterating. For the reasons stated above, I didn't want to return IDataReader (that would force them to remember to Dispose it), and I didn't want to return List<Product> (that would force them to hold all products in memory) -- but the first time I wrote this, I was worried. What if you never consume the last item and exit the loop? Are the reader, command, and connection all disposed correctly? Of course, I was 99.999999% sure the creators of C# had already thought of this and taken care of it, but inspection in Reflector was difficult due to the nature of the state machines yield return generates, so I decided to try a quick example program to verify whether or not Dispose() will be called when an iterator is broken from outside the iterator itself -- i.e. before the iterator reports there are no more items. So I wrote a quick Sequencer class with a Dispose() method and an iterator for it. Yes, it is COMPLETELY contrived: 1: // A disposable sequence of int -- yes this is completely contrived... 2: internal class Sequencer : IDisposable 3: { 4: private int _i = 0; 5: private readonly object _mutex = new object(); 6: 7: // Constructs an int sequence. 8: public Sequencer(int start) 9: { 10: _i = start; 11: } 12: 13: // Gets the next integer 14: public int GetNext() 15: { 16: lock (_mutex) 17: { 18: return _i++; 19: } 20: } 21: 22: // Dispose the sequence of integers. 23: public void Dispose() 24: { 25: // force output immediately (flush the buffer) 26: Console.WriteLine("Disposed with last sequence number of {0}!", _i); 27: Console.Out.Flush(); 28: } 29: } And then I created a generator (infinite-loop iterator) that did the using block for auto-Disposal: 1: // simply defines an extension method off of an int to start a sequence 2: public static class SequencerExtensions 3: { 4: // generates an infinite sequence starting at the specified number 5: public static IEnumerable<int> GetSequence(this int starter) 6: { 7: // note the using here, will call Dispose() when block terminated. 8: using (var seq = new Sequencer(starter)) 9: { 10: // infinite loop on this generator, means must be bounded by caller! 11: while(true) 12: { 13: yield return seq.GetNext(); 14: } 15: } 16: } 17: } This is really the same conundrum as the database problem originally posed. Here we are using iteration (yield return) over a large collection (infinite sequence of integers). If we cut the sequence short by breaking iteration, will that using block exit and hence, Dispose be called? Well, let's see: 1: // The test program class 2: public class IteratorTest 3: { 4: // The main test method. 5: public static void Main() 6: { 7: Console.WriteLine("Going to consume 10 of infinite items"); 8: Console.Out.Flush(); 9: 10: foreach(var i in 0.GetSequence()) 11: { 12: // could use TakeWhile, but wanted to output right at break... 13: if(i >= 10) 14: { 15: Console.WriteLine("Breaking now!"); 16: Console.Out.Flush(); 17: break; 18: } 19: 20: Console.WriteLine(i); 21: Console.Out.Flush(); 22: } 23: 24: Console.WriteLine("Done with loop."); 25: Console.Out.Flush(); 26: } 27: } So, what do we see? Do we see the "Disposed" message from our dispose, or did the Dispose get skipped because from an "eyeball" perspective we should be locked in that infinite generator loop? Here's the results: 1: Going to consume 10 of infinite items 2: 0 3: 1 4: 2 5: 3 6: 4 7: 5 8: 6 9: 7 10: 8 11: 9 12: Breaking now! 13: Disposed with last sequence number of 11! 14: Done with loop. Yes indeed, when we break the loop, the state machine that C# generates for yield iterate exits the iteration through the using blocks and auto-disposes the IDisposable correctly. I must admit, though, the first time I wrote one, I began to wonder and that led to this test. If you've never seen iterators before (I wrote a previous entry here) the infinite loop may throw you, but you have to keep in mind it is not a linear piece of code, that every time you hit a "yield return" it cedes control back to the state machine generated for the iterator. And this state machine, I'm happy to say, is smart enough to clean up the using blocks correctly. I suspected those wily guys and gals at Microsoft engineered it well, and I wasn't disappointed. But, I've been bitten by assumptions before, so it's good to test and see. Yes, maybe you knew it would or figured it would, but isn't it nice to know? And as those campy 80s G.I. Joe cartoon public service reminders always taught us, "Knowing is half the battle...". Technorati Tags: C#,.NET

Read the article
C#: LINQ vs foreach - Round 1.

- by James Michael Hare

So I was reading Peter Kellner's blog entry on Resharper 5.0 and its LINQ refactoring and thought that was very cool. But that raised a point I had always been curious about in my head -- which is a better choice: manual foreach loops or LINQ? The answer is not really clear-cut. There are two sides to any code cost arguments: performance and maintainability. The first of these is obvious and quantifiable. Given any two pieces of code that perform the same function, you can run them side-by-side and see which piece of code performs better. Unfortunately, this is not always a good measure. Well written assembly language outperforms well written C++ code, but you lose a lot in maintainability which creates a big techncial debt load that is hard to offset as the application ages. In contrast, higher level constructs make the code more brief and easier to understand, hence reducing technical cost. Now, obviously in this case we're not talking two separate languages, we're comparing doing something manually in the language versus using a higher-order set of IEnumerable extensions that are in the System.Linq library. Well, before we discuss any further, let's look at some sample code and the numbers. First, let's take a look at the for loop and the LINQ expression. This is just a simple find comparison: // find implemented via LINQ public static bool FindViaLinq(IEnumerable<int> list, int target) { return list.Any(item => item == target); } // find implemented via standard iteration public static bool FindViaIteration(IEnumerable<int> list, int target) { foreach (var i in list) { if (i == target) { return true; } } return false; } Okay, looking at this from a maintainability point of view, the Linq expression is definitely more concise (8 lines down to 1) and is very readable in intention. You don't have to actually analyze the behavior of the loop to determine what it's doing. So let's take a look at performance metrics from 100,000 iterations of these methods on a List<int> of varying sizes filled with random data. For this test, we fill a target array with 100,000 random integers and then run the exact same pseudo-random targets through both searches. List<T> On 100,000 Iterations Method Size Total (ms) Per Iteration (ms) % Slower Any 10 26 0.00046 30.00% Iteration 10 20 0.00023 - Any 100 116 0.00201 18.37% Iteration 100 98 0.00118 - Any 1000 1058 0.01853 16.78% Iteration 1000 906 0.01155 - Any 10,000 10,383 0.18189 17.41% Iteration 10,000 8843 0.11362 - Any 100,000 104,004 1.8297 18.27% Iteration 100,000 87,941 1.13163 - The LINQ expression is running about 17% slower for average size collections and worse for smaller collections. Presumably, this is due to the overhead of the state machine used to track the iterators for the yield returns in the LINQ expressions, which seems about right in a tight loop such as this. So what about other LINQ expressions? After all, Any() is one of the more trivial ones. I decided to try the TakeWhile() algorithm using a Count() to get the position stopped like the sample Pete was using in his blog that Resharper refactored for him into LINQ: // Linq form public static int GetTargetPosition1(IEnumerable<int> list, int target) { return list.TakeWhile(item => item != target).Count(); } // traditionally iterative form public static int GetTargetPosition2(IEnumerable<int> list, int target) { int count = 0; foreach (var i in list) { if(i == target) { break; } ++count; } return count; } Once again, the LINQ expression is much shorter, easier to read, and should be easier to maintain over time, reducing the cost of technical debt. So I ran these through the same test data: List<T> On 100,000 Iterations Method Size Total (ms) Per Iteration (ms) % Slower TakeWhile 10 41 0.00041 128% Iteration 10 18 0.00018 - TakeWhile 100 171 0.00171 88% Iteration 100 91 0.00091 - TakeWhile 1000 1604 0.01604 94% Iteration 1000 825 0.00825 - TakeWhile 10,000 15765 0.15765 92% Iteration 10,000 8204 0.08204 - TakeWhile 100,000 156950 1.5695 92% Iteration 100,000 81635 0.81635 - Wow! I expected some overhead due to the state machines iterators produce, but 90% slower? That seems a little heavy to me. So then I thought, well, what if TakeWhile() is not the right tool for the job? The problem is TakeWhile returns each item for processing using yield return, whereas our for-loop really doesn't care about the item beyond using it as a stop condition to evaluate. So what if that back and forth with the iterator state machine is the problem? Well, we can quickly create an (albeit ugly) lambda that uses the Any() along with a count in a closure (if a LINQ guru knows a better way PLEASE let me know!), after all , this is more consistent with what we're trying to do, we're trying to find the first occurence of an item and halt once we find it, we just happen to be counting on the way. This mostly matches Any(). // a new method that uses linq but evaluates the count in a closure. public static int TakeWhileViaLinq2(IEnumerable<int> list, int target) { int count = 0; list.Any(item => { if(item == target) { return true; } ++count; return false; }); return count; } Now how does this one compare? List<T> On 100,000 Iterations Method Size Total (ms) Per Iteration (ms) % Slower TakeWhile 10 41 0.00041 128% Any w/Closure 10 23 0.00023 28% Iteration 10 18 0.00018 - TakeWhile 100 171 0.00171 88% Any w/Closure 100 116 0.00116 27% Iteration 100 91 0.00091 - TakeWhile 1000 1604 0.01604 94% Any w/Closure 1000 1101 0.01101 33% Iteration 1000 825 0.00825 - TakeWhile 10,000 15765 0.15765 92% Any w/Closure 10,000 10802 0.10802 32% Iteration 10,000 8204 0.08204 - TakeWhile 100,000 156950 1.5695 92% Any w/Closure 100,000 108378 1.08378 33% Iteration 100,000 81635 0.81635 - Much better! It seems that the overhead of TakeAny() returning each item and updating the state in the state machine is drastically reduced by using Any() since Any() iterates forward until it finds the value we're looking for -- for the task we're attempting to do. So the lesson there is, make sure when you use a LINQ expression you're choosing the best expression for the job, because if you're doing more work than you really need, you'll have a slower algorithm. But this is true of any choice of algorithm or collection in general. Even with the Any() with the count in the closure it is still about 30% slower, but let's consider that angle carefully. For a list of 100,000 items, it was the difference between 1.01 ms and 0.82 ms roughly in a List<T>. That's really not that bad at all in the grand scheme of things. Even running at 90% slower with TakeWhile(), for the vast majority of my projects, an extra millisecond to save potential errors in the long term and improve maintainability is a small price to pay. And if your typical list is 1000 items or less we're talking only microseconds worth of difference. It's like they say: 90% of your performance bottlenecks are in 2% of your code, so over-optimizing almost never pays off. So personally, I'll take the LINQ expression wherever I can because they will be easier to read and maintain (thus reducing technical debt) and I can rely on Microsoft's development to have coded and unit tested those algorithm fully for me instead of relying on a developer to code the loop logic correctly. If something's 90% slower, yes, it's worth keeping in mind, but it's really not until you start get magnitudes-of-order slower (10x, 100x, 1000x) that alarm bells should really go off. And if I ever do need that last millisecond of performance? Well then I'll optimize JUST THAT problem spot. To me it's worth it for the readability, speed-to-market, and maintainability.

Read the article
SQL SERVER – Weekly Series – Memory Lane – #048

- by Pinal Dave

Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Order of Result Set of SELECT Statement on Clustered Indexed Table When ORDER BY is Not Used Above theory is true in most of the cases. However SQL Server does not use that logic when returning the resultset. SQL Server always returns the resultset which it can return fastest.In most of the cases the resultset which can be returned fastest is the resultset which is returned using clustered index. Effect of TRANSACTION on Local Variable – After ROLLBACK and After COMMIT One of the Jr. Developer asked me this question (What will be the Effect of TRANSACTION on Local Variable – After ROLLBACK and After COMMIT?) while I was rushing to an important meeting. I was getting late so I asked him to talk with his Application Tech Lead. When I came back from meeting both of them were looking for me. They said they are confused. I quickly wrote down following example for them. 2008 SQL SERVER – Guidelines and Coding Standards Complete List Download Coding standards and guidelines are very important for any developer on the path of a successful career. A coding standard is a set of guidelines, rules and regulations on how to write code. Coding standards should be flexible enough or should take care of the situation where they should not prevent best practices for coding. They are basically the guidelines that one should follow for better understanding. Download Guidelines and Coding Standards complete List Download Get Answer in Float When Dividing of Two Integer Many times we have requirements of some calculations amongst different fields in Tables. One of the software developers here was trying to calculate some fields having integer values and divide it which gave incorrect results in integer where accurate results including decimals was expected. Puzzle – Computed Columns Datatype Explanation SQL Server automatically does a cast to the data type having the highest precedence. So the result of INT and INT will be INT, but INT and FLOAT will be FLOAT because FLOAT has a higher precedence. If you want a different data type, you need to do an EXPLICIT cast. Renaming SP is Not Good Idea – Renaming Stored Procedure Does Not Update sys.procedures I have written many articles about renaming a tables, columns and procedures SQL SERVER – How to Rename a Column Name or Table Name, here I found something interesting about renaming the stored procedures and felt like sharing it with you all. The interesting fact is that when we rename a stored procedure using SP_Rename command, the Stored Procedure is successfully renamed. But when we try to test the procedure using sp_helptext, the procedure will be having the old name instead of new names. 2009 Insert Values of Stored Procedure in Table – Use Table Valued Function It is clear from the result set that , where I have converted stored procedure logic into the table valued function, is much better in terms of logic as it saves a large number of operations. However, this option should be used carefully. The performance of the stored procedure is “usually” better than that of functions. Interesting Observation – Index on Index View Used in Similar Query Recently, I was working on an optimization project for one of the largest organizations. While working on one of the queries, we came across a very interesting observation. We found that there was a query on the base table and when the query was run, it used the index, which did not exist in the base table. On careful examination, we found that the query was using the index that was on another view. This was very interesting as I have personally never experienced a scenario like this. In simple words, “Query on the base table can use the index created on the indexed view of the same base table.” Interesting Observation – Execution Plan and Results of Aggregate Concatenation Queries Working with SQL Server has never seemed to be monotonous – no matter how long one has worked with it. Quite often, I come across some excellent comments that I feel like acknowledging them as blog posts. Recently, I wrote an article on SQL SERVER – Execution Plan and Results of Aggregate Concatenation Queries Depend Upon Expression Location, which is well received in the community. 2010 I encourage all of you to go through complete series and write your own on the subject. If you write an article and send it to me, I will publish it on this blog with due credit to you. If you write on your own blog, I will update this blog post pointing to your blog post. SQL SERVER – ORDER BY Does Not Work – Limitation of the View 1 SQL SERVER – Adding Column is Expensive by Joining Table Outside View – Limitation of the View 2 SQL SERVER – Index Created on View not Used Often – Limitation of the View 3 SQL SERVER – SELECT * and Adding Column Issue in View – Limitation of the View 4 SQL SERVER – COUNT(*) Not Allowed but COUNT_BIG(*) Allowed – Limitation of the View 5 SQL SERVER – UNION Not Allowed but OR Allowed in Index View – Limitation of the View 6 SQL SERVER – Cross Database Queries Not Allowed in Indexed View – Limitation of the View 7 SQL SERVER – Outer Join Not Allowed in Indexed Views – Limitation of the View 8 SQL SERVER – SELF JOIN Not Allowed in Indexed View – Limitation of the View 9 SQL SERVER – Keywords View Definition Must Not Contain for Indexed View – Limitation of the View 10 SQL SERVER – View Over the View Not Possible with Index View – Limitations of the View 11 2011 Startup Parameters Easy to Configure If you are a regular reader of this blog, you must be aware that I have written about SQL Server Denali recently. Here is the quickest way to reach into the screen where we can change the startup parameters. Go to SQL Server Configuration Manager >> SQL Server Services >> Right Click on the Server >> Properties >> Startup Parameters 2012 Validating Unique Columnname Across Whole Database I sometimes come across very strange requirements and often I do not receive a proper explanation of the same. Here is the one of those examples. For example “Our business requirement is when we add new column we want it unique across current database.” Read the solution to this strange request in this blog post. Excel Losing Decimal Values When Value Pasted from SSMS ResultSet It is very common when users are coping the resultset to Excel, the floating point or decimals are missed. The solution is very much simple and it requires a small adjustment in the Excel. By default Excel is very smart and when it detects the value which is getting pasted is numeric it changes the column format to accommodate that. Basic Calculation and PEMDAS Order of Operation Read this interesting blog post for fantastic conversation about the subject. Copy Column Headers from Resultset – SQL in Sixty Seconds #027 – Video http://www.youtube.com/watch?v=x_-3tLqTRv0 Delete From Multiple Table – Update Multiple Table in Single Statement There are two questions which I get every single day multiple times. In my gmail, I have created standard canned reply for them. Let us see the questions here. I want to delete from multiple table in a single statement how will I do it? I want to update multiple table in a single statement how will I do it? Read the answer in the blog post. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
Windows Azure Myths

- by BuckWoody

Windows Azure is part of the Microsoft "stack" - the suite of software and services we offer. Because we have so many products in almost every part of technology, it's hard to know everything about all parts of what we do - even for those of us who work here. So it's no surprise that some folks are not as familiar with Windows and SQL Azure as they are, say Windows Server or XBox. As I chat with folks about a solution for a business or organization need, I put Windows Azure into the mix. I always start off with "What do you already know about Windows Azure?" so that I don't bore folks with information they already have. I some cases they've checked out the product ahead of time and have specific questions, in others they aren't as familiar, and in still others there is a fair amount of mis-information. Sometimes that's because of a marketing failure, sometimes it's hearsay, and somtetimes it's active misinformation. I thought I might lay out a few of these misconceptions. As always - do your fact-checking! Never take anyone's word alone (including mine) as gospel. Make sure you educate yourself on your options. Your company or your clients depend on you to have the right information on IT, so make sure you live up to that. Myth 1: Nobody uses Windows Azure It's true that we don't give out numbers on the amount of clients on Windows and SQL Azure. But lots of folks are here - companies you may have heard of like Boeing, NASA, Fujitsu, The City of London, Nuedesic, and many others. I deal with firms small and large that use Windows Azure for mission-critical applications, sometimes totally on Windows and/or SQL Azure, sometimes in conjunction with an on-premises system, sometimes for only a specific component in Windows Azure like storage. The interesting thing is that many sites you visit have a Windows Azure component, or are running on Windows Azure. They just don't announce it. Just like the other cloud providers, the companies have asked to be completely branded themselves - they don't want you to be aware or care that they are on Windows Azure. Sometimes that's for security, other times it's for different reasons. It's just like the web sites you visit. For the most part, they don't advertise which OS or Web Server they use. It really just shouldn't matter. The point is that they just use what works to solve a given problem. Check out a few public case studies here: https://www.windowsazure.com/en-us/home/case-studies/ Myth 2: It's only for Microsoft stuff - can't use Open Source This is the one I face the most, and am the most dismayed by. We work just fine with many open source products, including Java, NodeJS, PHP, Ruby, Python, Hadoop, and many other languages and applications. You can quickly deploy a Wordpress, Umbraco and other "kits". We have software development kits (SDK's) for iPhones, iPads, Android, Windows phones and more. We have an SDK to work with FaceBook and other social networks. In short, we play well with others. More on the languages and runtimes we support here: https://www.windowsazure.com/en-us/develop/overview/ More on the SDK's here: http://www.wadewegner.com/2011/05/windows-azure-toolkit-for-ios/, http://www.wadewegner.com/2011/08/windows-azure-toolkits-for-devices-now-with-android/, http://azuretoolkit.codeplex.com/ Myth 3: Microsoft expects me to switch everything to "the cloud" No, we don't. That would be disasterous, unless the only things you run in your company uses works perfectly in Azure. Use Windows Azure - or any cloud for that matter - where it works. Whenever I talk to companies, I focus on two things: Something that is broken and needs to be re-architected Something you want to do that is new If something is broken, and you need new tools to scale, extend, add capacity dynamically and so on, then you can consider using Windows or SQL Azure. It can help solve problems that you have, or it may include a component you don't want to write or architect yourself. Sometimes you want to do something new, like extend your company's offerings to mobile phones, to the web, or to a social network. More info on where it works here: http://blogs.msdn.com/b/buckwoody/archive/2011/01/18/windows-azure-and-sql-azure-use-cases.aspx Myth 4: I have to write code to use Windows and SQL Azure If Windows Azure is a PaaS - a Platform as a Service - then don't you have to write code to use it? Nope. Windows and SQL Azure are made up of various components. Some of those components allow you to write and deploy code (like Compute) and others don't. We have lots of customers using Windows Azure storage as a backup, to securely share files instead of using DropBox, to distribute videos or code or firmware, and more. Others use our High Performance Computing (HPC) offering to rent a supercomputer when they need one. You can even throw workloads at that using Excel! In addition there are lots of other components in Windows Azure you can use, from the Windows Azure Media Services to others. More here: https://www.windowsazure.com/en-us/home/scenarios/saas/ Myth 5: Windows Azure is just another form of "vendor lock-in" Windows Azure uses .NET, OSS languages and standard interfaces for the code. Sure, you're not going to take the code line-for-line and run it on a mainframe, but it's standard code that you write, and can port to something else. And the data is yours - you can bring it back whever you want. It's either in text or binary form, that you have complete control over. There are no licenses - you can "pay as you go", and when you're done, you can leave the service and take all your code, data and IP with you. So go out there, read up, try it. Use it where it works. And don't believe everything you hear - sometimes the Internet doesn't get it all correct. :)

Read the article
Integrating Coherence & Java EE 6 Applications using ActiveCache

- by Ricardo Ferreira

OK, so you are a developer and are starting a new Java EE 6 application using the most wonderful features of the Java EE platform like Enterprise JavaBeans, JavaServer Faces, CDI, JPA e another cool stuff technologies. And your architecture need to hold piece of data into distributed caches to improve application's performance, scalability and reliability? If this is your current facing scenario, maybe you should look closely in the solutions provided by Oracle WebLogic Server. Oracle had integrated WebLogic Server and its champion data caching technology called Oracle Coherence. This seamless integration between this two products provides a comprehensive environment to develop applications without the complexity of extra Java code to manage cache as a dependency, since Oracle provides an DI ("Dependency Injection") mechanism for Coherence, the same DI mechanism available in standard Java EE applications. This feature is called ActiveCache. In this article, I will show you how to configure ActiveCache in WebLogic and at your Java EE application. Configuring WebLogic to manage Coherence Before you start changing your application to use Coherence, you need to configure your Coherence distributed cache. The good news is, you can manage all this stuff without writing a single line of code of XML or even Java. This configuration can be done entirely in the WebLogic administration console. The first thing to do is the setup of a Coherence cluster. A Coherence cluster is a set of Coherence JVMs configured to form one single view of the cache. This means that you can insert or remove members of the cluster without the client application (the application that generates or consume data from the cache) knows about the changes. This concept allows your solution to scale-out without changing the application server JVMs. You can growth your application only in the data grid layer. To start the configuration, you need to configure an machine that points to the server in which you want to execute the Coherence JVMs. WebLogic Server allows you to do this very easily using the Administration Console. In this example, I will call the machine as "coherence-server". Remember that in order to the machine concept works, you need to ensure that the NodeManager are being executed in the target server that the machine points to. The NodeManager executable can be found in <WLS_HOME>/server/bin/startNodeManager.sh. The next thing to do is to configure a Coherence cluster. In the WebLogic administration console, go to Environment > Coherence Clusters and click in "New". Call this Coherence cluster of "my-coherence-cluster". Click in next. Specify a valid cluster address and port. The Coherence members will communicate with each other through this address and port. Our Coherence cluster are now configured. Now it is time to configure the Coherence members and add them to this cluster. In the WebLogic administration console, go to Environment > Coherence Servers and click in "New". In the field "Name" set to "coh-server-1". In the field "Machine", associate this Coherence server to the machine "coherence-server". In the field "Cluster", associate this Coherence server to the cluster named "my-coherence-cluster". Click in "Finish". Start the Coherence server using the "Control" tab of WebLogic administration console. This will instruct WebLogic to start a new JVM of Coherence in the target machine that should join the pre-defined Coherence cluster. Configuring your Java EE Application to Access Coherence Now lets pass to the funny part of the configuration. The first thing to do is to inform your Java EE application which Coherence cluster to join. Oracle had updated WebLogic server deployment descriptors so you will not have to change your code or the containers deployment descriptors like application.xml, ejb-jar.xml or web.xml. In this example, I will show you how to enable DI ("Dependency Injection") to a Coherence cache from a Servlet 3.0 component. In the WEB-INF/weblogic.xml deployment descriptor, put the following metadata information: <?xml version="1.0" encoding="UTF-8"?> <wls:weblogic-web-app xmlns:wls="http://xmlns.oracle.com/weblogic/weblogic-web-app" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd http://xmlns.oracle.com/weblogic/weblogic-web-app http://xmlns.oracle.com/weblogic/weblogic-web-app/1.4/weblogic-web-app.xsd"> <wls:context-root>myWebApp</wls:context-root> <wls:coherence-cluster-ref> <wls:coherence-cluster-name>my-coherence-cluster</wls:coherence-cluster-name> </wls:coherence-cluster-ref> </wls:weblogic-web-app> As you can see, using the "coherence-cluster-name" tag, we are informing our Java EE application that it should join the "my-coherence-cluster" when it loads in the web container. Without this information, the application will not be able to access the predefined Coherence cluster. It will form its own Coherence cluster without any members. So never forget to put this information. Now put the coherence.jar and active-cache-1.0.jar dependencies at your WEB-INF/lib application classpath. You need to deploy this dependencies so ActiveCache can automatically take care of the Coherence cluster join phase. This dependencies can be found in the following locations: - <WLS_HOME>/common/deployable-libraries/active-cache-1.0.jar - <COHERENCE_HOME>/lib/coherence.jar Finally, you need to write down the access code to the Coherence cache at your Servlet. In the following example, we have a Servlet 3.0 component that access a Coherence cache named "transactions" and prints into the browser output the content (the ammount property) of one specific transaction. package com.oracle.coherence.demo.activecache; import java.io.IOException; import javax.annotation.Resource; import javax.servlet.ServletException; import javax.servlet.annotation.WebServlet; import javax.servlet.http.HttpServlet; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; import com.tangosol.net.NamedCache; @WebServlet("/demo/specificTransaction") public class TransactionServletExample extends HttpServlet { @Resource(mappedName = "transactions") NamedCache transactions; protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { int transId = Integer.parseInt(request.getParameter("transId")); Transaction transaction = (Transaction) transactions.get(transId); response.getWriter().println("<center>" + transaction.getAmmount() + "</center>"); } } Thats it! No more configuration is necessary and you have all set to start producing and getting data to/from Coherence. As you can see in the example code, the Coherence cache are treated as a normal dependency in the Java EE container. The magic happens behind the scenes when the ActiveCache allows your application to join the defined Coherence cluster. The most interesting thing about this approach is, no matter which type of Coherence cache your are using (Distributed, Partitioned, Replicated, WAN-Remote) for the client application, it is just a simple attribute member of com.tangosol.net.NamedCache type. And its all managed by the Java EE container as an dependency. This means that if you inject the same dependency (the Coherence cache named "transactions") in another Java EE component (JSF managed-bean, Stateless EJB) the cache will be the same. Cool isn't it? Thanks to the CDI technology, we can extend the same support for non-Java EE standards components like simple POJOs. This means that you are not forced to only use Servlets, EJBs or JSF in order to inject Coherence caches. You can do the same approach for regular POJOs created for you and managed by lightweight containers like Spring or Seam.

Read the article
Windows Azure Virtual Machine Readiness and Capacity Assessment for SQL Server

- by SQLOS Team

Windows Azure Virtual Machine Readiness and Capacity Assessment for Windows Server Machine Running SQL Server With the release of MAP Toolkit 8.0 Beta, we have added a new scenario to assess your Windows Azure Virtual Machine Readiness. The MAP 8.0 Beta performs a comprehensive assessment of Windows Servers running SQL Server to determine you level of readiness to migrate an on-premise physical or virtual machine to Windows Azure Virtual Machines. The MAP Toolkit then offers suggested changes to prepare the machines for migration, such as upgrading the operating system or SQL Server. MAP Toolkit 8.0 Beta is available for download here Your participation and feedback is very important to make the MAP Toolkit work better for you. We encourage you to participate in the beta program and provide your feedback at [email protected] or through one of our surveys. Now, let’s walk through the MAP Toolkit task for completing the Windows Azure Virtual Machine assessment and capacity planning. The tasks include the following: Perform an inventory View the Windows Azure VM Readiness results and report Collect performance data for determine VM sizing View the Windows Azure Capacity results and report Perform an inventory: 1. To perform an inventory against a single machine or across a complete environment, choose Perform an Inventory to launch the Inventory and Assessment Wizard as shown below: 2. After the Inventory and Assessment Wizard launches, select either the Windows computers or SQL Server scenario to inventory Windows machines. HINT: If you don’t care about completely inventorying a machine, just select the SQL Server scenario. Click Next to Continue. 3. On the Discovery Methods page, select how you want to discover computers and then click Next to continue. Description of Discovery Methods: Use Active Directory Domain Services -- This method allows you to query a domain controller via the Lightweight Directory Access Protocol (LDAP) and select computers in all or specific domains, containers, or OUs. Use this method if all computers and devices are in AD DS. Windows networking protocols -- This method uses the WIN32 LAN Manager application programming interfaces to query the Computer Browser service for computers in workgroups and Windows NT 4.0–based domains. If the computers on the network are not joined to an Active Directory domain, use only the Windows networking protocols option to find computers. System Center Configuration Manager (SCCM) -- This method enables you to inventory computers managed by System Center Configuration Manager (SCCM). You need to provide credentials to the System Center Configuration Manager server in order to inventory the managed computers. When you select this option, the MAP Toolkit will query SCCM for a list of computers and then MAP will connect to these computers. Scan an IP address range -- This method allows you to specify the starting address and ending address of an IP address range. The wizard will then scan all IP addresses in the range and inventory only those computers. Note: This option can perform poorly, if many IP addresses aren’t being used within the range. Manually enter computer names and credentials -- Use this method if you want to inventory a small number of specific computers. Import computer names from a files -- Using this method, you can create a text file with a list of computer names that will be inventoried. 4. On the All Computers Credentials page, enter the accounts that have administrator rights to connect to the discovered machines. This does not need to a domain account, but needs to be a local administrator. I have entered my domain account that is an administrator on my local machine. Click Next after one or more accounts have been added. NOTE: The MAP Toolkit primarily uses Windows Management Instrumentation (WMI) to collect hardware, device, and software information from the remote computers. In order for the MAP Toolkit to successfully connect and inventory computers in your environment, you have to configure your machines to inventory through WMI and also allow your firewall to enable remote access through WMI. The MAP Toolkit also requires remote registry access for certain assessments. In addition to enabling WMI, you need accounts with administrative privileges to access desktops and servers in your environment. 5. On the Credentials Order page, select the order in which want the MAP Toolkit to connect to the machine and SQL Server. Generally just accept the defaults and click Next. 6. On the Enter Computers Manually page, click Create to pull up at dialog to enter one or more computer names. 7. On the Summary page confirm your settings and then click Finish. After clicking Finish the inventory process will start, as shown below: Windows Azure Readiness results and report After the inventory progress has completed, you can review the results under the Database scenario. On the tile, you will see the number of Windows Server machine with SQL Server that were analyzed, the number of machines that are ready to move without changes and the number of machines that require further changes. If you click this Azure VM Readiness tile, you will see additional details and can generate the Windows Azure VM Readiness Report. After the report is generated, select View | Saved Reports and Proposals to view the location of the report. Open up WindowsAzureVMReadiness* report in Excel. On the Windows tab, you can see the results of the assessment. This report has a column for the Operating System and SQL Server assessment and provides a recommendation on how to resolve, if there a component is not supported. Collect Performance Data Launch the Performance Wizard to collect performance information for the Windows Server machines that you would like the MAP Toolkit to suggest a Windows Azure VM size for. Windows Azure Capacity results and report After the performance metrics are collected, the Azure VM Capacity title will display the number of Virtual Machine sizes that are suggested for the Windows Server and Linux machines that were analyzed. You can then click on the Azure VM Capacity tile to see the capacity details and generate the Windows Azure VM Capacity Report. Within this report, you can view the performance data that was collected and the Virtual Machine sizes. MAP Toolkit 8.0 Beta is available for download here Your participation and feedback is very important to make the MAP Toolkit work better for you. We encourage you to participate in the beta program and provide your feedback at [email protected] or through one of our surveys. Useful References: Windows Azure Homepage How to guides for Windows Azure Virtual Machines Provisioning a SQL Server Virtual Machine on Windows Azure Windows Azure Pricing Peter Saddow Senior Program Manager – MAP Toolkit Team

Read the article
Personal Development : Time, Planning , Repairs & Maintenance

- by Rajesh Pillai

Personal Development : Time, Planning, Repairs & Maintenance These are just my thoughts, but some you may find something interesting in it. Please think over it. We may know many things, but still we always keeps procrastinating it. I have written this as I have heard many people coming back and saying they don’t have time to do things they like. These are my thoughts buy may be useful to someone else too. Certain things in life needs periodic repairs and maintenance. To cite some examples , your CAR, your HOUSE, your personal laptop/desktop, your health etc. Likewise there are certain other things in professional life that requires repair/ maintenance /or some kind of polishing, so that you always stay on top of it. But they are not always obvious. Some of them are - Improving your communication skills - Increasing your vocabulary - Upgrading your technical skills - Pursuing your hobby - Increasing your knowledge/awareness etc… etc… And then there are certain things that we are always short of…. one is TIME. We all know TIME is one of the most precious things in life and yet we all are very miserable at managing it. Remember you can only manage it and not control it. You can only control which you own or which you create. In theory time is infinite. So, there should be abundant of it. But remember one thing, you know this, it’s not reversible. Once it has elapsed you cannot live it again. Think over it. So, how do find that golden 25th hour every day. To find the 25th hour you need to reflect back on your current daily activities. Analyze them and see where you are spending most of your time and is it really important. Even the 8 hours that you spent in the office, is it spent fruitfully. At the end of the day is the 8 precious hour that you spent was worth it. Just reflect back on your activities. Did you learn something? If yes did you make a point to NOTE IT. If you didn’t NOTED it then was the time you spent really worth it. Just ponder over it. Some calculations of your daily activities where most of the time is spent. Let’s start (in no particular order though) - Sleep (6.5 hours) [Remember you only require 6 good hours of sleep every day]. Some may thing it is 8, but it’s a myth. o To achive 6 hours of sleep and be in good health you can practice 15 minutes of daily meditation. So effectively you can round it to 6.5 hours. - Morning chores(2 hours) : Some may need to prepare breakfast and all other things. - Office commuting (avg. to and fro 3 hours) - Office Work (avg 9.5 hours) Total Hours: 21 hours effective time which is spent irrespective of what you do. There may be some variations here and there. Still you have 3 hours EXTRA. Where do these 3 hours go? If you can find it, then you may get that golden 25th hour out of these 3 hours. Let’s discount 2 hours for contingencies, still you have 1 hour with you. If you can’t find it then you are living a direction less life. As you can see, the 25th Hour lies within the 24 hours of the day. It’s upto each one of us to find and make use of it. Now what can you do with that 25th hour i.e. 1 hour extra of your life. Imagine the possibility. Again some calculations 1 hour daily * 30 days = 30 hours every month 30 hours pm * 12 month = 360 hours every year. 360 hours every year seems very promising. Let’s add some contingencies, say, let’s be optimistic and say 50 % contingency. Still you have 180 hours every year. That leaves with 30 minutes every day of extra time. That’s hell a lot of time, if you could manage it. These may sound like a high talk [yes, it is, unless you apply these simple rules and rationalize your everyday living and stop procrastinating]. NOTE: I haven’t taken weekend, holidays and leaves into account. So, that leaves us with a lot of buffer time. You can meet family friends, relatives, other tasks, and yet have these 180 pure hours of joy every year. Do whatever you want to do with it. So, how important is this 180 hours per year to you? Just think over it. You may use it the way you like - 50 hours [pursue your hobby like drawing, crafting, learn dance, learn juggling, learn swimming, travelling hmm.. anything you like doing and you didn’t had time to do it.] - 30 hours you can learn a new programming language or technology (i.e. you can get comfortable with it) - 50 hours [improve existing skills] - 20 hours [improve you communication skill]. Do some light reading. - 30 hours [YOU DECIDE WHAT TO DO]? So, if you had done this for one year you would have learnt a new programming language, upgraded existing skills, improved you communication etc.. If you had done this for two years.. imagine the level of personal development or growth which you may have attained….. If you had done this for three years….. NOW I think I don’t need to mention this… So, you still have TIME, as they say TIME is infinite. So, make judicious use of this precious thing. And never ever comeback saying “I don’t have time”. So, if you are RICH in TIME, everything else will be automatically taken care of, as those things may just be a byproduct of how you spend your time… So, happy TIMING your TIME everyday.

Read the article
The Data Scientist

- by BuckWoody

A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I think it means, and why I’m excited about it. In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets. The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly. I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas. Persistence The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all? This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization. Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to. Processing Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on. A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one. Presentation This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays. This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does. And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist. Why I’m excited I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail. Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data. So watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way.

Read the article
SQL SERVER – Thinking about Deprecated, Discontinued Features and Breaking Changes while Upgrading to SQL Server 2012 – Guest Post by Nakul Vachhrajani

- by pinaldave

Nakul Vachhrajani is a Technical Specialist and systems development professional with iGATE having a total IT experience of more than 7 years. Nakul is an active blogger with BeyondRelational.com (150+ blogs), and can also be found on forums at SQLServerCentral and BeyondRelational.com. Nakul has also been a guest columnist for SQLAuthority.com and SQLServerCentral.com. Nakul presented a webcast on the “Underappreciated Features of Microsoft SQL Server” at the Microsoft Virtual Tech Days Exclusive Webcast series (May 02-06, 2011) on May 06, 2011. He is also the author of a research paper on Database upgrade methodologies, which was published in a CSI journal, published nationwide. In addition to his passion about SQL Server, Nakul also contributes to the academia out of personal interest. He visits various colleges and universities as an external faculty to judge project activities being carried out by the students. Disclaimer: The opinions expressed herein are his own personal opinions and do not represent his employer’s view in anyway. Blog | LinkedIn | Twitter | Google+ Let us hear the thoughts of Nakul in first person - Those who have been following my blogs would be aware that I am recently running a series on the database engine features that have been deprecated in Microsoft SQL Server 2012. Based on the response that I have received, I was quite surprised to know that most of the audience found these to be breaking changes, when in fact, they were not! It was then that I decided to write a little piece on how to plan your database upgrade such that it works with the next version of Microsoft SQL Server. Please note that the recommendations made in this article are high-level markers and are intended to help you think over the specific steps that you would need to take to upgrade your database. Refer the documentation – Understand the terms Change is the only constant in this world. Therefore, whenever customer requirements, newer architectures and designs require software vendors to make a change to the keywords, functions, etc; they ensure that they provide their end users sufficient time to migrate over to the new standards before dropping off the old ones. Microsoft does that too with it’s Microsoft SQL Server product. Whenever a new SQL Server release is announced, it comes with a list of the following features: Breaking changes These are changes that would break your currently running applications, scripts or functionalities that are based on earlier version of Microsoft SQL Server These are mostly features whose behavior has been changed keeping in mind the newer architectures and designs Lesson: These are the changes that you need to be most worried about! Discontinued features These features are no longer available in the associated version of Microsoft SQL Server These features used to be “deprecated” in the prior release Lesson: Without these changes, your database would not be compliant/may not work with the version of Microsoft SQL Server under consideration Deprecated features These features are those that are still available in the current version of Microsoft SQL Server, but are scheduled for removal in a future version. These may be removed in either the next version or any other future version of Microsoft SQL Server The features listed for deprecation will compose the list of discontinued features in the next version of SQL Server Lesson: Plan to make necessary changes required to remove/replace usage of the deprecated features with the latest recommended replacements Once a feature appears on the list, it moves from bottom to the top, i.e. it is first marked as “Deprecated” and then “Discontinued”. We know of “Breaking change” comes later on in the product life cycle. What this means is that if you want to know what features would not work with SQL Server 2012 (and you are currently using SQL Server 2008 R2), you need to refer the list of breaking changes and discontinued features in SQL Server 2012. Use the tools! There are a lot of tools and technologies around us, but it is rarely that I find teams using these tools religiously and to the best of their potential. Below are the top two tools, from Microsoft, that I use every time I plan a database upgrade. The SQL Server Upgrade Advisor Ever since SQL Server 2005 was announced, Microsoft provides a small, very light-weight tool called the “SQL Server upgrade advisor”. The upgrade advisor analyzes installed components from earlier versions of SQL Server, and then generates a report that identifies issues to fix either before or after you upgrade. The analysis examines objects that can be accessed, such as scripts, stored procedures, triggers, and trace files. Upgrade Advisor cannot analyze desktop applications or encrypted stored procedures. Refer the links towards the end of the post to know how to get the Upgrade Advisor. The SQL Server Profiler Another great tool that you can use is the one most SQL Server developers & administrators use often – the SQL Server profiler. SQL Server Profiler provides functionality to monitor the “Deprecation” event, which contains: Deprecation announcement – equivalent to features to be deprecated in a future release of SQL Server Deprecation final support – equivalent to features to be deprecated in the next release of SQL Server You can learn more using the links towards the end of the post. A basic checklist There are a lot of finer points that need to be taken care of when upgrading your database. But, it would be worth-while to identify a few basic steps in order to make your database compliant with the next version of SQL Server: Monitor the current application workload (on a test bed) via the Profiler in order to identify usage of features marked as Deprecated If none appear, you are all set! (This almost never happens) Note down all the offending queries and feature usages Run analysis sessions using the SQL Server upgrade advisor on your database Based on the inputs from the analysis report and Profiler trace sessions, Incorporate solutions for the breaking changes first Next, incorporate solutions for the discontinued features Revisit and document the upgrade strategy for your deployment scenarios Revisit the fall-back, i.e. rollback strategies in case the upgrades fail Because some programming changes are dependent upon the SQL server version, this may need to be done in consultation with the development teams Before any other enhancements are incorporated by the development team, send out the database changes into QA QA strategy should involve a comparison between an environment running the old version of SQL Server against the new one Because minimal application changes have gone in (essential changes for SQL Server version compliance only), this would be possible As an ongoing activity, keep incorporating changes recommended as per the deprecated features list As a DBA, update your coding standards to ensure that the developers are using ANSI compliant code – this code will require a change only if the ANSI standard changes Remember this: Change management is a continuous process. Keep revisiting the product release notes and incorporate recommended changes to stay prepared for the next release of SQL Server. May the power of SQL Server be with you! Links Referenced in this post Breaking changes in SQL Server 2012: Link Discontinued features in SQL Server 2012: Link Get the upgrade advisor from the Microsoft Download Center at: Link Upgrade Advisor page on MSDN: Link Profiler: Review T-SQL code to identify objects no longer supported by Microsoft: Link Upgrading to SQL Server 2012 by Vinod Kumar: Link Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Upgrade

Read the article
CodePlex Daily Summary for Sunday, March 28, 2010

CodePlex Daily Summary for Sunday, March 28, 2010New ProjectsFeed Tracker: Feed Tracker allows you to track your favorite feeds (RSS 2.0 and Atom 1.0) and open them up directly in your browser.FIM 2010 Resource Management Client: The Forefront Identity Management 2010 Resource Management Client is a library to communicate with the FIM 2010 web service. The development langu...Infection Protection: A game about controlling disease outbreak in a city. Developed for OGPC 2010, using Qt.OrthoLab: Homepage of Orthocone open-source laboratory.Paragliding ThermalMarker: Paragliding / Hanggliding Windows Application that receives waypoint files and returns only the thermals that get triggered more often in a place.RSSFalls: RssFalls makes it easier for developers to download RSS or Podcast enclosures.String Library for C++ Language: StrLib++ is a string library for C++ language. for now it support only ANSI strings, later Unicode support will added for UT8, UTF16 and UTF32 for...Sweeper: Sweeper is a Visual Studio 2008 add-in for C# that takes care of many of the trivial code-formatting issues that developers run into - particularly...System.Common: A .Net library that provides methods, properties and more that the .Net Framework doesn't provide.Tiveriad: The framework is designed to help you more easily build modular Windows applicationT-Shirts Online: Online shop build in Silverlight 4 using DIBS as payment module.New ReleasesArkSwitch: ArkSwitch v1.1.3: This release has some important changes. Thanks to MichyPrima for helping with some of the code. 1. Improved theming to more easily support multip...Catharsis: Catharsis 2.5 on catarsa.com: The Catharsis framework has finally its own portal http://catarsa.com The latest release version is 2.5 - string names of properties are not any ...EffiProz - A Pure C# Database: EffiProz CF 1.0: EffiProz for .Net compact framework.Encrypted Notes: Encrypted Notes 1.6: This is the latest version of Encrypted Notes (1.6). It has an installer - it will create a directory 'CPascoe' in My Documents. Once you have ext...Extend SmallBasic: Teaching Extensions v.009: Added Pentagon Crazy Recipe QuizGapi.NET - .NET (C#) wrapper for Google API: Gapi.NET 0.5.0.0: - Fixed some minor bugs. - Add minor features. - Performance improvement. See code check-ins for detailed informationHouseFly experimental controls: HouseFly experimental control: Alpha version of HouseFly experimental controlsiTuner - The iTunes Companion: iTuner 1.2.3738 Beta 2: V1.2 allows you to synchronize one or more iTunes playlists to a USB MP3 player. Beta 2 resolves all known issues. This continues the evolution ye...jQuery Library for SharePoint Web Services: SPServices 0.5.4: IMPORTANT NOTE: This release is in an alpha state. You should only download it if you know what you are getting and are interested in testing it f...JSINQ - LINQ to Objects for JavaScript: JSINQ 1.0: This is the first stable release of JSINQ. It is fully compatible with the 0.9 beta release. It contains the following new features: Now supports ...MSBuild Mercurial Tasks: 1.0.0 Beta: First release of the application. This version integrates all the basic functionalities of Mercurial as defined in the Use Case 1.Open Portal Foundation: Open Portal Foundation V1.4.2: What's news? noscript template was updated naming convention for layout autogenerated controls now use the "master" prefix. The documentation ce...Open Portal Foundation: Open Portal Foundation V1.4.4: What's news? ASP .NET Master page support for custom aspx integrated pages New usercontrols for ASCX, ASPX and Master page integration : Link: f...Paint.NET PSD Plugin: 1.5.0: RLE compression is now working fully on save. File sizes are now competitive with Photoshop's. Saving takes about twice as long with RLE compressi...Paragliding ThermalMarker: ThermalMarker_Alfa0.1: Release Alfa 1Simple Service Locator: Simple Service Locator v0.7: The Simple Service Locator is an easy-to-use Inversion of Control library that is a complete implementation of the Common Service Locator interface...String Library for C++ Language: Release 0.9: version 0.9 beta release DO NOT USE IN SERIOUS PROJECTS this release use default application heap, and because visual studio is using special debug...Sweeper: Sweeper Alpha 1: SweeperA Visual Studio Add-in for C# Code Formatting - Visual Studio 2008 Includes: A UI for options, Enable or disable any specific task you want ...T-Shirts Online: 1.0: First release of the online shop.Twilio Server Library for .NET (TSL.NET): v0.1.0 Beta: This is the first release of TSL.NET. This v0.1.0 release is a Beta. Subsequent builds will be posted as v0.1.x and release-candidate Betas will be...Vr30 OS: Facebook 1.0: Connect you to Facebook without your web browser.Vr30 OS: SkyBlog 1.0: SkyBlog without web browser.Vr30 OS: YouTube 1.0: Youtube without web browserWeb Image Resize Handler: Web Image Resize, Zoom, Rotate and Greyscale v.1.0: Efficient Web Image Resize, Zoom, Rotate and Greyscale cacheing handler for ASP.Net.WinXound: WinXound 3.3.0 Beta 1 for Mac OsX: This is the first Beta release for Apple Mac OsX (Universal Binary). DEBUG HELP NEEDED ! Please signal bugs, suggestions or feedback to: stefano_b...Most Popular ProjectsMetaSharpRawrWBFS ManagerASP.NET Ajax LibraryMicrosoft SQL Server Product Samples: DatabaseSilverlight ToolkitAJAX Control ToolkitLiveUpload to FacebookWindows Presentation Foundation (WPF)ASP.NETMost Active ProjectsRawrjQuery Library for SharePoint Web ServicesManaged Extensibility FrameworkBlogEngine.NETMicrosoft Biology Foundationpatterns & practices: Composite WPF and SilverlightLINQ to TwitterFarseer Physics EngineTable2ClassNB_Store - Free DotNetNuke Ecommerce Catalog Module

Read the article
Five Ways Enterprise 2.0 Can Transform Your Business - Q&A from the Webcast

- by [email protected]

A few weeks ago, Vince Casarez and I presented with KMWorld on the Five Ways Enterprise 2.0 Can Transform Your Business. It was an enjoyable, interactive webcast in which Vince and I discussed the ways Enterprise 2.0 can transform your business and more importantly, highlighted key customer examples of how to do so. If you missed the webcast, you can catch a replay here. We had a lot of audience participation in some of the polls we conducted and in the Q&A session. We weren't able to address all of the questions during the broadcast, so we attempted to answer them here: Q: Which area within your firm focuses on Web 2.0? Meaning, do you find new departments developing just to manage the web 2.0 (Twitter, Facebook, etc.) user experience or are you structuring current departments? A: There are three distinct efforts within Oracle. The first is around delivery of these Web 2.0 services for enterprise deployments. This is the focus of the WebCenter team. The second effort is injecting these Web 2.0 services into use cases that drive the different enterprise applications. This effort is focused on how to manage these external services and bring them into a cohesive flow for marketing programs, customer care, and purchasing. The third effort is how we consume these services internally to enhance Oracle's business delivery. It leverages the technologies and use cases of the first two but also pushes the envelope with regards to future directions of these other two areas. Q: In a business, Web 2.0 is mostly like action logs. How can we leverage the official process practice versus the logs of a recent action? Example: a system configuration modified last night on a call out versus the official practice that everybody would use in the morning.A: The key thing to remember is that most Web 2.0 actions / activity streams today are based on collaboration and communication type actions. At least with public social sites like Facebook and Twitter. What we're delivering as part of the WebCenter Suite are not just these types of activities but also enterprise application activities. These enterprise application activities come from different application modules: purchasing, HR, order entry, sales opportunity, etc. The actions within these systems are normally tied to a business object or process: purchase order/customer, employee or department, customer and supplier, customer and product, respectively. Therefore, the activities or "logs" as you name them are able to be "typed" so that as a viewer, you can filter or decide to see only certain types of information. In your example, you could have a view that only showed you recent "configuration" changes and this could be right next to a view that showed off the items to be watched every morning. Q: It's great to hear about customers using the software but is there any plan for future webinars to show what the products/installs look like? That would be very helpful.A: We don't have a webinar planned to show off the install process. However, we have a viewlet that's posted on Oracle Technology Network. You can see it here:http://www.oracle.com/technetwork/testcontent/wcs-install-098014.htmlAnd we've got excellent documentation that walks you through the steps here:http://download.oracle.com/docs/cd/E14571_01/install.1111/e12001/install.htmAnd there's a whole set of demos and examples of what WebCenter can do at this URL:http://www.oracle.com/technetwork/middleware/webcenter/release11-demos-097468.html Q: How do you anticipate managing metadata across the enterprise to make content findable?A: We need to first make sure we are all talking about the same thing when we use a word like "metadata". Here's why... For a developer, metadata means information that describes key elements of the portal or application and what the portal or application can do. For content systems, metadata means key terms that provide a taxonomy or folksonomy about the information that is being indexed, ordered, and managed. For business intelligence systems, metadata means key terms that provide labels to groups of data that most non-mathematicians need to understand. And for SOA, metadata means labels for parts of the processes that business owners should understand that connect development terminology. There are also additional requirements for metadata to be available to the team building these new solutions as well as requirements to make this metadata available to the running system. These requirements are often separated by "design time" and "run time" respectively. So clearly, a general goal of managing metadata across the enterprise is very challenging. We've invested a huge amount of resources around Oracle Metadata Services (MDS) to be able to provide a more generic system for all of these elements. No other vendor has anything like this technology foundation in their products. This provides a huge benefit to our customers as they will now be able to find content, processes, people, and information from a common set of search interfaces with consistent enterprise wide results. Q: Can you give your definition of terms as to document and content, please?A: Content applies to a broad category of information from Word documents, presentations and reports through attachments to invoices and/or purchase orders. Content is essentially any type of digital asset including images, video, and voice. A document is just one type of content. Q: Do you have special integration tools to realize an interaction between UCM and WebCenter Spaces/Services?A: Yes, we've dedicated a whole team of engineers to exploit the key features of Oracle UCM within WebCenter. While ensuring that WebCenter can connect to other non-Oracle systems, we've made sure that with the combined set of Oracle technology, no other solution can match the combined power and integration. This is part of the Oracle Fusion Middleware strategy which is to provide best in class capabilities for Content and Portals. When combined together, the synergy between the two products enables users to quickly add capabilities when they are needed. For example, simple document sharing is part of the combined product offering, but if legal discovery or archiving is required, Oracle UCM product includes these capabilities that can be quickly added. There's no need to move content around or add another system to support this, it's just a feature that gets turned on within Oracle UCM. Q: All customers have some interaction with their applications and have many older versions, how do you see some of these new Enterprise 2.0 capabilities adding value to existing enterprise application deployments?A: Just as Service Oriented Architectures allowed for connecting the processes of different applications systems to work together, there's a need for a similar approach with regards to these enterprise 2.0 capabilities. Oracle WebCenter is built on a core architecture that allows for SOA of these Enterprise 2.0 services so that one set of scalable services can be used and integrated directly into any type of application. In this way, users can get immediate value out of the Enterprise 2.0 capabilities without having to wait for the next major release or upgrade. These centrally managed WebCenter services expose a set of standard interfaces that make it extremely easy to add them into existing applications no matter what technology the application has been implemented. Q: We've heard about Oracle Next Generation applications called "Fusion Applications", can you tell me how all this works together?A: Oracle WebCenter powers the core collaboration and social computing services found within Fusion Applications. It is the core user experience technology for how all the application screens have been implemented. And the core concept of task flows allows for all the Fusion Applications modules to be adaptable and composable by business users and IT without needing to be a professional developer. Oracle WebCenter is at the heart of the new Fusion Applications. In addition, the same patterns and technologies are now being added to the existing applications including JD Edwards, Siebel, Peoplesoft, and eBusiness Suite. The core technology enables all these customers to have a much smoother upgrade path to Fusion Applications. They get immediate benefits of injecting new user interactions into their existing applications without having to completely move to Fusion Applications. And then when the time comes, their users will already be well versed in how the new capabilities work. Q: Does any of this work with non Oracle software? Other databases? Other application servers? etc.A: We have made sure that Oracle WebCenter delivers the broadest set of development choices so that no matter what technology you developers are using, WebCenter capabilities can be quickly and easily added to the site or application. In addition, we have certified Oracle WebCenter to run against non-Oracle databases like DB2 and SQLServer. We have stated plans for certification against MySQL as well. Later in CY 2011, Oracle will provide certification on non-Oracle application servers such as WebSphere and JBoss. Q: How do we balance User and IT requirements in regards to Enterprise 2.0 technologies?A: Wrong decisions are often made because employee knowledge is not tapped efficiently and opportunities to innovate are often missed because the right people do not work together. Collaboration amongst workers in the right business context is critical for success. While standalone Enterprise 2.0 technologies can improve collaboration for collaboration's sake, using social collaboration tools in the context of business applications and processes will improve business responsiveness and lead companies to a more competitive position. As these systems become more mission critical it is essential that they maintain the highest level of performance and availability while scaling to support larger communities. Q: What are the ways in which Enterprise 2.0 can improve business responsiveness?A: With a wide range of Enterprise 2.0 tools in the marketplace, CIOs need to deploy solutions that will meet the requirements from users as well as address the requirements from IT. Workers want a next-generation user experience that is personalized and aggregates their daily tools and tasks, while IT needs to ensure the solution is secure, scalable, flexible, reliable and easily integrated with existing systems. An open and integrated approach to deploying portals, content management, and collaboration can enhance your business by addressing both the needs of knowledge workers for better information and the IT mandate to conserve resources by simplifying, consolidating and centralizing infrastructure and administration.

Read the article
Curing the Database-Application mismatch

- by Phil Factor

If an application requires access to a database, then you have to be able to deploy it so as to be version-compatible with the database, in phase. If you can deploy both together, then the application and database must normally be deployed at the same version in which they, together, passed integration and functional testing. When a single database supports more than one application, then the problem gets more interesting. I’ll need to be more precise here. It is actually the application-interface definition of the database that needs to be in a compatible ‘version’. Most databases that get into production have no separate application-interface; in other words they are ‘close-coupled’. For this vast majority, the whole database is the application-interface, and applications are free to wander through the bowels of the database scot-free. If you’ve spurned the perceived wisdom of application architects to have a defined application-interface within the database that is based on views and stored procedures, any version-mismatch will be as sensitive as a kitten. A team that creates an application that makes direct access to base tables in a database will have to put a lot of energy into keeping Database and Application in sync, to say nothing of having to tackle issues such as security and audit. It is not the obvious route to development nirvana. I’ve been in countless tense meetings with application developers who initially bridle instinctively at the apparent restrictions of being ‘banned’ from the base tables or routines of a database. There is no good technical reason for needing that sort of access that I’ve ever come across. Everything that the application wants can be delivered via a set of views and procedures, and with far less pain for all concerned: This is the application-interface. If more than zero developers are creating a database-driven application, then the project will benefit from the loose-coupling that an application interface brings. What is important here is that the database development role is separated from the application development role, even if it is the same developer performing both roles. The idea of an application-interface with a database is as old as I can remember. The big corporate or government databases generally supported several applications, and there was little option. When a new application wanted access to an existing corporate database, the developers, and myself as technical architect, would have to meet with hatchet-faced DBAs and production staff to work out an interface. Sure, they would talk up the effort involved for budgetary reasons, but it was routine work, because it decoupled the database from its supporting applications. We’d be given our own stored procedures. One of them, I still remember, had ninety-two parameters. All database access was encapsulated in one application-module. If you have a stable defined application-interface with the database (Yes, one for each application usually) you need to keep the external definitions of the components of this interface in version control, linked with the application source, and carefully track and negotiate any changes between database developers and application developers. Essentially, the application development team owns the interface definition, and the onus is on the Database developers to implement it and maintain it, in conformance. Internally, the database can then make all sorts of changes and refactoring, as long as source control is maintained. If the application interface passes all the comprehensive integration and functional tests for the particular version they were designed for, nothing is broken. Your performance-testing can ‘hang’ on the same interface, since databases are judged on the performance of the application, not an ‘internal’ database process. The database developers have responsibility for maintaining the application-interface, but not its definition, as they refactor the database. This is easily tested on a daily basis since the tests are normally automated. In this setting, the deployment can proceed if the more stable application-interface, rather than the continuously-changing database, passes all tests for the version of the application. Normally, if all goes well, a database with a well-designed application interface can evolve gracefully without changing the external appearance of the interface, and this is confirmed by integration tests that check the interface, and which hopefully don’t need to be altered at all often. If the application is rapidly changing its ‘domain model’ in the light of an increased understanding of the application domain, then it can change the interface definitions and the database developers need only implement the interface rather than refactor the underlying database. The test team will also have to redo the functional and integration tests which are, of course ‘written to’ the definition. The Database developers will find it easier if these tests are done before their re-wiring job to implement the new interface. If, at the other extreme, an application receives no further development work but survives unchanged, the database can continue to change and develop to keep pace with the requirements of the other applications it supports, and needs only to take care that the application interface is never broken. Testing is easy since your automated scripts to test the interface do not need to change. The database developers will, of course, maintain their own source control for the database, and will be likely to maintain versions for all major releases. However, this will not need to be shared with the applications that the database servers. On the other hand, the definition of the application interfaces should be within the application source. Changes in it have to be subject to change-control procedures, as they will require a chain of tests. Once you allow, instead of an application-interface, an intimate relationship between application and database, we are in the realms of impedance mismatch, over and above the obvious security problems. Part of this impedance problem is a difference in development practices. Whereas the application has to be regularly built and integrated, this isn’t necessarily the case with the database. An RDBMS is inherently multi-user and self-integrating. If the developers work together on the database, then a subsequent integration of the database on a staging server doesn’t often bring nasty surprises. A separate database-integration process is only needed if the database is deliberately built in a way that mimics the application development process, but which hampers the normal database-development techniques. This process is like demanding a official walking with a red flag in front of a motor car. In order to closely coordinate databases with applications, entire databases have to be ‘versioned’, so that an application version can be matched with a database version to produce a working build without errors. There is no natural process to ‘version’ databases. Each development project will have to define a system for maintaining the version level. A curious paradox occurs in development when there is no formal application-interface. When the strains and cracks happen, the extra meetings, bureaucracy, and activity required to maintain accurate deployments looks to IT management like work. They see activity, and it looks good. Work means progress. Management then smile on the design choices made. In IT, good design work doesn’t necessarily look good, and vice versa.

Read the article
Java collision detection and player movement: tips

- by Loris

I have read a short guide for game develompent (java, without external libraries). I'm facing with collision detection and player (and bullets) movements. Now i put the code. Most of it is taken from the guide (should i link this guide?). I'm just trying to expand and complete it. This is the class that take care of updates movements and firing mechanism (and collision detection): public class ArenaController { private Arena arena; /** selected cell for movement */ private float targetX, targetY; /** true if droid is moving */ private boolean moving = false; /** true if droid is shooting to enemy */ private boolean shooting = false; private DroidController droidController; public ArenaController(Arena arena) { this.arena = arena; this.droidController = new DroidController(arena); } public void update(float delta) { Droid droid = arena.getDroid(); //droid movements if (moving) { droidController.moveDroid(delta, targetX, targetY); //check if arrived if (droid.getX() == targetX && droid.getY() == targetY) moving = false; } //firing mechanism if(shooting) { //stop shot if there aren't bullets if(arena.getBullets().isEmpty()) { shooting = false; } for(int i = 0; i < arena.getBullets().size(); i++) { //current bullet Bullet bullet = arena.getBullets().get(i); System.out.println(bullet.getBounds()); //angle calculation double angle = Math.atan2(bullet.getEnemyY() - bullet.getY(), bullet.getEnemyX() - bullet.getX()); //increments x and y bullet.setX((float) (bullet.getX() + (Math.cos(angle) * bullet.getSpeed() * delta))); bullet.setY((float) (bullet.getY() + (Math.sin(angle) * bullet.getSpeed() * delta))); //collision with obstacles for(int j = 0; j < arena.getObstacles().size(); j++) { Obstacle obs = arena.getObstacles().get(j); if(bullet.getBounds().intersects(obs.getBounds())) { System.out.println("Collision detect!"); arena.removeBullet(bullet); } } //collisions with enemies for(int j = 0; j < arena.getEnemies().size(); j++) { Enemy ene = arena.getEnemies().get(j); if(bullet.getBounds().intersects(ene.getBounds())) { System.out.println("Collision detect!"); arena.removeBullet(bullet); } } } } } public boolean onClick(int x, int y) { //click on empty cell if(arena.getGrid()[(int)(y / Arena.TILE)][(int)(x / Arena.TILE)] == null) { //coordinates targetX = x / Arena.TILE; targetY = y / Arena.TILE; //enables movement moving = true; return true; } //click on enemy: fire if(arena.getGrid()[(int)(y / Arena.TILE)][(int)(x / Arena.TILE)] instanceof Enemy) { //coordinates float enemyX = x / Arena.TILE; float enemyY = y / Arena.TILE; //new bullet Bullet bullet = new Bullet(); //start coordinates bullet.setX(arena.getDroid().getX()); bullet.setY(arena.getDroid().getY()); //end coordinates (enemie) bullet.setEnemyX(enemyX); bullet.setEnemyY(enemyY); //adds bullet to arena arena.addBullet(bullet); //enables shooting shooting = true; return true; } return false; } As you can see for collision detection i'm trying to use Rectangle object. Droid example: import java.awt.geom.Rectangle2D; public class Droid { private float x; private float y; private float speed = 20f; private float rotation = 0f; private float damage = 2f; public static final int DIAMETER = 32; private Rectangle2D rectangle; public Droid() { rectangle = new Rectangle2D.Float(x, y, DIAMETER, DIAMETER); } public float getX() { return x; } public void setX(float x) { this.x = x; //rectangle update rectangle.setRect(x, y, DIAMETER, DIAMETER); } public float getY() { return y; } public void setY(float y) { this.y = y; //rectangle update rectangle.setRect(x, y, DIAMETER, DIAMETER); } public float getSpeed() { return speed; } public void setSpeed(float speed) { this.speed = speed; } public float getRotation() { return rotation; } public void setRotation(float rotation) { this.rotation = rotation; } public float getDamage() { return damage; } public void setDamage(float damage) { this.damage = damage; } public Rectangle2D getRectangle() { return rectangle; } } For now, if i start the application and i try to shot to an enemy, is immediately detected a collision and the bullet is removed! Can you help me with this? If the bullet hit an enemy or an obstacle in his way, it must disappear. Ps: i know that the movements of the bullets should be managed in another class. This code is temporary. update I realized what happens, but not why. With those for loops (which checks collisions) the movements of the bullets are instantaneous instead of gradual. In addition to this, if i add the collision detection to the Droid, the method intersects returns true ALWAYS while the droid is moving! public void moveDroid(float delta, float x, float y) { Droid droid = arena.getDroid(); int bearing = 1; if (droid.getX() > x) { bearing = -1; } if (droid.getX() != x) { droid.setX(droid.getX() + bearing * droid.getSpeed() * delta); //obstacles collision detection for(Obstacle obs : arena.getObstacles()) { if(obs.getRectangle().intersects(droid.getRectangle())) { System.out.println("Collision detected"); //ALWAYS HERE } } //controlla se è arrivato if ((droid.getX() < x && bearing == -1) || (droid.getX() > x && bearing == 1)) droid.setX(x); } bearing = 1; if (droid.getY() > y) { bearing = -1; } if (droid.getY() != y) { droid.setY(droid.getY() + bearing * droid.getSpeed() * delta); if ((droid.getY() < y && bearing == -1) || (droid.getY() > y && bearing == 1)) droid.setY(y); } }

Read the article
Is Financial Inclusion an Obligation or an Opportunity for Banks?

- by tushar.chitra

Why should banks care about financial inclusion? First, the statistics, I think this will set the tone for this blog post. There are close to 2.5 billion people who are excluded from the banking stream and out of this, 2.2 billion people are from the continents of Africa, Latin America and Asia (McKinsey on Society: Global Financial Inclusion). However, this is not just a third-world phenomenon. According to Federal Deposit Insurance Corp (FDIC), in the US, post 2008 financial crisis, one family out of five has either opted out of the banking system or has been moved out (American Banker). Moving this huge unbanked population into mainstream banking is both an opportunity and a challenge for banks. An obvious opportunity is the significant untapped customer base that banks can target, so is the positive brand equity a bank can build by fulfilling its social responsibilities. Also, as banks target the cost-conscious unbanked customer, they will be forced to look at ways to offer cost-effective products and services, necessitating technology upgrades and innovations. However, cost is not the only hurdle in increasing the adoption of banking services. The potential users need to be convinced of the benefits of banking and banks will also face stiff competition from unorganized players. Finally, the banks will have to believe in the viability of this business opportunity, and not treat financial inclusion as an obligation. In what ways can banks target the unbanked For financial inclusion to be a success, banks should adopt innovative business models to develop products that address the stated and unstated needs of the unbanked population and also design delivery channels that are cost effective and viable in the long run. Through business correspondents and facilitators In rural and remote areas, one of the major hurdles in increasing banking penetration is connectivity and accessibility to banking services, which makes last mile inclusion a daunting challenge. To address this, banks can avail the services of business correspondents or facilitators. This model allows banks to establish greater connectivity through a trusted and reliable intermediary. In India, for instance, banks can leverage the local Kirana stores (the mom & pop stores) to service rural and remote areas. With a supportive nudge from the central bank, the commercial banks can enlist these shop owners as business correspondents to increase their reach. Since these neighborhood stores are acquainted with the local population, they can help banks manage the KYC norms, besides serving as a conduit for remittance. Banks also have an opportunity over a period of time to cross-sell other financial products such as micro insurance, mutual funds and pension products through these correspondents. To exercise greater operational control over the business correspondents, banks can also adopt a combination of branch and business correspondent models to deliver financial inclusion. Through mobile devices According to a 2012 world bank report on financial inclusion, out of a world population of 7 billion, over 5 billion or 70% have mobile phones and only 2 billion or 30% have a bank account. What this means for banks is that there is scope for them to leverage this phenomenal growth in mobile usage to serve the unbanked population. Banks can use mobile technology to service the basic banking requirements of their customers with no frills accounts, effectively bringing down the cost per transaction. As I had discussed in my earlier post on mobile payments, though non-traditional players have taken the lead in P2P mobile payments, banks still hold an edge in terms of infrastructure and reliability. Through crowd-funding According to the Crowdfunding Industry Report by Massolution, the global crowdfunding industry raised $2.7 billion in 2012, and is projected to grow to $5.1 billion in 2013. With credit policies becoming tighter and banks becoming more circumspect in terms of loan disbursals, crowdfunding has emerged as an alternative channel for lending. Typically, these initiatives target the unbanked population by offering small loans that are unviable for larger banks. Though a significant proportion of crowdfunding initiatives globally are run by non-banking institutions, banks are also venturing into this space. The next step towards inclusive finance Banks by themselves cannot make financial inclusion a success. There is a need for a whole ecosystem that is supportive of this mission. The policy makers, that include the regulators and government bodies, must be in sync, the IT solution providers must put on their thinking caps to come out with innovative products and solutions, communication channels such as internet and mobile need to expand their reach, and the media and the public need to play an active part. The other challenge for financial inclusion is from the banks themselves. While it is true that financial inclusion will unleash a hitherto hugely untapped market, the normal banking model may be found wanting because of issues such as flexibility, convenience and reliability. The business will be viable only when there is a focus on increasing the usage of existing infrastructure and that is possible when the banks can offer the entire range of products and services to the large number of users of essential banking services. Apart from these challenges, banks will also have to quickly master and replicate the business model to extend their reach to the remotest regions in their respective geographies. They will need to ensure that the transactions deliver a viable business benefit to the bank. For tapping cross-sell opportunities, banks will have to quickly roll-out customized and segment-specific products. The bank staff should be brought in sync with the business plan by convincing them of the viability of the business model and the need for a business correspondent delivery model. Banks, in collaboration with the government and NGOs, will have to run an extensive financial literacy program to educate the unbanked about the benefits of banking. Finally, with the growing importance of retail banking and with many unconventional players eyeing the opportunity in payments and other lucrative areas of banking, banks need to understand the importance of micro and small branches. These micro and small branches can help banks increase their presence without a huge cost burden, provide bankers an opportunity to cross sell micro products and offer a window of opportunity for the large non-banked population to transact without any interference from intermediaries. These branches can also help diminish the role of the unorganized financial sector, such as local moneylenders and unregistered credit societies. This will also help banks build a brand awareness and loyalty among the users, which by itself has a cascading effect on the business operations, especially among the rural and un-banked centers. In conclusion, with the increasingly competitive banking sector facing frequent slowdowns and downturns, the unbanked population presents a huge opportunity for banks to enhance their customer base and fulfill their social responsibility.

Read the article
The SSIS tuning tip that everyone misses

- by Rob Farley

I know that everyone misses this, because I’m yet to find someone who doesn’t have a bit of an epiphany when I describe this. When tuning Data Flows in SQL Server Integration Services, people see the Data Flow as moving from the Source to the Destination, passing through a number of transformations. What people don’t consider is the Source, getting the data out of a database. Remember, the source of data for your Data Flow is not your Source Component. It’s wherever the data is, within your database, probably on a disk somewhere. You need to tune your query to optimise it for SSIS, and this is what most people fail to do. I’m not suggesting that people don’t tune their queries – there’s plenty of information out there about making sure that your queries run as fast as possible. But for SSIS, it’s not about how fast your query runs. Let me say that again, but in bolder text: The speed of an SSIS Source is not about how fast your query runs. If your query is used in a Source component for SSIS, the thing that matters is how fast it starts returning data. In particular, those first 10,000 rows to populate that first buffer, ready to pass down the rest of the transformations on its way to the Destination. Let’s look at a very simple query as an example, using the AdventureWorks database: We’re picking the different Weight values out of the Product table, and it’s doing this by scanning the table and doing a Sort. It’s a Distinct Sort, which means that the duplicates are discarded. It'll be no surprise to see that the data produced is sorted. Obvious, I know, but I'm making a comparison to what I'll do later. Before I explain the problem here, let me jump back into the SSIS world... If you’ve investigated how to tune an SSIS flow, then you’ll know that some SSIS Data Flow Transformations are known to be Blocking, some are Partially Blocking, and some are simply Row transformations. Take the SSIS Sort transformation, for example. I’m using a larger data set for this, because my small list of Weights won’t demonstrate it well enough. Seven buffers of data came out of the source, but none of them could be pushed past the Sort operator, just in case the last buffer contained the data that would be sorted into the first buffer. This is a blocking operation. Back in the land of T-SQL, we consider our Distinct Sort operator. It’s also blocking. It won’t let data through until it’s seen all of it. If you weren’t okay with blocking operations in SSIS, why would you be happy with them in an execution plan? The source of your data is not your OLE DB Source. Remember this. The source of your data is the NCIX/CIX/Heap from which it’s being pulled. Picture it like this... the data flowing from the Clustered Index, through the Distinct Sort operator, into the SELECT operator, where a series of SSIS Buffers are populated, flowing (as they get full) down through the SSIS transformations. Alright, I know that I’m taking some liberties here, because the two queries aren’t the same, but consider the visual. The data is flowing from your disk and through your execution plan before it reaches SSIS, so you could easily find that a blocking operation in your plan is just as painful as a blocking operation in your SSIS Data Flow. Luckily, T-SQL gives us a brilliant query hint to help avoid this. OPTION (FAST 10000) This hint means that it will choose a query which will optimise for the first 10,000 rows – the default SSIS buffer size. And the effect can be quite significant. First let’s consider a simple example, then we’ll look at a larger one. Consider our weights. We don’t have 10,000, so I’m going to use OPTION (FAST 1) instead. You’ll notice that the query is more expensive, using a Flow Distinct operator instead of the Distinct Sort. This operator is consuming 84% of the query, instead of the 59% we saw from the Distinct Sort. But the first row could be returned quicker – a Flow Distinct operator is non-blocking. The data here isn’t sorted, of course. It’s in the same order that it came out of the index, just with duplicates removed. As soon as a Flow Distinct sees a value that it hasn’t come across before, it pushes it out to the operator on its left. It still has to maintain the list of what it’s seen so far, but by handling it one row at a time, it can push rows through quicker. Overall, it’s a lot more work than the Distinct Sort, but if the priority is the first few rows, then perhaps that’s exactly what we want. The Query Optimizer seems to do this by optimising the query as if there were only one row coming through: This 1 row estimation is caused by the Query Optimizer imagining the SELECT operation saying “Give me one row” first, and this message being passed all the way along. The request might not make it all the way back to the source, but in my simple example, it does. I hope this simple example has helped you understand the significance of the blocking operator. Now I’m going to show you an example on a much larger data set. This data was fetching about 780,000 rows, and these are the Estimated Plans. The data needed to be Sorted, to support further SSIS operations that needed that. First, without the hint. ...and now with OPTION (FAST 10000): A very different plan, I’m sure you’ll agree. In case you’re curious, those arrows in the top one are 780,000 rows in size. In the second, they’re estimated to be 10,000, although the Actual figures end up being 780,000. The top one definitely runs faster. It finished several times faster than the second one. With the amount of data being considered, these numbers were in minutes. Look at the second one – it’s doing Nested Loops, across 780,000 rows! That’s not generally recommended at all. That’s “Go and make yourself a coffee” time. In this case, it was about six or seven minutes. The faster one finished in about a minute. But in SSIS-land, things are different. The particular data flow that was consuming this data was significant. It was being pumped into a Script Component to process each row based on previous rows, creating about a dozen different flows. The data flow would take roughly ten minutes to run – ten minutes from when the data first appeared. The query that completes faster – chosen by the Query Optimizer with no hints, based on accurate statistics (rather than pretending the numbers are smaller) – would take a minute to start getting the data into SSIS, at which point the ten-minute flow would start, taking eleven minutes to complete. The query that took longer – chosen by the Query Optimizer pretending it only wanted the first 10,000 rows – would take only ten seconds to fill the first buffer. Despite the fact that it might have taken the database another six or seven minutes to get the data out, SSIS didn’t care. Every time it wanted the next buffer of data, it was already available, and the whole process finished in about ten minutes and ten seconds. When debugging SSIS, you run the package, and sit there waiting to see the Debug information start appearing. You look for the numbers on the data flow, and seeing operators going Yellow and Green. Without the hint, I’d sit there for a minute. With the hint, just ten seconds. You can imagine which one I preferred. By adding this hint, it felt like a magic wand had been waved across the query, to make it run several times faster. It wasn’t the case at all – but it felt like it to SSIS.

Read the article
SQL University: What and why of database testing

- by Mladen Prajdic

This is a post for a great idea called SQL University started by Jorge Segarra also famously known as SqlChicken on Twitter. It’s a collection of blog posts on different database related topics contributed by several smart people all over the world. So this week is mine and we’ll be talking about database testing and refactoring. In 3 posts we’ll cover: SQLU part 1 - What and why of database testing SQLU part 2 - What and why of database refactoring SQLU part 2 – Tools of the trade With that out of the way let us sharpen our pencils and get going. Why test a database The sad state of the industry today is that there is very little emphasis on testing in general. Test driven development is still a small niche of the programming world while refactoring is even smaller. The cause of this is the inability of developers to convince themselves and their managers that writing tests is beneficial. At the moment they are mostly viewed as waste of time. This is because the average person (let’s not fool ourselves, we’re all average) is unable to think about lower future costs in relation to little more current work. It’s orders of magnitude easier to know about the current costs in relation to current amount of work. That’s why programmers convince themselves testing is a waste of time. However we have to ask ourselves what tests are really about? Maybe finding bugs? No, not really. If we introduce bugs, we’re likely to write test around those bugs too. But yes we can find some bugs with tests. The main point of tests is to have reproducible repeatability in our systems. By having a code base largely covered by tests we can know with better certainty what a small code change can break in other parts of the system. By having repeatability we can make code changes with confidence, since we know we’ll see what breaks in other tests. And here comes the inability to estimate future costs. By spending just a few more hours writing those tests we’d know instantly what broke where. Imagine we fix a reported bug. We check-in the code, deploy it and the users are happy. Until we get a call 2 weeks later about a certain monthly process has stopped working. What we don’t know is that this process was developed by a long gone coworker and for some reason it relied on that same bug we’ve happily fixed. There’s no way we could’ve known that. We say OK and go in and fix the monthly process. But what we have no clue about is that there’s this ETL job that relied on data from that monthly process. Now that we’ve fixed the process it’s giving unexpected (yet correct since we fixed it) data to the ETL job. So we have to fix that too. But there’s this part of the app we coded that relies on data from that exact ETL job. And just like that we enter the “Loop of maintenance horror”. With the loop eventually comes blame. Here’s a nice tip for all developers and DBAs out there: If you make a mistake man up and admit to it. All of the above is valid for any kind of software development. Keeping this in mind the database is nothing other than just a part of the application. But a big part! One reason why testing a database is even more important than testing an application is that one database is usually accessed from multiple applications and processes. This makes it the central and vital part of the enterprise software infrastructure. Knowing all this can we really afford not to have tests? What to test in a database Now that we’ve decided we’ll dive into this testing thing we have to ask ourselves what needs to be tested? The short answer is: everything. The long answer is: read on! There are 2 main ways of doing tests: Black box and White box testing. Black box testing means we have no idea how the system internals are built and we only have access to it’s inputs and outputs. With it we test that the internal changes to the system haven’t caused the input/output behavior of the system to change. The most important thing to test here are the edge conditions. It’s where most programs break. Having good edge condition tests we can be more confident that the systems changes won’t break. White box testing has the full knowledge of the system internals. With it we test the internal system changes, different states of the application, etc… White and Black box tests should be complementary to each other as they are very much interconnected. Testing database routines includes testing stored procedures, views, user defined functions and anything you use to access the data with. Database routines are your input/output interface to the database system. They count as black box testing. We test then for 2 things: Data and schema. When testing schema we only care about the columns and the data types they’re returning. After all the schema is the contract to the out side systems. If it changes we usually have to change the applications accessing it. One helpful T-SQL command when doing schema tests is SET FMTONLY ON. It tells the SQL Server to return only empty results sets. This speeds up tests because it doesn’t return any data to the client. After we’ve validated the schema we have to test the returned data. There no other way to do this but to have expected data known before the tests executes and comparing that data to the database routine output. Testing Authentication and Authorization helps us validate who has access to the SQL Server box (Authentication) and who has access to certain database objects (Authorization). For desktop applications and windows authentication this works well. But the biggest problem here are web apps. They usually connect to the database as a single user. Please ensure that that user is not SA or an account with admin privileges. That is just bad. Load testing ensures us that our database can handle peak loads. One often overlooked tool for load testing is Microsoft’s OSTRESS tool. It’s part of RML utilities (x86, x64) for SQL Server and can help determine if our database server can handle loads like 100 simultaneous users each doing 10 requests per second. SQL Profiler can also help us here by looking at why certain queries are slow and what to do to fix them. One particular problem to think about is how to begin testing existing databases. First thing we have to do is to get to know those databases. We can’t test something when we don’t know how it works. To do this we have to talk to the users of the applications accessing the database, run SQL Profiler to see what queries are being run, use existing documentation to decipher all the object relationships, etc… The way to approach this is to choose one part of the database (say a logical grouping of tables that go together) and filter our traces accordingly. Once we’ve done that we move on to the next grouping and so on until we’ve covered the whole database. Then we move on to the next one. Database Testing is a topic that we can spent many hours discussing but let this be a nice intro to the world of database testing. See you in the next post.

Read the article
PASS: Election Changes for 2011

- by Bill Graziano

Last year after the election, the PASS Board created an Election Review Committee. This group was charged with reviewing our election procedures and making suggestions to improve the process. You can read about the formation of the group and review some of the intermediate work on the site – especially in the forums. I was one of the members of the group along with Joe Webb (Chair), Lori Edwards, Brian Kelley, Wendy Pastrick, Andy Warren and Allen White. This group worked from October to April on our election process. Along the way we: Interviewed interested parties including former NomCom members, Board candidates and anyone else that came forward. Held a session at the Summit to allow interested parties to discuss the issues Had numerous conference calls and worked through the various topics I can’t thank these people enough for the work they did. They invested a tremendous number of hours thinking, talking and writing about our elections. I’m proud to say I was a member of this group and thoroughly enjoyed working with everyone (even if I did finally get tired of all the calls.) The ERC delivered their recommendations to the PASS Board prior to our May Board meeting. We reviewed those and made a few modifications. I took their recommendations and rewrote them as procedures while incorporating those changes. Their original recommendations as well as our final document are posted at the ERC documents page. Please take a second and read them BEFORE we start the elections. If you have any questions please post them in the forums on the ERC site. (My final document includes a change log at the end that I decided to leave in. If you want to know which areas to pay special attention to that’s a good start.) Many of those recommendations were already posted in the forums or in the blogs of individual ERC members. Hopefully nothing in the ERC document is too surprising. In this post I’m going to walk through some of the key changes and talk about what I remember from both ERC and Board discussions. I’ll pay a little extra attention to things the Board changed from the ERC. I’d also encourage any of the Board or ERC members to blog their thoughts on this. The Nominating Committee will continue to exist. Personally, I was curious to see what the non-Board ERC members would think about the NomCom. There was broad agreement that a group to vet candidates had value to the organization. The NomCom will be composed of five members. Two will be Board members and three will be from the membership at large. The only requirement for the three community members is that you’ve volunteered in some way (and volunteering is defined very broadly). We expect potential at-large NomCom members to participate in a forum on the PASS site to answer questions from the other PASS members. We’re going to hold an election to determine the three community members. It will be closer to voting for Summit sessions than voting for Board members. That means there won’t be multiple dedicated emails. If you’re at all paying attention it will be easy to participate. Personally I wanted it easy for those that cared to participate but not overwhelm those that didn’t care. I think this strikes a good balance. There’s also a clause that in order to be considered a winner in this NomCom election, you must receive 10 votes. This is something I suggested. I have no idea how popular the NomCom election is going to be. I just wanted a fallback that if no one participated and some random person got in with one or two votes. Any open slots will be filled by the NomCom chair (usually the PASS Immediate Past President). My assumption is that they would probably take the next highest vote getters unless they were throwing flames in the forums or clearly unqualified. As a final check, the Board still approves the final NomCom. The NomCom is going to rank candidates instead of rating them. This has interesting implications. This was championed by another ERC member and I’m hoping they write something about it. This will really force the NomCom to make decisions between candidates. You can’t just rate everyone a 3 and be done with it. It may also make candidates appear further apart than they actually are. I’m looking forward talking with the NomCom after this election and getting their feedback on this. The PASS Board added an option to remove a candidate with a unanimous vote of the NomCom. This was primarily put in place to handle people that lied on their application or had a criminal background or some other unusual situation and we figured it out. We list an explicit goal of three candidate per open slot. We also wanted an easy way to find the NomCom candidate rankings from the ballot. Hopefully this will satisfy those that want a broad candidate pool and those that want the NomCom to identify the most qualified candidates. The primary spokesperson for the NomCom is the committee chair. After the issues around the election last year we didn’t have a good communication plan in place. We should have and that was a failure on the part of the Board. If there is criticism of the election this year I hope that falls squarely on the Board. The community members of the NomCom shouldn’t be fielding complaints over the election process. That said, the NomCom is ranking candidates and we are forcing them to rank some lower than others. I’m sure you’ll each find someone that you think should have been ranked differently. I also want to highlight one other change to the process that we started last year and isn’t included in these documents. I think the candidate forums on the PASS site were tremendously helpful last year in helping people to find out more about candidates. That gives our members a way to ask hard questions of the candidates and publicly see their answers. This year we have two important groups to fill. The first is the NomCom. We need three people from our membership to step up and fill this role. It won’t be easy. You will have to make subjective rankings of your fellow community members. Your actions will be important in deciding who the future leaders of PASS will be. There’s a 50/50 chance that one of the people you interview will be the President of PASS someday. This is not a responsibility to be taken lightly. The second is the slate of candidates. If you’ve ever thought about running for the Board this is the year. We’ve never had nine candidates on the ballot before. Your chance of making it through the NomCom are higher than in any previous year. Unfortunately the more of you that run, the more of you that will lose in the election. And hopefully that competition will mean more community involvement and better Board members for PASS. Is this the end of changes to the election process? It isn’t. Every year that I’ve been on the Board the election process has changed. Some years there have been small changes and some years there have been large changes. After this election we’ll look at how the process worked and decide what steps to take – just like we do every year.

Read the article
Securing an ADF Application using OES11g: Part 1

- by user12587121

Future releases of the Oracle stack should allow ADF applications to be secured natively with Oracle Entitlements Server (OES). In a sequence of postings here I explore one way to achive this with the current technology, namely OES 11.1.1.5 and ADF 11.1.1.6. ADF Security Basics ADF Bascis The Application Development Framework (ADF) is Oracle’s preferred technology for developing GUI based Java applications. It can be used to develop a UI for Swing applications or, more typically in the Oracle stack, for Web and J2EE applications. ADF is based on and extends the Java Server Faces (JSF) technology. To get an idea, Oracle provides an online demo to showcase ADF components. ADF can be used to develop just the UI part of an application, where, for example, the data access layer is implemented using some custom Java beans or EJBs. However ADF also has it’s own data access layer, ADF Business Components (ADF BC) that will allow rapid integration of data from data bases and Webservice interfaces to the ADF UI component. In this way ADF helps implement the MVC approach to building applications with UI and data components. The canonical tutorial for ADF is to open JDeveloper, define a connection to a database, drag and drop a table from the database view to a UI page, build and deploy. One has an application up and running very quickly with the ability to quickly integrate changes to, for example, the DB schema. ADF allows web pages to be created graphically and components like tables, forms, text fields, graphs and so on to be easily added to a page. On top of JSF Oracle have added drag and drop tooling with JDeveloper and declarative binding of the UI to the data layer, be it database, WebService or Java beans. An important addition is the bounded task flow which is a reusable set of pages and transitions. ADF adds some steps to the page lifecycle defined in JSF and adds extra widgets including powerful visualizations. It is worth pointing out that the Oracle Web Center product (portal, content management and so on) is based on and extends ADF. ADF Security ADF comes with it’s own security mechanism that is exposed by JDeveloper at development time and in the WLS Console and Enterprise Manager (EM) at run time. The security elements that need to be addressed in an ADF application are: authentication, authorization of access to web pages, task-flows, components within the pages and data being returned from the model layer. One typically relies on WLS to handle authentication and because of this users and groups will also be handled by WLS. Typically in a Dev environment, users and groups are stored in the WLS embedded LDAP server. One has a choice when enabling ADF security (Application->Secure->Configure ADF Security) about whether to turn on ADF authorization checking or not: In the case where authorization is enabled for ADF one defines a set of roles in which we place users and then we grant access to these roles to the different ADF elements (pages or task flows or elements in a page). An important notion here is the difference between Enterprise Roles and Application Roles. The idea behind an enterprise role is that is defined in terms of users and LDAP groups from the WLS identity store. “Enterprise” in the sense that these are things available for use to all applications that use that store. The other kind of role is an Application Role and the idea is that a given application will make use of Enterprise roles and users to build up a set of roles for it’s own use. These application roles will be available only to that application. The general idea here is that the enterprise roles are relatively static (for example an Employees group in the LDAP directory) while application roles are more dynamic, possibly depending on time, location, accessed resource and so on. One of the things that OES adds that is that we can define these dynamic membership conditions in Role Mapping Policies. To make this concrete, here is how, at design time in Jdeveloper, one assigns these rights in Jdeveloper, which puts them into a file called jazn-data.xml: When the ADF app is deployed to a WLS this JAZN security data is pushed to the system-jazn-data.xml file of the WLS deployment for the policies and application roles and to the WLS backing LDAP for the users and enterprise roles. Note the difference here: after deploying the application we will see the users and enterprise roles show up in the WLS LDAP server. But the policies and application roles are defined in the system-jazn-data.xml file. Consult the embedded WLS LDAP server to manage users and enterprise roles by going to the domain console and then Security Realms->myrealm->Users and Groups: For production environments (or in future to share this data with OES) one would then perform the operation of “reassociating” this security policy and application role data to a DB schema (or an LDAP). This is done in the EM console by reassociating the Security Provider. This blog posting has more explanations and references on this reassociation process. If ADF Authentication and Authorization are enabled then the Security Policies for a deployed application can be managed in EM. Our goal is to be able to manage security policies for the applicaiton rather via OES and it's console. Security Requirements for an ADF Application With this package tour of ADF security we can see that to secure an ADF application with we would expect to be able to take care of at least the following items: Authentication, including a user and user-group store Authorization for page access Authorization for bounded Task Flow access. A bounded task flow has only one point of entry and so if we protect that entry point by calling to OES then all the pages in the flow are protected. Authorization for viewing data coming from the data access layer In the next posting we will describe a sample ADF application and required security policies. References ADF Dev Guide: Fusion Middleware Fusion Developer's Guide for Oracle Application Development Framework: Enabling ADF Security in a Fusion Web Application Oracle tutorial on securing a sample ADF application, appears to require ADF 11.1.2 Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";} Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;}

Read the article
Thou shalt not put code on a piedestal - Code is a tool, no more, no less

- by Ralf Westphal

“Write great code and everything else becomes easier” is what Paul Pagel believes in. That´s his version of an adage by Brian Marick he cites: “treat code as an end, not just a means.” And he concludes: “My post-Agile world is software craftsmanship.” I wonder, if that´s really the way to go. Will “simply” writing great code lead the software industry into the light? He´s alluding to the philosopher Kant who proposed, a human beings should never be treated as a means, but always as an end. But should we transfer this ethical statement into the world of software? I doubt it. Reason #1: Human beings are categorially different from code. They are autonomous entities who need to find a way of living happily together. To Kant it seemed this goal could only be reached if nobody (ab)used a human being for his/her purposes. Because using a human being, i.e. treating it as a means, would contradict the fundamental autonomy and freedom of human beings. People should hold up a symmetric view of their relationships: Since nobody wants to be (ab)used, nobody should (ab)use anybody else. If you want to be treated decently, with respect, in accordance with your own free will - which means as an end - then do the same to other people. Code is dead, it´s a product, it´s a tool for people to reach their goals. No company spends any money on code other than to save money or earn money in the long run. Code is not a puppy. Enterprises do not commission software development to just feel good in its company. Code is not a buddy. Code is a slave, if you will. A mechanical slave, a non-tangible robot. Code is a tool, is a tool. And if we start to treat it differently, if we elevate its status unduely… I guess that will contort our relationship in a contraproductive way. Please get me right: Just because something is “just a tool”, “just a product” does not mean we should not be careful while designing, building, using it. Right to the contrary. We should be very careful when writing code – but not for the code´s sake! We should be careful because we respect our customers who are fellow human beings who should be treated as an end. If we are careless, neglectful, ignorant when producing code on their behalf, then we´re using them. Being sloppy means you´re caring more for yourself that for your customer. You´re then treating the customer as a means to fulfill some of your own needs. That´s plain unethical behavior. Reason #2: The focus should always be on your purpose, not on any tool. But if code is treated as an end, then the focus is on the code. That might sound right, because where else should be your focus as a software developer? But, well, I´d say, your focus should be on delivering value to your customer. Because in the end your customer does not care if you write a single line of code. She just wants her problem to be solved. Solving problems is the purpose of any contractor. Code must be treated just as a means, a tool we know how to handle very well. But if we´re really trying to be craftsmen then we should be conscious about exactly that and act ethically. That means we must never be so focused on our tool as to be unable to suggest better solutions to the problems of our customers than code. I´m all with Paul when he urges us to “Write great code”. Sure, if you need to write code, then by all means do so. Write the best code you can think of – and then try to improve it. Paul has all the best intentions when he signs Brians “treat code as an end” - but as we all know: “The road to hell is paved with best intentions” ;-) Yes, I can imagine a “hell of code focus”. In fact, I don´t need to imagine it, I´m seeing it quite often. Because code hell is whereever two developers stand together and are so immersed in talking about all sorts of coding tricks, design patterns, code smells, technologies, platforms, tools that they lose sight of the big picture. Talking about TDD or SOLID or refactoring is a sign of consciousness – relative to the “cowboy coders” view of the world. But from yet another point of view TDD, SOLID, and refactoring are just cures for ailments within a system. And I fear, if “Writing great code” is the only focus or the main focus of software development, then we as an industry lose the ability to see that. Focus draws a line around something, it defines a horizon for perceptions and thinking. So if we focus on code our horizon ends where “the land of code” ends. I don´t think that should be our professional attitude. So what about Software Craftsmanship as the next big thing after Agility? I think Software Craftsmanship has an important message for all software developers and beyond. But to make it the successor of the Agility movement seems to miss a point. Agility never claimed to solve all software development problems, I´d say. So to blame it for having missed out on certain aspects of it is wrong. If I had to summarize Agility in one word I´d say “Value”. Agility put value for the customer back in software development. Focus on delivering value early and often – that´s Agility´s mantra. All else follows from that. And I ask you: Is that obsolete? Is delivering value not hip anymore? No, sure not. That´s our very purpose as software developers. So how can Agility become obsolete and need to be replaced? We need to do away with this “either/or”-thinking. It´s either Agility or Lean or Software Craftsmanship or whatnot. Instead we should start integrating concepts and movements. Think “both/and”. Think Agility plus Software Craftsmanship plus Lean plus whatnot. We don´t neet to tear down anything from a piedestal and replace it with a new idol. Instead we should do away with piedestals and arrange whatever is helpful is a circle. Then we can turn to concepts, movements for whatever they are best. After 10 years of Agility we should be able to identify what it was good at – and keep that. Keep Agility around and add whatever Agility was lacking or never concerned with. Add whatever is at the core of Software Craftsmanship. Add whatever is at the core of Lean etc. But don´t call out the age of Post-Agility. Because it better never will end. Because once we start to lose Agility´s core we´re losing focus of the customer.

Read the article

Search Results

Search found 5313 results on 213 pages for 'steve care'.

Page 202/213 | < Previous Page | 198 199 200 201 202 203 204 205 206 207 208 209 | Next Page >

- by jorg

- by Geordie

- by James Michael Hare

- by BuckWoody

- by Roger Hart

- by James Michael Hare

- by James Michael Hare

- by James Michael Hare

- by Pinal Dave

- by BuckWoody

- by Ricardo Ferreira

- by SQLOS Team

- by Rajesh Pillai

- by BuckWoody

- by pinaldave

- by [email protected]

- by Phil Factor

- by Loris

- by tushar.chitra

- by Rob Farley

- by Mladen Prajdic

- by Bill Graziano

- by user12587121

- by Ralf Westphal

< Previous Page | 198 199 200 201 202 203 204 205 206 207 208 209 | Next Page >