Search Results

Search found 3996 results on 160 pages for 'operations'.

Page 145/160 | < Previous Page | 141 142 143 144 145 146 147 148 149 150 151 152 | Next Page >

Master Data Management and Cloud Computing

- by david.butler(at)oracle.com

Cloud Computing is all the rage these days. There are many reasons why this is so. But like its predecessor, Service Oriented Architecture, it can fall on hard times if the underlying data is left unmanaged. Master Data Management is the perfect Cloud companion. It can materially increase the chances for successful Cloud initiatives. In this blog, I'll review the nature of the Cloud and show how MDM fits in. Here's the National Institute of Standards and Technology Cloud definition: • Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction. Cloud architectures have three main layers: applications or Software as a Service (SaaS), Platforms as a Service (PaaS), and Infrastructure as a Service (IaaS). SaaS generally refers to applications that are delivered to end-users over the Internet. Oracle CRM On Demand is an example of a SaaS application. Today there are hundreds of SaaS providers covering a wide variety of applications including Salesforce.com, Workday, and Netsuite. Oracle MDM applications are located in this layer of Oracle's On Demand enterprise Cloud platform. We call it Master Data as a Service (MDaaS). PaaS generally refers to an application deployment platform delivered as a service. They are often built on a grid computing architecture and include database and middleware. Oracle Fusion Middleware is in this category and includes the SOA and Data Integration products used to connect SaaS applications including MDM. Finally, IaaS generally refers to computing hardware (servers, storage and network) delivered as a service. This typically includes the associated software as well: operating systems, virtualization, clustering, etc. Cloud Computing benefits are compelling for a large number of organizations. These include significant cost savings, increased flexibility, and fast deployments. Cost advantages include paying for just what you use. This is especially critical for organizations with variable or seasonal usage. Companies don't have to invest to support peak computing periods. Costs are also more predictable and controllable. Increased agility includes access to the latest technology and experts without making significant up front investments. While Cloud Computing is certainly very alluring with a clear value proposition, it is not without its challenges. An IDC survey of 244 IT executives/CIOs and their line-of-business (LOB) colleagues identified a number of issues: Security - 74% identified security as an issue involving data privacy and resource access control. Integration - 61% found that it is hard to integrate Cloud Apps with in-house applications. Operational Costs - 50% are worried that On Demand will actually cost more given the impact of poor data quality on the rest of the enterprise. Compliance - 49% felt that compliance with required regulatory, legal and general industry requirements (such as PCI, HIPAA and Sarbanes-Oxley) would be a major issue. When control is lost, the ability of a provider to directly manage how and where data is deployed, used and destroyed is negatively impacted. There are others, but I singled out these four top issues because Master Data Management, properly incorporated into a Cloud Computing infrastructure, can significantly ameliorate all of these problems. Cloud Computing can literally rain raw data across the enterprise. According to fellow blogger, Mike Ferguson, "the fracturing of data caused by the adoption of cloud computing raises the importance of MDM in keeping disparate data synchronized." David Linthicum, CTO Blue Mountain Labs blogs that "the lack of MDM will become more of an issue as cloud computing rises. We're moving from complex federated on-premise systems, to complex federated on-premise and cloud-delivered systems." Left unmanaged, non-standard, inconsistent, ungoverned data with questionable quality can pollute analytical systems, increase operational costs, and reduce the ROI in Cloud and On-Premise applications. As cloud computing becomes more relevant, and more data, applications, services, and processes are moved out to cloud computing platforms, the need for MDM becomes ever more important. Oracle's MDM suite is designed to deal with all four of the above Cloud issues listed in the IDC survey. Security - MDM manages all master data attribute privacy and resource access control issues. Integration - MDM pre-integrates Cloud Apps with each other and with On Premise applications at the data level. Operational Costs - MDM significantly reduces operational costs by increasing data quality, thereby improving enterprise business processes efficiency. Compliance - MDM, with its built in Data Governance capabilities, insures that the data is governed according to organizational standards. This facilitates rapid and accurate reporting for compliance purposes. Oracle MDM creates governed high quality master data. A unified cleansed and standardized data view is produced. The Oracle Customer Hub creates a single view of the customer. The Oracle Product Hub creates high quality product data designed to support all go-to-market processes. Oracle Supplier Hub dramatically reduces the chances of 'supplier exceptions'. Oracle Site Hub masters locations. And Oracle Hyperion Data Relationship Management masters financial reference data and manages enterprise hierarchies across operational areas from ERP to EPM and CRM to SCM. Oracle Fusion Middleware connects Cloud and On Premise applications to MDM Hubs and brings high quality master data to your enterprise business processes. An independent analyst once said "Poor data quality is like dirt on the windshield. You may be able to drive for a long time with slowly degrading vision, but at some point, you either have to stop and clear the windshield or risk everything." Cloud Computing has the potential to significantly degrade data quality across the enterprise over time. Deploying a Master Data Management solution prior to or in conjunction with a move to the Cloud can insure that the data flowing into the enterprise from the Cloud is clean and governed. This will in turn insure that expected returns on the investment in Cloud Computing will be realized. Oracle MDM has proven its metal in this area and has the customers to back that up. In fact, I will be hosting a webcast on Tuesday, April 10th at 10 am PT with one of our top Cloud customers, the Church Pension Group. They have moved all mainline applications to a hosted model and use Oracle MDM to insure the master data is managed and cleansed before it is propagated to other cloud and internal systems. I invite you join Martin Hossfeld, VP, IT Operations, and Danette Patterson, Enterprise Data Manager as they review business drivers for MDM and hosted applications, how they did it, the benefits achieved, and lessons learned. You can register for this free webcast here. Hope to see you there.

Read the article
SPARC T3-1 Record Results Running JD Edwards EnterpriseOne Day in the Life Benchmark with Added Batch Component

- by Brian

Using Oracle's SPARC T3-1 server for the application tier and Oracle's SPARC Enterprise M3000 server for the database tier, a world record result was produced running the Oracle's JD Edwards EnterpriseOne applications Day in the Life benchmark run concurrently with a batch workload. The SPARC T3-1 server based result has 25% better performance than the IBM Power 750 POWER7 server even though the IBM result did not include running a batch component. The SPARC T3-1 server based result has 25% better space/performance than the IBM Power 750 POWER7 server as measured by the online component. The SPARC T3-1 server based result is 5x faster than the x86-based IBM x3650 M2 server system when executing the online component of the JD Edwards EnterpriseOne 9.0.1 Day in the Life benchmark. The IBM result did not include a batch component. The SPARC T3-1 server based result has 2.5x better space/performance than the x86-based IBM x3650 M2 server as measured by the online component. The combination of SPARC T3-1 and SPARC Enterprise M3000 servers delivered a Day in the Life benchmark result of 5000 online users with 0.875 seconds of average transaction response time running concurrently with 19 Universal Batch Engine (UBE) processes at 10 UBEs/minute. The solution exercises various JD Edwards EnterpriseOne applications while running Oracle WebLogic Server 11g Release 1 and Oracle Web Tier Utilities 11g HTTP server in Oracle Solaris Containers, together with the Oracle Database 11g Release 2. The SPARC T3-1 server showed that it could handle the additional workload of batch processing while maintaining the same number of online users for the JD Edwards EnterpriseOne Day in the Life benchmark. This was accomplished with minimal loss in response time. JD Edwards EnterpriseOne 9.0.1 takes advantage of the large number of compute threads available in the SPARC T3-1 server at the application tier and achieves excellent response times. The SPARC T3-1 server consolidates the application/web tier of the JD Edwards EnterpriseOne 9.0.1 application using Oracle Solaris Containers. Containers provide flexibility, easier maintenance and better CPU utilization of the server leaving processing capacity for additional growth. A number of Oracle advanced technology and features were used to obtain this result: Oracle Solaris 10, Oracle Solaris Containers, Oracle Java Hotspot Server VM, Oracle WebLogic Server 11g Release 1, Oracle Web Tier Utilities 11g, Oracle Database 11g Release 2, the SPARC T3 and SPARC64 VII+ based servers. This is the first published result running both online and batch workload concurrently on the JD Enterprise Application server. No published results are available from IBM running the online component together with a batch workload. The 9.0.1 version of the benchmark saw some minor performance improvements relative to 9.0. When comparing between 9.0.1 and 9.0 results, the reader should take this into account when the difference between results is small. Performance Landscape JD Edwards EnterpriseOne Day in the Life Benchmark Online with Batch Workload This is the first publication on the Day in the Life benchmark run concurrently with batch jobs. The batch workload was provided by Oracle's Universal Batch Engine. System RackUnits Online Users Resp Time (sec) BatchConcur(# of UBEs) BatchRate(UBEs/m) Version SPARC T3-1, 1xSPARC T3 (1.65 GHz), Solaris 10 M3000, 1xSPARC64 VII+ (2.86 GHz), Solaris 10 4 5000 0.88 19 10 9.0.1 Resp Time (sec) — Response time of online jobs reported in seconds Batch Concur (# of UBEs) — Batch concurrency presented in the number of UBEs Batch Rate (UBEs/m) — Batch transaction rate in UBEs/minute. JD Edwards EnterpriseOne Day in the Life Benchmark Online Workload Only These results are for the Day in the Life benchmark. They are run without any batch workload. System RackUnits Online Users ResponseTime (sec) Version SPARC T3-1, 1xSPARC T3 (1.65 GHz), Solaris 10 M3000, 1xSPARC64 VII (2.75 GHz), Solaris 10 4 5000 0.52 9.0.1 IBM Power 750, 1xPOWER7 (3.55 GHz), IBM i7.1 4 4000 0.61 9.0 IBM x3650M2, 2xIntel X5570 (2.93 GHz), OVM 2 1000 0.29 9.0 IBM result from http://www-03.ibm.com/systems/i/advantages/oracle/, IBM used WebSphere Configuration Summary Hardware Configuration: 1 x SPARC T3-1 server 1 x 1.65 GHz SPARC T3 128 GB memory 16 x 300 GB 10000 RPM SAS 1 x Sun Flash Accelerator F20 PCIe Card, 92 GB 1 x 10 GbE NIC 1 x SPARC Enterprise M3000 server 1 x 2.86 SPARC64 VII+ 64 GB memory 1 x 10 GbE NIC 2 x StorageTek 2540 + 2501 Software Configuration: JD Edwards EnterpriseOne 9.0.1 with Tools 8.98.3.3 Oracle Database 11g Release 2 Oracle 11g WebLogic server 11g Release 1 version 10.3.2 Oracle Web Tier Utilities 11g Oracle Solaris 10 9/10 Mercury LoadRunner 9.10 with Oracle Day in the Life kit for JD Edwards EnterpriseOne 9.0.1 Oracle’s Universal Batch Engine - Short UBEs and Long UBEs Benchmark Description JD Edwards EnterpriseOne is an integrated applications suite of Enterprise Resource Planning (ERP) software. Oracle offers 70 JD Edwards EnterpriseOne application modules to support a diverse set of business operations. Oracle's Day in the Life (DIL) kit is a suite of scripts that exercises most common transactions of JD Edwards EnterpriseOne applications, including business processes such as payroll, sales order, purchase order, work order, and other manufacturing processes, such as ship confirmation. These are labeled by industry acronyms such as SCM, CRM, HCM, SRM and FMS. The kit's scripts execute transactions typical of a mid-sized manufacturing company. The workload consists of online transactions and the UBE workload of 15 short and 4 long UBEs. LoadRunner runs the DIL workload, collects the user’s transactions response times and reports the key metric of Combined Weighted Average Transaction Response time. The UBE processes workload runs from the JD Enterprise Application server. Oracle's UBE processes come as three flavors: Short UBEs < 1 minute engage in Business Report and Summary Analysis, Mid UBEs > 1 minute create a large report of Account, Balance, and Full Address, Long UBEs > 2 minutes simulate Payroll, Sales Order, night only jobs. The UBE workload generates large numbers of PDF files reports and log files. The UBE Queues are categorized as the QBATCHD, a single threaded queue for large UBEs, and the QPROCESS queue for short UBEs run concurrently. One of the Oracle Solaris Containers ran 4 Long UBEs, while another Container ran 15 short UBEs concurrently. The mixed size UBEs ran concurrently from the SPARC T3-1 server with the 5000 online users driven by the LoadRunner. Oracle’s UBE process performance metric is Number of Maximum Concurrent UBE processes at transaction rate, UBEs/minute. Key Points and Best Practices Two JD Edwards EnterpriseOne Application Servers and two Oracle Fusion Middleware WebLogic Servers 11g R1 coupled with two Oracle Fusion Middleware 11g Web Tier HTTP Server instances on the SPARC T3-1 server were hosted in four separate Oracle Solaris Containers to demonstrate consolidation of multiple application and web servers. See Also SPARC T3-1 oracle.com SPARC Enterprise M3000 oracle.com Oracle Solaris oracle.com JD Edwards EnterpriseOne oracle.com Oracle Database 11g Release 2 Enterprise Edition oracle.com Disclosure Statement Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 6/27/2011.

Read the article
Conversion of BizTalk Projects to Use the New WCF-SAP Adaptor

- by Geordie

We are in the process of upgrading our BizTalk Environment from BizTalk 2006 R2 to BizTalk 2010. The SAP adaptor in BizTalk 2010 is an all new and more powerful WCF-SAP adaptor. When my colleagues tested out the new adaptor they discovered that the format of the data extracted from SAP was not identical to the old adaptor. This is not a big deal if the structure of the messages from SAP is simple. In this case we were receiving the delivery and invoice iDocs. Both these structures are complex especially the delivery document. Over the past few years I have tweaked the delivery mapping to remove bugs from original mapping. The idea of redoing these maps did not appeal and due to the current work load was not even an option. I opted for a rather crude alternative of pulling in the iDoc in the new typed format and then adding a static map at the start of the orchestration to convert the data to the old schema. Note WCF-SAP data formats (on the binding tab of the configuration dialog box is the ‘RecieiveIdocFormat’ field): Typed: Returns a XML document with the hierarchy represented in XML and all fields being represented by XML tags. RFC: Returns an XML document with the hierarchy represented in XML but the iDoc lines in flat file format. String: This returns the iDoc in a format that is closest to the original flat file format but is still wrapped with some top level XML tags. The files also contained some strange characters at the end of each line. I started with the invoice document and it was quite straight forward to add the mapping but this is where my problems started. The orchestrations for these documents are dynamic and so require the identity of the partner to be able to correctly configure the orchestration. The partner identity is in the EDI_DC40 segment of the iDoc. In the old project the RECPRN node of the segment was promoted. The code to set a variable to the partner ID was now failing. After lot of head scratching I discovered the problem was due to the addition of Namespaces to the fields in the EDI_DC40 segment. To overcome this I needed to use an xPath query with a Namespace Manager. This had to be done in custom code. I now tried to repeat the process with the delivery document. Unfortunately when we tried to get sample typed data from SAP an exception was thrown. The adapter "WCF-SAP" raised an error message. Details "Microsoft.ServiceModel.Channels.Common.XmlReaderGenerationException: The segment or group definition E2EDKA1001 was not found in the IDoc metadata. The UniqueId of the IDoc type is: IDOCTYP/3/DESADV01/ZASNEXT1/640. For Receive operations, the SAP adapter does not support unreleased segments. Our guess is that when the WCF-SAP adaptor tries to down load the data it retrieves a data schema from SAP. For some reason the schema does not match the data. This may be due to the version of SAP we are running or due to a customization. Either way resolving this problem did not look easy. When doing some research on this problem I found an article showing me how to get the data from SAP using the WCF-SAP adaptor without any XML tags. http://blogs.msdn.com/b/adapters/archive/2007/10/05/receiving-idocs-getting-the-raw-idoc-data.aspx Reproduction of Mustansir blog: Since the WCF based SAP Adapter is ... well, WCF based, all data flowing in and out of the adapter is encapsulated within a SOAP message. Which means there are those pesky xml tags all over the place. If you want to receive an Idoc from SAP, you can receive it in "Typed" format (in which case each column in each segment of the idoc appears within its own xml tag), or you can receive it in "String" format (in which case there are just 2 xml tags at the top, the raw xml data in string/flat file format, and the 2 closing xml tags). In "String" format, an incoming idoc (for ORDERS05, containing 5 data records) would look like: <ReceiveIdoc ><idocData>EDI_DC40 8000000000001064985620 E2EDK01005 800000000000106498500000100000001 E2EDK14 8000000000001064985000002000000020111000 E2EDK14 8000000000001064985000003000000020081000 E2EDK14 80000000000010649850000040000000200710 E2EDK14 80000000000010649850000050000000200600</idocData></ReceiveIdoc> (I have trimmed part of the control record so that it fits cleanly here on one line). Now, you're only interested in the IDOC data, and don't care much for the XML tags. It isn't that difficult to write your own pipeline component, or even some logic in the orchestration to remove the tags, right? Well, you don't need to write any extra code at all - the WCF Adapter can help you here! During the configuration of your one-way Receive Location using WCF-Custom, navigate to the Messages tab. Under the section "Inbound BizTalk Messge Body", select the "Path" radio button, and: (a) Enter the body path expression as: /*[local-name()='ReceiveIdoc']/*[local-name()='idocData'] (b) Choose "String" for the Node Encoding. What we've done is, used an XPATH to pull out the value of the "idocData" node from the XML. Your Receive Location will now emit text containing only the idoc data. You can at this point, for example, put the Flat File Pipeline component to convert the flat text into a different xml format based on some other schema you already have, and receive your version of the xml formatted message in your orchestration. This was potentially a much easier solution than adding the static maps to the orchestrations and overcame the issue with ‘Typed’ delivery documents. Not quite so fast… Note: When I followed Mustansir’s blog the characters at the end of each line disappeared. After configuring the adaptor and passing the iDoc data into the original flat file receive pipelines I was receiving exceptions. There was a failure executing the receive pipeline: "PAPINETPipelines.DeliveryFlatFileReceive, CustomerIntegration2.PAPINET.Pipelines, Version=1.0.0.0, Culture=neutral, PublicKeyToken=4ca3635fbf092bbb" Source: "Pipeline " Receive Port: "recSAP_Delivery" URI: "D:\CustomerIntegration2\SAP\Delivery\*.xml" Reason: An error occurred when parsing the incoming document: "Unexpected data found while looking for: 'Z2EDPZ7' The current definition being parsed is E2EDP07GRP. The stream offset where the error occured is 8859. The line number where the error occured is 23. The column where the error occured is 0.". Although the new flat file looked the same as the old one there was a differences. In the original file all lines in the document were exactly 1064 character long. In the new file all lines were truncated to the last alphanumeric character. The final piece of the puzzle was to add a custom pipeline component to pad all the lines to 1064 characters. This component was added to the decode node of the custom delivery and invoice flat file disassembler pipelines. Execute method of the custom pipeline component: public IBaseMessage Execute(IPipelineContext pc, IBaseMessage inmsg) { //Convert Stream to a string Stream s = null; IBaseMessagePart bodyPart = inmsg.BodyPart; // NOTE inmsg.BodyPart.Data is implemented only as a setter in the http adapter API and a //getter and setter for the file adapter. Use GetOriginalDataStream to get data instead. if (bodyPart != null) s = bodyPart.GetOriginalDataStream(); string newMsg = string.Empty; string strLine; try { StreamReader sr = new StreamReader(s); strLine = sr.ReadLine(); while (strLine != null) { //Execute padding code if (strLine != null) strLine = strLine.PadRight(1064, ' ') + "\r\n"; newMsg += strLine; strLine = sr.ReadLine(); } sr.Close(); } catch (IOException ex) { throw new Exception("Error occured trying to pad the message to 1064 charactors"); } //Convert back to stream and set to Data property inmsg.BodyPart.Data = new MemoryStream(Encoding.UTF8.GetBytes(newMsg)); ; //reset the position of the stream to zero inmsg.BodyPart.Data.Position = 0; return inmsg; }

Read the article
Pre-rentrée Oracle Open World 2012 : à vos agendas

- by Eric Bezille

A maintenant moins d'un mois de l’événement majeur d'Oracle, qui se tient comme chaque année à San Francisco, fin septembre, début octobre, les spéculations vont bon train sur les annonces qui vont y être dévoilées... Et sans lever le voile, je vous engage à prendre connaissance des sujets des "Key Notes" qui seront tenues par Larry Ellison, Mark Hurd, Thomas Kurian (responsable des développements logiciels) et John Fowler (responsable des développements systèmes) afin de vous donner un avant goût. Stratégie et Roadmaps Oracle Bien entendu, au-delà des séances plénières qui vous donnerons une vision précise de la stratégie, et pour ceux qui seront sur place, je vous engage à ne pas manquer les séances d'approfondissement qui auront lieu dans la semaine, dont voici quelques morceaux choisis : "Accelerate your Business with the Oracle Hardware Advantage" avec John Fowler, le lundi 1er Octobre, 3:15pm-4:15pm "Why Oracle Softwares Runs Best on Oracle Hardware" , avec Bradley Carlile, le responsable des Benchmarks, le lundi 1er Octobre, 12:15pm-13:15pm "Engineered Systems - from Vision to Game-changing Results", avec Robert Shimp, le lundi 1er Octobre 1:45pm-2:45pm "Database and Application Consolidation on SPARC Supercluster", avec Hugo Rivero, responsable dans les équipes d'intégration matériels et logiciels, le lundi 1er Octobre, 4:45pm-5:45pm "Oracle’s SPARC Server Strategy Update", avec Masood Heydari, responsable des développements serveurs SPARC, le mardi 2 Octobre, 10:15am - 11:15am "Oracle Solaris 11 Strategy, Engineering Insights, and Roadmap", avec Markus Flier, responsable des développements Solaris, le mercredi 3 Octobre, 10:15am - 11:15am "Oracle Virtualization Strategy and Roadmap", avec Wim Coekaerts, responsable des développement Oracle VM et Oracle Linux, le lundi 1er Octobre, 12:15pm-1:15pm "Big Data: The Big Story", avec Jean-Pierre Dijcks, responsable du développement produits Big Data, le lundi 1er Octobre, 3:15pm-4:15pm "Scaling with the Cloud: Strategies for Storage in Cloud Deployments", avec Christine Rogers, Principal Product Manager, et Chris Wood, Senior Product Specialist, Stockage , le lundi 1er Octobre, 10:45am-11:45am Retours d'expériences et témoignages Si Oracle Open World est l'occasion de partager avec les équipes de développement d'Oracle en direct, c'est aussi l'occasion d'échanger avec des clients et experts qui ont mis en oeuvre nos technologies pour bénéficier de leurs retours d'expériences, comme par exemple : "Oracle Optimized Solution for Siebel CRM at ACCOR", avec les témoignages d'Eric Wyttynck, directeur IT Multichannel & CRM et Pascal Massenet, VP Loyalty & CRM systems, sur les bénéfices non seulement métiers, mais également projet et IT, le mercredi 3 Octobre, 1:15pm-2:15pm "Tips from AT&T: Oracle E-Business Suite, Oracle Database, and SPARC Enterprise", avec le retour d'expérience des experts Oracle, le mardi 2 Octobre, 11:45am-12:45pm "Creating a Maximum Availability Architecture with SPARC SuperCluster", avec le témoignage de Carte Wright, Database Engineer à CKI, le mercredi 3 Octobre, 11:45am-12:45pm "Multitenancy: Everybody Talks It, Oracle Walks It with Pillar Axiom Storage", avec le témoignage de Stephen Schleiger, Manager Systems Engineering de Navis, le lundi 1er Octobre, 1:45pm-2:45pm "Oracle Exadata for Database Consolidation: Best Practices", avec le retour d'expérience des experts Oracle ayant participé à la mise en oeuvre d'un grand client du monde bancaire, le lundi 1er Octobre, 4:45pm-5:45pm "Oracle Exadata Customer Panel: Packaged Applications with Oracle Exadata", animé par Tim Shetler, VP Product Management, mardi 2 Octobre, 1:15pm-2:15pm "Big Data: Improving Nearline Data Throughput with the StorageTek SL8500 Modular Library System", avec le témoignage du CTO de CSC, Alan Powers, le jeudi 4 Octobre, 12:45pm-1:45pm "Building an IaaS Platform with SPARC, Oracle Solaris 11, and Oracle VM Server for SPARC", avec le témoignage de Syed Qadri, Lead DBA et Michael Arnold, System Architect d'US Cellular, le mardi 2 Octobre, 10:15am-11:15am "Transform Data Center TCO with Oracle Optimized Servers: A Customer Panel", avec les témoignages notamment d'AT&T et Liberty Global, le mardi 2 Octobre, 11:45am-12:45pm "Data Warehouse and Big Data Customers’ View of the Future", avec The Nielsen Company US, Turkcell, GE Retail Finance, Allianz Managed Operations and Services SE, le lundi 1er Octobre, 4:45pm-5:45pm "Extreme Storage Scale and Efficiency: Lessons from a 100,000-Person Organization", le témoignage de l'IT interne d'Oracle sur la transformation et la migration de l'ensemble de notre infrastructure de stockage, mardi 2 Octobre, 1:15pm-2:15pm Echanges avec les groupes d'utilisateurs et les équipes de développement Oracle Si vous avez prévu d'arriver suffisamment tôt, vous pourrez également échanger dès le dimanche avec les groupes d'utilisateurs, ou tous les soirs avec les équipes de développement Oracle sur des sujets comme : "To Exalogic or Not to Exalogic: An Architectural Journey", avec Todd Sheetz - Manager of DBA and Enterprise Architecture, Veolia Environmental Services, le dimanche 30 Septembre, 2:30pm-3:30pm "Oracle Exalytics and Oracle TimesTen for Exalytics Best Practices", avec Mark Rittman, de Rittman Mead Consulting Ltd, le dimanche 30 Septembre, 10:30am-11:30am "Introduction of Oracle Exadata at Telenet: Bringing BI to Warp Speed", avec Rudy Verlinden & Eric Bartholomeus - Managers IT infrastructure à Telenet, le dimanche 30 Septembre, 1:15pm-2:00pm "The Perfect Marriage: Sun ZFS Storage Appliance with Oracle Exadata", avec Melanie Polston, directeur, Data Management, de Novation et Charles Kim, Managing Director de Viscosity, le dimanche 30 Septembre, 9:00am-10am "Oracle’s Big Data Solutions: NoSQL, Connectors, R, and Appliance Technologies", avec Jean-Pierre Dijcks et les équipes de développement Oracle, le lundi 1er Octobre, 6:15pm-7:00pm Testez et évaluez les solutions Et pour finir, vous pouvez même tester les technologies au travers du Oracle DemoGrounds, (1133 Moscone South pour la partie Systèmes Oracle, OS, et Virtualisation) et des "Hands-on-Labs", comme : "Deploying an IaaS Environment with Oracle VM", le mardi 2 Octobre, 10:15am-11:15am "Virtualize and Deploy Oracle Applications in Minutes with Oracle VM: Hands-on Lab", le mardi 2 Octobre, 11:45am-12:45pm (il est fortement conseillé d'avoir suivi le "Hands-on-Labs" précédent avant d'effectuer ce Lab. "x86 Enterprise Cloud Infrastructure with Oracle VM 3.x and Sun ZFS Storage Appliance", le mercredi 3 Octobre, 5:00pm-6:00pm "StorageTek Tape Analytics: Managing Tape Has Never Been So Simple", le mercredi 3 Octobre, 1:15pm-2:15pm "Oracle’s Pillar Axiom 600 Storage System: Power and Ease", le lundi 1er Octobre, 12:15pm-1:15pm "Enterprise Cloud Infrastructure for SPARC with Oracle Enterprise Manager Ops Center 12c", le lundi 1er Octobre, 1:45pm-2:45pm "Managing Storage in the Cloud", le mardi 2 Octobre, 5:00pm-6:00pm "Learn How to Write MapReduce on Oracle’s Big Data Platform", le lundi 1er Octobre, 12:15pm-1:15pm "Oracle Big Data Analytics and R", le mardi 2 Octobre, 1:15pm-2:15pm "Reduce Risk with Oracle Solaris Access Control to Restrain Users and Isolate Applications", le lundi 1er Octobre, 10:45am-11:45am "Managing Your Data with Built-In Oracle Solaris ZFS Data Services in Release 11", le lundi 1er Octobre, 4:45pm-5:45pm "Virtualizing Your Oracle Solaris 11 Environment", le mardi 2 Octobre, 1:15pm-2:15pm "Large-Scale Installation and Deployment of Oracle Solaris 11", le mercredi 3 Octobre, 3:30pm-4:30pm En conclusion, une semaine très riche en perspective, et qui vous permettra de balayer l'ensemble des sujets au coeur de vos préoccupations, de la stratégie à l'implémentation... Cette semaine doit se préparer, pour tailler votre agenda sur mesure, à travers les plus de 2000 sessions dont je ne vous ai fait qu'un extrait, et dont vous pouvez retrouver l'ensemble en ligne.

Read the article
SQL SERVER – SSMS: Disk Usage Report

- by Pinal Dave

Let us start with humor! I think we the series on various reports, we come to a logical point. We covered all the reports at server level. This means the reports we saw were targeted towards activities that are related to instance level operations. These are mostly like how a doctor diagnoses a patient. At this point I am reminded of a dialog which I read somewhere: Patient: Doc, It hurts when I touch my head. Doc: Ok, go on. What else have you experienced? Patient: It hurts even when I touch my eye, it hurts when I touch my arms, it even hurts when I touch my feet, etc. Doc: Hmmm … Patient: I feel it hurts when I touch anywhere in my body. Doc: Ahh … now I get it. You need a plaster to your finger John. Sometimes the server level gives an indicator to what is happening in the system, but we need to get to the root cause for a specific database. So, this is the first blog in series where we would start discussing about database level reports. To launch database level reports, expand selected server in Object Explorer, expand the Databases folder, and then right-click any database for which we want to look at reports. From the menu, select Reports, then Standard Reports, and then any of database level reports. In this blog, we would talk about four “disk” reports because they are similar: Disk Usage Disk Usage by Top Tables Disk Usage by Table Disk Usage by Partition Disk Usage This report shows multiple information about the database. Let us discuss them one by one. We have divided the output into 5 different sections. Section 1 shows the high level summary of the database. It shows the space used by database files (mdf and ldf). Under the hood, the report uses, various DMVs and DBCC Commands, it is using sys.data_spaces and DBCC SHOWFILESTATS. Section 2 and 3 are pie charts. One for data file allocation and another for the transaction log file. Pie chart for “Data Files Space Usage (%)” shows space consumed data, indexes, allocated to the SQL Server database, and unallocated space which is allocated to the SQL Server database but not yet filled with anything. “Transaction Log Space Usage (%)” used DBCC SQLPERF (LOGSPACE) and shows how much empty space we have in the physical transaction log file. Section 4 shows the data from Default Trace and looks at Event IDs 92, 93, 94, 95 which are for “Data File Auto Grow”, “Log File Auto Grow”, “Data File Auto Shrink” and “Log File Auto Shrink” respectively. Here is an expanded view for that section. If default trace is not enabled, then this section would be replaced by the message “Trace Log is disabled” as highlighted below. Section 5 of the report uses DBCC SHOWFILESTATS to get information. Here is the enhanced version of that section. This shows the physical layout of the file. In case you have In-Memory Objects in the database (from SQL Server 2014), then report would show information about those as well. Here is the screenshot taken for a different database, which has In-Memory table. I have highlighted new things which are only shown for in-memory database. The new sections which are highlighted above are using sys.dm_db_xtp_checkpoint_files, sys.database_files and sys.data_spaces. The new type for in-memory OLTP is ‘FX’ in sys.data_space. The next set of reports is targeted to get information about a table and its storage. These reports can answer questions like: Which is the biggest table in the database? How many rows we have in table? Is there any table which has a lot of reserved space but its unused? Which partition of the table is having more data? Disk Usage by Top Tables This report provides detailed data on the utilization of disk space by top 1000 tables within the Database. The report does not provide data for memory optimized tables. Disk Usage by Table This report is same as earlier report with few difference. First Report shows only 1000 rows First Report does order by values in DMV sys.dm_db_partition_stats whereas second one does it based on name of the table. Both of the reports have interactive sort facility. We can click on any column header and change the sorting order of data. Disk Usage by Partition This report shows the distribution of the data in table based on partition in the table. This is so similar to previous output with the partition details now. Here is the query taken from profiler. SELECT row_number() OVER (ORDER BY a1.used_page_count DESC, a1.index_id) AS row_number , (dense_rank() OVER (ORDER BY a5.name, a2.name))%2 AS l1 , a1.OBJECT_ID , a5.name AS [schema] , a2.name , a1.index_id , a3.name AS index_name , a3.type_desc , a1.partition_number , a1.used_page_count * 8 AS total_used_pages , a1.reserved_page_count * 8 AS total_reserved_pages , a1.row_count FROM sys.dm_db_partition_stats a1 INNER JOIN sys.all_objects a2 ON ( a1.OBJECT_ID = a2.OBJECT_ID) AND a1.OBJECT_ID NOT IN (SELECT OBJECT_ID FROM sys.tables WHERE is_memory_optimized = 1) INNER JOIN sys.schemas a5 ON (a5.schema_id = a2.schema_id) LEFT OUTER JOIN sys.indexes a3 ON ( (a1.OBJECT_ID = a3.OBJECT_ID) AND (a1.index_id = a3.index_id) ) WHERE (SELECT MAX(DISTINCT partition_number) FROM sys.dm_db_partition_stats a4 WHERE (a4.OBJECT_ID = a1.OBJECT_ID)) >= 1 AND a2.TYPE <> N'S' AND a2.TYPE <> N'IT' ORDER BY a5.name ASC, a2.name ASC, a1.index_id, a1.used_page_count DESC, a1.partition_number Using all of the above reports, you should be able to get the usage of database files and also space used by tables. I think this is too much disk information for a single blog and I hope you have used them in the past to get data. Do let me know if you found anything interesting using these reports in your environments. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, T SQL Tagged: SQL Reports

Read the article
Taking the training wheels off: Accelerating the Business with Oracle IAM by Brian Mozinski (Accenture)

- by Greg Jensen

Today, technical requirements for IAM are evolving rapidly, and the bar is continuously raised for high performance IAM solutions as organizations look to roll out high volume use cases on the back of legacy systems. Existing solutions were often designed and architected to support offline transactions and manual processes, and the business owners today demand globally scalable infrastructure to support the growth their business cases are expected to deliver. To help IAM practitioners address these challenges and make their organizations and themselves more successful, this series we will outline the: • Taking the training wheels off: Accelerating the Business with Oracle IAM The explosive growth in expectations for IAM infrastructure, and the business cases they support to gain investment in new security programs. • "Necessity is the mother of invention": Technical solutions developed in the field Well proven tricks of the trade, used by IAM guru’s to maximize your solution while addressing the requirements of global organizations. • The Art & Science of Performance Tuning of Oracle IAM 11gR2 Real world examples of performance tuning with Oracle IAM • No Where to go but up: Extending the benefits of accelerated IAM Anything is possible, compelling new solutions organizations are unlocking with accelerated Oracle IAM Let’s get started … by talking about the changing dynamics driving these discussions. Big Companies are getting bigger everyday, and increasingly organizations operate across state lines, multiple times zones, and in many countries or continents at the same time. No longer is midnight to 6am a safe time to take down the system for upgrades, to run recon’s and import or update user accounts and attributes. Further IT organizations are operating as shared services with SLA’s similar to telephone carrier levels expected by their “clients”. Workers are moved in and out of roles on a weekly, daily, or even hourly rate and IAM is expected to support those rapid changes. End users registering for services during business hours in Singapore are expected their access to be green-lighted in custom apps hosted in Portugal within the hour. Many of the expectations of asynchronous systems and batched updates are not adequate and the number and types of users is growing. When organizations acted more like independent teams at functional or geographic levels it was manageable to have processes that relied on a handful of people who knew how to make things work …. Knew how to get you access to the key systems to get your job done. Today everyone is expected to do more with less, the finance administrator previously supporting their local Atlanta sales office might now be asked to help close the books for the Johannesburg team, and access certification process once completed monthly by Joan on the 3rd floor is now done by a shared pool of resources in Sao Paulo. Fragmented processes that rely on institutional knowledge to get access to systems and get work done quickly break down in these scenarios. Highly robust processes that have automated workflows for connected or disconnected systems give organizations the dynamic flexibility to share work across these lines and cut costs or increase productivity. As the IT industry computing paradigms continue to change with the passing of time, and as mature or proven approaches become clear, it is normal for organizations to adjust accordingly. Businesses must manage identity in an increasingly hybrid world in which legacy on-premises IAM infrastructures are extended or replaced to support more and more interconnected and interdependent services to a wider range of users. The old legacy IAM implementation models we had relied on to manage identities no longer apply. End users expect to self-request access to services from their tablet, get supervisor approval over mobile devices and email, and launch the application even if is hosted on the cloud, or run by a partner, vendor, or service provider. While user expectations are higher, they are also simpler … logging into custom desktop apps to request approvals, or going through email or paper based processes for certification is unacceptable. Users expect security to operate within the paradigm of the application … i.e. feel like the application they are using. Citizen and customer facing applications have evolved from every where, with custom applications, 3rd party tools, and merging in from acquired entities or 3rd party OEM’s resold to expand your portfolio of services. These all have their own user stores, authentication models, user lifecycles, session management, etc. Often the designers/developers are no longer accessible and the documentation is limited. Bringing together underlying directories to scale for growth, and improve user experience is critical for revenue … but also for operations. Job functions are more dynamic.... take the Olympics for example. Endless organizations from corporations broadcasting, endorsing, or marketing through the event … to non-profit athletic foundations and public/government entities for athletes and public safety, all operate simultaneously on the world stage. Each organization needs to spin up short-term teams, often dealing with proprietary information from hot ads to racing strategies or security plans. IAM is expected to enable team’s to spin up, enable new applications, protect privacy, and secure critical infrastructure. Then it needs to be disabled just as quickly as users go back to their previous responsibilities. On a more technical level … Optimized system directory; tuning guidelines and parameters are needed by businesses today. Business’s need to be making the right choices (virtual directories) and considerations via choosing the correct architectural patterns (virtual, direct, replicated, and tuning), challenge is that business need to assess and chose the correct architectural patters (centralized, virtualized, and distributed) Today's Business organizations have very complex heterogeneous enterprises that contain diverse and multifaceted information. With today's ever changing global landscape, the strategic end goal in challenging times for business is business agility. The business of identity management requires enterprise's to be more agile and more responsive than ever before. The continued proliferation of networking devices (PC, tablet, PDA's, notebooks, etc.) has caused the number of devices and users to be granted access to these devices to grow exponentially. Business needs to deploy an IAM system that can account for the demands for authentication and authorizations to these devices. Increased innovation is forcing business and organizations to centralize their identity management services. Access management needs to handle traditional web based access as well as handle new innovations around mobile, as well as address insufficient governance processes which can lead to rouge identity accounts, which can then become a source of vulnerabilities within a business’s identity platform. Risk based decisions are providing challenges to business, for an adaptive risk model to make proper access decisions via standard Web single sign on for internal and external customers,. Organizations have to move beyond simple login and passwords to address trusted relationship questions such as: Is this a trusted customer, client, or citizen? Is this a trusted employee, vendor, or partner? Is this a trusted device? Without a solid technological foundation, organizational performance, collaboration, constituent services, or any other organizational processes will languish. A Single server location presents not only network concerns for distributed user base, but identity challenges. The network risks are centered on latency of the long trip that the traffic has to take. Other risks are a performance around availability and if the single identity server is lost, all access is lost. As you can see, there are many reasons why performance tuning IAM will have a substantial impact on the success of your organization. In our next installment in the series we roll up our sleeves and get into detailed tuning techniques used everyday by thought leaders in the field implementing Oracle Identity & Access Management Solutions.

Read the article
Physical Directories vs. MVC View Paths

- by Rick Strahl

This post falls into the bucket of operator error on my part, but I want to share this anyway because it describes an issue that has bitten me a few times now and writing it down might keep it a little stronger in my mind. I've been working on an MVC project the last few days, and at the end of a long day I accidentally moved one of my View folders from the MVC Root Folder to the project root. It must have been at the very end of the day before shutting down because tests and manual site navigation worked fine just before I quit for the night. I checked in changes and called it a night. Next day I came back, started running the app and had a lot of breaks with certain views. Oddly custom routes to these controllers/views worked, but stock /{controller}/{action} routes would not. After a bit of spelunking I realized that "Hey one of my View Folders is missing", which made some sense given the error messages I got. I looked in the recycle bin - nothing there, so rather than try to figure out what the hell happened, just restored from my last SVN checkin. At this point the folders are back… but… view access still ends up breaking for this set of views. Specifically I'm getting the Yellow Screen of Death with: CS0103: The name 'model' does not exist in the current context Here's the full error: Server Error in '/ClassifiedsWeb' Application. Compilation ErrorDescription: An error occurred during the compilation of a resource required to service this request. Please review the following specific error details and modify your source code appropriately.Compiler Error Message: CS0103: The name 'model' does not exist in the current contextSource Error: Line 1: @model ClassifiedsWeb.EntryViewModel Line 2: @{ Line 3: ViewBag.Title = Model.Entry.Title + " - " + ClassifiedsBusiness.App.Configuration.ApplicationName; Source File: c:\Projects2010\Clients\GorgeNet\Classifieds\ClassifiedsWeb\Classifieds\Show.cshtml Line: 1 Compiler Warning Messages: Show Detailed Compiler Output: Show Complete Compilation Source: Version Information: Microsoft .NET Framework Version:4.0.30319; ASP.NET Version:4.0.30319.272 Here's what's really odd about this error: The views now do exist in the /Views/Classifieds folder of the project, but it appears like MVC is trying to execute the views directly. This is getting pretty weird, man! So I hook up some break points in my controllers to see if my controller actions are getting fired - and sure enough it turns out they are not - but only for those views that were previously 'lost' and then restored from SVN. WTF? At this point I'm thinking that I must have messed up one of the config files, but after some more spelunking and realizing that all the other Controller views work, I give up that idea. Config's gotta be OK if other controllers and views are working. Root Folders and MVC Views don't mix As I mentioned the problem was the fact that I inadvertantly managed to drag my View folder to the root folder of the project. Here's what this looks like in my FUBAR'd project structure after I copied back /Views/Classifieds folder from SVN: There's the actual root folder in the /Views folder and the accidental copy that sits of the root. I of course did not notice the /Classifieds folder at the root because it was excluded and didn't show up in the project. Now, before you call me a complete idiot remember that this happened by accident - an accidental drag probably just before shutting down for the night. :-) So why does this break? MVC should be happy with views in the /Views/Classifieds folder right? While MVC might be happy, IIS is not. The fact that there is a physical folder on disk takes precedence over MVC's routing. In other words if a URL exists that matches a route the pysical path is accessed first. What happens here is that essentially IIS is trying to execute the .cshtml pages directly without ever routing to the Controller methods. In the error page I showed above my clue should have been that the view was served as: c:\Projects2010\Clients\GorgeNet\Classifieds\ClassifiedsWeb\Classifieds\Show.cshtml rather than c:\Projects2010\Clients\GorgeNet\Classifieds\ClassifiedsWeb\Views\Classifieds\Show.cshtml But of course I didn't notice that right away, just skimming to the end and looking at the file name. The reason that /classifieds/list actually fires that file is that the ASP.NET Web Pages engine looks for physical files on disk that match a path. IOW, when calling Web Pages you drop the .cshtml off the Razor page and IIS will serve that just fine. So: /classifieds/list looks and tries to find /classifieds/list.cshtml and executes that script. And that is exactly what's happening. Web Pages is trying to execute the .cshtml file and it fails because Web Pages knows nothing about the @model tag which is an MVC specific template extension. This is why my breakpoints in the controller methods didn't fire and it also explains why the error mentions that the @model key word is invalid (@model is an MVC provided template enhancement to the Razor Engine). The solution of course is super simple: Delete the accidentally created root folder and the problem is solved. Routing and Physical Paths I've run into problems with this before actually. In the past I've had a number of applications that had a physical /Admin folder which also would conflict with an MVC Admin controller. More than once I ended up wondering why the index route (/Admin/) was not working properly. If a physical /Admin folder exists /Admin will not route to the Index action (or whatever default action you have set up, but instead try to list the directory or show the default document in the folder. The only way to force the index page through MVC is to explicitly use /Admin/Index. Makes perfect sense once you realize the physical folder is there, but that's easy to forget in an MVC application. As you might imagine after a few times of running into this I gave up on the Admin folder and moved everything into MVC views to handle those operations. Still it's one of those things that can easily bite you, because the behavior and error messages seem to point at completely different problems. Moral of the story is: If you see routing problems where routes are not reaching obvious controller methods, always check to make sure there's isn't a physical path being mapped by IIS instead. That way you won't feel stupid like I did after trying a million things for about an hour before discovering my sloppy mousing behavior :-)© Rick Strahl, West Wind Technologies, 2005-2012Posted in MVC IIS7 Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article
C#: Does an IDisposable in a Halted Iterator Dispose?

- by James Michael Hare

If that sounds confusing, let me give you an example. Let's say you expose a method to read a database of products, and instead of returning a List<Product> you return an IEnumerable<Product> in iterator form (yield return). This accomplishes several good things: The IDataReader is not passed out of the Data Access Layer which prevents abstraction leak and resource leak potentials. You don't need to construct a full List<Product> in memory (which could be very big) if you just want to forward iterate once. If you only want to consume up to a certain point in the list, you won't incur the database cost of looking up the other items. This could give us an example like: 1: // a sample data access object class to do standard CRUD operations. 2: public class ProductDao 3: { 4: private DbProviderFactory _factory = SqlClientFactory.Instance 5: 6: // a method that would retrieve all available products 7: public IEnumerable<Product> GetAvailableProducts() 8: { 9: // must create the connection 10: using (var con = _factory.CreateConnection()) 11: { 12: con.ConnectionString = _productsConnectionString; 13: con.Open(); 14: 15: // create the command 16: using (var cmd = _factory.CreateCommand()) 17: { 18: cmd.Connection = con; 19: cmd.CommandText = _getAllProductsStoredProc; 20: cmd.CommandType = CommandType.StoredProcedure; 21: 22: // get a reader and pass back all results 23: using (var reader = cmd.ExecuteReader()) 24: { 25: while(reader.Read()) 26: { 27: yield return new Product 28: { 29: Name = reader["product_name"].ToString(), 30: ... 31: }; 32: } 33: } 34: } 35: } 36: } 37: } The database details themselves are irrelevant. I will say, though, that I'm a big fan of using the System.Data.Common classes instead of your provider specific counterparts directly (SqlCommand, OracleCommand, etc). This lets you mock your data sources easily in unit testing and also allows you to swap out your provider in one line of code. In fact, one of the shared components I'm most proud of implementing was our group's DatabaseUtility library that simplifies all the database access above into one line of code in a thread-safe and provider-neutral way. I went with my own flavor instead of the EL due to the fact I didn't want to force internal company consumers to use the EL if they didn't want to, and it made it easy to allow them to mock their database for unit testing by providing a MockCommand, MockConnection, etc that followed the System.Data.Common model. One of these days I'll blog on that if anyone's interested. Regardless, you often have situations like the above where you are consuming and iterating through a resource that must be closed once you are finished iterating. For the reasons stated above, I didn't want to return IDataReader (that would force them to remember to Dispose it), and I didn't want to return List<Product> (that would force them to hold all products in memory) -- but the first time I wrote this, I was worried. What if you never consume the last item and exit the loop? Are the reader, command, and connection all disposed correctly? Of course, I was 99.999999% sure the creators of C# had already thought of this and taken care of it, but inspection in Reflector was difficult due to the nature of the state machines yield return generates, so I decided to try a quick example program to verify whether or not Dispose() will be called when an iterator is broken from outside the iterator itself -- i.e. before the iterator reports there are no more items. So I wrote a quick Sequencer class with a Dispose() method and an iterator for it. Yes, it is COMPLETELY contrived: 1: // A disposable sequence of int -- yes this is completely contrived... 2: internal class Sequencer : IDisposable 3: { 4: private int _i = 0; 5: private readonly object _mutex = new object(); 6: 7: // Constructs an int sequence. 8: public Sequencer(int start) 9: { 10: _i = start; 11: } 12: 13: // Gets the next integer 14: public int GetNext() 15: { 16: lock (_mutex) 17: { 18: return _i++; 19: } 20: } 21: 22: // Dispose the sequence of integers. 23: public void Dispose() 24: { 25: // force output immediately (flush the buffer) 26: Console.WriteLine("Disposed with last sequence number of {0}!", _i); 27: Console.Out.Flush(); 28: } 29: } And then I created a generator (infinite-loop iterator) that did the using block for auto-Disposal: 1: // simply defines an extension method off of an int to start a sequence 2: public static class SequencerExtensions 3: { 4: // generates an infinite sequence starting at the specified number 5: public static IEnumerable<int> GetSequence(this int starter) 6: { 7: // note the using here, will call Dispose() when block terminated. 8: using (var seq = new Sequencer(starter)) 9: { 10: // infinite loop on this generator, means must be bounded by caller! 11: while(true) 12: { 13: yield return seq.GetNext(); 14: } 15: } 16: } 17: } This is really the same conundrum as the database problem originally posed. Here we are using iteration (yield return) over a large collection (infinite sequence of integers). If we cut the sequence short by breaking iteration, will that using block exit and hence, Dispose be called? Well, let's see: 1: // The test program class 2: public class IteratorTest 3: { 4: // The main test method. 5: public static void Main() 6: { 7: Console.WriteLine("Going to consume 10 of infinite items"); 8: Console.Out.Flush(); 9: 10: foreach(var i in 0.GetSequence()) 11: { 12: // could use TakeWhile, but wanted to output right at break... 13: if(i >= 10) 14: { 15: Console.WriteLine("Breaking now!"); 16: Console.Out.Flush(); 17: break; 18: } 19: 20: Console.WriteLine(i); 21: Console.Out.Flush(); 22: } 23: 24: Console.WriteLine("Done with loop."); 25: Console.Out.Flush(); 26: } 27: } So, what do we see? Do we see the "Disposed" message from our dispose, or did the Dispose get skipped because from an "eyeball" perspective we should be locked in that infinite generator loop? Here's the results: 1: Going to consume 10 of infinite items 2: 0 3: 1 4: 2 5: 3 6: 4 7: 5 8: 6 9: 7 10: 8 11: 9 12: Breaking now! 13: Disposed with last sequence number of 11! 14: Done with loop. Yes indeed, when we break the loop, the state machine that C# generates for yield iterate exits the iteration through the using blocks and auto-disposes the IDisposable correctly. I must admit, though, the first time I wrote one, I began to wonder and that led to this test. If you've never seen iterators before (I wrote a previous entry here) the infinite loop may throw you, but you have to keep in mind it is not a linear piece of code, that every time you hit a "yield return" it cedes control back to the state machine generated for the iterator. And this state machine, I'm happy to say, is smart enough to clean up the using blocks correctly. I suspected those wily guys and gals at Microsoft engineered it well, and I wasn't disappointed. But, I've been bitten by assumptions before, so it's good to test and see. Yes, maybe you knew it would or figured it would, but isn't it nice to know? And as those campy 80s G.I. Joe cartoon public service reminders always taught us, "Knowing is half the battle...". Technorati Tags: C#,.NET

Read the article
SQL SERVER – Weekly Series – Memory Lane – #048

- by Pinal Dave

Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Order of Result Set of SELECT Statement on Clustered Indexed Table When ORDER BY is Not Used Above theory is true in most of the cases. However SQL Server does not use that logic when returning the resultset. SQL Server always returns the resultset which it can return fastest.In most of the cases the resultset which can be returned fastest is the resultset which is returned using clustered index. Effect of TRANSACTION on Local Variable – After ROLLBACK and After COMMIT One of the Jr. Developer asked me this question (What will be the Effect of TRANSACTION on Local Variable – After ROLLBACK and After COMMIT?) while I was rushing to an important meeting. I was getting late so I asked him to talk with his Application Tech Lead. When I came back from meeting both of them were looking for me. They said they are confused. I quickly wrote down following example for them. 2008 SQL SERVER – Guidelines and Coding Standards Complete List Download Coding standards and guidelines are very important for any developer on the path of a successful career. A coding standard is a set of guidelines, rules and regulations on how to write code. Coding standards should be flexible enough or should take care of the situation where they should not prevent best practices for coding. They are basically the guidelines that one should follow for better understanding. Download Guidelines and Coding Standards complete List Download Get Answer in Float When Dividing of Two Integer Many times we have requirements of some calculations amongst different fields in Tables. One of the software developers here was trying to calculate some fields having integer values and divide it which gave incorrect results in integer where accurate results including decimals was expected. Puzzle – Computed Columns Datatype Explanation SQL Server automatically does a cast to the data type having the highest precedence. So the result of INT and INT will be INT, but INT and FLOAT will be FLOAT because FLOAT has a higher precedence. If you want a different data type, you need to do an EXPLICIT cast. Renaming SP is Not Good Idea – Renaming Stored Procedure Does Not Update sys.procedures I have written many articles about renaming a tables, columns and procedures SQL SERVER – How to Rename a Column Name or Table Name, here I found something interesting about renaming the stored procedures and felt like sharing it with you all. The interesting fact is that when we rename a stored procedure using SP_Rename command, the Stored Procedure is successfully renamed. But when we try to test the procedure using sp_helptext, the procedure will be having the old name instead of new names. 2009 Insert Values of Stored Procedure in Table – Use Table Valued Function It is clear from the result set that , where I have converted stored procedure logic into the table valued function, is much better in terms of logic as it saves a large number of operations. However, this option should be used carefully. The performance of the stored procedure is “usually” better than that of functions. Interesting Observation – Index on Index View Used in Similar Query Recently, I was working on an optimization project for one of the largest organizations. While working on one of the queries, we came across a very interesting observation. We found that there was a query on the base table and when the query was run, it used the index, which did not exist in the base table. On careful examination, we found that the query was using the index that was on another view. This was very interesting as I have personally never experienced a scenario like this. In simple words, “Query on the base table can use the index created on the indexed view of the same base table.” Interesting Observation – Execution Plan and Results of Aggregate Concatenation Queries Working with SQL Server has never seemed to be monotonous – no matter how long one has worked with it. Quite often, I come across some excellent comments that I feel like acknowledging them as blog posts. Recently, I wrote an article on SQL SERVER – Execution Plan and Results of Aggregate Concatenation Queries Depend Upon Expression Location, which is well received in the community. 2010 I encourage all of you to go through complete series and write your own on the subject. If you write an article and send it to me, I will publish it on this blog with due credit to you. If you write on your own blog, I will update this blog post pointing to your blog post. SQL SERVER – ORDER BY Does Not Work – Limitation of the View 1 SQL SERVER – Adding Column is Expensive by Joining Table Outside View – Limitation of the View 2 SQL SERVER – Index Created on View not Used Often – Limitation of the View 3 SQL SERVER – SELECT * and Adding Column Issue in View – Limitation of the View 4 SQL SERVER – COUNT(*) Not Allowed but COUNT_BIG(*) Allowed – Limitation of the View 5 SQL SERVER – UNION Not Allowed but OR Allowed in Index View – Limitation of the View 6 SQL SERVER – Cross Database Queries Not Allowed in Indexed View – Limitation of the View 7 SQL SERVER – Outer Join Not Allowed in Indexed Views – Limitation of the View 8 SQL SERVER – SELF JOIN Not Allowed in Indexed View – Limitation of the View 9 SQL SERVER – Keywords View Definition Must Not Contain for Indexed View – Limitation of the View 10 SQL SERVER – View Over the View Not Possible with Index View – Limitations of the View 11 2011 Startup Parameters Easy to Configure If you are a regular reader of this blog, you must be aware that I have written about SQL Server Denali recently. Here is the quickest way to reach into the screen where we can change the startup parameters. Go to SQL Server Configuration Manager >> SQL Server Services >> Right Click on the Server >> Properties >> Startup Parameters 2012 Validating Unique Columnname Across Whole Database I sometimes come across very strange requirements and often I do not receive a proper explanation of the same. Here is the one of those examples. For example “Our business requirement is when we add new column we want it unique across current database.” Read the solution to this strange request in this blog post. Excel Losing Decimal Values When Value Pasted from SSMS ResultSet It is very common when users are coping the resultset to Excel, the floating point or decimals are missed. The solution is very much simple and it requires a small adjustment in the Excel. By default Excel is very smart and when it detects the value which is getting pasted is numeric it changes the column format to accommodate that. Basic Calculation and PEMDAS Order of Operation Read this interesting blog post for fantastic conversation about the subject. Copy Column Headers from Resultset – SQL in Sixty Seconds #027 – Video http://www.youtube.com/watch?v=x_-3tLqTRv0 Delete From Multiple Table – Update Multiple Table in Single Statement There are two questions which I get every single day multiple times. In my gmail, I have created standard canned reply for them. Let us see the questions here. I want to delete from multiple table in a single statement how will I do it? I want to update multiple table in a single statement how will I do it? Read the answer in the blog post. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
Writing the tests for FluentPath

- by Bertrand Le Roy

Writing the tests for FluentPath is a challenge. The library is a wrapper around a legacy API (System.IO) that wasn’t designed to be easily testable. If it were more testable, the sensible testing methodology would be to tell System.IO to act against a mock file system, which would enable me to verify that my code is doing the expected file system operations without having to manipulate the actual, physical file system: what we are testing here is FluentPath, not System.IO. Unfortunately, that is not an option as nothing in System.IO enables us to plug a mock file system in. As a consequence, we are left with few options. A few people have suggested me to abstract my calls to System.IO away so that I could tell FluentPath – not System.IO – to use a mock instead of the real thing. That in turn is getting a little silly: FluentPath already is a thin abstraction around System.IO, so layering another abstraction between them would double the test surface while bringing little or no value. I would have to test that new abstraction layer, and that would bring us back to square one. Unless I’m missing something, the only option I have here is to bite the bullet and test against the real file system. Of course, the tests that do that can hardly be called unit tests. They are more integration tests as they don’t only test bits of my code. They really test the successful integration of my code with the underlying System.IO. In order to write such tests, the techniques of BDD work particularly well as they enable you to express scenarios in natural language, from which test code is generated. Integration tests are being better expressed as scenarios orchestrating a few basic behaviors, so this is a nice fit. The Orchard team has been successfully using SpecFlow for integration tests for a while and I thought it was pretty cool so that’s what I decided to use. Consider for example the following scenario: Scenario: Change extension Given a clean test directory When I change the extension of bar\notes.txt to foo Then bar\notes.txt should not exist And bar\notes.foo should exist This is human readable and tells you everything you need to know about what you’re testing, but it is also executable code. What happens when SpecFlow compiles this scenario is that it executes a bunch of regular expressions that identify the known Given (set-up phases), When (actions) and Then (result assertions) to identify the code to run, which is then translated into calls into the appropriate methods. Nothing magical. Here is the code generated by SpecFlow: [NUnit.Framework.TestAttribute()] [NUnit.Framework.DescriptionAttribute("Change extension")] public virtual void ChangeExtension() { TechTalk.SpecFlow.ScenarioInfo scenarioInfo = new TechTalk.SpecFlow.ScenarioInfo("Change extension", ((string[])(null))); #line 6 this.ScenarioSetup(scenarioInfo); #line 7 testRunner.Given("a clean test directory"); #line 8 testRunner.When("I change the extension of " + "bar\\notes.txt to foo"); #line 9 testRunner.Then("bar\\notes.txt should not exist"); #line 10 testRunner.And("bar\\notes.foo should exist"); #line hidden testRunner.CollectScenarioErrors();} The #line directives are there to give clues to the debugger, because yes, you can put breakpoints into a scenario: The way you usually write tests with SpecFlow is that you write the scenario first, let it fail, then write the translation of your Given, When and Then into code if they don’t already exist, which results in running but failing tests, and then you write the code to make your tests pass (you implement the scenario). In the case of FluentPath, I built a simple Given method that builds a simple file hierarchy in a temporary directory that all scenarios are going to work with: [Given("a clean test directory")] public void GivenACleanDirectory() { _path = new Path(SystemIO.Path.GetTempPath()) .CreateSubDirectory("FluentPathSpecs") .MakeCurrent(); _path.GetFileSystemEntries() .Delete(true); _path.CreateFile("foo.txt", "This is a text file named foo."); var bar = _path.CreateSubDirectory("bar"); bar.CreateFile("baz.txt", "bar baz") .SetLastWriteTime(DateTime.Now.AddSeconds(-2)); bar.CreateFile("notes.txt", "This is a text file containing notes."); var barbar = bar.CreateSubDirectory("bar"); barbar.CreateFile("deep.txt", "Deep thoughts"); var sub = _path.CreateSubDirectory("sub"); sub.CreateSubDirectory("subsub"); sub.CreateFile("baz.txt", "sub baz") .SetLastWriteTime(DateTime.Now); sub.CreateFile("binary.bin", new byte[] {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0xFF}); } Then, to implement the scenario that you can read above, I had to write the following When: [When("I change the extension of (.*) to (.*)")] public void WhenIChangeTheExtension( string path, string newExtension) { var oldPath = Path.Current.Combine(path.Split('\\')); oldPath.Move(p => p.ChangeExtension(newExtension)); } As you can see, the When attribute is specifying the regular expression that will enable the SpecFlow engine to recognize what When method to call and also how to map its parameters. For our scenario, “bar\notes.txt” will get mapped to the path parameter, and “foo” to the newExtension parameter. And of course, the code that verifies the assumptions of the scenario: [Then("(.*) should exist")] public void ThenEntryShouldExist(string path) { Assert.IsTrue(_path.Combine(path.Split('\\')).Exists); } [Then("(.*) should not exist")] public void ThenEntryShouldNotExist(string path) { Assert.IsFalse(_path.Combine(path.Split('\\')).Exists); } These steps should be written with reusability in mind. They are building blocks for your scenarios, not implementation of a specific scenario. Think small and fine-grained. In the case of the above steps, I could reuse each of those steps in other scenarios. Those tests are easy to write and easier to read, which means that they also constitute a form of documentation. Oh, and SpecFlow is just one way to do this. Rob wrote a long time ago about this sort of thing (but using a different framework) and I highly recommend this post if I somehow managed to pique your interest: http://blog.wekeroad.com/blog/make-bdd-your-bff-2/ And this screencast (Rob always makes excellent screencasts): http://blog.wekeroad.com/mvc-storefront/kona-3/ (click the “Download it here” link)

Read the article
Changing an HTML Form's Target with jQuery

- by Rick Strahl

This is a question that comes up quite frequently: I have a form with several submit or link buttons and one or more of the buttons needs to open a new Window. How do I get several buttons to all post to the right window? If you're building ASP.NET forms you probably know that by default the Web Forms engine sends button clicks back to the server as a POST operation. A server form has a <form> tag which expands to this: <form method="post" action="default.aspx" id="form1"> Now you CAN change the target of the form and point it to a different window or frame, but the problem with that is that it still affects ALL submissions of the current form. If you multiple buttons/links and they need to go to different target windows/frames you can't do it easily through the <form runat="server"> tag. Although this discussion uses ASP.NET WebForms as an example, realistically this is a general HTML problem although likely more common in WebForms due to the single form metaphor it uses. In ASP.NET MVC for example you'd have more options by breaking out each button into separate forms with its own distinct target tag. However, even with that option it's not always possible to break up forms - for example if multiple targets are required but all targets require the same form data to the be posted. A common scenario here is that you might have a button (or link) that you click where you still want some server code to fire but at the end of the request you actually want to display the content in a new window. A common operation where this happens is report generation: You click a button and the server generates a report say in PDF format and you then want to display the PDF result in a new window without killing the content in the current window. Assuming you have other buttons on the same Page that need to post to base window how do you get the button click to go to a new window? Can't you just use a LinkButton or other Link Control? At first glance you might think an easy way to do this is to use an ASP.NET LinkButton to do this - after all a LinkButton creates a hyper link that CAN accept a target and it also posts back to the server, right? However, there's no Target property, although you can set the target HTML attribute easily enough. Code like this looks reasonable: <asp:LinkButton runat="server" ID="btnNewTarget" Text="New Target" target="_blank" OnClick="bnNewTarget_Click" /> But if you try this you'll find that it doesn't work. Why? Because ASP.NET creates postbacks with JavaScript code that operates on the current window/frame: <a id="btnNewTarget" target="_blank" href="javascript:__doPostBack('btnNewTarget','')">New Target</a> What happens with a target tag is that before the JavaScript actually executes a new window is opened and the focus shifts to the new window. The new window of course is empty and has no __doPostBack() function nor access to the old document. So when you click the link a new window opens but the window remains blank without content - no server postback actually occurs. Natch that idea. Setting the Form Target for a Button Control or LinkButton So, in order to send Postback link controls and buttons to another window/frame, both require that the target of the form gets changed dynamically when the button or link is clicked. Luckily this is rather easy to do however using a little bit of script code and jQuery. Imagine you have two buttons like this that should go to another window: <asp:LinkButton runat="server" ID="btnNewTarget" Text="New Target" OnClick="ClickHandler" /> <asp:Button runat="server" ID="btnButtonNewTarget" Text="New Target Button" OnClick="ClickHandler" /> ClickHandler in this case is any routine that generates the output you want to display in the new window. Generally this output will not come from the current page markup but is generated externally - like a PDF report or some report generated by another application component or tool. The output generally will be either generated by hand or something that was generated to disk to be displayed with Response.Redirect() or Response.TransmitFile() etc. Here's the dummy handler that just generates some HTML by hand and displays it: protected void ClickHandler(object sender, EventArgs e) { // Perform some operation that generates HTML or Redirects somewhere else Response.Write("Some custom output would be generated here (PDF, non-Page HTML etc.)"); // Make sure this response doesn't display the page content // Call Response.End() or Response.Redirect() Response.End(); } To route this oh so sophisticated output to an alternate window for both the LinkButton and Button Controls, you can use the following simple script code: <script type="text/javascript"> $("#btnButtonNewTarget,#btnNewTarget").click(function () { $("form").attr("target", "_blank"); }); </script> So why does this work where the target attribute did not? The difference here is that the script fires BEFORE the target is changed to the new window. When you put a target attribute on a link or form the target is changed as the very first thing before the link actually executes. IOW, the link literally executes in the new window when it's done this way. By attaching a click handler, though we're not navigating yet so all the operations the script code performs (ie. __doPostBack()) and the collection of Form variables to post to the server all occurs in the current page. By changing the target from within script code the target change fires as part of the form submission process which means it runs in the correct context of the current page. IOW - the input for the POST is from the current page, but the output is routed to a new window/frame. Just what we want in this scenario. Voila you can dynamically route output to the appropriate window.© Rick Strahl, West Wind Technologies, 2005-2011Posted in ASP.NET HTML jQuery

Read the article
The blocking nature of aggregates

- by Rob Farley

I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here. In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

Read the article
The blocking nature of aggregates

- by Rob Farley

I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here. In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

Read the article
The Shift: how Orchard painlessly shifted to document storage, and how it’ll affect you

- by Bertrand Le Roy

We’ve known it all along. The storage for Orchard content items would be much more efficient using a document database than a relational one. Orchard content items are composed of parts that serialize naturally into infoset kinds of documents. Storing them as relational data like we’ve done so far was unnatural and requires the data for a single item to span multiple tables, related through 1-1 relationships. This means lots of joins in queries, and a great potential for Select N+1 problems. Document databases, unfortunately, are still a tough sell in many places that prefer the more familiar relational model. Being able to x-copy Orchard to hosters has also been a basic constraint in the design of Orchard. Combine those with the necessity at the time to run in medium trust, and with license compatibility issues, and you’ll find yourself with very few reasonable choices. So we went, a little reluctantly, for relational SQL stores, with the dream of one day transitioning to document storage. We have played for a while with the idea of building our own document storage on top of SQL databases, and Sébastien implemented something more than decent along those lines, but we had a better way all along that we didn’t notice until recently… In Orchard, there are fields, which are named properties that you can add dynamically to a content part. Because they are so dynamic, we have been storing them as XML into a column on the main content item table. This infoset storage and its associated API are fairly generic, but were only used for fields. The breakthrough was when Sébastien realized how this existing storage could give us the advantages of document storage with minimal changes, while continuing to use relational databases as the substrate. public bool CommercialPrices { get { return this.Retrieve(p => p.CommercialPrices); } set { this.Store(p => p.CommercialPrices, value); } } This code is very compact and efficient because the API can infer from the expression what the type and name of the property are. It is then able to do the proper conversions for you. For this code to work in a content part, there is no need for a record at all. This is particularly nice for site settings: one query on one table and you get everything you need. This shows how the existing infoset solves the data storage problem, but you still need to query. Well, for those properties that need to be filtered and sorted on, you can still use the current record-based relational system. This of course continues to work. We do however provide APIs that make it trivial to store into both record properties and the infoset storage in one operation: public double Price { get { return Retrieve(r => r.Price); } set { Store(r => r.Price, value); } } This code looks strikingly similar to the non-record case above. The difference is that it will manage both the infoset and the record-based storages. The call to the Store method will send the data in both places, keeping them in sync. The call to the Retrieve method does something even cooler: if the property you’re looking for exists in the infoset, it will return it, but if it doesn’t, it will automatically look into the record for it. And if that wasn’t cool enough, it will take that value from the record and store it into the infoset for the next time it’s required. This means that your data will start automagically migrating to infoset storage just by virtue of using the code above instead of the usual: public double Price { get { return Record.Price; } set { Record.Price = value; } } As your users browse the site, it will get faster and faster as Select N+1 issues will optimize themselves away. If you preferred, you could still have explicit migration code, but it really shouldn’t be necessary most of the time. If you do already have code using QueryHints to mitigate Select N+1 issues, you might want to reconsider those, as with the new system, you’ll want to avoid joins that you don’t need for filtering or sorting, further optimizing your queries. There are some rare cases where the storage of the property must be handled differently. Check out this string[] property on SearchSettingsPart for example: public string[] SearchedFields { get { return (Retrieve<string>("SearchedFields") ?? "") .Split(new[] {',', ' '}, StringSplitOptions.RemoveEmptyEntries); } set { Store("SearchedFields", String.Join(", ", value)); } } The array of strings is transformed by the property accessors into and from a comma-separated list stored in a string. The Retrieve and Store overloads used in this case are lower-level versions that explicitly specify the type and name of the attribute to retrieve or store. You may be wondering what this means for code or operations that look directly at the database tables instead of going through the new infoset APIs. Even if there is a record, the infoset version of the property will win if it exists, so it is necessary to keep the infoset up-to-date. It’s not very complicated, but definitely something to keep in mind. Here is what a product record looks like in Nwazet.Commerce for example: And here is the same data in the infoset: The infoset is stored in Orchard_Framework_ContentItemRecord or Orchard_Framework_ContentItemVersionRecord, depending on whether the content type is versionable or not. A good way to find what you’re looking for is to inspect the record table first, as it’s usually easier to read, and then get the item record of the same id. Here is the detailed XML document for this product: <Data> <ProductPart Inventory="40" Price="18" Sku="pi-camera-box" OutOfStockMessage="" AllowBackOrder="false" Weight="0.2" Size="" ShippingCost="null" IsDigital="false" /> <ProductAttributesPart Attributes="" /> <AutoroutePart DisplayAlias="camera-box" /> <TitlePart Title="Nwazet Pi Camera Box" /> <BodyPart Text="[...]" /> <CommonPart CreatedUtc="2013-09-10T00:39:00Z" PublishedUtc="2013-09-14T01:07:47Z" /> </Data> The data is neatly organized under each part. It is easy to see how that document is all you need to know about that content item, all in one table. If you want to modify that data directly in the database, you should be careful to do it in both the record table and the infoset in the content item record. In this configuration, the record is now nothing more than an index, and will only be used for sorting and filtering. Of course, it’s perfectly fine to mix record-backed properties and record-less properties on the same part. It really depends what you think must be sorted and filtered on. In turn, this potentially simplifies migrations considerably. So here it is, the great shift of Orchard to document storage, something that Orchard has been designed for all along, and that we were able to implement with a satisfying and surprising economy of resources. Expect this code to make its way into the 1.8 version of Orchard when that’s available.

Read the article
Optimizing AES modes on Solaris for Intel Westmere

- by danx

Optimizing AES modes on Solaris for Intel Westmere Review AES is a strong method of symmetric (secret-key) encryption. It is a U.S. FIPS-approved cryptographic algorithm (FIPS 197) that operates on 16-byte blocks. AES has been available since 2001 and is widely used. However, AES by itself has a weakness. AES encryption isn't usually used by itself because identical blocks of plaintext are always encrypted into identical blocks of ciphertext. This encryption can be easily attacked with "dictionaries" of common blocks of text and allows one to more-easily discern the content of the unknown cryptotext. This mode of encryption is called "Electronic Code Book" (ECB), because one in theory can keep a "code book" of all known cryptotext and plaintext results to cipher and decipher AES. In practice, a complete "code book" is not practical, even in electronic form, but large dictionaries of common plaintext blocks is still possible. Here's a diagram of encrypting input data using AES ECB mode: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ AESKey-->(AES Encryption) AESKey-->(AES Encryption) | | | | \/ \/ CipherTextOutput CipherTextOutput Block 1 Block 2 What's the solution to the same cleartext input producing the same ciphertext output? The solution is to further process the encrypted or decrypted text in such a way that the same text produces different output. This usually involves an Initialization Vector (IV) and XORing the decrypted or encrypted text. As an example, I'll illustrate CBC mode encryption: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ IV >----->(XOR) +------------->(XOR) +---> . . . . | | | | | | | | \/ | \/ | AESKey-->(AES Encryption) | AESKey-->(AES Encryption) | | | | | | | | | \/ | \/ | CipherTextOutput ------+ CipherTextOutput -------+ Block 1 Block 2 The steps for CBC encryption are: Start with a 16-byte Initialization Vector (IV), choosen randomly. XOR the IV with the first block of input plaintext Encrypt the result with AES using a user-provided key. The result is the first 16-bytes of output cryptotext. Use the cryptotext (instead of the IV) of the previous block to XOR with the next input block of plaintext Another mode besides CBC is Counter Mode (CTR). As with CBC mode, it also starts with a 16-byte IV. However, for subsequent blocks, the IV is just incremented by one. Also, the IV ix XORed with the AES encryption result (not the plain text input). Here's an illustration: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ AESKey-->(AES Encryption) AESKey-->(AES Encryption) | | | | \/ \/ IV >----->(XOR) IV + 1 >---->(XOR) IV + 2 ---> . . . . | | | | \/ \/ CipherTextOutput CipherTextOutput Block 1 Block 2 Optimization Which of these modes can be parallelized? ECB encryption/decryption can be parallelized because it does more than plain AES encryption and decryption, as mentioned above. CBC encryption can't be parallelized because it depends on the output of the previous block. However, CBC decryption can be parallelized because all the encrypted blocks are known at the beginning. CTR encryption and decryption can be parallelized because the input to each block is known--it's just the IV incremented by one for each subsequent block. So, in summary, for ECB, CBC, and CTR modes, encryption and decryption can be parallelized with the exception of CBC encryption. How do we parallelize encryption? By interleaving. Usually when reading and writing data there are pipeline "stalls" (idle processor cycles) that result from waiting for memory to be loaded or stored to or from CPU registers. Since the software is written to encrypt/decrypt the next data block where pipeline stalls usually occurs, we can avoid stalls and crypt with fewer cycles. This software processes 4 blocks at a time, which ensures virtually no waiting ("stalling") for reading or writing data in memory. Other Optimizations Besides interleaving, other optimizations performed are Loading the entire key schedule into the 128-bit %xmm registers. This is done once for per 4-block of data (since 4 blocks of data is processed, when present). The following is loaded: the entire "key schedule" (user input key preprocessed for encryption and decryption). This takes 11, 13, or 15 registers, for AES-128, AES-192, and AES-256, respectively The input data is loaded into another %xmm register The same register contains the output result after encrypting/decrypting Using SSSE 4 instructions (AESNI). Besides the aesenc, aesenclast, aesdec, aesdeclast, aeskeygenassist, and aesimc AESNI instructions, Intel has several other instructions that operate on the 128-bit %xmm registers. Some common instructions for encryption are: pxor exclusive or (very useful), movdqu load/store a %xmm register from/to memory, pshufb shuffle bytes for byte swapping, pclmulqdq carry-less multiply for GCM mode Combining AES encryption/decryption with CBC or CTR modes processing. Instead of loading input data twice (once for AES encryption/decryption, and again for modes (CTR or CBC, for example) processing, the input data is loaded once as both AES and modes operations occur at in the same function Performance Everyone likes pretty color charts, so here they are. I ran these on Solaris 11 running on a Piketon Platform system with a 4-core Intel Clarkdale processor @3.20GHz. Clarkdale which is part of the Westmere processor architecture family. The "before" case is Solaris 11, unmodified. Keep in mind that the "before" case already has been optimized with hand-coded Intel AESNI assembly. The "after" case has combined AES-NI and mode instructions, interleaved 4 blocks at-a-time. « For the first table, lower is better (milliseconds). The first table shows the performance improvement using the Solaris encrypt(1) and decrypt(1) CLI commands. I encrypted and decrypted a 1/2 GByte file on /tmp (swap tmpfs). Encryption improved by about 40% and decryption improved by about 80%. AES-128 is slighty faster than AES-256, as expected. The second table shows more detail timings for CBC, CTR, and ECB modes for the 3 AES key sizes and different data lengths. » The results shown are the percentage improvement as shown by an internal PKCS#11 microbenchmark. And keep in mind the previous baseline code already had optimized AESNI assembly! The keysize (AES-128, 192, or 256) makes little difference in relative percentage improvement (although, of course, AES-128 is faster than AES-256). Larger data sizes show better improvement than 128-byte data. Availability This software is in Solaris 11 FCS. It is available in the 64-bit libcrypto library and the "aes" Solaris kernel module. You must be running hardware that supports AESNI (for example, Intel Westmere and Sandy Bridge, microprocessor architectures). The easiest way to determine if AES-NI is available is with the isainfo(1) command. For example, $ isainfo -v 64-bit amd64 applications pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu 32-bit i386 applications pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu No special configuration or setup is needed to take advantage of this software. Solaris libraries and kernel automatically determine if it's running on AESNI-capable machines and execute the correctly-tuned software for the current microprocessor. Summary Maximum throughput of AES cipher modes can be achieved by combining AES encryption with modes processing, interleaving encryption of 4 blocks at a time, and using Intel's wide 128-bit %xmm registers and instructions. References "Block cipher modes of operation", Wikipedia Good overview of AES modes (ECB, CBC, CTR, etc.) "Advanced Encryption Standard", Wikipedia "Current Modes" describes NIST-approved block cipher modes (ECB,CBC, CFB, OFB, CCM, GCM)

Read the article
SQL SERVER – Core Concepts – Elasticity, Scalability and ACID Properties – Exploring NuoDB an Elastically Scalable Database System

- by pinaldave

I have been recently exploring Elasticity and Scalability attributes of databases. You can see that in my earlier blog posts about NuoDB where I wanted to look at Elasticity and Scalability concepts. The concepts are very interesting, and intriguing as well. I have discussed these concepts with my friend Joyti M and together we have come up with this interesting read. The goal of this article is to answer following simple questions What is Elasticity? What is Scalability? How ACID properties vary from NOSQL Concepts? What are the prevailing problems in the current database system architectures? Why is NuoDB an innovative and welcome change in database paradigm? Elasticity This word’s original form is used in many different ways and honestly it does do a decent job in holding things together over the years as a person grows and contracts. Within the tech world, and specifically related to software systems (database, application servers), it has come to mean a few things - allow stretching of resources without reaching the breaking point (on demand). What are resources in this context? Resources are the usual suspects – RAM/CPU/IO/Bandwidth in the form of a container (a process or bunch of processes combined as modules). When it is about increasing resources the simplest idea which comes to mind is the addition of another container. Another container means adding a brand new physical node. When it is about adding a new node there are two questions which comes to mind. 1) Can we add another node to our software system? 2) If yes, does adding new node cause downtime for the system? Let us assume we have added new node, let us see what the new needs of the system are when a new node is added. Balancing incoming requests to multiple nodes Synchronization of a shared state across multiple nodes Identification of “downstate” and resolution action to bring it to “upstate” Well, adding a new node has its advantages as well. Here are few of the positive points Throughput can increase nearly horizontally across the node throughout the system Response times of application will increase as in-between layer interactions will be improved Now, Let us put the above concepts in the perspective of a Database. When we mention the term “running out of resources” or “application is bound to resources” the resources can be CPU, Memory or Bandwidth. The regular approach to “gain scalability” in the database is to look around for bottlenecks and increase the bottlenecked resource. When we have memory as a bottleneck we look at the data buffers, locks, query plans or indexes. After a point even this is not enough as there needs to be an efficient way of managing such large workload on a “single machine” across memory and CPU bound (right kind of scheduling) workload. We next move on to either read/write separation of the workload or functionality-based sharing so that we still have control of the individual. But this requires lots of planning and change in client systems in terms of knowing where to go/update/read and for reporting applications to “aggregate the data” in an intelligent way. What we ideally need is an intelligent layer which allows us to do these things without us getting into managing, monitoring and distributing the workload. Scalability In the context of database/applications, scalability means three main things Ability to handle normal loads without pressure E.g. X users at the Y utilization of resources (CPU, Memory, Bandwidth) on the Z kind of hardware (4 processor, 32 GB machine with 15000 RPM SATA drives and 1 GHz Network switch) with T throughput Ability to scale up to expected peak load which is greater than normal load with acceptable response times Ability to provide acceptable response times across the system E.g. Response time in S milliseconds (or agreed upon unit of measure) – 90% of the time The Issue – Need of Scale In normal cases one can plan for the load testing to test out normal, peak, and stress scenarios to ensure specific hardware meets the needs. With help from Hardware and Software partners and best practices, bottlenecks can be identified and requisite resources added to the system. Unfortunately this vertical scale is expensive and difficult to achieve and most of the operational people need the ability to scale horizontally. This helps in getting better throughput as there are physical limits in terms of adding resources (Memory, CPU, Bandwidth and Storage) indefinitely. Today we have different options to achieve scalability: Read & Write Separation The idea here is to do actual writes to one store and configure slaves receiving the latest data with acceptable delays. Slaves can be used for balancing out reads. We can also explore functional separation or sharing as well. We can separate data operations by a specific identifier (e.g. region, year, month) and consolidate it for reporting purposes. For functional separation the major disadvantage is when schema changes or workload pattern changes. As the requirement grows one still needs to deal with scale need in manual ways by providing an abstraction in the middle tier code. Using NOSQL solutions The idea is to flatten out the structures in general to keep all values which are retrieved together at the same store and provide flexible schema. The issue with the stores is that they are compromising on mostly consistency (no ACID guarantees) and one has to use NON-SQL dialect to work with the store. The other major issue is about education with NOSQL solutions. Would one really want to make these compromises on the ability to connect and retrieve in simple SQL manner and learn other skill sets? Or for that matter give up on ACID guarantee and start dealing with consistency issues? Hybrid Deployment – Mac, Linux, Cloud, and Windows One of the challenges today that we see across On-premise vs Cloud infrastructure is a difference in abilities. Take for example SQL Azure – it is wonderful in its concepts of throttling (as it is shared deployment) of resources and ability to scale using federation. However, the same abilities are not available on premise. This is not a mistake, mind you – but a compromise of the sweet spot of workloads, customer requirements and operational SLAs which can be supported by the team. In today’s world it is imperative that databases are available across operating systems – which are a commodity and used by developers of all hues. An Ideal Database Ability List A system which allows a linear scale of the system (increase in throughput with reasonable response time) with the addition of resources A system which does not compromise on the ACID guarantees and require developers to learn new paradigms A system which does not force fit a new way interacting with database by learning Non-SQL dialect A system which does not force fit its mechanisms for providing availability across its various modules. Well NuoDB is the first database which has all of the above abilities and much more. In future articles I will cover my hands-on experience with it. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: NuoDB

Read the article
SortedDictionary and SortedList

- by Simon Cooper

Apart from Dictionary<TKey, TValue>, there's two other dictionaries in the BCL - SortedDictionary<TKey, TValue> and SortedList<TKey, TValue>. On the face of it, these two classes do the same thing - provide an IDictionary<TKey, TValue> interface where the iterator returns the items sorted by the key. So what's the difference between them, and when should you use one rather than the other? (as in my previous post, I'll assume you have some basic algorithm & datastructure knowledge) SortedDictionary We'll first cover SortedDictionary. This is implemented as a special sort of binary tree called a red-black tree. Essentially, it's a binary tree that uses various constraints on how the nodes of the tree can be arranged to ensure the tree is always roughly balanced (for more gory algorithmical details, see the wikipedia link above). What I'm concerned about in this post is how the .NET SortedDictionary is actually implemented. In .NET 4, behind the scenes, the actual implementation of the tree is delegated to a SortedSet<KeyValuePair<TKey, TValue>>. One example tree might look like this: Each node in the above tree is stored as a separate SortedSet<T>.Node object (remember, in a SortedDictionary, T is instantiated to KeyValuePair<TKey, TValue>): class Node { public bool IsRed; public T Item; public SortedSet<T>.Node Left; public SortedSet<T>.Node Right; } The SortedSet only stores a reference to the root node; all the data in the tree is accessed by traversing the Left and Right node references until you reach the node you're looking for. Each individual node can be physically stored anywhere in memory; what's important is the relationship between the nodes. This is also why there is no constructor to SortedDictionary or SortedSet that takes an integer representing the capacity; there are no internal arrays that need to be created and resized. This may seen trivial, but it's an important distinction between SortedDictionary and SortedList that I'll cover later on. And that's pretty much it; it's a standard red-black tree. Plenty of webpages and datastructure books cover the algorithms behind the tree itself far better than I could. What's interesting is the comparions between SortedDictionary and SortedList, which I'll cover at the end. As a side point, SortedDictionary has existed in the BCL ever since .NET 2. That means that, all through .NET 2, 3, and 3.5, there has been a bona-fide sorted set class in the BCL (called TreeSet). However, it was internal, so it couldn't be used outside System.dll. Only in .NET 4 was this class exposed as SortedSet. SortedList Whereas SortedDictionary didn't use any backing arrays, SortedList does. It is implemented just as the name suggests; two arrays, one containing the keys, and one the values (I've just used random letters for the values): The items in the keys array are always guarenteed to be stored in sorted order, and the value corresponding to each key is stored in the same index as the key in the values array. In this example, the value for key item 5 is 'z', and for key item 8 is 'm'. Whenever an item is inserted or removed from the SortedList, a binary search is run on the keys array to find the correct index, then all the items in the arrays are shifted to accomodate the new or removed item. For example, if the key 3 was removed, a binary search would be run to find the array index the item was at, then everything above that index would be moved down by one: and then if the key/value pair {7, 'f'} was added, a binary search would be run on the keys to find the index to insert the new item, and everything above that index would be moved up to accomodate the new item: If another item was then added, both arrays would be resized (to a length of 10) before the new item was added to the arrays. As you can see, any insertions or removals in the middle of the list require a proportion of the array contents to be moved; an O(n) operation. However, if the insertion or removal is at the end of the array (ie the largest key), then it's only O(log n); the cost of the binary search to determine it does actually need to be added to the end (excluding the occasional O(n) cost of resizing the arrays to fit more items). As a side effect of using backing arrays, SortedList offers IList Keys and Values views that simply use the backing keys or values arrays, as well as various methods utilising the array index of stored items, which SortedDictionary does not (and cannot) offer. The Comparison So, when should you use one and not the other? Well, here's the important differences: Memory usage SortedDictionary and SortedList have got very different memory profiles. SortedDictionary... has a memory overhead of one object instance, a bool, and two references per item. On 64-bit systems, this adds up to ~40 bytes, not including the stored item and the reference to it from the Node object. stores the items in separate objects that can be spread all over the heap. This helps to keep memory fragmentation low, as the individual node objects can be allocated wherever there's a spare 60 bytes. In contrast, SortedList... has no additional overhead per item (only the reference to it in the array entries), however the backing arrays can be significantly larger than you need; every time the arrays are resized they double in size. That means that if you add 513 items to a SortedList, the backing arrays will each have a length of 1024. To conteract this, the TrimExcess method resizes the arrays back down to the actual size needed, or you can simply assign list.Capacity = list.Count. stores its items in a continuous block in memory. If the list stores thousands of items, this can cause significant problems with Large Object Heap memory fragmentation as the array resizes, which SortedDictionary doesn't have. Performance Operations on a SortedDictionary always have O(log n) performance, regardless of where in the collection you're adding or removing items. In contrast, SortedList has O(n) performance when you're altering the middle of the collection. If you're adding or removing from the end (ie the largest item), then performance is O(log n), same as SortedDictionary (in practice, it will likely be slightly faster, due to the array items all being in the same area in memory, also called locality of reference). So, when should you use one and not the other? As always with these sort of things, there are no hard-and-fast rules. But generally, if you: need to access items using their index within the collection are populating the dictionary all at once from sorted data aren't adding or removing keys once it's populated then use a SortedList. But if you: don't know how many items are going to be in the dictionary are populating the dictionary from random, unsorted data are adding & removing items randomly then use a SortedDictionary. The default (again, there's no definite rules on these sort of things!) should be to use SortedDictionary, unless there's a good reason to use SortedList, due to the bad performance of SortedList when altering the middle of the collection.

Read the article
Monitoring Html Element CSS Changes in JavaScript

- by Rick Strahl

[ updated Feb 15, 2011: Added event unbinding to avoid unintended recursion ] Here's a scenario I've run into on a few occasions: I need to be able to monitor certain CSS properties on an HTML element and know when that CSS element changes. For example, I have a some HTML element behavior plugins like a drop shadow that attaches to any HTML element, but I then need to be able to automatically keep the shadow in sync with the window if the element dragged around the window or moved via code. Unfortunately there's no move event for HTML elements so you can't tell when it's location changes. So I've been looking around for some way to keep track of the element and a specific CSS property, but no luck. I suspect there's nothing native to do this so the only way I could think of is to use a timer and poll rather frequently for the property. I ended up with a generic jQuery plugin that looks like this: (function($){ $.fn.watch = function (props, func, interval, id) { /// <summary> /// Allows you to monitor changes in a specific /// CSS property of an element by polling the value. /// when the value changes a function is called. /// The function called is called in the context /// of the selected element (ie. this) /// </summary> /// <param name="prop" type="String">CSS Properties to watch sep. by commas</param> /// <param name="func" type="Function"> /// Function called when the value has changed. /// </param> /// <param name="interval" type="Number"> /// Optional interval for browsers that don't support DOMAttrModified or propertychange events. /// Determines the interval used for setInterval calls. /// </param> /// <param name="id" type="String">A unique ID that identifies this watch instance on this element</param> /// <returns type="jQuery" /> if (!interval) interval = 200; if (!id) id = "_watcher"; return this.each(function () { var _t = this; var el$ = $(this); var fnc = function () { __watcher.call(_t, id) }; var itId = null; var data = { id: id, props: props.split(","), func: func, vals: [props.split(",").length], fnc: fnc, origProps: props, interval: interval }; $.each(data.props, function (i) { data.vals[i] = el$.css(data.props[i]); }); el$.data(id, data); hookChange(el$, id, data.fnc); }); function hookChange(el$, id, fnc) { el$.each(function () { var el = $(this); if (typeof (el.get(0).onpropertychange) == "object") el.bind("propertychange." + id, fnc); else if ($.browser.mozilla) el.bind("DOMAttrModified." + id, fnc); else itId = setInterval(fnc, interval); }); } function __watcher(id) { var el$ = $(this); var w = el$.data(id); if (!w) return; var _t = this; if (!w.func) return; // must unbind or else unwanted recursion may occur el$.unwatch(id); var changed = false; var i = 0; for (i; i < w.props.length; i++) { var newVal = el$.css(w.props[i]); if (w.vals[i] != newVal) { w.vals[i] = newVal; changed = true; break; } } if (changed) w.func.call(_t, w, i); // rebind event hookChange(el$, id, w.fnc); } } $.fn.unwatch = function (id) { this.each(function () { var el = $(this); var fnc = el.data(id).fnc; try { if (typeof (this.onpropertychange) == "object") el.unbind("propertychange." + id, fnc); else if ($.browser.mozilla) el.unbind("DOMAttrModified." + id, fnc); else clearInterval(id); } // ignore if element was already unbound catch (e) { } }); return this; } })(jQuery); With this I can now monitor movement by monitoring say the top CSS property of the element. The following code creates a box and uses the draggable (jquery.ui) plugin and a couple of custom plugins that center and create a shadow. Here's how I can set this up with the watcher: $("#box") .draggable() .centerInClient() .shadow() .watch("top", function() { $(this).shadow(); },70,"_shadow"); ... $("#box") .unwatch("_shadow") .shadow("remove"); This code basically sets up the window to be draggable and initially centered and then a shadow is added. The .watch() call then assigns a CSS property to monitor (top in this case) and a function to call in response. The component now sets up a setInterval call and keeps on pinging this property every time. When the top value changes the supplied function is called. While this works and I can now drag my window around with the shadow following suit it's not perfect by a long shot. The shadow move is delayed and so drags behind the window, but using a higher timer value is not appropriate either as the UI starts getting jumpy if the timer's set with too small of an increment. This sort of monitor can be useful for other things as well where operations are maybe not quite as time critical as a UI operation taking place. Can anybody see a better a better way of capturing movement of an element on the page?© Rick Strahl, West Wind Technologies, 2005-2011Posted in ASP.NET JavaScript jQuery

Read the article
Modern/Metro Internet Explorer: What were they thinking???

- by Rick Strahl

As I installed Windows 8.1 last week I decided that I really should take a closer look at Internet Explorer in the Modern/Metro environment again. Right away I ran into two issues that are real head scratchers to me.Modern Split Windows don't resize Viewport but Zoom OutThis one falls in the "WTF, really?" department: It looks like Modern Internet Explorer's Modern doesn't resize the browser window as every other browser (including IE 11 on the desktop) does, but rather tries to adjust the zoom to the width of the browser. This means that if you use the Modern IE browser and you split the display between IE and another application, IE will be zoomed out, with text becoming much, much smaller, rather than resizing the browser viewport and adjusting the pixel width as you would when a browser window is typically resized.Here's what I'm talking about in a couple of pictures. First here's the full screen Internet Explorer version (this shot is resized down since it's full screen at 1080p, click to see the full image):This brings up the first issue which is: On the desktop who wants to browse a site full screen? Most sites aren't fully optimized for 1080p widescreen experience and frankly most content that wide just looks weird. Even in typical 10" resolutions of 1280 width it's weird to look at things this way. At least this issue can be worked around with @media queries and either constraining the view, or adding additional content to make use of the extra space. Still running a desktop browser full screen is not optimal on a desktop machine - ever.Regardless, this view, while oversized, is what I expect: Everything is rendered in the right ratios, with font-size and the responsive design styling properly respected.But now look what happens when you split the desktop windows and show half desktop and have modern IE (this screen shot is not resized but cropped - this is actual size content as you can see in the cropped Twitter window on the right half of the screen):What's happening here is that IE is zooming out of the content to make it fit into the smaller width, shrinking the content rather than resizing the viewport's pixel width. In effect it looks like the pixel width stays at 1080px and the viewport expands out height-wise in response resulting in some crazy long portrait view.There goes responsive design - out the window literally. If you've built your site using @media queries and fixed viewport sizes, Internet Explorer completely screws you in this split view. On my 1080p monitor, the site shown at a little under half width becomes completely unreadable as the fonts are too small and break up. As you go into split view and you resize the window handle the content of the browser gets smaller and smaller (and effectively longer and longer on the bottom) effectively throwing off any responsive layout to the point of un-readability even on a big display, let alone a small tablet screen.What could POSSIBLY be the benefit of this screwed up behavior? I checked around a bit trying different pages in this shrunk down view. Other than the Microsoft home page, every page I went to was nearly unreadable at a quarter width. The only page I found that worked 'normally' was the Microsoft home page which undoubtedly is optimized just for Internet Explorer specifically.Bottom Address Bar opaquely overlays ContentAnother problematic feature for me is the browser address bar on the bottom. Modern IE shows the status bar opaquely on the bottom, overlaying the content area of the Web Page - until you click on the page. Until you do though, the address bar overlays the bottom content solidly. And not just a little bit but by good sizable chunk.In the application from the screen shot above I have an application toolbar on the bottom and the IE Address bar completely hides that bottom toolbar when the page is first loaded, until the user clicks into the content at which point the address bar shrinks down to a fat border style bar with a … on it. Toolbars on the bottom are pretty common these days, especially for mobile optimized applications, so I'd say this is a common use case. But even if you don't have toolbars on the bottom maybe there's other fixed content on the bottom of the page that is vital to display. While other browsers often also show address bars and then later hide them, these other browsers tend to resize the viewport when the address bar status changes, so the content can respond to the size change. Not so with Modern IE. The address bar overlays content and stays visible until content is clicked. No resize notification or viewport height change is sent to the browser.So basically Internet Explorer is telling me: "Our toolbar is more important than your content!" - AND gives me no chance to re-act to that behavior. The result on this page/application is that the user sees no actionable operations until he or she clicks into the content area, which is terrible from a UI perspective as the user has no idea what options are available on initial load.It's doubly confounding in that IE is running in full screen mode and has an the entire height of the screen at its disposal - there's plenty of real estate available to not require this sort of hiding of content in the first place. Heck, even Windows Phone with its more constrained size doesn't hide content - in fact the address bar on Windows Phone 8 is always visible.What were they thinking?Every time I use anything in the Modern Metro interface in Windows 8/8.1 I get angry. I can pretty much ignore Metro/Modern for my everyday usage, but unfortunately with Internet Explorer in the modern shell I have to live with, because there will be users using it to access my sites. I think it's inexcusable by Microsoft to build such a crappy shell around the browser that impacts the actual usability of Web content. In both of the cases above I can only scratch my head at what could have possibly motivated anybody designing the UI for the browser to make these screwed up choices, that manipulate the content in a totally unmaintainable way.© Rick Strahl, West Wind Technologies, 2005-2013Posted in Windows HTML5 Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article
SQL SERVER – Using expressor Composite Types to Enforce Business Rules

- by pinaldave

One of the features that distinguish the expressor Data Integration Platform from other products in the data integration space is its concept of composite types, which provide an effective and easily reusable way to clearly define the structure and characteristics of data within your application. An important feature of the composite type approach is that it allows you to easily adjust the content of a record to its ultimate purpose. For example, a record used to update a row in a database table is easily defined to include only the minimum set of columns, that is, a value for the key column and values for only those columns that need to be updated. Much like a class in higher level programming languages, you can also use the composite type as a way to enforce business rules onto your data by encapsulating a datum’s name, data type, and constraints (for example, maximum, minimum, or acceptable values) as a single entity, which ensures that your data can not assume an invalid value. To what extent you use this functionality is a decision you make when designing your application; the expressor design paradigm does not force this approach on you. Let’s take a look at how these features are used. Suppose you want to create a group of applications that maintain the employee table in your human resources database. Your table might have a structure similar to the HumanResources.Employee table in the AdventureWorks database. This table includes two columns, EmployeID and rowguid, that are maintained by the relational database management system; you cannot provide values for these columns when inserting new rows into the table. Additionally, there are columns such as VacationHours and SickLeaveHours that you might choose to update for all employees on a monthly basis, which justifies creation of a dedicated application. By creating distinct composite types for the read, insert and update operations against this table, you can more easily manage this table’s content. When developing this application within expressor Studio, your first task is to create a schema artifact for the database table. This process is completely driven by a wizard, only requiring that you select the desired database schema and table. The resulting schema artifact defines the mapping of result set records to a record within the expressor data integration application. The structure of the record within the expressor application is a composite type that is given the default name CompositeType1. As you can see in the following figure, all columns from the table are included in the result set and mapped to an identically named attribute in the default composite type. If you are developing an application that needs to read this table, perhaps to prepare a year-end report of employees by department, you would probably not be interested in the data in the rowguid and ModifiedDate columns. A typical approach would be to drop this unwanted data in a downstream operator. But using an alternative composite type provides a better approach in which the unwanted data never enters your application. While working in expressor Studio’s schema editor, simply create a second composite type within the same schema artifact, which you could name ReadTable, and remove the attributes corresponding to the unwanted columns. The value of an alternative composite type is even more apparent when you want to insert into or update the table. In the composite type used to insert rows, remove the attributes corresponding to the EmployeeID primary key and rowguid uniqueidentifier columns since these values are provided by the relational database management system. And to update just the VacationHours and SickLeaveHours columns, use a composite type that includes only the attributes corresponding to the EmployeeID, VacationHours, SickLeaveHours and ModifiedDate columns. By specifying this schema artifact and composite type in a Write Table operator, your upstream application need only deal with the four required attributes and there is no risk of unintentionally overwriting a value in a column that does not need to be updated. Now, what about the option to use the composite type to enforce business rules? If you review the composition of the default composite type CompositeType1, you will note that the constraints defined for many of the attributes mirror the table column specifications. For example, the maximum number of characters in the NationaIDNumber, LoginID and Title attributes is equivalent to the maximum width of the target column, and the size of the MaritalStatus and Gender attributes is limited to a single character as required by the table column definition. If your application code leads to a violation of these constraints, an error will be raised. The expressor design paradigm then allows you to handle the error in a way suitable for your application. For example, a string value could be truncated or a numeric value could be rounded. Moreover, you have the option of specifying additional constraints that support business rules unrelated to the table definition. Let’s assume that the only acceptable values for marital status are S, M, and D. Within the schema editor, double-click on the MaritalStatus attribute to open the Edit Attribute window. Then click the Allowed Values checkbox and enter the acceptable values into the Constraint Value text box. The schema editor is updated accordingly. There is one more option that the expressor semantic type paradigm supports. Since the MaritalStatus attribute now clearly specifies how this type of information should be represented (a single character limited to S, M or D), you can convert this attribute definition into a shared type, which will allow you to quickly incorporate this definition into another composite type or into the description of an output record from a transform operator. Again, double-click on the MaritalStatus attribute and in the Edit Attribute window, click Convert, which opens the Share Local Semantic Type window that you use to name this shared type. There’s no requirement that you give the shared type the same name as the attribute from which it was derived. You should supply a name that makes it obvious what the shared type represents. In this posting, I’ve overviewed the expressor semantic type paradigm and shown how it can be used to make your application development process more productive. The beauty of this feature is that you choose when and to what extent you utilize the functionality, but I’m certain that if you opt to follow this approach your efforts will become more efficient and your work will progress more quickly. As always, I encourage you to download and evaluate expressor Studio for your current and future data integration needs. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: CodeProject, Pinal Dave, PostADay, SQL, SQL Authority, SQL Documentation, SQL Query, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

Read the article
SQL SERVER – SSIS Look Up Component – Cache Mode – Notes from the Field #028

- by Pinal Dave

[Notes from Pinal]: Lots of people think that SSIS is all about arranging various operations together in one logical flow. Well, the understanding is absolutely correct, but the implementation of the same is not as easy as it seems. Similarly most of the people think lookup component is just component which does look up for additional information and does not pay much attention to it. Due to the same reason they do not pay attention to the same and eventually get very bad performance. Linchpin People are database coaches and wellness experts for a data driven world. In this 28th episode of the Notes from the Fields series database expert Tim Mitchell (partner at Linchpin People) shares very interesting conversation related to how to write a good lookup component with Cache Mode. In SQL Server Integration Services, the lookup component is one of the most frequently used tools for data validation and completion. The lookup component is provided as a means to virtually join one set of data to another to validate and/or retrieve missing values. Properly configured, it is reliable and reasonably fast. Among the many settings available on the lookup component, one of the most critical is the cache mode. This selection will determine whether and how the distinct lookup values are cached during package execution. It is critical to know how cache modes affect the result of the lookup and the performance of the package, as choosing the wrong setting can lead to poorly performing packages, and in some cases, incorrect results. Full Cache The full cache mode setting is the default cache mode selection in the SSIS lookup transformation. Like the name implies, full cache mode will cause the lookup transformation to retrieve and store in SSIS cache the entire set of data from the specified lookup location. As a result, the data flow in which the lookup transformation resides will not start processing any data buffers until all of the rows from the lookup query have been cached in SSIS. The most commonly used cache mode is the full cache setting, and for good reason. The full cache setting has the most practical applications, and should be considered the go-to cache setting when dealing with an untested set of data. With a moderately sized set of reference data, a lookup transformation using full cache mode usually performs well. Full cache mode does not require multiple round trips to the database, since the entire reference result set is cached prior to data flow execution. There are a few potential gotchas to be aware of when using full cache mode. First, you can see some performance issues – memory pressure in particular – when using full cache mode against large sets of reference data. If the table you use for the lookup is very large (either deep or wide, or perhaps both), there’s going to be a performance cost associated with retrieving and caching all of that data. Also, keep in mind that when doing a lookup on character data, full cache mode will always do a case-sensitive (and in some cases, space-sensitive) string comparison even if your database is set to a case-insensitive collation. This is because the in-memory lookup uses a .NET string comparison (which is case- and space-sensitive) as opposed to a database string comparison (which may be case sensitive, depending on collation). There’s a relatively easy workaround in which you can use the UPPER() or LOWER() function in the pipeline data and the reference data to ensure that case differences do not impact the success of your lookup operation. Again, neither of these present a reason to avoid full cache mode, but should be used to determine whether full cache mode should be used in a given situation. Full cache mode is ideally useful when one or all of the following conditions exist: The size of the reference data set is small to moderately sized The size of the pipeline data set (the data you are comparing to the lookup table) is large, is unknown at design time, or is unpredictable Each distinct key value(s) in the pipeline data set is expected to be found multiple times in that set of data Partial Cache When using the partial cache setting, lookup values will still be cached, but only as each distinct value is encountered in the data flow. Initially, each distinct value will be retrieved individually from the specified source, and then cached. To be clear, this is a row-by-row lookup for each distinct key value(s). This is a less frequently used cache setting because it addresses a narrower set of scenarios. Because each distinct key value(s) combination requires a relational round trip to the lookup source, performance can be an issue, especially with a large pipeline data set to be compared to the lookup data set. If you have, for example, a million records from your pipeline data source, you have the potential for doing a million lookup queries against your lookup data source (depending on the number of distinct values in the key column(s)). Therefore, one has to be keenly aware of the expected row count and value distribution of the pipeline data to safely use partial cache mode. Using partial cache mode is ideally suited for the conditions below: The size of the data in the pipeline (more specifically, the number of distinct key column) is relatively small The size of the lookup data is too large to effectively store in cache The lookup source is well indexed to allow for fast retrieval of row-by-row values No Cache As you might guess, selecting no cache mode will not add any values to the lookup cache in SSIS. As a result, every single row in the pipeline data set will require a query against the lookup source. Since no data is cached, it is possible to save a small amount of overhead in SSIS memory in cases where key values are not reused. In the real world, I don’t see a lot of use of the no cache setting, but I can imagine some edge cases where it might be useful. As such, it’s critical to know your data before choosing this option. Obviously, performance will be an issue with anything other than small sets of data, as the no cache setting requires row-by-row processing of all of the data in the pipeline. I would recommend considering the no cache mode only when all of the below conditions are true: The reference data set is too large to reasonably be loaded into SSIS memory The pipeline data set is small and is not expected to grow There are expected to be very few or no duplicates of the key values(s) in the pipeline data set (i.e., there would be no benefit from caching these values) Conclusion The cache mode, an often-overlooked setting on the SSIS lookup component, represents an important design decision in your SSIS data flow. Choosing the right lookup cache mode directly impacts the fidelity of your results and the performance of package execution. Know how this selection impacts your ETL loads, and you’ll end up with more reliable, faster packages. If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: SSIS

Read the article
C# Performance Pitfall – Interop Scenarios Change the Rules

- by Reed

C# and .NET, overall, really do have fantastic performance in my opinion. That being said, the performance characteristics dramatically differ from native programming, and take some relearning if you’re used to doing performance optimization in most other languages, especially C, C++, and similar. However, there are times when revisiting tricks learned in native code play a critical role in performance optimization in C#. I recently ran across a nasty scenario that illustrated to me how dangerous following any fixed rules for optimization can be… The rules in C# when optimizing code are very different than C or C++. Often, they’re exactly backwards. For example, in C and C++, lifting a variable out of loops in order to avoid memory allocations often can have huge advantages. If some function within a call graph is allocating memory dynamically, and that gets called in a loop, it can dramatically slow down a routine. This can be a tricky bottleneck to track down, even with a profiler. Looking at the memory allocation graph is usually the key for spotting this routine, as it’s often “hidden” deep in call graph. For example, while optimizing some of my scientific routines, I ran into a situation where I had a loop similar to: for (i=0; i<numberToProcess; ++i) { // Do some work ProcessElement(element[i]); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This loop was at a fairly high level in the call graph, and often could take many hours to complete, depending on the input data. As such, any performance optimization we could achieve would be greatly appreciated by our users. After a fair bit of profiling, I noticed that a couple of function calls down the call graph (inside of ProcessElement), there was some code that effectively was doing: // Allocate some data required DataStructure* data = new DataStructure(num); // Call into a subroutine that passed around and manipulated this data highly CallSubroutine(data); // Read and use some values from here double values = data->Foo; // Cleanup delete data; // ... return bar; Normally, if “DataStructure” was a simple data type, I could just allocate it on the stack. However, it’s constructor, internally, allocated it’s own memory using new, so this wouldn’t eliminate the problem. In this case, however, I could change the call signatures to allow the pointer to the data structure to be passed into ProcessElement and through the call graph, allowing the inner routine to reuse the same “data” memory instead of allocating. At the highest level, my code effectively changed to something like: DataStructure* data = new DataStructure(numberToProcess); for (i=0; i<numberToProcess; ++i) { // Do some work ProcessElement(element[i], data); } delete data; Granted, this dramatically reduced the maintainability of the code, so it wasn’t something I wanted to do unless there was a significant benefit. In this case, after profiling the new version, I found that it increased the overall performance dramatically – my main test case went from 35 minutes runtime down to 21 minutes. This was such a significant improvement, I felt it was worth the reduction in maintainability. In C and C++, it’s generally a good idea (for performance) to: Reduce the number of memory allocations as much as possible, Use fewer, larger memory allocations instead of many smaller ones, and Allocate as high up the call stack as possible, and reuse memory I’ve seen many people try to make similar optimizations in C# code. For good or bad, this is typically not a good idea. The garbage collector in .NET completely changes the rules here. In C#, reallocating memory in a loop is not always a bad idea. In this scenario, for example, I may have been much better off leaving the original code alone. The reason for this is the garbage collector. The GC in .NET is incredibly effective, and leaving the allocation deep inside the call stack has some huge advantages. First and foremost, it tends to make the code more maintainable – passing around object references tends to couple the methods together more than necessary, and overall increase the complexity of the code. This is something that should be avoided unless there is a significant reason. Second, (unlike C and C++) memory allocation of a single object in C# is normally cheap and fast. Finally, and most critically, there is a large advantage to having short lived objects. If you lift a variable out of the loop and reuse the memory, its much more likely that object will get promoted to Gen1 (or worse, Gen2). This can cause expensive compaction operations to be required, and also lead to (at least temporary) memory fragmentation as well as more costly collections later. As such, I’ve found that it’s often (though not always) faster to leave memory allocations where you’d naturally place them – deep inside of the call graph, inside of the loops. This causes the objects to stay very short lived, which in turn increases the efficiency of the garbage collector, and can dramatically improve the overall performance of the routine as a whole. In C#, I tend to: Keep variable declarations in the tightest scope possible Declare and allocate objects at usage While this tends to cause some of the same goals (reducing unnecessary allocations, etc), the goal here is a bit different – it’s about keeping the objects rooted for as little time as possible in order to (attempt) to keep them completely in Gen0, or worst case, Gen1. It also has the huge advantage of keeping the code very maintainable – objects are used and “released” as soon as possible, which keeps the code very clean. It does, however, often have the side effect of causing more allocations to occur, but keeping the objects rooted for a much shorter time. Now – nowhere here am I suggesting that these rules are hard, fast rules that are always true. That being said, my time spent optimizing over the years encourages me to naturally write code that follows the above guidelines, then profile and adjust as necessary. In my current project, however, I ran across one of those nasty little pitfalls that’s something to keep in mind – interop changes the rules. In this case, I was dealing with an API that, internally, used some COM objects. In this case, these COM objects were leading to native allocations (most likely C++) occurring in a loop deep in my call graph. Even though I was writing nice, clean managed code, the normal managed code rules for performance no longer apply. After profiling to find the bottleneck in my code, I realized that my inner loop, a innocuous looking block of C# code, was effectively causing a set of native memory allocations in every iteration. This required going back to a “native programming” mindset for optimization. Lifting these variables and reusing them took a 1:10 routine down to 0:20 – again, a very worthwhile improvement. Overall, the lessons here are: Always profile if you suspect a performance problem – don’t assume any rule is correct, or any code is efficient just because it looks like it should be Remember to check memory allocations when profiling, not just CPU cycles Interop scenarios often cause managed code to act very differently than “normal” managed code. Native code can be hidden very cleverly inside of managed wrappers

Read the article
Is Financial Inclusion an Obligation or an Opportunity for Banks?

- by tushar.chitra

Why should banks care about financial inclusion? First, the statistics, I think this will set the tone for this blog post. There are close to 2.5 billion people who are excluded from the banking stream and out of this, 2.2 billion people are from the continents of Africa, Latin America and Asia (McKinsey on Society: Global Financial Inclusion). However, this is not just a third-world phenomenon. According to Federal Deposit Insurance Corp (FDIC), in the US, post 2008 financial crisis, one family out of five has either opted out of the banking system or has been moved out (American Banker). Moving this huge unbanked population into mainstream banking is both an opportunity and a challenge for banks. An obvious opportunity is the significant untapped customer base that banks can target, so is the positive brand equity a bank can build by fulfilling its social responsibilities. Also, as banks target the cost-conscious unbanked customer, they will be forced to look at ways to offer cost-effective products and services, necessitating technology upgrades and innovations. However, cost is not the only hurdle in increasing the adoption of banking services. The potential users need to be convinced of the benefits of banking and banks will also face stiff competition from unorganized players. Finally, the banks will have to believe in the viability of this business opportunity, and not treat financial inclusion as an obligation. In what ways can banks target the unbanked For financial inclusion to be a success, banks should adopt innovative business models to develop products that address the stated and unstated needs of the unbanked population and also design delivery channels that are cost effective and viable in the long run. Through business correspondents and facilitators In rural and remote areas, one of the major hurdles in increasing banking penetration is connectivity and accessibility to banking services, which makes last mile inclusion a daunting challenge. To address this, banks can avail the services of business correspondents or facilitators. This model allows banks to establish greater connectivity through a trusted and reliable intermediary. In India, for instance, banks can leverage the local Kirana stores (the mom & pop stores) to service rural and remote areas. With a supportive nudge from the central bank, the commercial banks can enlist these shop owners as business correspondents to increase their reach. Since these neighborhood stores are acquainted with the local population, they can help banks manage the KYC norms, besides serving as a conduit for remittance. Banks also have an opportunity over a period of time to cross-sell other financial products such as micro insurance, mutual funds and pension products through these correspondents. To exercise greater operational control over the business correspondents, banks can also adopt a combination of branch and business correspondent models to deliver financial inclusion. Through mobile devices According to a 2012 world bank report on financial inclusion, out of a world population of 7 billion, over 5 billion or 70% have mobile phones and only 2 billion or 30% have a bank account. What this means for banks is that there is scope for them to leverage this phenomenal growth in mobile usage to serve the unbanked population. Banks can use mobile technology to service the basic banking requirements of their customers with no frills accounts, effectively bringing down the cost per transaction. As I had discussed in my earlier post on mobile payments, though non-traditional players have taken the lead in P2P mobile payments, banks still hold an edge in terms of infrastructure and reliability. Through crowd-funding According to the Crowdfunding Industry Report by Massolution, the global crowdfunding industry raised $2.7 billion in 2012, and is projected to grow to $5.1 billion in 2013. With credit policies becoming tighter and banks becoming more circumspect in terms of loan disbursals, crowdfunding has emerged as an alternative channel for lending. Typically, these initiatives target the unbanked population by offering small loans that are unviable for larger banks. Though a significant proportion of crowdfunding initiatives globally are run by non-banking institutions, banks are also venturing into this space. The next step towards inclusive finance Banks by themselves cannot make financial inclusion a success. There is a need for a whole ecosystem that is supportive of this mission. The policy makers, that include the regulators and government bodies, must be in sync, the IT solution providers must put on their thinking caps to come out with innovative products and solutions, communication channels such as internet and mobile need to expand their reach, and the media and the public need to play an active part. The other challenge for financial inclusion is from the banks themselves. While it is true that financial inclusion will unleash a hitherto hugely untapped market, the normal banking model may be found wanting because of issues such as flexibility, convenience and reliability. The business will be viable only when there is a focus on increasing the usage of existing infrastructure and that is possible when the banks can offer the entire range of products and services to the large number of users of essential banking services. Apart from these challenges, banks will also have to quickly master and replicate the business model to extend their reach to the remotest regions in their respective geographies. They will need to ensure that the transactions deliver a viable business benefit to the bank. For tapping cross-sell opportunities, banks will have to quickly roll-out customized and segment-specific products. The bank staff should be brought in sync with the business plan by convincing them of the viability of the business model and the need for a business correspondent delivery model. Banks, in collaboration with the government and NGOs, will have to run an extensive financial literacy program to educate the unbanked about the benefits of banking. Finally, with the growing importance of retail banking and with many unconventional players eyeing the opportunity in payments and other lucrative areas of banking, banks need to understand the importance of micro and small branches. These micro and small branches can help banks increase their presence without a huge cost burden, provide bankers an opportunity to cross sell micro products and offer a window of opportunity for the large non-banked population to transact without any interference from intermediaries. These branches can also help diminish the role of the unorganized financial sector, such as local moneylenders and unregistered credit societies. This will also help banks build a brand awareness and loyalty among the users, which by itself has a cascading effect on the business operations, especially among the rural and un-banked centers. In conclusion, with the increasingly competitive banking sector facing frequent slowdowns and downturns, the unbanked population presents a huge opportunity for banks to enhance their customer base and fulfill their social responsibility.

Read the article
The Internet of Things Is Really the Internet of People

- by HCM-Oracle

By Mark Hurd - Originally Posted on LinkedIn As I speak with CEOs around the world, our conversations invariably come down to this central question: Can we change our corporate cultures and the ways we train and reward our people as rapidly as new technology is changing the work we do, the products we make and how we engage with customers? It’s a critical consideration given today’s pace of disruption, which already is straining traditional management models and HR strategies. Winning companies will bring innovation and vision to their employees and partners by attracting people who will thrive in this emerging world of relentless data, predictive analytics and unlimited what-if scenarios. So, where are we going to find employees who are as familiar with complex data as I am with orderly financial statements and business plans? I’m not just talking about high-end data scientists who most certainly will sit at or near the top of the new decision-making pyramid. Global organizations will need creative and motivated people who will devote their time to manipulating, reviewing, analyzing, sorting and reshaping data to drive business and delight customers. This might seem evident, but my conversations with business people across the globe indicate that only a small number of companies get it. In the past few years, executives have been busy keeping pace with seismic upheavals, including the rise of social customer engagement, the rapid acceleration of product-development cycles and the relentless move to mobile-first. But all of that, I think, is the start of an uphill climb to the top of a roller-coaster. Today, about 10 billion devices across the globe are connected to the Internet. In a couple of years, that number will probably double, and not because we will have bought 10 billion more computers, smart phones and tablets. This unprecedented explosion of Big Data is being triggered by the Internet of Things, which is another way of saying that the numerous intelligent devices touching our everyday lives are all becoming interconnected. Home appliances, food, industrial equipment, pets, pharmaceutical products, pallets, cars, luggage, packaged goods, athletic equipment, even clothing will be streaming data. Some data will provide important information about how to run our businesses and lead healthier lives. Much of it will be extraneous. How does a CEO cope with this unimaginable volume and velocity of data, much less harness it to excite and delight customers? Here are three things CEOs must do to tackle this challenge: 1) Take care of your employees, take care of your customers. Larry Ellison recently noted that the two most important priorities for any CEO today revolve around people: Taking care of your employees and taking care of your customers. Companies in today’s hypercompetitive business environment simply won’t be able to survive unless they’ve got world-class people at all levels of the organization. CEOs must demonstrate a commitment to employees by becoming champions for HR systems that empower every employee to fully understand his or her job, how it ties into the corporate framework, what’s expected of them, what training is available, and how they can use an embedded social network to communicate, collaborate and excel. Over the next several years, many of the world’s top industrialized economies will see a turnover in the workforce on an unprecedented scale. Across the United States, Europe, China and Japan, the “baby boomer” generation will be retiring and, by 2020, we’ll see turnovers in those regions ranging from 10 to 30 percent. How will companies replace all that brainpower, experience and know-how? How will CEOs perpetuate the best elements of their corporate cultures in the midst of this profound turnover? The challenge will be daunting, but it can be met with world-class HR technology. As companies begin replacing up to 30 percent of their workforce, they will need thousands of new types of data-native workers to exploit the Internet of Things in the service of the Internet of People. The shift in corporate mindset here can’t be overstated. The CEO has to be at the forefront of this new way of recruiting, training, motivating, aligning and developing truly 21-century talent. 2) Start thinking today about the Internet of People. Some forward-looking companies have begun pursuing the “democratization of data.” This allows more people within a company greater access to data that can help them make better decisions, move more quickly and keep pace with the changing interests and demands of their customers. As a result, we’ve seen organizations flatten out, growing numbers of well-informed people authorized to make decisions without corporate approval and a movement of engagement away from headquarters to the point of contact with the customer. These are profound changes, and I’m a huge proponent. As I think about what the next few years will bring as companies become deluged with unprecedented streams of data, I’m convinced that we’ll need dramatically different organizational structures, decision-making models, risk-management profiles and reward systems. For example, if a car company’s marketing department mines incoming data to determine that customers are shifting rapidly toward neon-green models, how many layers of approval, review, analysis and sign-off will be needed before the factory starts cranking out more neon-green cars? Will we continue to have organizations where too many people are empowered to say “No” and too few are allowed to say “Yes”? If so, how will those companies be able to compete in a world in which customers have more choices, instant access to more information and less loyalty than ever before? That’s why I think CEOs need to begin thinking about this problem right now, not in a year or two when competitors are already reshaping their organizations to match the marketplace’s new realities. 3) Partner with universities to help create a new type of highly skilled workers. Several years ago, universities introduced new undergraduate as well as graduate-level programs in analytics and informatics as the business need for deeper insights into the booming world of data began to explode. Today, as the growth rate of data continues to soar, we know that the Internet of Things will only intensify that growth. Moreover, as Big Data fuels insights that can be shaped into products and services that generate revenue, the demand for data scientists and data specialists will go on unabated. Beyond that top-level expertise, companies are going to need data-native thinkers at all levels of the organization. Where will this new type of worker come from? I think it’s incumbent on the business community to collaborate with universities to develop new curricula designed to turn out graduates who can capitalize on the data-driven world that the Internet of Things is surely going to create. These new workers will create opportunities to help their companies in fields as diverse as product design, customer service, marketing, manufacturing and distribution. They will become innovative leaders in fashioning an entirely new type of workforce and organizational structure optimized to fully exploit the Internet of Things so that it becomes a high-value enabler of the Internet of People. Mark Hurd is President of Oracle Corporation and a member of the company's Board of Directors. He joined Oracle in 2010, bringing more than 30 years of technology industry leadership, computer hardware expertise, and executive management experience to his role with the company. As President, Mr. Hurd oversees the corporate direction and strategy for Oracle's global field operations, including marketing, sales, consulting, alliances and channels, and support. He focuses on strategy, leadership, innovation, and customers.

Read the article
Nashorn, the rhino in the room

- by costlow

Nashorn is a new runtime within JDK 8 that allows developers to run code written in JavaScript and call back and forth with Java. One advantage to the Nashorn scripting engine is that is allows for quick prototyping of functionality or basic shell scripts that use Java libraries. The previous JavaScript runtime, named Rhino, was introduced in JDK 6 (released 2006, end of public updates Feb 2013). Keeping tradition amongst the global developer community, "Nashorn" is the German word for rhino. The Java platform and runtime is an intentional home to many languages beyond the Java language itself. OpenJDK’s Da Vinci Machine helps coordinate work amongst language developers and tool designers and has helped different languages by introducing the Invoke Dynamic instruction in Java 7 (2011), which resulted in two major benefits: speeding up execution of dynamic code, and providing the groundwork for Java 8’s lambda executions. Many of these improvements are discussed at the JVM Language Summit, where language and tool designers get together to discuss experiences and issues related to building these complex components. There are a number of benefits to running JavaScript applications on JDK 8’s Nashorn technology beyond writing scripts quickly: Interoperability with Java and JavaScript libraries. Scripts do not need to be compiled. Fast execution and multi-threading of JavaScript running in Java’s JRE. The ability to remotely debug applications using an IDE like NetBeans, Eclipse, or IntelliJ (instructions on the Nashorn blog). Automatic integration with Java monitoring tools, such as performance, health, and SIEM. In the remainder of this blog post, I will explain how to use Nashorn and the benefit from those features. Nashorn execution environment The Nashorn scripting engine is included in all versions of Java SE 8, both the JDK and the JRE. Unlike Java code, scripts written in nashorn are interpreted and do not need to be compiled before execution. Developers and users can access it in two ways: Users running JavaScript applications can call the binary directly:jre8/bin/jjs This mechanism can also be used in shell scripts by specifying a shebang like #!/usr/bin/jjs Developers can use the API and obtain a ScriptEngine through:ScriptEngine engine = new ScriptEngineManager().getEngineByName("nashorn"); When using a ScriptEngine, please understand that they execute code. Avoid running untrusted scripts or passing in untrusted/unvalidated inputs. During compilation, consider isolating access to the ScriptEngine and using Type Annotations to only allow @Untainted String arguments. One noteworthy difference between JavaScript executed in or outside of a web browser is that certain objects will not be available. For example when run outside a browser, there is no access to a document object or DOM tree. Other than that, all syntax, semantics, and capabilities are present. Examples of Java and JavaScript The Nashorn script engine allows developers of all experience levels the ability to write and run code that takes advantage of both languages. The specific dialect is ECMAScript 5.1 as identified by the User Guide and its standards definition through ECMA international. In addition to the example below, Benjamin Winterberg has a very well written Java 8 Nashorn Tutorial that provides a large number of code samples in both languages. Basic Operations A basic Hello World application written to run on Nashorn would look like this: #!/usr/bin/jjs print("Hello World"); The first line is a standard script indication, so that Linux or Unix systems can run the script through Nashorn. On Windows where scripts are not as common, you would run the script like: jjs helloWorld.js. Receiving Arguments In order to receive program arguments your jjs invocation needs to use the -scripting flag and a double-dash to separate which arguments are for jjs and which are for the script itself:jjs -scripting print.js -- "This will print" #!/usr/bin/jjs var whatYouSaid = $ARG.length==0 ? "You did not say anything" : $ARG[0] print(whatYouSaid); Interoperability with Java libraries (including 3rd party dependencies) Another goal of Nashorn was to allow for quick scriptable prototypes, allowing access into Java types and any libraries. Resources operate in the context of the script (either in-line with the script or as separate threads) so if you open network sockets and your script terminates, those sockets will be released and available for your next run. Your code can access Java types the same as regular Java classes. The “import statements” are written somewhat differently to accommodate for language. There is a choice of two styles: For standard classes, just name the class: var ServerSocket = java.net.ServerSocket For arrays or other items, use Java.type: var ByteArray = Java.type("byte[]")You could technically do this for all. The same technique will allow your script to use Java types from any library or 3rd party component and quickly prototype items. Building a user interface One major difference between JavaScript inside and outside of a web browser is the availability of a DOM object for rendering views. When run outside of the browser, JavaScript has full control to construct the entire user interface with pre-fabricated UI controls, charts, or components. The example below is a variation from the Nashorn and JavaFX guide to show how items work together. Nashorn has a -fx flag to make the user interface components available. With the example script below, just specify: jjs -fx -scripting fx.js -- "My title" #!/usr/bin/jjs -fx var Button = javafx.scene.control.Button; var StackPane = javafx.scene.layout.StackPane; var Scene = javafx.scene.Scene; var clickCounter=0; $STAGE.title = $ARG.length>0 ? $ARG[0] : "You didn't provide a title"; var button = new Button(); button.text = "Say 'Hello World'"; button.onAction = myFunctionForButtonClicking; var root = new StackPane(); root.children.add(button); $STAGE.scene = new Scene(root, 300, 250); $STAGE.show(); function myFunctionForButtonClicking(){ var text = "Click Counter: " + clickCounter; button.setText(text); clickCounter++; print(text); } For a more advanced post on using Nashorn to build a high-performing UI, see JavaFX with Nashorn Canvas example. Interoperable with frameworks like Node, Backbone, or Facebook React The major benefit of any language is the interoperability gained by people and systems that can read, write, and use it for interactions. Because Nashorn is built for the ECMAScript specification, developers familiar with JavaScript frameworks can write their code and then have system administrators deploy and monitor the applications the same as any other Java application. A number of projects are also running Node applications on Nashorn through Project Avatar and the supported modules. In addition to the previously mentioned Nashorn tutorial, Benjamin has also written a post about Using Backbone.js with Nashorn. To show the multi-language power of the Java Runtime, there is another interesting example that unites Facebook React and Clojure on JDK 8’s Nashorn. Summary Nashorn provides a simple and fast way of executing JavaScript applications and bridging between the best of each language. By making the full range of Java libraries to JavaScript applications, and the quick prototyping style of JavaScript to Java applications, developers are free to work as they see fit. Software Architects and System Administrators can take advantage of one runtime and leverage any work that they have done to tune, monitor, and certify their systems. Additional information is available within: The Nashorn Users’ Guide Java Magazine’s article "Next Generation JavaScript Engine for the JVM." The Nashorn team’s primary blog or a very helpful collection of Nashorn links.

Read the article

Search Results

Search found 3996 results on 160 pages for 'operations'.

Page 145/160 | < Previous Page | 141 142 143 144 145 146 147 148 149 150 151 152 | Next Page >

- by david.butler(at)oracle.com

- by Brian

- by Geordie

- by Eric Bezille

- by Pinal Dave

- by Greg Jensen

- by Rick Strahl

- by James Michael Hare

- by Pinal Dave

- by Bertrand Le Roy

- by Rick Strahl

- by Rob Farley

- by Rob Farley

- by Bertrand Le Roy

- by danx

- by pinaldave

- by Simon Cooper

- by Rick Strahl

- by Rick Strahl

- by pinaldave

- by Pinal Dave

- by Reed

- by tushar.chitra

- by HCM-Oracle

- by costlow

< Previous Page | 141 142 143 144 145 146 147 148 149 150 151 152 | Next Page >