sql 2005 - Page 732 - Developer IT

How the "migrations" approach makes database continuous integration possible

- by David Atkinson

Testing a database upgrade script as part of a continuous integration process will only work if there is an easy way to automate the generation of the upgrade scripts. There are two common approaches to managing upgrade scripts. The first is to maintain a set of scripts as-you-go-along. Many SQL developers I've encountered will store these in a folder prefixed numerically to ensure they are ordered as they are intended to be run. Occasionally there is an accompanying document or a batch file that ensures that the scripts are run in the defined order. Writing these scripts during the course of development requires discipline. It's all too easy to load up the table designer and to make a change directly to the development database, rather than to save off the ALTER statement that is required when the same change is made to production. This discipline can add considerable overhead to the development process. However, come the end of the project, everything is ready for final testing and deployment. The second development paradigm is to not do the above. Changes are made to the development database without considering the incremental update scripts required to effect the changes. At the end of the project, the SQL developer or DBA, is tasked to work out what changes have been made, and to hand-craft the upgrade scripts retrospectively. The end of the project is the wrong time to be doing this, as the pressure is mounting to ship the product. And where data deployment is involved, it is prudent not to feel rushed. Schema comparison tools such as SQL Compare have made this latter technique more bearable. These tools work by analyzing the before and after states of a database schema, and calculating the SQL required to transition the database. Problem solved? Not entirely. Schema comparison tools are huge time savers, but they have their limitations. There are certain changes that can be made to a database that can't be determined purely from observing the static schema states. If a column is split, how do we determine the algorithm required to copy the data into the new columns? If a NOT NULL column is added without a default, how do we populate the new field for existing records in the target? If we rename a table, how do we know we've done a rename, as we could equally have dropped a table and created a new one? All the above are examples of situations where developer intent is required to supplement the script generation engine. SQL Source Control 3 and SQL Compare 10 introduced a new feature, migration scripts, allowing developers to add custom scripts to replace the default script generation behavior. These scripts are committed to source control alongside the schema changes, and are associated with one or more changesets. Before this capability was introduced, any schema change that required additional developer intent would break any attempt at auto-generation of the upgrade script, rendering deployment testing as part of continuous integration useless. SQL Compare will now generate upgrade scripts not only using its diffing engine, but also using the knowledge supplied by developers in the guise of migration scripts. In future posts I will describe the necessary command line syntax to leverage this feature as part of an automated build process such as continuous integration.

Read the article

Trace flags - TF 1117

- by Damian

I had a session about trace flags this year on the SQL Day 2014 conference that was held in Wroclaw at the end of April. The session topic is important to most of DBA's and the reason I did it was that I sometimes forget about various trace flags :). So I decided to prepare a presentation but I think it is a good idea to write posts about trace flags, too. Let's start then - today I will describe the TF 1117. I assume that we all know how to setup a TF using starting parameters or registry or in the session or on the query level. I will always write if a trace flag is local or global to make sure we know how to use it. Why do we need this trace flag? Let’s create a test database first. This is quite ordinary database as it has two data files (4 MB each) and a log file that has 1MB. The data files are able to expand by 1 MB and the log file grows by 10%: USE [master] GO CREATE DATABASE [TF1117] ON PRIMARY ( NAME = N'TF1117', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.SQL2014\MSSQL\DATA\TF1117.mdf' , SIZE = 4096KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB ), ( NAME = N'TF1117_1', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.SQL2014\MSSQL\DATA\TF1117_1.ndf' , SIZE = 4096KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB ) LOG ON ( NAME = N'TF1117_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.SQL2014\MSSQL\DATA\TF1117_log.ldf' , SIZE = 1024KB , MAXSIZE = 2048GB , FILEGROWTH = 10% ) GO Without the TF 1117 turned on the data files don’t grow all up at once. When a first file is full the SQL Server expands it but the other file is not expanded until is full. Why is that so important? The SQL Server proportional fill algorithm will direct new extent allocations to the file with the most available space so new extents will be written to the file that was just expanded. When the TF 1117 is enabled it will cause all files to auto grow by their specified increment. That means all files will have the same percent of free space so we still have the benefit of evenly distributed IO. The TF 1117 is global flag so it affects all databases on the instance. Of course if a filegroup contains only one file the TF does not have any effect on it. Now let’s do a simple test. First let’s create a table in which every row will fit to a single page: The table definition is pretty simple as it has two integer columns and one character column of fixed size 8000 bytes: create table TF1117Tab ( col1 int, col2 int, col3 char (8000) ) go Now I load some data to the table to make sure that one of the data file must grow: declare @i int select @i = 1 while (@i < 800) begin insert into TF1117Tab values (@i, @i+1000, 'hello') select @i= @i + 1 end I can check the actual file size in the sys.database_files DMV: SELECT name, (size*8)/1024 'Size in MB' FROM sys.database_files GO As you can see only the first data file was expanded and the other has still the initial size: name Size in MB --------------------- ----------- TF1117 5 TF1117_log 1 TF1117_1 4 There is also other methods of looking at the events of file autogrows. One possibility is to create an Extended Events session and the other is to look into the default trace file: DECLARE @path NVARCHAR(260); SELECT @path = REVERSE(SUBSTRING(REVERSE([path]), CHARINDEX('\', REVERSE([path])), 260)) + N'log.trc' FROM sys.traces WHERE is_default = 1; SELECT DatabaseName, [FileName], SPID, Duration, StartTime, EndTime, FileType = CASE EventClass WHEN 92 THEN 'Data' WHEN 93 THEN 'Log' END FROM sys.fn_trace_gettable(@path, DEFAULT) WHERE EventClass IN (92,93) AND StartTime >'2014-07-12' AND DatabaseName = N'TF1117' ORDER BY StartTime DESC; After running the query I can see the file was expanded and how long did the process take which might be useful from the performance perspective. Now it’s time to turn on the flag 1117. DBCC TRACEON(1117) I dropped the database and recreated it once again. Then I ran the queries and observed the results. After loading the records I see that both files were evenly expanded: name Size in MB --------------------- ----------- TF1117 5 TF1117_log 1 TF1117_1 5 I found also information in the default trace. The query returned three rows. The last one is connected to my first experiment when the TF was turned off. The two rows shows that first file was expanded by 1MB and right after that operation the second file was expanded, too. This is what is this TF all about J

Read the article

Database Web Service using Toplink DB Provider

- by Vishal Jain

With JDeveloper 11gR2 you can now create database based web services using JAX-WS Provider. The key differences between this and the already existing PL/SQL Web Services support is:Based on JAX-WS ProviderSupports SQL Queries for creating Web ServicesSupports Table CRUD OperationsThis is present as a new option in the New Gallery under 'Web Services'When you invoke the New Gallery option, it present you with three options to choose from:In this entry I will explain the options of creating service based on SQL queries and Table CRUD operations.SQL Query based Service When you select this option, on 'Next' page it asks you for the DB Conn details. You can also choose if you want SOAP 1.1 or 1.2 format. For this example, I will proceed with SOAP 1.1, the default option.On the Next page, you can give the SQL query. The wizard support Bind Variables, so you can parametrize your queries. Give "?" as a input parameter you want to give at runtime, and the "Bind Variables" button will get enabled. Here you can specify the name and type of the variable.Finish the wizard. Now you can test your service in Analyzer:See that the bind variable specified comes as a input parameter in the Analyzer Input Form:CRUD OperationsFor this, At Step 2 of Wizard, select the radio button "Generate Table CRUD Service Provider"At the next step, select the DB Connection and the table for which you want to generate the default set of operations:Finish the Wizard. Now, run the service in Analyzer for a quick check.See that all the basic operations are exposed:

Read the article

Microsoft Codename Houston

- by kaleidoscope

On one of the final talks about SQL Azure in Day 3 of PDC09, David Robinson, Senior PM on the Azure team, announced a project codenamed ‘Houston’ which is basically a Silverlight equivalent of SQL Server Management Studio. The concept comes from the SQL Azure being within the cloud, and if the only way to interact with it is by installing SSMS locally then it does not feel like a consistent story. From the limited preview, it only contains the basics but it clearly lets you create tables, stored procedures and views, edit them, even add data to tables in a grid view reminiscent of Microsoft Access. The UI was based around the standard ribbon bar, object window on the left and working pane on the right. As of now this tool is still pre-alpha and it seems like a basic tool that will facilitate rapid database development on cloud. When asked about general availability, no dates were given but calendar 2010 was indicated as the target. More information can be found at: http://sqlfascination.com/2009/11/20/pdc-09-day-3-sql-azure-and-codename-houston-announcement/ Tinu, O

Read the article

Nesting Linq-to-Objects query within Linq-to-Entities query –what is happening under the covers?

- by carewithl

var numbers = new int[] { 1, 2, 3, 4, 5 }; var contacts = from c in context.Contacts where c.ContactID == numbers.Max() | c.ContactID == numbers.FirstOrDefault() select c; foreach (var item in contacts) Console.WriteLine(item.ContactID); Linq-to-Entities query is first translated into Linq expression tree, which is then converted by Object Services into command tree. And if Linq-to-Entities query nests Linq-to-Objects query, then this nested query also gets translated into an expression tree. a) I assume none of the operators of the nested Linq-to-Objects query actually get executed, but instead data provider for particular DB (or perhaps Object Services) knows how to transform the logic of Linq-to-Objects operators into appropriate SQL statements? b) Data provider knows how to create equivalent SQL statements only for some of the Linq-to-Objects operators? c) Similarly, data provider knows how to create equivalent SQL statements only for some of the non-Linq methods in the Net Framework class library? EDIT: I know only some Sql so I can't be completely sure, but reading Sql query generated for the above code it seems data provider didn't actually execute numbers.Max method, but instead just somehow figured out that numbers.Max should return the maximum value and then proceed to include in generated Sql query a call to TSQL's build-in MAX function. It also put all the values held by numbers array into a Sql query. SELECT CASE WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN '0X0X' ELSE '0X1X' END AS [C1], [Extent1].[ContactID] AS [ContactID], [Extent1].[FirstName] AS [FirstName], [Extent1].[LastName] AS [LastName], [Extent1].[Title] AS [Title], [Extent1].[AddDate] AS [AddDate], [Extent1].[ModifiedDate] AS [ModifiedDate], [Extent1].[RowVersion] AS [RowVersion], CASE WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN [Project1].[CustomerTypeID] END AS [C2], CASE WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN [Project1].[InitialDate] END AS [C3], CASE WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN [Project1].[PrimaryDesintation] END AS [C4], CASE WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN [Project1].[SecondaryDestination] END AS [C5], CASE WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN [Project1].[PrimaryActivity] END AS [C6], CASE WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN [Project1].[SecondaryActivity] END AS [C7], CASE WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN [Project1].[Notes] END AS [C8], CASE WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN [Project1].[RowVersion] END AS [C9], CASE WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN [Project1].[BirthDate] END AS [C10], CASE WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN [Project1].[HeightInches] END AS [C11], CASE WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN [Project1].[WeightPounds] END AS [C12], CASE WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN [Project1].[DietaryRestrictions] END AS [C13] FROM [dbo].[Contact] AS [Extent1] LEFT OUTER JOIN (SELECT [Extent2].[ContactID] AS [ContactID], [Extent2].[BirthDate] AS [BirthDate], [Extent2].[HeightInches] AS [HeightInches], [Extent2].[WeightPounds] AS [WeightPounds], [Extent2].[DietaryRestrictions] AS [DietaryRestrictions], [Extent3].[CustomerTypeID] AS [CustomerTypeID], [Extent3].[InitialDate] AS [InitialDate], [Extent3].[PrimaryDesintation] AS [PrimaryDesintation], [Extent3].[SecondaryDestination] AS [SecondaryDestination], [Extent3].[PrimaryActivity] AS [PrimaryActivity], [Extent3].[SecondaryActivity] AS [SecondaryActivity], [Extent3].[Notes] AS [Notes], [Extent3].[RowVersion] AS [RowVersion], cast(1 as bit) AS [C1] FROM [dbo].[ContactPersonalInfo] AS [Extent2] INNER JOIN [dbo].[Customers] AS [Extent3] ON [Extent2].[ContactID] = [Extent3].[ContactID]) AS [Project1] ON [Extent1].[ContactID] = [Project1].[ContactID] LEFT OUTER JOIN (SELECT TOP (1) [c].[C1] AS [C1] FROM (SELECT [UnionAll3].[C1] AS [C1] FROM (SELECT [UnionAll2].[C1] AS [C1] FROM (SELECT [UnionAll1].[C1] AS [C1] FROM (SELECT 1 AS [C1] FROM (SELECT 1 AS X) AS [SingleRowTable1] UNION ALL SELECT 2 AS [C1] FROM (SELECT 1 AS X) AS [SingleRowTable2]) AS [UnionAll1] UNION ALL SELECT 3 AS [C1] FROM (SELECT 1 AS X) AS [SingleRowTable3]) AS [UnionAll2] UNION ALL SELECT 4 AS [C1] FROM (SELECT 1 AS X) AS [SingleRowTable4]) AS [UnionAll3] UNION ALL SELECT 5 AS [C1] FROM (SELECT 1 AS X) AS [SingleRowTable5]) AS [c]) AS [Limit1] ON 1 = 1 LEFT OUTER JOIN (SELECT TOP (1) [c].[C1] AS [C1] FROM (SELECT [UnionAll7].[C1] AS [C1] FROM (SELECT [UnionAll6].[C1] AS [C1] FROM (SELECT [UnionAll5].[C1] AS [C1] FROM (SELECT 1 AS [C1] FROM (SELECT 1 AS X) AS [SingleRowTable6] UNION ALL SELECT 2 AS [C1] FROM (SELECT 1 AS X) AS [SingleRowTable7]) AS [UnionAll5] UNION ALL SELECT 3 AS [C1] FROM (SELECT 1 AS X) AS [SingleRowTable8]) AS [UnionAll6] UNION ALL SELECT 4 AS [C1] FROM (SELECT 1 AS X) AS [SingleRowTable9]) AS [UnionAll7] UNION ALL SELECT 5 AS [C1] FROM (SELECT 1 AS X) AS [SingleRowTable10]) AS [c]) AS [Limit2] ON 1 = 1 CROSS JOIN (SELECT MAX([UnionAll12].[C1]) AS [A1] FROM (SELECT [UnionAll11].[C1] AS [C1] FROM (SELECT [UnionAll10].[C1] AS [C1] FROM (SELECT [UnionAll9].[C1] AS [C1] FROM (SELECT 1 AS [C1] FROM (SELECT 1 AS X) AS [SingleRowTable11] UNION ALL SELECT 2 AS [C1] FROM (SELECT 1 AS X) AS [SingleRowTable12]) AS [UnionAll9] UNION ALL SELECT 3 AS [C1] FROM (SELECT 1 AS X) AS [SingleRowTable13]) AS [UnionAll10] UNION ALL SELECT 4 AS [C1] FROM (SELECT 1 AS X) AS [SingleRowTable14]) AS [UnionAll11] UNION ALL SELECT 5 AS [C1] FROM (SELECT 1 AS X) AS [SingleRowTable15]) AS [UnionAll12]) AS [GroupBy1] WHERE [Extent1].[ContactID] IN ([GroupBy1].[A1], (CASE WHEN ([Limit1].[C1] IS NULL) THEN 0 ELSE [Limit2].[C1] END)) Based on this, is it possible that Linq2Entities provider indeed doesn't execute non-Linq and Linq-to-Object methods, but instead creates equivalent SQL statements for some of them ( and for others it throws an exception )? Thank you in advance

Read the article

Eager Loading more than 1 table in LinqtoSql

- by Michael Freidgeim

When I've tried in Linq2Sql to load table with 2 child tables, I've noticed, that multiple SQLs are generated. I've found that it isa known issue, if you try to specify more than one to pre-load it just picks which one to pre-load and which others to leave deferred (simply ignoring those LoadWith hints)There are more explanations in http://codebetter.com/blogs/david.hayden/archive/2007/08/06/linq-to-sql-query-tuning-appears-to-break-down-in-more-advanced-scenarios.aspxThe reason the relationship in your blog post above is generating multiple queries is that you have two (1:n) relationship (Customers->Orders) and (Orders->OrderDetails). If you just had one (1:n) relationship (Customer->Orders) or (Orders->OrderDetails) LINQ to SQL would optimize and grab it in one query (using a JOIN). The alternative -to use SQL and POCO classes-see http://stackoverflow.com/questions/238504/linq-to-sql-loading-child-entities-without-using-dataloadoptions?rq=1Fortunately the problem is not applicable to Entity Framework, that we want to use in future development instead of Linq2SqlProduct firstProduct = db.Product.Include("OrderDetail").Include("Supplier").First(); ?

Read the article

VB.NET Class Library: DBIO dBase

A plug-in database I/O layer for dBase databases with automatic SQL statements

Read the article

My First Post @ geekswithblogs

- by sathya

Dear Friends, Here is my first post on geekswithblogs. I am happy that I have got a separate space here to blog. I am an MCTS certified Professional in .Net 2.0 Web applications, working as a Senior Software Engineer. Willing to share my knowledge on all topics whatever I know. I am also an active presenter / speaker in Microsoft Developer User Group HyderabadTechies. And I have presented many online sessions there. I keep myself updated on the latest technologies in Microsoft. You can see my posts here on the following subjects : C# ASP.NET SQL Server SQL Server Integration Services (SSIS) SQL Server Analysis Services (SSAS) SQL Server Reporting Services (SSRS) I have a personal blog too where I share my knowledge. Pls take a note of it. http://cybersathya.blogspot.com You can see me here often posting the updates on technologies and the technical challenges that I faced and the solutions for the same. Stay Tuned !!! Regards Sathya Narayanan Srinivasan

Read the article

Could not retrieve backup settings for primary ID in Log shipping

- by user1723139

I am doing log shipping between two Amazon EC2 instances running Windows Server 2008 R2 with SQL Server 2008 R2 standard edition. Both the instances are in the same domain and I can access the shared folders between the instances. The SQL server service account, agent service account are all running under a domain account. When I activate log shipping (with stand by mode restore in secondary server), the initial backup gets restored on the secondary. After that the backup operation is getting failed and i get the following error message: *** Error: Could not retrieve backup settings for primary ID 'xxxxxx-xxxx-xxxx-xxxx-4d772cd7337e'.(Microsoft.SqlServer.Management.LogShipping) *** *** Error: Failed to connect to server IP-0A7653F2.(Microsoft.SqlServer.ConnectionInfo) *** ****** Error: A network-related or instance-specific error occurred while establishing a connection to SQL Server.******** **The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)(.Net SqlClient Data Provider) *** **----- END OF TRANSACTION LOG BACKUP -----**** Any ideas?

Read the article

ETPM Environment Health Monitoring Tools

- by Paula Speranza-Hadley

This post is to provide some useful information about the tools typically used by Oracle ETPM implementations for performance tuning and analysis. This includes tools to monitor and gather performance information and statistics on the Database, Application Server, and Client (browser). Enterprise Monitoring Tools Oracle Enterprise Manager - OEM Grid Control comes with a comprehensive set of performance and health metrics that allow monitoring of key components in your environment such as applications, application servers, databases, as well as the back-end components on which they rely, such as hosts, operating systems and storage. Tools for the Database Oracle Diagnostics Pack Automatic Workload Repository (AWR) - this tool gets statistics from memory abut the Time Model or DB Time, Wait Events, Active Session History and High Load SWL queries Automatic Database Diagnostic Monitor (ADDM) - This self-diagnostic software is built into the database. It examines and analyzes data captured in AWR to dertermine possible performance issues. It locates the root cause of the issue, provides recommendations for correcting the issues and qualifies the expected benefit. Oracle Database Tuning Pack SQL Tuning Advisor - This enables you to submit one or more SQL statements as input and receive output in the form of specific advice or recommendations on how to tune statements. The recommendation relates to collection of statistics on objects, creation on new indexes and restructuring of SQL statements. SQL Access Advisor - This enables you to optimize data access paths of SQL queries by recommending a proper set of materialized views, indexes and partitions for a given SQL workload. Tools for the Application Server Weblogic Console - is a web-based, user interface used to configure and control a set of WebLogic servers or clusters (i.e. a "domain"). In any logical group of WebLogic servers there must exist one admin server, which hosts the WebLogic Admin console application and manages the associated configuratoin files. WebLogic Administrators will use the Administration Console for a number of tasks, including: Starting and stopping WebLogic servers or entire clusters. Configuring server parameters, security, database connections and deployed applications. Viewing server status, health and metrics. Yourkit for Profiling - helps analyze synchronization issues, including: Which threads were calling wait(), and for how long Which threads were blocked on attempt to acquire a monitor held by another thread (synchronized methods/blocks), and for how long Tools for the Client Fiddler - allows you to inspect traffic logs, debug and set breakpoints. Firebug – allows you to inspect and edit HTML, monitor network activity and debug JavaScript

Read the article

OBJECT_Name parameters and dbid

- by steveh99999

If you've been using SQL Server for a long time, you may have been used to using the OBJECT_NAME system function in the past - especially useful when converting table IDs into table names when querying sysobjects and sysindexes..... However, if you're an old-school DBA - did you know since SQL 2005 service pack 2 it accepts a second parameter ? database_id.. For example, this can be used to summarize some useful information from sys.dm_exec_query_stats. When reviewing SQL Server performance - it can be useful to look at the most heavily used stored procedures rather than inefficient less frequently used procedures. Here's a query to summarize performance data on the most-heavily used stored procedures across all databases on a server :-SELECT TOP 20 DENSE_RANK() OVER (ORDER BY SUM(execution_count) DESC) AS rank, OBJECT_NAME(qt.objectid, qt.dbid) AS 'proc name', (CASE WHEN qt.dbid = 32767 THEN 'mssqlresource' ELSE DB_NAME(qt.dbid) END ) AS 'Database', OBJECT_SCHEMA_NAME(qt.objectid,qt.dbid) AS 'schema', SUM(execution_count) AS 'TotalExecutions',SUM(total_worker_time) AS 'TotalCPUTimeMS', SUM(total_elapsed_time) AS 'TotalRunTimeMS', SUM(total_logical_reads) AS 'TotalLogicalReads',SUM(total_logical_writes) AS 'TotalLogicalWrites', MIN(creation_time) AS 'earliestPlan', MAX(last_execution_time) AS 'lastExecutionTime' FROM sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt WHERE OBJECT_NAME(qt.objectid, qt.dbid) IS NOT NULL GROUP BY OBJECT_NAME(qt.objectid, qt.dbid),qt.dbid,OBJECT_SCHEMA_NAME(qt.objectid,qt.dbid)

Read the article

Windows Azure Recipe: Big Data

- by Clint Edmonson

As the name implies, what we’re talking about here is the explosion of electronic data that comes from huge volumes of transactions, devices, and sensors being captured by businesses today. This data often comes in unstructured formats and/or too fast for us to effectively process in real time. Collectively, we call these the 4 big data V’s: Volume, Velocity, Variety, and Variability. These qualities make this type of data best managed by NoSQL systems like Hadoop, rather than by conventional Relational Database Management System (RDBMS). We know that there are patterns hidden inside this data that might provide competitive insight into market trends. The key is knowing when and how to leverage these “No SQL” tools combined with traditional business such as SQL-based relational databases and warehouses and other business intelligence tools. Drivers Petabyte scale data collection and storage Business intelligence and insight Solution The sketch below shows one of many big data solutions using Hadoop’s unique highly scalable storage and parallel processing capabilities combined with Microsoft Office’s Business Intelligence Components to access the data in the cluster. Ingredients Hadoop – this big data industry heavyweight provides both large scale data storage infrastructure and a highly parallelized map-reduce processing engine to crunch through the data efficiently. Here are the key pieces of the environment: Pig - a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. Mahout - a machine learning library with algorithms for clustering, classification and batch based collaborative filtering that are implemented on top of Apache Hadoop using the map/reduce paradigm. Hive - data warehouse software built on top of Apache Hadoop that facilitates querying and managing large datasets residing in distributed storage. Directly accessible to Microsoft Office and other consumers via add-ins and the Hive ODBC data driver. Pegasus - a Peta-scale graph mining system that runs in parallel, distributed manner on top of Hadoop and that provides algorithms for important graph mining tasks such as Degree, PageRank, Random Walk with Restart (RWR), Radius, and Connected Components. Sqoop - a tool designed for efficiently transferring bulk data between Apache Hadoop and structured data stores such as relational databases. Flume - a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large log data amounts to HDFS. Database – directly accessible to Hadoop via the Sqoop based Microsoft SQL Server Connector for Apache Hadoop, data can be efficiently transferred to traditional relational data stores for replication, reporting, or other needs. Reporting – provides easily consumable reporting when combined with a database being fed from the Hadoop environment. Training These links point to online Windows Azure training labs where you can learn more about the individual ingredients described above. Hadoop Learning Resources (20+ tutorials and labs) Huge collection of resources for learning about all aspects of Apache Hadoop-based development on Windows Azure and the Hadoop and Windows Azure Ecosystems SQL Azure (7 labs) Microsoft SQL Azure delivers on the Microsoft Data Platform vision of extending the SQL Server capabilities to the cloud as web-based services, enabling you to store structured, semi-structured, and unstructured data. See my Windows Azure Resource Guide for more guidance on how to get started, including links web portals, training kits, samples, and blogs related to Windows Azure.

Read the article

Roadmap for Thinktecture IdentityServer

- by Your DisplayName here!

I got asked today if I could publish a roadmap for thinktecture IdentityServer (idrsv in short). Well – I got a lot of feedback after B1 and one of the biggest points here was the data access layer. So I made two changes: I moved to configuration database access code to EF 4.1 code first. That makes it much easier to change the underlying database. So it is now just a matter of changing the connection string to use real SQL Server instead of SQL Compact. Important when you plan to do scale out. I included the ASP.NET Universal Providers in the download. This adds official support for SQL Azure, SQL Server and SQL Compact for the membership, roles and profile features. Unfortunately the Universal Provider use a different schema than the original ASP.NET providers (that sucks btw!) – so I made them optional. If you want to use them go to web.config and uncomment the new provider. Then there are some other small changes: The relying party registration entries now have added fields to add extra data that you want to couple with the RP. One use case could be to give the UI a hint how the login experience should look like per RP. This allows to have a different look and feel for different relying parties. I also included a small helper API that you can use to retrieve the RP record based on the incoming WS-Federation query string. WS-Federation single sign out is now conforming to the spec. I made certificate based endpoint identities for SSL endpoints optional. This caused some problems with configuration and versioning of existing clients. I hope I can release the RC in the next days. If there are no major issues, there will be RTM very soon!

Read the article

Logging connection strings

If you some of the dynamic features of SSIS such as package configurations or property expressions then sometimes trying to work out were your connections are pointing can be a bit confusing. You will work out in the end but it can be useful to explicitly log this information so that when things go wrong you can just review the logs. You may wish to develop this idea further and encapsulate such logging into a custom task, but for now lets keep it simple and use the Script Task. The Script Task code below will raise an Information event showing the name and connection string for a connection. Imports System Imports Microsoft.SqlServer.Dts.Runtime Public Class ScriptMain Public Sub Main() Dim fireAgain As Boolean ' Get the connection string, we need to know the name of the connection Dim connectionName As String = "My OLE-DB Connection" Dim connectionString As String = Dts.Connections(connectionName).ConnectionString ' Format the message and log it via an information event Dim message As String = String.Format("Connection ""{0}"" has a connection string of ""{1}"".", _ connectionName, connectionString) Dts.Events.FireInformation(0, "Information", message, Nothing, 0, fireAgain) Dts.TaskResult = Dts.Results.Success End Sub End Class Building on that example it is probably more flexible to log all connections in a package as shown in the next example. Imports System Imports Microsoft.SqlServer.Dts.Runtime Public Class ScriptMain Public Sub Main() Dim fireAgain As Boolean ' Loop through all connections in the package For Each connection As ConnectionManager In Dts.Connections ' Get the connection string and log it via an information event Dim message As String = String.Format("Connection ""{0}"" has a connection string of ""{1}"".", _ connection.Name, connection.ConnectionString) Dts.Events.FireInformation(0, "Information", message, Nothing, 0, fireAgain) Next Dts.TaskResult = Dts.Results.Success End Sub End Class By using the Information event it makes it readily available in the designer, for example the Visual Studio Output window (Ctrl+Alt+O) or the package designer Execution Results tab, and also allows you to readily control the logging by choosing which events to log in the normal way. Now before somebody starts commenting that this is a security risk, I would like to highlight good practice for building connection managers. Firstly the Password property, or any other similar sensitive property is always defined as write-only, and secondly the connection string property only uses the public properties to assemble the connection string value when requested. In other words the connection string will never contain the password. I have seen a couple of cases where this is not true, but that was just bad development by third-parties, you won’t find anything like that in the box from Microsoft. Whilst writing this code it made me wish that there was a custom log entry that you could just turn on that did this for you, but alas connection managers do not even seem to support custom events. It did however remind me of a very useful event that is often overlooked and fits rather well alongside connection string logging, the Execute SQL Task’s custom ExecuteSQLExecutingQuery event. To quote the help reference Custom Messages for Logging - Provides information about the execution phases of the SQL statement. Log entries are written when the task acquires connection to the database, when the task starts to prepare the SQL statement, and after the execution of the SQL statement is completed. The log entry for the prepare phase includes the SQL statement that the task uses. It is the last part that is so useful, how often have you used an expression to derive a SQL statement and you want to log that to make sure the correct SQL is being returned? You need to turn it one, by default no custom log events are captured, but I’ll refer you to a walkthrough on setting up the logging for ExecuteSQLExecutingQuery by Jamie.

Read the article

High Jinks, Hi Jacks, Exceptional DBA Awards and PASS

- by Rodney

The countdown to PASS has counted down. The day after tomorrow I will board a plane, like many others, on my way for the 4th year in a row to SQL PASS Summit. The anticipation has been excruciating but luckily I have this little thing called a day job as a DBA that has kept me busy and not thinking too much about the event. Well that is not exactly true since my beautiful wife works for PASS so we get to talk about SQL from the time we wake up until late in the evening. I would not have it any other way and I feel very fortunate to be a part of this great event and to have been chosen as the Exceptional DBA Award judge also for the 4th year in a row. This year, I will have been again tasked with presenting the award to the winner, Mr. Jeff Moden and it will be a true honor to meet him in person as I have read many of his articles on SSC and have attended his session at PASS previously. The speech is all ready but one item remains, which will be a surprise to all who attend the party on Tuesday night in Seattle (see links below). Let's face it, Exceptional DBAs everywhere work very hard protecting our data stores, tuning queries, mentoring, saving money, installing clusters, etc and once in a while there is time to be exceptionally non-professional and have a bit of fun. Once incident that happened this year that falls under the High Jinks category was when my network admin asked if I could Telnet into a SQL instance and see if I could make the connection through the firewall that he had just configured. I was able to establish a connection on port 1433 and it occurred to me that it would be very interesting if I could actually run T-SQL queries via a Telnet session much like you might do with an SMTP server. With that thought, I proceeded to demonstrate this could be possible by convincing my senior DBA Shawn McGehee that I was able to do so. At first he did not believe me. It shook his world view. It was inconceivable. What I had done, behind the scenes, of course, was to copy and rename SQLCMD.exe to Telnet.exe and used it to connect and run a simple, "Select * from sys.databases" on the SQL instance. I think if it had been anyone other than Shawn I could have extended this ruse indefinitely but he caught on within 30 seconds. It was a fun thirty seconds though. On the High Jacks side of the house, which is really merged to be SQL HACKS, I finally, after several years of struggling with how to connect to an untrusted domain like in a DMZ with a windows account in SSMS, I stumbled upon a solution that does away with the requirement to use SQL Authentication. While "Runas" is a great command to use to run an application with a higher privileged account, I had not previously been able to figure out how to connect to the remote domain with SSMS and "Runsas". It never connected and caused a login failure every time for the remote windows domain account. Then I ran across an option for "Runas", "/netonly". This option postpones the login until a connection is made and only then passes the remote login you supply when you first launch SSMS with the "Runas" command. So a typical shortcut would look like: "C:\Windows\System32\runas.exe /netonly /user:remotedomain.com\rodlandrum "C:\Program Files\Microsoft SQL Server\100\Tools\Binn\VSShell\Common7\IDE\Ssms.exe" You will want to make sure the passwords are synced between the two domains, your local domain and the remote domain, otherwise you may have account lockout issues, but I have found in weeks of testing this is a stable solution. Now it is time to get ready to head for Seattle. Please, if you see me (@SQLBeat) or my wife (@Karlakay22) please run up and high five me (wait..High Jinks.High Jacks.High Fives.Need to change the title) or give me a big bear hug if you are strong enough to lift me off the ground. And if you do actually do that, I will think you are awesome and will not embarrass you by crying out for help or complaining of a broken back or sciatic nerve damage. And now the links to others who have all of the details. First, for the MVP Deep Dives 2, of which, like John, I was lucky enough to be able to participate in this year. http://www.simple-talk.com/community/blogs/johnm/archive/2011/09/29/103577.aspx And the details of the SSC party where the Exceptional DBA of 2011, Jeff Moden, will be awarded. http://www.simple-talk.com/community/blogs/rebecca_amos/archive/2011/10/05/103661.aspx Cheers! Rodney

Read the article

Attunity Oracle CDC Solution for SSIS - Beta

We in no way work for Attunity but we were asked to test drive a beta version of their Oracle CDC solution for SSIS. Everybody should know that moving more data than you need to takes too much time and uses resources that may better be employed doing something else. Change data Capture is a technology that is designed to help you identify only the data that has had something done to it and you can therefore move only what is needed. Microsoft have implemented this exact functionality into SQL server 2008 and I really like it there. Attunity though are doing it on Oracle. DISCLAIMER: This is a BETA release and some of the parts are a bit ugly/difficult to work with. The idea though is definitely right and the product once working does exactly what it says on the tin. They have always been helpful to me when I have had a problem with the product and if that continues then beta testing pain should be eased somewhat. In due course I am going to be doing some videos around me using the product. If you use Oracle and SSIS then give it a go. Here is their product description. Attunity is a Microsoft SQL Server technology partner and the creator of the Microsoft Connectors for Oracle and Teradata, currently available in SQL Server 2008 Enterprise Edition. Attunity released a beta version of the Attunity Oracle-CDC for SSIS, a product that integrates continually changing Oracle data into SSIS, efficiently and in real-time. Attunity designed the product and integrated it into SSIS to create the simple creation of change data capture (CDC) solutions, accelerate implementation time, and reduce resources and costs. They also utilize log-based CDC so the solution has minimal impact on the Oracle source system. You can use the product to implement enterprise-class data replication, synchronization, and real-time business intelligence (BI) and data warehousing projects, quickly and efficiently, leveraging their existing SQL Server investments and resource skills. Attunity architected the product specifically for the Microsoft SSIS developer community and the product is available for both SQL Server 2005 and SQL Server 2008. It offers the following key capabilities: · Log-based, non-intrusive Oracle CDC · Full integration into SSIS and the Business Intelligence Developer Studio · Automatic generation of SSIS packages for CDC as well as full-loads of Oracle data · Filtering of Oracle tables and columns at the source · Monitoring and control of CDC processing Click to learn more and download the beta.

Read the article

Instead of alter table column to turn IDENTITY on and off, turn IDENTITY_INSERT on and off

- by Kevin Shyr

First of all, I don't know which version of SQL this post (http://www.techonthenet.com/sql/tables/alter_table.php) is based on, but at least for Microsoft SQL Server 2008, the syntax is not: ALTER TABLE [table_name] MODIFY [column_name] [data_type] NOT NULL; Instead, it should be: ALTER TABLE [table_name] ALTER COLUMN [column_name] [data_type] NOT NULL; Then, as several posts point out, you can't use T-SQL to run an existing column into an IDENTITY column. Instead, use the IDENTITY_INSERT to copy data from other tables. http://msdn.microsoft.com/en-us/library/ms188059.aspx SET IDENTITY_INSERT [table_name] ON INSERT .... SET IDENTITY_INSERT [table_name] OFF http://www.sqlservercentral.com/Forums/Topic126147-8-1.aspx http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=65257

Read the article

Indexing data from multiple tables with Oracle Text

- by Roger Ford

It's well known that Oracle Text indexes perform best when all the data to be indexed is combined into a single index. The query select * from mytable where contains (title, 'dog') 0 or contains (body, 'cat') 0 will tend to perform much worse than select * from mytable where contains (text, 'dog WITHIN title OR cat WITHIN body') 0 For this reason, Oracle Text provides the MULTI_COLUMN_DATASTORE which will combine data from multiple columns into a single index. Effectively, it constructs a "virtual document" at indexing time, which might look something like: <title>the big dog</title> <body>the ginger cat smiles</body> This virtual document can be indexed using either AUTO_SECTION_GROUP, or by explicitly defining sections for title and body, allowing the query as expressed above. Note that we've used a column called "text" - this might have been a dummy column added to the table simply to allow us to create an index on it - or we could created the index on either of the "real" columns - title or body. It should be noted that MULTI_COLUMN_DATASTORE doesn't automatically handle updates to columns used by it - if you create the index on the column text, but specify that columns title and body are to be indexed, you will need to arrange triggers such that the text column is updated whenever title or body are altered. That works fine for single tables. But what if we actually want to combine data from multiple tables? In that case there are two approaches which work well: Create a real table which contains a summary of the information, and create the index on that using the MULTI_COLUMN_DATASTORE. This is simple, and effective, but it does use a lot of disk space as the information to be indexed has to be duplicated. Create our own "virtual" documents using the USER_DATASTORE. The user datastore allows us to specify a PL/SQL procedure which will be used to fetch the data to be indexed, returned in a CLOB, or occasionally in a BLOB or VARCHAR2. This PL/SQL procedure is called once for each row in the table to be indexed, and is passed the ROWID value of the current row being indexed. The actual contents of the procedure is entirely up to the owner, but it is normal to fetch data from one or more columns from database tables. In both cases, we still need to take care of updates - making sure that we have all the triggers necessary to update the indexed column (and, in case 1, the summary table) whenever any of the data to be indexed gets changed. I've written full examples of both these techniques, as SQL scripts to be run in the SQL*Plus tool. You will need to run them as a user who has CTXAPP role and CREATE DIRECTORY privilege. Part of the data to be indexed is a Microsoft Word file called "1.doc". You should create this file in Word, preferably containing the single line of text: "test document". This file can be saved anywhere, but the SQL scripts need to be changed so that the "create or replace directory" command refers to the right location. In the example, I've used C:\doc. multi_table_indexing_1.sql : creates a summary table containing all the data, and uses multi_column_datastore Download link / View in browser multi_table_indexing_2.sql : creates "virtual" documents using a procedure as a user_datastore Download link / View in browser

Read the article

RoundhousE now supports Oracle, SQL2000

- by Robz / Fervent Coder

RoundhousE, the database migration software that is based on sql scripts has added support for Oracle and SQL 2000. There have also been numerous other little things, including better logging and a script run errors table. The script errors table captures what went wrong when/if your scripts are not quite up to par or there is some other issue. A special thanks goes out to http://twitter.com/PascalMestdach and http://twitter.com/jochenjonc. They worked hard on this and all I did was provide guidance and help bring it back to the trunk. This is what an entry in the database looks like: This is a preview of new log: ================================================== Versioning ================================================== Attempting to resolve version from C:\code\roundhouse\code_drop\sample\deployment\_BuildInfo.xml using //buildInfo/version. Found version 0.5.0.188 from C:\code\roundhouse\code_drop\sample\deployment\_BuildInfo.xml. Migrating TestRoundhousE from version 0 to 0.5.0.188. Versioning TestRoundhousE database with version 0.5.0.188 based on http://roundhouse.googlecode.com/svn. ================================================== Migration Scripts ================================================== Looking for Update scripts in "C:\code\roundhouse\code_drop\sample\deployment\..\db\TestRoundhousE\up". These should be one time only scripts. -------------------------------------------------- Running 0001_CreateTables.sql on (local) - TestRoundhousE. Running 0002_ChangeTable.sql on (local) - TestRoundhousE. Running 0003_TestBatchSplitter.sql on (local) - TestRoundhousE. -------------------------------------------------- But what are you waiting for? Head out and grab the latest release today!

Read the article

What are some of the benefits of a "Micro-ORM"?

- by Wayne M

I've been looking into the so-called "Micro ORMs" like Dapper and (to a lesser extent as it relies on .NET 4.0) Massive as these might be easier to implement at work than a full-blown ORM since our current system is highly reliant on stored procedures and would require significant refactoring to work with an ORM like NHibernate or EF. What is the benefit of using one of these over a full-featured ORM? It seems like just a thin layer around a database connection that still forces you to write raw SQL - perhaps I'm wrong but I was always told the reason for ORMs in the first place is so you didn't have to write SQL, it could be automatically generated; especially for multi-table joins and mapping relationships between tables which are a pain to do in pure SQL but trivial with an ORM. For instance, looking at an example of Dapper: var connection = new SqlConnection(); // setup here... var person = connection.Query<Person>("select * from people where PersonId = @personId", new { PersonId = 42 }); How is that any different than using a handrolled ADO.NET data layer, except that you don't have to write the command, set the parameters and I suppose map the entity back using a Builder. It looks like you could even use a stored procedure call as the SQL string. Are there other tangible benefits that I'm missing here where a Micro ORM makes sense to use? I'm not really seeing how it's saving anything over the "old" way of using ADO.NET except maybe a few lines of code - you still have to write to figure out what SQL you need to execute (which can get hairy) and you still have to map relationships between tables (the part that IMHO ORMs help the most with).

Read the article

Is DQS-in-the-cloud on its way?

- by jamiet

LinkedIn profiles are always a useful place to find out what's really going on in Microsoft. Today I stumbled upon this little nugget from former SSIS product team member Matt Carroll: March 2012 – December 2012 (10 months)Redmond, WA Took ownership of the SQL 2012 Data Quality Services box product and re-architected and extended it to become a cloud service. Led team and managed product to add dynamic scale, security, multi-tenancy, deployment, logging, monitoring, and telemetry as well as creating new Excel add-in and new ecosystem experience around easily sharing and finding cleansing agents. Personally designed, coded, and unit tested in-memory trigram matching algorithm core to better performance, scale and maintainability. Delivered and supported successful private preview of the new service prior to SQL wide reorganization. http://www.linkedin.com/profile/view?id=9657184 Sounds as though a Data-Quality-Services-in-the-cloud (which I spoke of as being a useful addition to Microsoft's BI portfolio in my previous blog post Thoughts on Power BI for Office 365 ) might be on its way some time in the future. And what's this SQL wide reorganization? Interesting stuff. @Jamiet

Read the article

Virtual NIC on VM couldn't ping externally after Vmotion

- by ToreTrygg

Today I vmotioned 5 MS SQL 2005 servers over to a new DRS Cluster. All SQL servers use the "Production_LAN" network and a single virtual NIC of type "VMXNET 3". The first 4 SQL VM (Windows 2003 Standard or Enterprise x32 bit) vmotioned over without a hitch. The last SQL VM I vmotioned (Windows 2003 Standard x64 bit R2) vmotioned over without error, but I upon completion, I could no longer ping the VM. I went into the VM and could not even ping the gateway, however I could ping the loopback. This SQL server is extremely busy in comparison with the previous 4 VM's. I restarted the server and it came back up with the virtual nic working just fine. The build of both servers (vmotioner and vmotionee) is ESX 4.0.0 175625 - So, pre-update 1. Should I suspect the network switch/VM for possibly not updating the mac table on the switch? Anybody else ever have this issue or know what may have caused it? Thank you!

Read the article

Math with Timestamp

- by Knut Vatsendvik

table.sql { border-width: 1px; border-spacing: 2px; border-style: dashed; border-color: #0023ff; border-collapse: separate; background-color: white; } table.sql th { border-width: 1px; padding: 1px; border-style: none; border-color: gray; background-color: white; -moz-border-radius: 0px 0px 0px 0px; } table.sql td { border-width: 1px; padding: 3px; border-style: none; border-color: gray; background-color: white; -moz-border-radius: 0px 0px 0px 0px; } .sql-keyword { color: #0000cd; background-color: inherit; } .sql-result { color: #458b74; background-color: inherit; } Got this little SQL quiz from a colleague. How to add or subtract exactly 1 second from a Timestamp? Sounded simple enough at first blink, but was a bit trickier than expected. If the data type had been a Date, we knew that we could add or subtract days, minutes or seconds using + or – sysdate + 1 to add one day sysdate - (1 / 24) to subtract one hour sysdate + (1 / 86400) to add one second Would the same arithmetic work with Timestamp as with Date? Let’s test it out with the following query SELECT systimestamp , systimestamp + (1 / 86400) FROM dual; ---------- 03.05.2010 22.11.50,240887 +02:00 03.05.2010 The first result line shows us the system time down to fractions of seconds. The second result line shows the result as Date (as used for date calculation) meaning now that the granularity is reduced down to a second. By using the PL/SQL dump() function, we can confirm this with the following query SELECT dump(systimestamp) , dump(systimestamp + (1 / 86400)) FROM dual; ---------- Typ=188 Len=20: 218,7,5,4,8,53,9,0,200,46,89,20,2,0,5,0,0,0,0,0 Typ=13 Len=8: 218,7,5,4,10,53,10,0 Where typ=13 is a runtime representation for Date. So how can we increase the precision to include fractions of second? After investigating it a bit, we found out that the interval data type INTERVAL DAY TO SECOND could be used with the result of addition or subtraction being a Timestamp. Let’s try again our first query again, now using the interval data type. SELECT systimestamp, systimestamp + INTERVAL '0 00:00:01.0' DAY TO SECOND(1) FROM dual; ---------- 03.05.2010 22.58.32,723659000 +02:00 03.05.2010 22.58.33,723659000 +02:00 Yes, it worked! To finish the story, here is one example showing how to specify an interval of 2 days, 6 hours, 30 minutes, 4 seconds and 111 thousands of a second. INTERVAL ‘2 6:30:4.111’ DAY TO SECOND(3)

Read the article

Microsoft and innovation: IIF() method

This Saturday I was watching a couple of eLearning videos from TrainSignal (thanks to the subscription I have with Pluralsight) on Querying Microsoft SQL Server 2012 (exam 70-461). 'Innovation' by Microsoft I kept myself busy learning 'new' things about Microsoft SQL Server 2012 and some best practices. It was incredible 'innovative' to see that there is an additional logic function called IIF() available now: Returns one of two values depending on the value of a logical expression. IIF(lExpression, eExpression1, eExpression2) Ups, my bad... That's actually taken from the syntax page of Visual FoxPro 9.0 SP 2. And tada, at least seven (7+) years later, there's the recent IIF() Transact-SQL version of that function: Returns one of two values, depending on whether the Boolean expression evaluates to true or false in SQL Server 2012. IIF ( boolean_expression, true_value, false_value ) Now, that's what I call innovation! But we all know what happened to Visual FoxPro... It has been reincarnated in form of Visual Studio LightSwitch (and SQL Server). Enough ranting... Happy coding!

Read the article

Solving Big Problems with Oracle R Enterprise, Part II

- by dbayard

Part II – Solving Big Problems with Oracle R Enterprise In the first post in this series (see https://blogs.oracle.com/R/entry/solving_big_problems_with_oracle), we showed how you can use R to perform historical rate of return calculations against investment data sourced from a spreadsheet. We demonstrated the calculations against sample data for a small set of accounts. While this worked fine, in the real-world the problem is much bigger because the amount of data is much bigger. So much bigger that our approach in the previous post won’t scale to meet the real-world needs. From our previous post, here are the challenges we need to conquer: The actual data that needs to be used lives in a database, not in a spreadsheet The actual data is much, much bigger- too big to fit into the normal R memory space and too big to want to move across the network The overall process needs to run fast- much faster than a single processor The actual data needs to be kept secured- another reason to not want to move it from the database and across the network And the process of calculating the IRR needs to be integrated together with other database ETL activities, so that IRR’s can be calculated as part of the data warehouse refresh processes In this post, we will show how we moved from sample data environment to working with full-scale data. This post is based on actual work we did for a financial services customer during a recent proof-of-concept. Getting started with the Database At this point, we have some sample data and our IRR function. We were at a similar point in our customer proof-of-concept exercise- we had sample data but we did not have the full customer data yet. So our database was empty. But, this was easily rectified by leveraging the transparency features of Oracle R Enterprise (see https://blogs.oracle.com/R/entry/analyzing_big_data_using_the). The following code shows how we took our sample data SimpleMWRRData and easily turned it into a new Oracle database table called IRR_DATA via ore.create(). The code also shows how we can access the database table IRR_DATA as if it was a normal R data.frame named IRR_DATA. If we go to sql*plus, we can also check out our new IRR_DATA table: At this point, we now have our sample data loaded in the database as a normal Oracle table called IRR_DATA. So, we now proceeded to test our R function working with database data. As our first test, we retrieved the data from a single account from the IRR_DATA table, pull it into local R memory, then call our IRR function. This worked. No SQL coding required! Going from Crawling to Walking Now that we have shown using our R code with database-resident data for a single account, we wanted to experiment with doing this for multiple accounts. In other words, we wanted to implement the split-apply-combine technique we discussed in our first post in this series. Fortunately, Oracle R Enterprise provides a very scalable way to do this with a function called ore.groupApply(). You can read more about ore.groupApply() here: https://blogs.oracle.com/R/entry/analyzing_big_data_using_the1 Here is an example of how we ask ORE to take our IRR_DATA table in the database, split it by the ACCOUNT column, apply a function that calls our SimpleMWRR() calculation, and then combine the results. (If you are following along at home, be sure to have installed our myIRR package on your database server via “R CMD INSTALL myIRR”). The interesting thing about ore.groupApply is that the calculation is not actually performed in my desktop R environment from which I am running. What actually happens is that ore.groupApply uses the Oracle database to perform the work. And the Oracle database is what actually splits the IRR_DATA table by ACCOUNT. Then the Oracle database takes the data for each account and sends it to an embedded R engine running on the database server to apply our R function. Then the Oracle database combines all the individual results from the calls to the R function. This is significant because now the embedded R engine only needs to deal with the data for a single account at a time. Regardless of whether we have 20 accounts or 1 million accounts or more, the R engine that performs the calculation does not care. Given that normal R has a finite amount of memory to hold data, the ore.groupApply approach overcomes the R memory scalability problem since we only need to fit the data from a single account in R memory (not all of the data for all of the accounts). Additionally, the IRR_DATA does not need to be sent from the database to my desktop R program. Even though I am invoking ore.groupApply from my desktop R program, because the actual SimpleMWRR calculation is run by the embedded R engine on the database server, the IRR_DATA does not need to leave the database server- this is both a performance benefit because network transmission of large amounts of data take time and a security benefit because it is harder to protect private data once you start shipping around your intranet. Another benefit, which we will discuss in a few paragraphs, is the ability to leverage Oracle database parallelism to run these calculations for dozens of accounts at once. From Walking to Running ore.groupApply is rather nice, but it still has the drawback that I run this from a desktop R instance. This is not ideal for integrating into typical operational processes like nightly data warehouse refreshes or monthly statement generation. But, this is not an issue for ORE. Oracle R Enterprise lets us run this from the database using regular SQL, which is easily integrated into standard operations. That is extremely exciting and the way we actually did these calculations in the customer proof. As part of Oracle R Enterprise, it provides a SQL equivalent to ore.groupApply which it refers to as “rqGroupEval”. To use rqGroupEval via SQL, there is a bit of simple setup needed. Basically, the Oracle Database needs to know the structure of the input table and the grouping column, which we are able to define using the database’s pipeline table function mechanisms. Here is the setup script: At this point, our initial setup of rqGroupEval is done for the IRR_DATA table. The next step is to define our R function to the database. We do that via a call to ORE’s rqScriptCreate. Now we can test it. The SQL you use to run rqGroupEval uses the Oracle database pipeline table function syntax. The first argument to irr_dataGroupEval is a cursor defining our input. You can add additional where clauses and subqueries to this cursor as appropriate. The second argument is any additional inputs to the R function. The third argument is the text of a dummy select statement. The dummy select statement is used by the database to identify the columns and datatypes to expect the R function to return. The fourth argument is the column of the input table to split/group by. The final argument is the name of the R function as you defined it when you called rqScriptCreate(). The Real-World Results In our real customer proof-of-concept, we had more sophisticated calculation requirements than shown in this simplified blog example. For instance, we had to perform the rate of return calculations for 5 separate time periods, so the R code was enhanced to do so. In addition, some accounts needed a time-weighted rate of return to be calculated, so we extended our approach and added an R function to do that. And finally, there were also a few more real-world data irregularities that we needed to account for, so we added logic to our R functions to deal with those exceptions. For the full-scale customer test, we loaded the customer data onto a Half-Rack Exadata X2-2 Database Machine. As our half-rack had 48 physical cores (and 96 threads if you consider hyperthreading), we wanted to take advantage of that CPU horsepower to speed up our calculations. To do so with ORE, it is as simple as leveraging the Oracle Database Parallel Query features. Let’s look at the SQL used in the customer proof: Notice that we use a parallel hint on the cursor that is the input to our rqGroupEval function. That is all we need to do to enable Oracle to use parallel R engines. Here are a few screenshots of what this SQL looked like in the Real-Time SQL Monitor when we ran this during the proof of concept (hint: you might need to right-click on these images to be able to view the images full-screen to see the entire image): From the above, you can notice a few things (numbers 1 thru 5 below correspond with highlighted numbers on the images above. You may need to right click on the above images and view the images full-screen to see the entire image): The SQL completed in 110 seconds (1.8minutes) We calculated rate of returns for 5 time periods for each of 911k accounts (the number of actual rows returned by the IRRSTAGEGROUPEVAL operation) We accessed 103m rows of detailed cash flow/market value data (the number of actual rows returned by the IRR_STAGE2 operation) We ran with 72 degrees of parallelism spread across 4 database servers Most of our 110seconds was spent in the “External Procedure call” event On average, we performed 8,200 executions of our R function per second (110s/911k accounts) On average, each execution was passed 110 rows of data (103m detail rows/911k accounts) On average, we did 41,000 single time period rate of return calculations per second (each of the 8,200 executions of our R function did rate of return calculations for 5 time periods) On average, we processed over 900,000 rows of database data in R per second (103m detail rows/110s) R + Oracle R Enterprise: Best of R + Best of Oracle Database This blog post series started by describing a real customer problem: how to perform a lot of calculations on a lot of data in a short period of time. While standard R proved to be a very good fit for writing the necessary calculations, the challenge of working with a lot of data in a short period of time remained. This blog post series showed how Oracle R Enterprise enables R to be used in conjunction with the Oracle Database to overcome the data volume and performance issues (as well as simplifying the operations and security issues). It also showed that we could calculate 5 time periods of rate of returns for almost a million individual accounts in less than 2 minutes. In a future post, we will take the same R function and show how Oracle R Connector for Hadoop can be used in the Hadoop world. In that next post, instead of having our data in an Oracle database, our data will live in Hadoop and we will how to use the Oracle R Connector for Hadoop and other Oracle Big Data Connectors to move data between Hadoop, R, and the Oracle Database easily.

Search Results

Search found 28900 results on 1156 pages for 'sql 2005'.

Page 732/1156 | < Previous Page | 728 729 730 731 732 733 734 735 736 737 738 739 | Next Page >

- by David Atkinson

- by Damian

- by Vishal Jain

- by kaleidoscope

- by carewithl

- by Michael Freidgeim

- by sathya

- by user1723139

- by Paula Speranza-Hadley

- by steveh99999

- by Clint Edmonson

- by Your DisplayName here!

- by Rodney

- by Kevin Shyr

- by Roger Ford

- by Robz / Fervent Coder

- by Wayne M

- by jamiet

- by ToreTrygg

- by Knut Vatsendvik

- by dbayard

< Previous Page | 728 729 730 731 732 733 734 735 736 737 738 739 | Next Page >