Search Results

Search found 74473 results on 2979 pages for 'data table'.

Page 7/2979 | < Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >

Why would you use data structures (ie Binary Trees, Linked Lists) in your jobs/side projects? [closed]

- by Chris2021

It seems to me that, for everyday use, more primitive data structures like arrays get the job done just as well as a binary tree would. My question is how common is to use these structures when writing code for projects at work or projects that you pursue in your free time? I understand the better insertion time/deletion time/sorting time for certain structures but would that really matter that much if you were working with a relatively small amount of data?

Read the article
Multi-statement Table Valued Function vs Inline Table Valued Function

- by AndyC

ie: CREATE FUNCTION MyNS.GetUnshippedOrders() RETURNS TABLE AS RETURN SELECT a.SaleId, a.CustomerID, b.Qty FROM Sales.Sales a INNER JOIN Sales.SaleDetail b ON a.SaleId = b.SaleId INNER JOIN Production.Product c ON b.ProductID = c.ProductID WHERE a.ShipDate IS NULL GO versus: CREATE FUNCTION MyNS.GetLastShipped(@CustomerID INT) RETURNS @CustomerOrder TABLE (SaleOrderID INT NOT NULL, CustomerID INT NOT NULL, OrderDate DATETIME NOT NULL, OrderQty INT NOT NULL) AS BEGIN DECLARE @MaxDate DATETIME SELECT @MaxDate = MAX(OrderDate) FROM Sales.SalesOrderHeader WHERE CustomerID = @CustomerID INSERT @CustomerOrder SELECT a.SalesOrderID, a.CustomerID, a.OrderDate, b.OrderQty FROM Sales.SalesOrderHeader a INNER JOIN Sales.SalesOrderHeader b ON a.SalesOrderID = b.SalesOrderID INNER JOIN Production.Product c ON b.ProductID = c.ProductID WHERE a.OrderDate = @MaxDate AND a.CustomerID = @CustomerID RETURN END GO Is there an advantage to using one over the other? Is there certain scenarios when one is better than the other or are the differences purely syntactical? I realise the 2 example queries are doing different things but is there a reason I would write them in that way? Reading about them and the advantages/differences haven't really been explained. Thanks

Read the article
Queued Loadtest to remove Concurrency issues using Shared Data Service in OpenScript

- by stefan.thieme(at)oracle.com

Queued Processing to remove Concurrency issues in Loadtest ScriptsSome scripts act on information returned by the server, e.g. act on first item in the returned list of pending tasks/actions. This may lead to concurrency issues if the virtual users simulated in a load test scenario are not synchronized in some way.As the load test cases should be carried out in a comparable and straight forward manner simply cancel a transaction in case a collision occurs is clearly not an option. In case you increase the number of virtual users this approach would lead to a high number of requests for the early steps in your transaction (e.g. login, retrieve list of action points, assign an action point to the virtual user) but later steps would be rarely visited successfully or at all, depending on the application logic.A way to tackle this problem is to enqueue the virtual users in a Shared Data Service queue. Only the first virtual user in this queue will be allowed to carry out the critical steps (retrieve list of action points, assign an action point to the virtual user) in your transaction at any one time.Once a virtual user has passed the critical path it will dequeue himself from the head of the queue and continue with his actions. This does theoretically allow virtual users to run in parallel all steps of the transaction which are not part of the critical path.In practice it has been seen this is rarely the case, though it does not allow adding more than N users to perform a transaction without causing delays due to virtual users waiting in the queue. N being the time of the total transaction divided by the sum of the time of all critical steps in this transaction.While this problem can be circumvented by allowing multiple queues to act on individual segments of the list of actions, e.g. per country filter, ends with 0..9 filter, etc.This would require additional handling of these additional queues of slots for the virtual users at the head of the queue in order to maintain the mutually exclusive access to the first element in the list returned by the server at any one time of the load test. Such an improved handling of multiple queues and/or multiple slots is above the subject of this paper.Shared Data Services Pre-RequisitesStart WebLogic Server to host Shared Data ServicesYou will have to make sure that your WebLogic server is installed and started. Shared Data Services may not work if you installed only the minimal installation package for OpenScript. If however you installed the default package including OLT and OTM, you may follow the instructions below to start and verify WebLogic installation.To start the WebLogic Server deployed underneath of Oracle Load Testing and/or Oracle Test Manager you can go to your Start menu, Oracle Application Testing Suite and select the Restart Oracle Application Testing Suite Application Service entry from the Tools submenu.To verify the service has been started you can run the Microsoft Management Console for Services by Selecting Run from the Start Menu and entering services.msc. Look for the entry that reads Oracle Application Testing Suite Application Service, once it has changed it status from Starting to Started you can proceed to verify the login. Please note that this may take several minutes, I would say up to 10 minutes depending on the strength of your CPU horse-power.Verify WebLogic Server user credentialsYou will have to make sure that your WebLogic Server is installed and started. Next open the Oracle WebLogic Server Adminstration Console on http://localhost:8088/console.It may take a while until the application is deployed and started. It may display the following until the Administration Console has been deployed on the fly.Afterwards you can login using the username oats and the password that you selected during install time for your Application Testing Suite administrative purposes.This will bring up the Home page of you WebLogic Server. You have actually verified that you are able to login with these credentials already. However if you want to check the details, navigate to Security Realms, myrealm, Users and Groups tab.Here you could add users to your WebLogic Server which could be used in the later steps. Details on the Groups required for such a custom user to work are exceeding this quick overview and have to be selected with the WebLogic Server Adminstration Guide in mind.Shared Data Services pre-requisites for Load testingOpenScript Preferences have to be set to enable Encryption and provide a default Shared Data Service Connection for Playback.These are pre-requisites you want to use for load testing with Shared Data Services.Please note that the usage of the Connection Parameters (individual directive in the script) for Shared Data Services did not playback reliably in the current version 9.20.0370 of Oracle Load Testing (OLT) and encryption of credentials still seemed to be mandatory as well.General Encryption settingsSelect OpenScript Preferences from the View menu and navigate to the General, Encryption entry in the tree on the left. Select the Encrypt script data option from the list and enter the same password that you used for securing your WebLogic Server Administration Console.Enable global shared data access credentialsSelect OpenScript Preferences from the View menu and navigate to the Playback, Shared Data entry in the tree on the left. Enable the global shared data access credentials and enter the Address, User name and Password determined for your WebLogic Server to host Shared Data Services.Please note, that you may want to replace the localhost in Address with the hosts realname in case you plan to run load tests with Loadtest Agents running on remote systems.Queued Processing of TransactionsEnable Shared Data Services Module in Script PropertiesThe Shared Data Services Module has to be enabled for each Script that wants to employ the Shared Data Service Queue functionality in OpenScript. It can be enabled under the Script menu selecting Script Properties. On the Script Properties Dialog select the Modules section and check Shared Data to enable Shared Data Service Module for your script. Checking the Shared Data Services option will effectively add a line to your script code that adds the sharedData ScriptService to your script class of IteratingVUserScript.@ScriptService oracle.oats.scripting.modules.sharedData.api.SharedDataService sharedData;Record your scriptRecord your script as usual and then add the following things for Queue handling in the Initialize code block, before the first step and after the last step of your critical path and in the Finalize code block.The java code to be added at individual locations is explained in the following sections in full detail.Create a Shared Data Queue in InitializeTo create a Shared Data Queue go to the Java view of your script and enter the following statements to the initialize() code block.info("Create queueA with life time of 120 minutes");sharedData.createQueue("queueA", 120);This will create an instantiation of the Shared Data Queue object named queueA which is maintained for upto 120 minutes.If you want to use the code for multiple scripts, make sure to use a different queue name for each one here and in the subsequent steps. You may even consider to use a dynamic queueName based on filters of your result list being concurrently accessed.Prepare a unique id for each IterationIn order to keep track of individual virtual users in our queue we need to create a unique identifier from the virtual user id and the used username right after retrieving the next record from our databank file.getDatabank("Usernames").getNextDatabankRecord();getVariables().set("usernameValue1","VU_{{@vuid}}_{{@iterationnum}}_{{db.Usernames.Username}}_{{@timestamp}}_{{@random(10000)}}");String usernameValue = getVariables().get("usernameValue1");info("Now running virtual user " + usernameValue);As you can see from the above code block, we have set the OpenScript variable usernameValue1 to VU_{{@vuid}}_{{@iterationnum}}_{{db.Usernames.Username}}_{{@timestamp}}_{{@random(10000)}} which is a concatenation of the virtual user id and the iterationnumber for general uniqueness; as well as the username from our databank, the timestamp and a random number for making it further unique and ease spotting of errors.Not all of these fields are actually required to make it really unique, but adding the queue name may also be considered to help troubleshoot multiple queues.The value is then retrieved with the getVariables.get() method call and assigned to the usernameValue String used throughout the script.Please note that moving the getDatabank("Usernames").getNextDatabankRecord(); call to the initialize block was later considered to remove concurrency of multiple virtual users running with the same userid and therefor accessing the same "My Inbox" in step 6. This will effectively give each virtual user a userid from the databank file. Make sure you have enough userids to remove this second hurdle.Enqueue and attend Queue before Critical PathTo maintain the right order of virtual users being allowed into the critical path of the transaction the following pseudo step has to be added in front of the first critical step. In the case of this example this is right in front of the step where we retrieve the list of actions from which we select the first to be assigned to us.beginStep("[0] Waiting in the Queue", 0);{info("Enqueued virtual user " + usernameValue + " at the end of queueA");sharedData.offerLast("queueA", usernameValue);info("Wait until the user is the first in queueA");String queueValue1 = null;do {// we wait for at least 0.7 seconds before we check the head of the// queue. This is the time it takes one user to move through the// critical path, i.e. pass steps [5] Enter country and [6] Assign// to meThread.sleep(700);queueValue1 = (String) sharedData.peekFirst("queueA");info("The first user in queueA is currently: '" + queueValue1 + "' " + queueValue1.getClass() + " length " + queueValue1.length() );info("The current user is '"+ usernameValue + "' " + usernameValue.getClass() + " length " + usernameValue.length() + ": indexOf " + usernameValue.indexOf(queueValue1) + " equals " + usernameValue.equals(queueValue1) );} while ( queueValue1.indexOf(usernameValue) < 0 );info("Now the user is the first in queueA");}endStep();This will enqueue the username to the tail of our Queue. It will will wait for at least 700 milliseconds, the time it takes for one user to exit the critical path and then compare the head of our queue with it's username. This last step will be repeated while the two are not equal (indexOf less than zero). If they are equal the indexOf will yield a value of zero or larger and we will perform the critical steps.Dequeue after Critical PathAfter the virtual user has left the critical path and complete its last step the following code block needs to dequeue the virtual user. In the case of our example this is right after the action has been actually assigned to the virtual user. This will allow the next virtual user to retrieve the list of actions still available and in turn let him make his selection/assignment.info("Get and remove the current user from the head of queueA");String pollValue1 = (String) sharedData.pollFirst("queueA");The current user is removed from the head of the queue. The next one will now be able to match his username against the head of the queue.Clear and Destroy Queue for FinishWhen the script has completed, it should clear and destroy the queue. This code block can be put in the finish block of your script and/or in a separate script in order to clear and remove the queue in case you have spotted an error or want to reset the queue for some reason.info("Clear queueA");sharedData.clearQueue("queueA");info("Destroy queueA");sharedData.destroyQueue("queueA");The users waiting in queueA are cleared and the queue is destroyed. If you have scripts still executing they will be caught in a loop.I found it better to maintain a separate Reset Queue script which contained only the following code in the initialize() block. I use to call this script to make sure the queue is cleared in between multiple Loadtest runs. This script could also even be added as the first in a larger scenario, which would execute it only once at very start of the Loadtest and make sure the queues do not contain any stale entries.info("Create queueA with life time of 120 minutes");sharedData.createQueue("queueA", 120);info("Clear queueA");sharedData.clearQueue("queueA");This will create a Shared Data Queue instance of queueA and clear all entries from this queue.Monitoring QueueWhile creating the scripts it was useful to monitor the contents, i.e. the current first user in the Queue. The following code block will make sure the Shared Data Queue is accessible in the initialize() block.info("Create queueA with life time of 120 minutes");sharedData.createQueue("queueA", 120);In the run() block the following code will continuously monitor the first element of the Queue and write an informational message with the current username Value to the Result window.info("Monitor the first users in queueA");String queueValue1 = null;do {queueValue1 = (String) sharedData.peekFirst("queueA");if (queueValue1 != null)info("The first user in queueA is currently: '" + queueValue1 + "' " + queueValue1.getClass() + " length " + queueValue1.length() );} while ( true );This script can be run from OpenScript parallel to a loadtest performed by the Oracle Load Test.However it is not recommend to run this in a production loadtest as the performance impact is unknown. Accessing the Queue's head with the peekFirst() method has been reported with about 2 seconds response time by both OpenScript and OTL. It is advised to log a Service Request to see if this could be lowered in future releases of Application Testing Suite, as the pollFirst() and even offerLast() writing to the tail of the Queue usually returned after an average 0.1 seconds.Debugging QueueWhile debugging the scripts the following was useful to remove single entries from its head, i.e. the current first user in the Queue. The following code block will make sure the Shared Data Queue is accessible in the initialize() block.info("Create queueA with life time of 120 minutes");sharedData.createQueue("queueA", 120);In the run() block the following code will remove the first element of the Queue and write an informational message with the current username Value to the Result window.info("Get and remove the current user from the head of queueA");String pollValue1 = (String) sharedData.pollFirst("queueA");info("The first user in queueA was currently: '" + pollValue1 + "' " + pollValue1.getClass() + " length " + pollValue1.length() );ReferencesOracle Functional Testing OpenScript User's Guide Version 9.20 [E15488-05]Chapter 17 Using the Shared Data Modulehttp://download.oracle.com/otn/nt/apptesting/oats-docs-9.21.0030.zipOracle Fusion Middleware Oracle WebLogic Server Administration Console Online Help 11g Release 1 (10.3.4) [E13952-04]Administration Console Online Help - Manage users and groupshttp://download.oracle.com/docs/cd/E17904_01/apirefs.1111/e13952/taskhelp/security/ManageUsersAndGroups.htm

Read the article
New version of SQL Server Data Tools is now available

- by jamiet

If you don’t follow the SQL Server Data Tools (SSDT) blog then you may not know that two days ago an updated version of SSDT was released (and by SSDT I mean the database projects, not the SSIS/SSRS/SSAS stuff) along with a new version of the SSDT Power Tools. This release incorporates a an updated version of the SQL Server Data Tier Application Framework (aka DAC Framework, aka DacFX) which you can read about on Adam Mahood’s blog post SQL Server Data-Tier Application Framework (September 2012) Available. DacFX is essentially all the gubbins that you need to extract and publish .dacpacs and according to Adam’s post it incorporates a new feature that I think is very interesting indeed: Extract DACPAC with data – Creates a database snapshot file (.dacpac) from a live SQL Server or Windows Azure SQL Database that contains data from user tables in addition to the database schema. These packages can be published to a new or existing SQL Server or Windows Azure SQL Database using the SqlPackage.exe Publish action. Data contained in package replaces the existing data in the target database. In short, .dacpacs can now include data as well as schema. I’m very excited about this because one of my long-standing complaints about SSDT (and its many forebears) is that whilst it has great support for declarative development of schema it does not provide anything similar for data – if you want to deploy data from your SSDT projects then you have to write Post-Deployment MERGE scripts. This new feature for .dacpacs does not change that situation yet however it is a very important pre-requisite so I am hoping that a feature to provide declaration of data (in addition to declaration of schema which we have today) is going to light up in SSDT in the not too distant future. Read more about the latest SSDT, Power Tools & DacFX releases at: Now available: SQL Server Data Tools - September 2012 update! by Janet Yeilding New SSDT Power Tools! Now for both Visual Studio 2010 and Visual Studio 2012 by Sarah McDevitt SQL Server Data-Tier Application Framework (September 2012) Available by Adam Mahood @Jamiet

Read the article
How Oracle Data Integration Customers Differentiate Their Business in Competitive Markets

- by Irem Radzik

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 With data being a central force in driving innovation and competing effectively, data integration has become a key IT approach to remove silos and ensure working with consistent and trusted data. Especially with the release of 12c version, Oracle Data Integrator and Oracle GoldenGate offer easy-to-use and high-performance solutions that help companies with their critical data initiatives, including big data analytics, moving to cloud architectures, modernizing and connecting transactional systems and more. In a recent press release we announced the great momentum and analyst recognition Oracle Data Integration products have achieved in the data integration and replication market. In this press release we described some of the key new features of Oracle Data Integrator 12c and Oracle GoldenGate 12c. In addition, a few from our 4500+ customers explained how Oracle’s data integration platform helped them achieve their business goals. In this blog post I would like to go over what these customers shared about their experience. Land O’Lakes is one of America’s premier member-owned cooperatives, and offers an extensive line of agricultural supplies, as well as production and business services. Rich Bellefeuille, manager, ETL & data warehouse for Land O’Lakes told us how GoldenGate helped them modernize their critical ERP system without impacting service and how they are moving to new projects with Oracle Data Integrator 12c: “With Oracle GoldenGate 11g, we've been able to migrate our enterprise-wide implementation of Oracle’s JD Edwards EnterpriseOne, ERP system, to a new database and application server platform with minimal downtime to our business. Using Oracle GoldenGate 11g we reduced database migration time from nearly 30 hours to less than 30 minutes. Given our quick success, we are considering expansion of our Oracle GoldenGate 12c footprint. We are also in the midst of deploying a solution leveraging Oracle Data Integrator 12c to manage our pricing data to handle orders more effectively and provide a better relationship with our clients. We feel we are gaining higher productivity and flexibility with Oracle's data integration products." ICON, a global provider of outsourced development services to the pharmaceutical, biotechnology and medical device industries, highlighted the competitive advantage that a solid data integration foundation brings. Diarmaid O’Reilly, enterprise data warehouse manager, ICON plc said “Oracle Data Integrator enables us to align clinical trials intelligence with the information needs of our sponsors. It helps differentiate ICON’s services in an increasingly competitive drug-development industry." You can find more info on ICON's implementation here. A popular use case for Oracle GoldenGate’s real-time data integration is offloading operational reporting from critical transaction processing systems. SolarWorld, one of the world’s largest solar-technology producers and the largest U.S. solar panel manufacturer, implemented Oracle GoldenGate for real-time data integration of manufacturing data for fast analysis. Russ Toyama, U.S. senior database administrator for SolarWorld told us real-time data helps their operations and GoldenGate’s solution supports high performance of their manufacturing systems: “We use Oracle GoldenGate for real-time data integration into our decision support system, which performs real-time analysis for manufacturing operations to continuously improve product quality, yield and efficiency. With reliable and low-impact data movement capabilities, Oracle GoldenGate also helps ensure that our critical manufacturing systems are stable and operate with high performance." You can watch the full interview with SolarWorld's Russ Toyama here. Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Starwood Hotels and Resorts is one of the many customers that found out how well Oracle Data Integration products work with Oracle Exadata. Gordon Light, senior director of information technology for StarWood Hotels, says they had notable performance gain in loading Oracle Exadata reporting environment: “We leverage Oracle GoldenGate to replicate data from our central reservations systems and other OLTP databases – significantly decreasing the overall ETL duration. Moving forward, we plan to use Oracle GoldenGate to help the company achieve near-real-time reporting.”You can listen about Starwood Hotels' implementation here. Many companies combine the power of Oracle GoldenGate with Oracle Data Integrator to have a single, integrated data integration platform for variety of use cases across the enterprise. Ufone is another good example of that. The leading mobile communications service provider of Pakistan has improved customer service using timely customer data in its data warehouse. Atif Aslam, head of management information systems for Ufone says: “Oracle Data Integrator and Oracle GoldenGate help us integrate information from various systems and provide up-to-date and real-time CRM data updates hourly, rather than daily. The applications have simplified data warehouse operations and allowed business users to make faster and better informed decisions to protect revenue in the fast-moving Pakistani telecommunications market.” You can read more about Ufone's use case here. In our Oracle Data Integration 12c launch webcast back in November we also heard from BT’s CTO Surren Parthab about their use of GoldenGate for moving to private cloud architecture. Surren also shared his perspectives on Oracle Data Integrator 12c and Oracle GoldenGate 12c releases. You can watch the video here. These are only a few examples of leading companies that have made data integration and real-time data access a key part of their data governance and IT modernization initiatives. They have seen real improvements in how their businesses operate and differentiate in today’s competitive markets. You can read about other customer examples in our Ebook: The Path to the Future and access resources including white papers, data sheets, podcasts and more via our Oracle Data Integration resource kit. /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;}

Read the article
Parse and transform XML with missing elements into table structure

- by dnlbrky

I'm trying to parse an XML file. A simplified version of it looks like this: x <- '<grandparent><parent><child1>ABC123</child1><child2>1381956044</child2></parent><parent><child2>1397527137</child2></parent><parent><child3>4675</child3></parent><parent><child1>DEF456</child1><child3>3735</child3></parent><parent><child1/><child3>3735</child3></parent></grandparent>' library(XML) xmlRoot(xmlTreeParse(x)) ## <grandparent> ## <parent> ## <child1>ABC123</child1> ## <child2>1381956044</child2> ## </parent> ## <parent> ## <child2>1397527137</child2> ## </parent> ## <parent> ## <child3>4675</child3> ## </parent> ## <parent> ## <child1>DEF456</child1> ## <child3>3735</child3> ## </parent> ## <parent> ## <child1/> ## <child3>3735</child3> ## </parent> ## </grandparent> I'd like to transform the XML into a data.frame / data.table that looks like this: parent <- data.frame(child1=c("ABC123",NA,NA,"DEF456",NA), child2=c(1381956044, 1397527137, rep(NA, 3)), child3=c(rep(NA, 2), 4675, 3735, 3735)) parent ## child1 child2 child3 ## 1 ABC123 1381956044 NA ## 2 <NA> 1397527137 NA ## 3 <NA> NA 4675 ## 4 DEF456 NA 3735 ## 5 <NA> NA 3735 If each parent node always contained all of the possible elements ("child1", "child2", "child3", etc.), I could use xmlToList and unlist to flatten it, and then dcast to put it into a table. But the XML often has missing child elements. Here is an attempt with incorrect output: library(data.table) ## Flatten: dt <- as.data.table(unlist(xmlToList(x)), keep.rownames=T) setnames(dt, c("column", "value")) ## Add row numbers, but they're incorrect due to missing XML elements: dt[, row:=.SD[,.I], by=column][] column value row 1: parent.child1 ABC123 1 2: parent.child2 1381956044 1 3: parent.child2 1397527137 2 4: parent.child3 4675 1 5: parent.child1 DEF456 2 6: parent.child3 3735 2 7: parent.child3 3735 3 ## Reshape from long to wide, but some value are in the wrong row: dcast.data.table(dt, row~column, value.var="value", fill=NA) ## row parent.child1 parent.child2 parent.child3 ## 1: 1 ABC123 1381956044 4675 ## 2: 2 DEF456 1397527137 3735 ## 3: 3 NA NA 3735 I won't know ahead of time the names of the child elements, or the count of unique element names for children of the grandparent, so the answer should be flexible.

Read the article
Importing data from text file to specific columns using BULK INSERT

- by Dinesh Asanka

Bulk insert is much faster than using other techniques such as SSIS. However, when you are using bulk insert you can’t insert to specific columns. If, for example, there are five columns in a table you should have five values for each record in the text file you are importing from. This is an issue when you are expecting default values to be inserted into tables. Let us say you have table as below: In this table, you are expecting ID, Status and CreatedDate to be updated automatically, so your text file may only have FirstName LastName values as below: Dinesh,Asanka Saman,Liyanage Ruwan,Silva Susantha,Bathige Jude,Peires Sanjeewa,Jayawickrama If you use bulk insert to this table like follows, You will be returned an error: Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 1 (ID). To avoid this you will need to create a view with the columns you are expecting to fill and use bulk insert against it. If you check the table now, you will see table with values in the text file and the default values.

Read the article
Data Security Through Structure, Procedures, Policies, and Governance

Security Structure and Procedures One of the easiest ways to implement security is through the use of structure, in particular the structure in which data is stored. The preferred method for this through the use of User Roles, these Roles allow for specific access to be granted based on what role a user plays in relation to the data that they are manipulating. Typical data access actions are defined by the CRUD Principle. CRUD Principle: Create New Data Read Existing Data Update Existing Data Delete Existing Data Based on the actions assigned to a role assigned, User can manipulate data as they need to preform daily business operations. An example of this can be seen in a hospital where doctors have been assigned Create, Read, Update, and Delete access to their patient’s prescriptions so that a doctor can prescribe and adjust any existing prescriptions as necessary. However, a nurse will only have Read access on the patient’s prescriptions so that they will know what medicines to give to the patients. If you notice, they do not have access to prescribe new prescriptions, update or delete existing prescriptions because only the patient’s doctor has access to preform those actions. With User Roles comes responsibility, companies need to constantly monitor data access to ensure that the proper roles have the most appropriate access levels to ensure users are not exposed to inappropriate data. In addition this also protects rouge employees from gaining access to critical business information that could be destroyed, altered or stolen. It is important that all data access is monitored because of this threat. Security Governance Current Data Governance laws regarding security Health Insurance Portability and Accountability Act (HIPAA) Sarbanes-Oxley Act Database Breach Notification Act The US Department of Health and Human Services defines HIIPAA as a Privacy Rule. This legislation protects the privacy of individually identifiable health information. Currently, HIPAA sets the national standards for securing electronically protected health records. Additionally, its confidentiality provisions protect identifiable information being used to analyze patient safety events and improve patient safety. In 2002 after the wake of the Enron and World Com Financial scandals Senator Paul Sarbanes and Representative Michael Oxley lead the creation of the Sarbanes-Oxley Act. This act administered by the Securities and Exchange Commission (SEC) dramatically altered corporate financial practices and data governance. In addition, it also set specific deadlines for compliance. The Sarbanes-Oxley is not a set of standard business rules and does not specify how a company should retain its records; In fact, this act outlines which pieces of data are to be stored as well as the storage duration. The Database Breach Notification Act requires companies, in the event of a data breach containing personally identifiable information, to notify all California residents whose information was stored on the compromised system at the time of the event, according to Gregory Manter. He further explains that this act is only California legislation. However, it does affect “any person or business that conducts business in California, and that owns or licenses computerized data that includes personal information,” regardless of where the compromised data is located. This will force any business that maintains at least limited interactions with California residents will find themselves subject to the Act’s provisions. Security Policies All companies must work in accordance with the appropriate city, county, state, and federal laws. One way to ensure that a company is legally compliant is to enforce security policies that adhere to the appropriate legislation in their area or areas that they service. These types of polices need to be mandated by a company’s Security Officer. For smaller companies, these policies need to come from executives, Directors, and Owners.

Read the article
Multiple foreign keys in one table to 1 other table in mysql

- by djerry

Hey guys, I got 2 tables in my database: user and call. User exists of 3 fields: id, name, number and call : id, 'source', 'destination', 'referred', date. I need to monitor calls in my app. The 3 ' ' fields above are actually userid numbers. now i'm wondering, can i make those 3 field foreign key elements of the id-field in table user? Thanks in advance...

Read the article
Excel Help: Data Input Help

- by B-Ballerl

Everyday I download data from a site that will have rows each filled with individual data for clients. I'm able to input the data into excel as a whole but after that I'm having trouble figuring out how to put it into a chart. For example Web visits time. So say Client 1 stayed for 5 min increasing his total time on the site to 20 min and Client 2 stayed for 0 min keeping his time of 10 min and they were both registered on new years eve, and R1's last login was today and R2's was yesterday. (R for some reason repersents Client, no idea why...). Client 3 hasn't been on since he registered keeping his total at 4 min So my data would look something like this for Today (20110104) R1,20101231,20110104,20 R2,20101231,20110103,10 R3,20101231,20101231,4 And this for the day before (201101030), R1,20101231,20110102,15 R2,20101231,20110103,10 R3,20101231,20101231,4 I get about 200+ client rows each day where even the names of the Client list are changing. Is it possible to import the data each day and fill it in a excel sheet where the Client number is off on the left hand side in a table, and the amount of time (Whole Number ex. 4) each day it spends on the site extend to the right under it's specific date see Picture? I've manage to create a manual sheet but have been unsucessful at getting excel to do any of it for me. Here are two pictures:

Read the article
Recover data from hard drive with partitions (but not most data) overwritten

- by Macha

I have a 500GB hard drive I've been keeping around to recover data from that I removed from a failing NAS drive that got sort of... erratic at the end. I finally got rid of the NAS when during a firmware update it removed the partition table. Fast forward to a week ago, when I was building a new PC, and a mixup resulted in me placing the hard drive in question in the new PC and installing Windows XP on the first 100GB. I'm presuming any data on that first 100GB is now gone, but for the rest of it, is there any way I can recover it at home, as professional data recovery is currently too expensive? I have a blank 1TB HDD if I can store any images of that hard drive on. The problem was definitely with the NAS and not the hard drive, as the hard drive had a successful install of Windows when mistakenly place in the new PC, and there were capacitors in the NAS's circuitry clearly broken. The data I want to recover (in order of priority) is: High: Some jpgs of family photos. Medium: Some RAW files. (There are also jpg versions of all of these) Low: Some mp3s, avis and ISOs, I can re-rip most of these if need be, but it'd be handy not to have to. (I don't need a backup lecture, and if you can hold it in from nagging Jeff Atwood for it, you can hold it in from nagging me for it) In short: The partition tables are gone and overwritten. The data is not overwritten, except for an amount equal to the size of a Windows XP SP3 installation.

Read the article
Why Oracle Delivers More Value than IBM in Data Integration Solutions

- by irem.radzik(at)oracle.com

For data integration projects, IT organization look for a robust but an easy-to-use solution, which simplifies enterprise data architecture while providing exceptional value-- not one that adds complexity and costs. This is a major challenge today for customers who are using IBM InfoSphere products like DataStage or Change Data Capture. Whereas, Oracle consistently delivers higher level value with its data integration products such as Oracle Data Integrator, Oracle GoldenGate. There are many differentiators for Oracle's Data Integration offering in comparison to IBM. Here are the top five: Lower cost of ownership Higher performance in both real-time and bulk data movement Ease of use and flexibility Reliability Complete, Open, and Integrated Middleware Offering Architectural differences between products contribute a great deal to these differences. First of all, Oracle's ETL architecture does not require a middle-tier transformation server, something IBM does require. Not only it costs more to manage an additional transformation server including energy costs, but it adds a performance bottleneck as well. In addition, IBM's data integration products are complex and often require lengthy professional services engagements to integrate. This translates to higher costs and delayed time to market. Then there's the reliability factor. Our customers choose Oracle GoldenGate over IBM's InfoSphere Change Data Capture product because Oracle GoldenGate is designed for mission-critical systems that require guaranteed data delivery and automatic recovery in case of process interruptions. On Thursday we will discuss these key differentiators in detail and provide customer examples that chose Oracle over IBM in data integration projects. Join us on Thursday Feb 10th at 11am PT to learn how Oracle delivers more value than IBM in data integration solutions.

Read the article
multiple pivot table consolidation to another pivot table

- by phill

I have to SQL Server views being drawn to 2 seperate worksheets as pivot tables in an excel 2007 file. the results on worksheet1 include example data: - company_name, tickets, month, year company1, 3, 1,2009 company2, 4, 1,2009 company3, 5, 1,2009 company3, 2, 2,2009 results from worksheet2 include example data: company_name, month, year , fee company1, 1 , 2009 , 2.00 company2, 1 , 2009 , 3.00 company3, 1 , 2009 , 4.00 company3, 2 , 2009 , 2.00 I would like the results of one worksheet to be reflected onto the pivot table of another with their corresponding companies. for example in this case: - company_name, tickets, month, year, fee company1, 3, 1,2009 , 2 company2, 4, 1,2009 , 3 company3, 5, 1,2009 , 4 company3, 2, 2,2009 , 2 Is there a way to do this without vba? thanks in advance

Read the article
Improving Partitioned Table Join Performance

- by Paul White

The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) ); CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY); WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50); -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE); -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory: Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); CREATE TABLE #P ( partition_number integer PRIMARY KEY); INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1; SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; DROP TABLE #P; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

Read the article
How to obtain a random sub-datatable from another data table

- by developerit

Introduction In this article, I’ll show how to get a random subset of data from a DataTable. This is useful when you already have queries that are filtered correctly but returns all the rows. Analysis I came across this situation when I wanted to display a random tag cloud. I already had the query to get the keywords ordered by number of clicks and I wanted to created a tag cloud. Tags that are the most popular should have more chance to get picked and should be displayed larger than less popular ones. Implementation In this code snippet, there is everything you need. ' Min size, in pixel for the tag Private Const MIN_FONT_SIZE As Integer = 9 ' Max size, in pixel for the tag Private Const MAX_FONT_SIZE As Integer = 14 ' Basic function that retreives Tags from a DataBase Public Shared Function GetTags() As MediasTagsDataTable ' Simple call to the TableAdapter, to get the Tags ordered by number of clicks Dim dt As MediasTagsDataTable = taMediasTags.GetDataValide ' If the query returned no result, return an empty DataTable If dt Is Nothing OrElse dt.Rows.Count < 1 Then Return New MediasTagsDataTable End If ' Set the font-size of the group of data ' We are dividing our results into sub set, according to their number of clicks ' Example: 10 results -> [0,2] will get font size 9, [3,5] will get font size 10, [6,8] wil get 11, ... ' This is the number of elements in one group Dim groupLenth As Integer = CType(Math.Floor(dt.Rows.Count / (MAX_FONT_SIZE - MIN_FONT_SIZE)), Integer) ' Counter of elements in the same group Dim counter As Integer = 0 ' Counter of groups Dim groupCounter As Integer = 0 ' Loop througt the list For Each row As MediasTagsRow In dt ' Set the font-size in a custom column row.c_FontSize = MIN_FONT_SIZE + groupCounter ' Increment the counter counter += 1 ' If the group counter is less than the counter If groupLenth <= counter Then ' Start a new group counter = 0 groupCounter += 1 End If Next ' Return the new DataTable with font-size Return dt End Function ' Function that generate the random sub set Public Shared Function GetRandomSampleTags(ByVal KeyCount As Integer) As MediasTagsDataTable ' Get the data Dim dt As MediasTagsDataTable = GetTags() ' Create a new DataTable that will contains the random set Dim rep As MediasTagsDataTable = New MediasTagsDataTable ' Count the number of row in the new DataTable Dim count As Integer = 0 ' Random number generator Dim rand As New Random() While count < KeyCount Randomize() ' Pick a random row Dim r As Integer = rand.Next(0, dt.Rows.Count - 1) Dim tmpRow As MediasTagsRow = dt(r) ' Import it into the new DataTable rep.ImportRow(tmpRow) ' Remove it from the old one, to be sure not to pick it again dt.Rows.RemoveAt(r) ' Increment the counter count += 1 End While ' Return the new sub set Return rep End Function Pro’s This method is good because it doesn’t require much work to get it work fast. It is a good concept when you are working with small tables, let says less than 100 records. Con’s If you have more than 100 records, out of memory exception may occur since we are coping and duplicating rows. I would consider using a stored procedure instead.

Read the article
SQL SERVER – Size of Index Table for Each Index – Solution 2

- by pinaldave

Earlier I had ran puzzle where I asked question regarding size of index table for each index in database over here SQL SERVER – Size of Index Table – A Puzzle to Find Index Size for Each Index on Table. I had received good amount answers and I had blogged about that here SQL SERVER – Size of Index Table for Each Index – Solution. As a comment to that blog I have received another very interesting comment and that provides near accurate answers to original question. Many thanks to Rama Mathanmohan for providing wonderful solution. SELECT OBJECT_NAME(i.OBJECT_ID) AS TableName, i.name AS IndexName, i.index_id AS IndexID, 8 * SUM(a.used_pages) AS 'Indexsize(KB)' FROM sys.indexes AS i JOIN sys.partitions AS p ON p.OBJECT_ID = i.OBJECT_ID AND p.index_id = i.index_id JOIN sys.allocation_units AS a ON a.container_id = p.partition_id GROUP BY i.OBJECT_ID,i.index_id,i.name ORDER BY OBJECT_NAME(i.OBJECT_ID),i.index_id Let me know if you have any better script for the same. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Pinal Dave, Readers Contribution, SQL, SQL Authority, SQL Data Storage, SQL Index, SQL Performance, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
Big GRC: Turning Data into Actionable GRC Intelligence

- by Jenna Danko

While it’s no longer headline news that Governments have carried out large scale data-mining programmes aimed at terrorism detection and identifying other patterns of interest across a wide range of digital data sources, the debate over the ethics and justification over this action, will clearly continue for some time to come. What is becoming clear is that these programmes are a framework for the collation and aggregation of massive amounts of unstructured data and from this, the creation of actionable intelligence from analyses that allowed the analysts to explore and extract a variety of patterns and then direct resources. This data included audio and video chats, phone calls, photographs, e-mails, documents, internet searches, social media posts and mobile phone logs and connections. Although Governance, Risk and Compliance (GRC) professionals are not looking at the implementation of such programmes, there are many similar GRC “Big data” challenges to be faced and potential lessons to be learned from these high profile government programmes that can be applied a lot closer to home. For example, how can GRC professionals collect, manage and analyze an enormous and disparate volume of data to create and manage their own actionable intelligence covering hidden signs and patterns of criminal activity, the early or retrospective, violation of regulations/laws/corporate policies and procedures, emerging risks and weakening controls etc. Not exactly the stuff of James Bond to be sure, but it is certainly more applicable to most GRC professional’s day to day challenges. So what is Big Data and how can it benefit the GRC process? Although it often varies, the definition of Big Data largely refers to the following types of data: Traditional Enterprise Data – includes customer information from CRM systems, transactional ERP data, web store transactions, and general ledger data. Machine-Generated /Sensor Data – includes Call Detail Records (“CDR”), weblogs and trading systems data. Social Data – includes customer feedback streams, micro-blogging sites like Twitter, and social media platforms like Facebook. The McKinsey Global Institute estimates that data volume is growing 40% per year, and will grow 44x between 2009 and 2020. But while it’s often the most visible parameter, volume of data is not the only characteristic that matters. In fact, according to sources such as Forrester there are four key characteristics that define big data: Volume. Machine-generated data is produced in much larger quantities than non-traditional data. This is all the data generated by IT systems that power the enterprise. This includes live data from packaged and custom applications – for example, app servers, Web servers, databases, networks, virtual machines, telecom equipment, and much more. Velocity. Social media data streams – while not as massive as machine-generated data – produce a large influx of opinions and relationships valuable to customer relationship management as well as offering early insight into potential reputational risk issues. Even at 140 characters per tweet, the high velocity (or frequency) of Twitter data ensures large volumes (over 8 TB per day) need to be managed. Variety. Traditional data formats tend to be relatively well defined by a data schema and change slowly. In contrast, non-traditional data formats exhibit a dizzying rate of change. Without question, all GRC professionals work in a dynamic environment and as new services, new products, new business lines are added or new marketing campaigns executed for example, new data types are needed to capture the resultant information. Value. The economic value of data varies significantly. Typically, there is good information hidden amongst a larger body of non-traditional data that GRC professionals can use to add real value to the organisation; the greater challenge is identifying what is valuable and then transforming and extracting that data for analysis and action. For example, customer service calls and emails have millions of useful data points and have long been a source of information to GRC professionals. Those calls and emails are critical in helping GRC professionals better identify hidden patterns and implement new policies that can reduce the amount of customer complaints. Now on a scale and depth far beyond those in place today, all that unstructured call and email data can be captured, stored and analyzed to reveal the reasons for the contact, perhaps with the aggregated customer results cross referenced against what is being said about the organization or a similar peer organization on social media. The organization can then take positive actions, communicating to the market in advance of issues reaching the press, strengthening controls, adjusting risk profiles, changing policy and procedures and completely minimizing, if not eliminating, complaints and compensation for that specific reason in the future. In this one example of many similar ones, the GRC team(s) has demonstrated real and tangible business value. Big Challenges - Big Opportunities As pointed out by recent Forrester research, high performing companies (those that are growing 15% or more year-on-year compared to their peers) are taking a selective approach to investing in Big Data. "Tomorrow's winners understand this, and they are making selective investments aimed at specific opportunities with tangible benefits where big data offers a more economical solution to meet a need." (Forrsights Strategy Spotlight: Business Intelligence and Big Data, Q4 2012) As pointed out earlier, with the ever increasing volume of regulatory demands and fines for getting it wrong, limited resource availability and out of date or inadequate GRC systems all contributing to a higher cost of compliance and/or higher risk profile than desired – a big data investment in GRC clearly falls into this category. However, to make the most of big data organizations must evolve both their business and IT procedures, processes, people and infrastructures to handle these new high-volume, high-velocity, high-variety sources of data and be able integrate them with the pre-existing company data to be analyzed. GRC big data clearly allows the organization access to and management over a huge amount of often very sensitive information that although can help create a more risk intelligent organization, also presents numerous data governance challenges, including regulatory compliance and information security. In addition to client and regulatory demands over better information security and data protection the sheer amount of information organizations deal with the need to quickly access, classify, protect and manage that information can quickly become a key issue from a legal, as well as technical or operational standpoint. However, by making information governance processes a bigger part of everyday operations, organizations can make sure data remains readily available and protected. The Right GRC & Big Data Partnership Becomes Key The "getting it right first time" mantra used in so many companies remains essential for any GRC team that is sponsoring, helping kick start, or even overseeing a big data project. To make a big data GRC initiative work and get the desired value, partnerships with companies, who have a long history of success in delivering successful GRC solutions as well as being at the very forefront of technology innovation, becomes key. Clearly solutions can be built in-house more cheaply than through vendor, but as has been proven time and time again, when it comes to self built solutions covering AML and Fraud for example, few have able to scale or adapt appropriately to meet the changing regulations or challenges that the GRC teams face on a daily basis. This has led to the creation of GRC silo’s that are causing so many headaches today. The solutions that stand out and should be explored are the ones that can seamlessly merge the traditional world of well-known data, analytics and visualization with the new world of seemingly innumerable data sources, utilizing Big Data technologies to generate new GRC insights right across the enterprise.Ultimately, Big Data is here to stay, and organizations that embrace its potential and outline a viable strategy, as well as understand and build a solid analytical foundation, will be the ones that are well positioned to make the most of it. A Blueprint and Roadmap Service for Big Data Big data adoption is first and foremost a business decision. As such it is essential that your partner can align your strategies, goals, and objectives with an architecture vision and roadmap to accelerate adoption of big data for your environment, as well as establish practical, effective governance that will maintain a well managed environment going forward. Key Activities: While your initiatives will clearly vary, there are some generic starting points the team and organization will need to complete: Clearly define your drivers, strategies, goals, objectives and requirements as it relates to big data Conduct a big data readiness and Information Architecture maturity assessment Develop future state big data architecture, including views across all relevant architecture domains; business, applications, information, and technology Provide initial guidance on big data candidate selection for migrations or implementation Develop a strategic roadmap and implementation plan that reflects a prioritization of initiatives based on business impact and technology dependency, and an incremental integration approach for evolving your current state to the target future state in a manner that represents the least amount of risk and impact of change on the business Provide recommendations for practical, effective Data Governance, Data Quality Management, and Information Lifecycle Management to maintain a well-managed environment Conduct an executive workshop with recommendations and next steps There is little debate that managing risk and data are the two biggest obstacles encountered by financial institutions. Big data is here to stay and risk management certainly is not going anywhere, and ultimately financial services industry organizations that embrace its potential and outline a viable strategy, as well as understand and build a solid analytical foundation, will be best positioned to make the most of it. Matthew Long is a Financial Crime Specialist for Oracle Financial Services. He can be reached at matthew.long AT oracle.com.

Read the article
Introducing Data Annotations Extensions

- by srkirkland

Validation of user input is integral to building a modern web application, and ASP.NET MVC offers us a way to enforce business rules on both the client and server using Model Validation. The recent release of ASP.NET MVC 3 has improved these offerings on the client side by introducing an unobtrusive validation library built on top of jquery.validation. Out of the box MVC comes with support for Data Annotations (that is, System.ComponentModel.DataAnnotations) and can be extended to support other frameworks. Data Annotations Validation is becoming more popular and is being baked in to many other Microsoft offerings, including Entity Framework, though with MVC it only contains four validators: Range, Required, StringLength and Regular Expression. The Data Annotations Extensions project attempts to augment these validators with additional attributes while maintaining the clean integration Data Annotations provides. A Quick Word About Data Annotations Extensions The Data Annotations Extensions project can be found at http://dataannotationsextensions.org/, and currently provides 11 additional validation attributes (ex: Email, EqualTo, Min/Max) on top of Data Annotations’ original 4. You can find a current list of the validation attributes on the afore mentioned website. The core library provides server-side validation attributes that can be used in any .NET 4.0 project (no MVC dependency). There is also an easily pluggable client-side validation library which can be used in ASP.NET MVC 3 projects using unobtrusive jquery validation (only MVC3 included javascript files are required). On to the Preview Let’s say you had the following “Customer” domain model (or view model, depending on your project structure) in an MVC 3 project: public class Customer { public string Email { get; set; } public int Age { get; set; } public string ProfilePictureLocation { get; set; } } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } When it comes time to create/edit this Customer, you will probably have a CustomerController and a simple form that just uses one of the Html.EditorFor() methods that the ASP.NET MVC tooling generates for you (or you can write yourself). It should look something like this: With no validation, the customer can enter nonsense for an email address, and then can even report their age as a negative number! With the built-in Data Annotations validation, I could do a bit better by adding a Range to the age, adding a RegularExpression for email (yuck!), and adding some required attributes. However, I’d still be able to report my age as 10.75 years old, and my profile picture could still be any string. Let’s use Data Annotations along with this project, Data Annotations Extensions, and see what we can get: public class Customer { [Email] [Required] public string Email { get; set; } [Integer] [Min(1, ErrorMessage="Unless you are benjamin button you are lying.")] [Required] public int Age { get; set; } [FileExtensions("png|jpg|jpeg|gif")] public string ProfilePictureLocation { get; set; } } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Now let’s try to put in some invalid values and see what happens: That is very nice validation, all done on the client side (will also be validated on the server). Also, the Customer class validation attributes are very easy to read and understand. Another bonus: Since Data Annotations Extensions can integrate with MVC 3’s unobtrusive validation, no additional scripts are required! Now that we’ve seen our target, let’s take a look at how to get there within a new MVC 3 project. Adding Data Annotations Extensions To Your Project First we will File->New Project and create an ASP.NET MVC 3 project. I am going to use Razor for these examples, but any view engine can be used in practice. Now go into the NuGet Extension Manager (right click on references and select add Library Package Reference) and search for “DataAnnotationsExtensions.” You should see the following two packages: The first package is for server-side validation scenarios, but since we are using MVC 3 and would like comprehensive sever and client validation support, click on the DataAnnotationsExtensions.MVC3 project and then click Install. This will install the Data Annotations Extensions server and client validation DLLs along with David Ebbo’s web activator (which enables the validation attributes to be registered with MVC 3). Now that Data Annotations Extensions is installed you have all you need to start doing advanced model validation. If you are already using Data Annotations in your project, just making use of the additional validation attributes will provide client and server validation automatically. However, assuming you are starting with a blank project I’ll walk you through setting up a controller and model to test with. Creating Your Model In the Models folder, create a new User.cs file with a User class that you can use as a model. To start with, I’ll use the following class: public class User { public string Email { get; set; } public string Password { get; set; } public string PasswordConfirm { get; set; } public string HomePage { get; set; } public int Age { get; set; } } Next, create a simple controller with at least a Create method, and then a matching Create view (note, you can do all of this via the MVC built-in tooling). Your files will look something like this: UserController.cs: public class UserController : Controller { public ActionResult Create() { return View(new User()); } [HttpPost] public ActionResult Create(User user) { if (!ModelState.IsValid) { return View(user); } return Content("User valid!"); } } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Create.cshtml: @model NuGetValidationTester.Models.User @{ ViewBag.Title = "Create"; } <h2>Create</h2> <script src="@Url.Content("~/Scripts/jquery.validate.min.js")" type="text/javascript"></script> <script src="@Url.Content("~/Scripts/jquery.validate.unobtrusive.min.js")" type="text/javascript"></script> @using (Html.BeginForm()) { @Html.ValidationSummary(true) <fieldset> <legend>User</legend> @Html.EditorForModel() <p> <input type="submit" value="Create" /> </p> </fieldset> } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } In the Create.cshtml view, note that we are referencing jquery validation and jquery unobtrusive (jquery is referenced in the layout page). These MVC 3 included scripts are the only ones you need to enjoy both the basic Data Annotations validation as well as the validation additions available in Data Annotations Extensions. These references are added by default when you use the MVC 3 “Add View” dialog on a modification template type. Now when we go to /User/Create we should see a form for editing a User Since we haven’t yet added any validation attributes, this form is valid as shown (including no password, email and an age of 0). With the built-in Data Annotations attributes we can make some of the fields required, and we could use a range validator of maybe 1 to 110 on Age (of course we don’t want to leave out supercentenarians) but let’s go further and validate our input comprehensively using Data Annotations Extensions. The new and improved User.cs model class. { [Required] [Email] public string Email { get; set; } [Required] public string Password { get; set; } [Required] [EqualTo("Password")] public string PasswordConfirm { get; set; } [Url] public string HomePage { get; set; } [Integer] [Min(1)] public int Age { get; set; } } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Now let’s re-run our form and try to use some invalid values: All of the validation errors you see above occurred on the client, without ever even hitting submit. The validation is also checked on the server, which is a good practice since client validation is easily bypassed. That’s all you need to do to start a new project and include Data Annotations Extensions, and of course you can integrate it into an existing project just as easily. Nitpickers Corner ASP.NET MVC 3 futures defines four new data annotations attributes which this project has as well: CreditCard, Email, Url and EqualTo. Unfortunately referencing MVC 3 futures necessitates taking an dependency on MVC 3 in your model layer, which may be unadvisable in a multi-tiered project. Data Annotations Extensions keeps the server and client side libraries separate so using the project’s validation attributes don’t require you to take any additional dependencies in your model layer which still allowing for the rich client validation experience if you are using MVC 3. Custom Error Message and Globalization: Since the Data Annotations Extensions are build on top of Data Annotations, you have the ability to define your own static error messages and even to use resource files for very customizable error messages. Available Validators: Please see the project site at http://dataannotationsextensions.org/ for an up-to-date list of the new validators included in this project. As of this post, the following validators are available: CreditCard Date Digits Email EqualTo FileExtensions Integer Max Min Numeric Url Conclusion Hopefully I’ve illustrated how easy it is to add server and client validation to your MVC 3 projects, and how to easily you can extend the available validation options to meet real world needs. The Data Annotations Extensions project is fully open source under the BSD license. Any feedback would be greatly appreciated. More information than you require, along with links to the source code, is available at http://dataannotationsextensions.org/. Enjoy!

Read the article
SQL Server – Undelete a Table and Restore a Single Table from Backup

- by Mladen Prajdic

This post is part of the monthly community event called T-SQL Tuesday started by Adam Machanic (blog|twitter) and hosted by someone else each month. This month the host is Sankar Reddy (blog|twitter) and the topic is Misconceptions in SQL Server. You can follow posts for this theme on Twitter by looking at #TSQL2sDay hashtag. Let me start by saying: This code is a crazy hack that is to never be used unless you really, really have to. Really! And I don’t think there’s a time when you would really have to use it for real. Because it’s a hack there are number of things that can go wrong so play with it knowing that. I’ve managed to totally corrupt one database. :) Oh… and for those saying: yeah yeah.. you have a single table in a file group and you’re restoring that, I say “nay nay” to you. As we all know SQL Server can’t do single table restores from backup. This is kind of a obvious thing due to different relational integrity (RI) concerns. Since we have to maintain that we have to restore all tables represented in a RI graph. For this exercise i say BAH! to those concerns. Note that this method “works” only for simple tables that don’t have LOB and off rows data. The code can be expanded to include those but I’ve tried to leave things “simple”. Note that for this to work our table needs to be relatively static data-wise. This doesn’t work for OLTP table. Products are a perfect example of static data. They don’t change much between backups, pretty much everything depends on them and their table is one of those tables that are relatively easy to accidentally delete everything from. This only works if the database is in Full or Bulk-Logged recovery mode for tables where the contents have been deleted or truncated but NOT when a table was dropped. Everything we’ll talk about has to be done before the data pages are reused for other purposes. After deletion or truncation the pages are marked as reusable so you have to act fast. The best thing probably is to put the database into single user mode ASAP while you’re performing this procedure and return it to multi user after you’re done. How do we do it? We will be using an undocumented but known DBCC commands: DBCC PAGE, an undocumented function sys.fn_dblog and a little known DATABASE RESTORE PAGE option. All tests will be on a copy of Production.Product table in AdventureWorks database called Production.Product1 because the original table has FK constraints that prevent us from truncating it for testing. -- create a duplicate table. This doesn't preserve indexes!SELECT *INTO AdventureWorks.Production.Product1FROM AdventureWorks.Production.Product After we run this code take a full back to perform further testing. First let’s see what the difference between DELETE and TRUNCATE is when it comes to logging. With DELETE every row deletion is logged in the transaction log. With TRUNCATE only whole data page deallocations are logged in the transaction log. Getting deleted data pages is simple. All we have to look for is row delete entry in the sys.fn_dblog output. But getting data pages that were truncated from the transaction log presents a bit of an interesting problem. I will not go into depths of IAM(Index Allocation Map) and PFS (Page Free Space) pages but suffice to say that every IAM page has intervals that tell us which data pages are allocated for a table and which aren’t. If we deep dive into the sys.fn_dblog output we can see that once you truncate a table all the pages in all the intervals are deallocated and this is shown in the PFS page transaction log entry as deallocation of pages. For every 8 pages in the same extent there is one PFS page row in the transaction log. This row holds information about all 8 pages in CSV format which means we can get to this data with some parsing. A great help for parsing this stuff is Peter Debetta’s handy function dbo.HexStrToVarBin that converts hexadecimal string into a varbinary value that can be easily converted to integer tus giving us a readable page number. The shortened (columns removed) sys.fn_dblog output for a PFS page with CSV data for 1 extent (8 data pages) looks like this: -- [Page ID] is displayed in hex format. -- To convert it to readable int we'll use dbo.HexStrToVarBin function found at -- http://sqlblog.com/blogs/peter_debetta/archive/2007/03/09/t-sql-convert-hex-string-to-varbinary.aspx -- This function must be installed in the master databaseSELECT Context, AllocUnitName, [Page ID], DescriptionFROM sys.fn_dblog(NULL, NULL)WHERE [Current LSN] = '00000031:00000a46:007d' The pages at the end marked with 0x00—> are pages that are allocated in the extent but are not part of a table. We can inspect the raw content of each data page with a DBCC PAGE command: -- we need this trace flag to redirect output to the query window.DBCC TRACEON (3604); -- WITH TABLERESULTS gives us data in table format instead of message format-- we use format option 3 because it's the easiest to read and manipulate further onDBCC PAGE (AdventureWorks, 1, 613, 3) WITH TABLERESULTS Since the DBACC PAGE output can be quite extensive I won’t put it here. You can see an example of it in the link at the beginning of this section. Getting deleted data back When we run a delete statement every row to be deleted is marked as a ghost record. A background process periodically cleans up those rows. A huge misconception is that the data is actually removed. It’s not. Only the pointers to the rows are removed while the data itself is still on the data page. We just can’t access it with normal means. To get those pointers back we need to restore every deleted page using the RESTORE PAGE option mentioned above. This restore must be done from a full backup, followed by any differential and log backups that you may have. This is necessary to bring the pages up to the same point in time as the rest of the data. However the restore doesn’t magically connect the restored page back to the original table. It simply replaces the current page with the one from the backup. After the restore we use the DBCC PAGE to read data directly from all data pages and insert that data into a temporary table. To finish the RESTORE PAGE procedure we finally have to take a tail log backup (simple backup of the transaction log) and restore it back. We can now insert data from the temporary table to our original table by hand. Getting truncated data back When we run a truncate the truncated data pages aren’t touched at all. Even the pointers to rows stay unchanged. Because of this getting data back from truncated table is simple. we just have to find out which pages belonged to our table and use DBCC PAGE to read data off of them. No restore is necessary. Turns out that the problems we had with finding the data pages is alleviated by not having to do a RESTORE PAGE procedure. Stop stalling… show me The Code! This is the code for getting back deleted and truncated data back. It’s commented in all the right places so don’t be afraid to take a closer look. Make sure you have a full backup before trying this out. Also I suggest that the last step of backing and restoring the tail log is performed by hand. USE masterGOIF OBJECT_ID('dbo.HexStrToVarBin') IS NULL RAISERROR ('No dbo.HexStrToVarBin installed. Go to http://sqlblog.com/blogs/peter_debetta/archive/2007/03/09/t-sql-convert-hex-string-to-varbinary.aspx and install it in master database' , 18, 1) SET NOCOUNT ONBEGIN TRY DECLARE @dbName VARCHAR(1000), @schemaName VARCHAR(1000), @tableName VARCHAR(1000), @fullBackupName VARCHAR(1000), @undeletedTableName VARCHAR(1000), @sql VARCHAR(MAX), @tableWasTruncated bit; /* THE FIRST LINE ARE OUR INPUT PARAMETERS In this case we're trying to recover Production.Product1 table in AdventureWorks database. My full backup of AdventureWorks database is at e:\AW.bak */ SELECT @dbName = 'AdventureWorks', @schemaName = 'Production', @tableName = 'Product1', @fullBackupName = 'e:\AW.bak', @undeletedTableName = '##' + @tableName + '_Undeleted', @tableWasTruncated = 0, -- copy the structure from original table to a temp table that we'll fill with restored data @sql = 'IF OBJECT_ID(''tempdb..' + @undeletedTableName + ''') IS NOT NULL DROP TABLE ' + @undeletedTableName + ' SELECT *' + ' INTO ' + @undeletedTableName + ' FROM [' + @dbName + '].[' + @schemaName + '].[' + @tableName + ']' + ' WHERE 1 = 0' EXEC (@sql) IF OBJECT_ID('tempdb..#PagesToRestore') IS NOT NULL DROP TABLE #PagesToRestore /* FIND DATA PAGES WE NEED TO RESTORE*/ CREATE TABLE #PagesToRestore ([ID] INT IDENTITY(1,1), [FileID] INT, [PageID] INT, [SQLtoExec] VARCHAR(1000)) -- DBCC PACE statement to run later RAISERROR ('Looking for deleted pages...', 10, 1) -- use T-LOG direct read to get deleted data pages INSERT INTO #PagesToRestore([FileID], [PageID], [SQLtoExec]) EXEC('USE [' + @dbName + '];SELECT FileID, PageID, ''DBCC TRACEON (3604); DBCC PAGE ([' + @dbName + '], '' + FileID + '', '' + PageID + '', 3) WITH TABLERESULTS'' as SQLToExecFROM (SELECT DISTINCT LEFT([Page ID], 4) AS FileID, CONVERT(VARCHAR(100), ' + 'CONVERT(INT, master.dbo.HexStrToVarBin(SUBSTRING([Page ID], 6, 20)))) AS PageIDFROM sys.fn_dblog(NULL, NULL)WHERE AllocUnitName LIKE ''%' + @schemaName + '.' + @tableName + '%'' ' + 'AND Context IN (''LCX_MARK_AS_GHOST'', ''LCX_HEAP'') AND Operation in (''LOP_DELETE_ROWS''))t');SELECT *FROM #PagesToRestore -- if upper EXEC returns 0 rows it means the table was truncated so find truncated pages IF (SELECT COUNT(*) FROM #PagesToRestore) = 0 BEGIN RAISERROR ('No deleted pages found. Looking for truncated pages...', 10, 1) -- use T-LOG read to get truncated data pages INSERT INTO #PagesToRestore([FileID], [PageID], [SQLtoExec]) -- dark magic happens here -- because truncation simply deallocates pages we have to find out which pages were deallocated. -- we can find this out by looking at the PFS page row's Description column. -- for every deallocated extent the Description has a CSV of 8 pages in that extent. -- then it's just a matter of parsing it. -- we also remove the pages in the extent that weren't allocated to the table itself -- marked with '0x00-->00' EXEC ('USE [' + @dbName + '];DECLARE @truncatedPages TABLE(DeallocatedPages VARCHAR(8000), IsMultipleDeallocs BIT);INSERT INTO @truncatedPagesSELECT REPLACE(REPLACE(Description, ''Deallocated '', ''Y''), ''0x00-->00 '', ''N'') + '';'' AS DeallocatedPages, CHARINDEX('';'', Description) AS IsMultipleDeallocsFROM (SELECT DISTINCT LEFT([Page ID], 4) AS FileID, CONVERT(VARCHAR(100), CONVERT(INT, master.dbo.HexStrToVarBin(SUBSTRING([Page ID], 6, 20)))) AS PageID, DescriptionFROM sys.fn_dblog(NULL, NULL)WHERE Context IN (''LCX_PFS'') AND Description LIKE ''Deallocated%'' AND AllocUnitName LIKE ''%' + @schemaName + '.' + @tableName + '%'') t;SELECT FileID, PageID , ''DBCC TRACEON (3604); DBCC PAGE ([' + @dbName + '], '' + FileID + '', '' + PageID + '', 3) WITH TABLERESULTS'' as SQLToExecFROM (SELECT LEFT(PageAndFile, 1) as WasPageAllocatedToTable , SUBSTRING(PageAndFile, 2, CHARINDEX('':'', PageAndFile) - 2 ) as FileID , CONVERT(VARCHAR(100), CONVERT(INT, master.dbo.HexStrToVarBin(SUBSTRING(PageAndFile, CHARINDEX('':'', PageAndFile) + 1, LEN(PageAndFile))))) as PageIDFROM ( SELECT SUBSTRING(DeallocatedPages, delimPosStart, delimPosEnd - delimPosStart) as PageAndFile, IsMultipleDeallocs FROM ( SELECT *, CHARINDEX('';'', DeallocatedPages)*(N-1) + 1 AS delimPosStart, CHARINDEX('';'', DeallocatedPages)*N AS delimPosEnd FROM @truncatedPages t1 CROSS APPLY (SELECT TOP (case when t1.IsMultipleDeallocs = 1 then 8 else 1 end) ROW_NUMBER() OVER(ORDER BY number) as N FROM master..spt_values) t2 )t)t)tWHERE WasPageAllocatedToTable = ''Y''') SELECT @tableWasTruncated = 1 END DECLARE @lastID INT, @pagesCount INT SELECT @lastID = 1, @pagesCount = COUNT(*) FROM #PagesToRestore SELECT @sql = 'Number of pages to restore: ' + CONVERT(VARCHAR(10), @pagesCount) IF @pagesCount = 0 RAISERROR ('No data pages to restore.', 18, 1) ELSE RAISERROR (@sql, 10, 1) -- If the table was truncated we'll read the data directly from data pages without restoring from backup IF @tableWasTruncated = 0 BEGIN -- RESTORE DATA PAGES FROM FULL BACKUP IN BATCHES OF 200 WHILE @lastID <= @pagesCount BEGIN -- create CSV string of pages to restore SELECT @sql = STUFF((SELECT ',' + CONVERT(VARCHAR(100), FileID) + ':' + CONVERT(VARCHAR(100), PageID) FROM #PagesToRestore WHERE ID BETWEEN @lastID AND @lastID + 200 ORDER BY ID FOR XML PATH('')), 1, 1, '') SELECT @sql = 'RESTORE DATABASE [' + @dbName + '] PAGE = ''' + @sql + ''' FROM DISK = ''' + @fullBackupName + '''' RAISERROR ('Starting RESTORE command:' , 10, 1) WITH NOWAIT; RAISERROR (@sql , 10, 1) WITH NOWAIT; EXEC(@sql); RAISERROR ('Restore DONE' , 10, 1) WITH NOWAIT; SELECT @lastID = @lastID + 200 END /* If you have any differential or transaction log backups you should restore them here to bring the previously restored data pages up to date */ END DECLARE @dbccSinglePage TABLE ( [ParentObject] NVARCHAR(500), [Object] NVARCHAR(500), [Field] NVARCHAR(500), [VALUE] NVARCHAR(MAX) ) DECLARE @cols NVARCHAR(MAX), @paramDefinition NVARCHAR(500), @SQLtoExec VARCHAR(1000), @FileID VARCHAR(100), @PageID VARCHAR(100), @i INT = 1 -- Get deleted table columns from information_schema view -- Need sp_executeSQL because database name can't be passed in as variable SELECT @cols = 'select @cols = STUFF((SELECT '', ['' + COLUMN_NAME + '']''FROM ' + @dbName + '.INFORMATION_SCHEMA.COLUMNSWHERE TABLE_NAME = ''' + @tableName + ''' AND TABLE_SCHEMA = ''' + @schemaName + '''ORDER BY ORDINAL_POSITIONFOR XML PATH('''')), 1, 2, '''')', @paramDefinition = N'@cols nvarchar(max) OUTPUT' EXECUTE sp_executesql @cols, @paramDefinition, @cols = @cols OUTPUT -- Loop through all the restored data pages, -- read data from them and insert them into temp table -- which you can then insert into the orignial deleted table DECLARE dbccPageCursor CURSOR GLOBAL FORWARD_ONLY FOR SELECT [FileID], [PageID], [SQLtoExec] FROM #PagesToRestore ORDER BY [FileID], [PageID] OPEN dbccPageCursor; FETCH NEXT FROM dbccPageCursor INTO @FileID, @PageID, @SQLtoExec; WHILE @@FETCH_STATUS = 0 BEGIN RAISERROR ('---------------------------------------------', 10, 1) WITH NOWAIT; SELECT @sql = 'Loop iteration: ' + CONVERT(VARCHAR(10), @i); RAISERROR (@sql, 10, 1) WITH NOWAIT; SELECT @sql = 'Running: ' + @SQLtoExec RAISERROR (@sql, 10, 1) WITH NOWAIT; -- if something goes wrong with DBCC execution or data gathering, skip it but print error BEGIN TRY INSERT INTO @dbccSinglePage EXEC (@SQLtoExec) -- make the data insert magic happen here IF (SELECT CONVERT(BIGINT, [VALUE]) FROM @dbccSinglePage WHERE [Field] LIKE '%Metadata: ObjectId%') = OBJECT_ID('['+@dbName+'].['+@schemaName +'].['+@tableName+']') BEGIN DELETE @dbccSinglePage WHERE NOT ([ParentObject] LIKE 'Slot % Offset %' AND [Object] LIKE 'Slot % Column %') SELECT @sql = 'USE tempdb; ' + 'IF (OBJECTPROPERTY(object_id(''' + @undeletedTableName + '''), ''TableHasIdentity'') = 1) ' + 'SET IDENTITY_INSERT ' + @undeletedTableName + ' ON; ' + 'INSERT INTO ' + @undeletedTableName + '(' + @cols + ') ' + STUFF((SELECT ' UNION ALL SELECT ' + STUFF((SELECT ', ' + CASE WHEN VALUE = '[NULL]' THEN 'NULL' ELSE '''' + [VALUE] + '''' END FROM ( -- the unicorn help here to correctly set ordinal numbers of columns in a data page -- it's turning STRING order into INT order (1,10,11,2,21 into 1,2,..10,11...21) SELECT [ParentObject], [Object], Field, VALUE, RIGHT('00000' + O1, 6) AS ParentObjectOrder, RIGHT('00000' + REVERSE(LEFT(O2, CHARINDEX(' ', O2)-1)), 6) AS ObjectOrder FROM ( SELECT [ParentObject], [Object], Field, VALUE, REPLACE(LEFT([ParentObject], CHARINDEX('Offset', [ParentObject])-1), 'Slot ', '') AS O1, REVERSE(LEFT([Object], CHARINDEX('Offset ', [Object])-2)) AS O2 FROM @dbccSinglePage WHERE t.ParentObject = ParentObject )t)t ORDER BY ParentObjectOrder, ObjectOrder FOR XML PATH('')), 1, 2, '') FROM @dbccSinglePage t GROUP BY ParentObject FOR XML PATH('') ), 1, 11, '') + ';' RAISERROR (@sql, 10, 1) WITH NOWAIT; EXEC (@sql) END END TRY BEGIN CATCH SELECT @sql = 'ERROR!!!' + CHAR(10) + CHAR(13) + 'ErrorNumber: ' + ERROR_NUMBER() + '; ErrorMessage' + ERROR_MESSAGE() + CHAR(10) + CHAR(13) + 'FileID: ' + @FileID + '; PageID: ' + @PageID RAISERROR (@sql, 10, 1) WITH NOWAIT; END CATCH DELETE @dbccSinglePage SELECT @sql = 'Pages left to process: ' + CONVERT(VARCHAR(10), @pagesCount - @i) + CHAR(10) + CHAR(13) + CHAR(10) + CHAR(13) + CHAR(10) + CHAR(13), @i = @i+1 RAISERROR (@sql, 10, 1) WITH NOWAIT; FETCH NEXT FROM dbccPageCursor INTO @FileID, @PageID, @SQLtoExec; END CLOSE dbccPageCursor; DEALLOCATE dbccPageCursor; EXEC ('SELECT ''' + @undeletedTableName + ''' as TableName; SELECT * FROM ' + @undeletedTableName)END TRYBEGIN CATCH SELECT ERROR_NUMBER() AS ErrorNumber, ERROR_MESSAGE() AS ErrorMessage IF CURSOR_STATUS ('global', 'dbccPageCursor') >= 0 BEGIN CLOSE dbccPageCursor; DEALLOCATE dbccPageCursor; ENDEND CATCH-- if the table was deleted we need to finish the restore page sequenceIF @tableWasTruncated = 0BEGIN -- take a log tail backup and then restore it to complete page restore process DECLARE @currentDate VARCHAR(30) SELECT @currentDate = CONVERT(VARCHAR(30), GETDATE(), 112) RAISERROR ('Starting Log Tail backup to c:\Temp ...', 10, 1) WITH NOWAIT; PRINT ('BACKUP LOG [' + @dbName + '] TO DISK = ''c:\Temp\' + @dbName + '_TailLogBackup_' + @currentDate + '.trn''') EXEC ('BACKUP LOG [' + @dbName + '] TO DISK = ''c:\Temp\' + @dbName + '_TailLogBackup_' + @currentDate + '.trn''') RAISERROR ('Log Tail backup done.', 10, 1) WITH NOWAIT; RAISERROR ('Starting Log Tail restore from c:\Temp ...', 10, 1) WITH NOWAIT; PRINT ('RESTORE LOG [' + @dbName + '] FROM DISK = ''c:\Temp\' + @dbName + '_TailLogBackup_' + @currentDate + '.trn''') EXEC ('RESTORE LOG [' + @dbName + '] FROM DISK = ''c:\Temp\' + @dbName + '_TailLogBackup_' + @currentDate + '.trn''') RAISERROR ('Log Tail restore done.', 10, 1) WITH NOWAIT;END-- The last step is manual. Insert data from our temporary table to the original deleted table The misconception here is that you can do a single table restore properly in SQL Server. You can't. But with little experimentation you can get pretty close to it. One way to possible remove a dependency on a backup to retrieve deleted pages is to quickly run a similar script to the upper one that gets data directly from data pages while the rows are still marked as ghost records. It could be done if we could beat the ghost record cleanup task.

Read the article
Finding values from a table that are *not* in a grouping of another table and what group that value

- by Bkins

I hope I am not missing something very simple here. I have done a Google search(es) and searched through stackoverflow. Here is the situation: For simplicity's sake let's say I have a table called "PeoplesDocs", in a SQL Server 2008 DB, that holds a bunch of people and all the documents that they own. So one person can have several documents. I also have a table called "RequiredDocs" that simply holds all the documents that a person should have. Here is sort of what it looks like: PeoplesDocs: PersonID DocID -------- ----- 1 A 1 B 1 C 1 D 2 C 2 D 3 A 3 B 3 C RequiredDocs: DocID DocName ----- --------- A DocumentA B DocumentB C DocumentC D DocumentD How do I write a SQL query that returns some variation of: PersonID MissingDocs -------- ----------- 2 DocumentA 2 DocumentB 3 DocumentD I have tried, and most of my searching has pointed to, something like: SELECT DocID FROM DocsRequired WHERE NOT EXIST IN ( SELECT DocID FROM PeoplesDocs) but obviously this will not return anything in this example because everyone has at least one of the documents. Also, if a person does not have any documents then there will be one record in the PeoplesDocs table with the DocID set to NULL. Thank you in advance for any help you can provide, Ben

Read the article
Cleaning a dataset of song data - what sort of problem is this?

- by Rob Lourens

I have a set of data about songs. Each entry is a line of text which includes the artist name, song title, and some extra text. Some entries are only "extra text". My goal is to resolve as many of these as possible to songs on Spotify using their web API. My strategy so far has been to search for the entry via the API - if there are no results, apply a transformation such as "remove all text between ( )" and search again. I have a list of heuristics and I've had reasonable success with this but as the code gets more and more convoluted I keep thinking there must be a more generic and consistent way. I don't know where to look - any suggestions for what to try, topics to study, buzzwords to google?

Read the article
Augmenting your Social Efforts via Data as a Service (DaaS)

- by Mike Stiles

The following is the 3rd in a series of posts on the value of leveraging social data across your enterprise by Oracle VP Product Development Don Springer and Oracle Cloud Data and Insight Service Sr. Director Product Management Niraj Deo. In this post, we will discuss the approach and value of integrating additional “public” data via a cloud-based Data-as-as-Service platform (or DaaS) to augment your Socially Enabled Big Data Analytics and CX Management. Let’s assume you have a functional Social-CRM platform in place. You are now successfully and continuously listening and learning from your customers and key constituents in Social Media, you are identifying relevant posts and following up with direct engagement where warranted (both 1:1, 1:community, 1:all), and you are starting to integrate signals for communication into your appropriate Customer Experience (CX) Management systems as well as insights for analysis in your business intelligence application. What is the next step? Augmenting Social Data with other Public Data for More Advanced Analytics When we say advanced analytics, we are talking about understanding causality and correlation from a wide variety, volume and velocity of data to Key Performance Indicators (KPI) to achieve and optimize business value. And in some cases, to predict future performance to make appropriate course corrections and change the outcome to your advantage while you can. The data to acquire, process and analyze this is very nuanced: It can vary across structured, semi-structured, and unstructured data It can span across content, profile, and communities of profiles data It is increasingly public, curated and user generated The key is not just getting the data, but making it value-added data and using it to help discover the insights to connect to and improve your KPIs. As we spend time working with our larger customers on advanced analytics, we have seen a need arise for more business applications to have the ability to ingest and use “quality” curated, social, transactional reference data and corresponding insights. The challenge for the enterprise has been getting this data inline into an easily accessible system and providing the contextual integration of the underlying data enriched with insights to be exported into the enterprise’s business applications. The following diagram shows the requirements for this next generation data and insights service or (DaaS): Some quick points on these requirements: Public Data, which in this context is about Common Business Entities, such as - Customers, Suppliers, Partners, Competitors (all are organizations) Contacts, Consumers, Employees (all are people) Products, Brands This data can be broadly categorized incrementally as - Base Utility data (address, industry classification) Public Master Reference data (trade style, hierarchy) Social/Web data (News, Feeds, Graph) Transactional Data generated by enterprise process, workflows etc. This Data has traits of high-volume, variety, velocity etc., and the technology needed to efficiently integrate this data for your needs includes - Change management of Public Reference Data across all categories Applied Big Data to extract statics as well as real-time insights Knowledge Diagnostics and Data Mining As you consider how to deploy this solution, many of our customers will be using an online “cloud” service that provides quality data and insights uniformly to all their necessary applications. In addition, they are requesting a service that is: Agile and Easy to Use: Applications integrated with the service can obtain data on-demand, quickly and simply Cost-effective: Pre-integrated into applications so customers don’t have to Has High Data Quality: Single point access to reference data for data quality and linkages to transactional, curated and social data Supports Data Governance: Becomes more manageable and cost-effective since control of data privacy and compliance can be enforced in a centralized place Data-as-a-Service (DaaS) Just as the cloud has transformed and now offers a better path for how an enterprise manages its IT from their infrastructure, platform, and software (IaaS, PaaS, and SaaS), the next step is data (DaaS). Over the last 3 years, we have seen the market begin to offer a cloud-based data service and gain initial traction. On one side of the DaaS continuum, we see an “appliance” type of service that provides a single, reliable source of accurate business data plus social information about accounts, leads, contacts, etc. On the other side of the continuum we see more of an online market “exchange” approach where ISVs and Data Publishers can publish and sell premium datasets within the exchange, with the exchange providing a rich set of web interfaces to improve the ease of data integration. Why the difference? It depends on the provider’s philosophy on how fast the rate of commoditization of certain data types will occur. How do you decide the best approach? Our perspective, as shown in the diagram below, is that the enterprise should develop an elastic schema to support multi-domain applicability. This allows the enterprise to take the most flexible approach to harness the speed and breadth of public data to achieve value. The key tenet of the proposed approach is that an enterprise carefully federates common utility, master reference data end points, mobility considerations and content processing, so that they are pervasively available. One way you may already be familiar with this approach is in how you do Address Verification treatments for accounts, contacts etc. If you design and revise this service in such a way that it is also easily available to social analytic needs, you could extend this to launch geo-location based social use cases (marketing, sales etc.). Our fundamental belief is that value-added data achieved through enrichment with specialized algorithms, as well as applying business “know-how” to weight-factor KPIs based on innovative combinations across an ever-increasing variety, volume and velocity of data, will be where real value is achieved. Essentially, Data-as-a-Service becomes a single entry point for the ever-increasing richness and volume of public data, with enrichment and combined capabilities to extract and integrate the right data from the right sources with the right factoring at the right time for faster decision-making and action within your core business applications. As more data becomes available (and in many cases commoditized), this value-added data processing approach will provide you with ongoing competitive advantage. Let’s look at a quick example of creating a master reference relationship that could be used as an input for a variety of your already existing business applications. In phase 1, a simple master relationship is achieved between a company (e.g. General Motors) and a variety of car brands’ social insights. The reference data allows for easy sort, export and integration into a set of CRM use cases for analytics, sales and marketing CRM. In phase 2, as you create more data relationships (e.g. competitors, contacts, other brands) to have broader and deeper references (social profiles, social meta-data) for more use cases across CRM, HCM, SRM, etc. This is just the tip of the iceberg, as the amount of master reference relationships is constrained only by your imagination and the availability of quality curated data you have to work with. DaaS is just now emerging onto the marketplace as the next step in cloud transformation. For some of you, this may be the first you have heard about it. Let us know if you have questions, or perspectives. In the meantime, we will continue to share insights as we can.Photo: Erik Araujo, stock.xchng

Read the article
How to add a footer to a table in Microsoft Word?

- by dewalla

I have a table that is longer than one page. I have found the option to make the header of the table to be added to the second portion of the table after the page break. Is there a way to do the same thing but with a footer on the table? I want to add a footer so that if my table was 1000 entries long (12 pages), that the first and last row of each page would be consistant; a header and footer for the table. If I edit the rest of the document (above the table) the table will shift up/down and I want to header and footer of the table to remain at the pagge breaks. Any Ideas? PAGE BREAK HEADER OF TABLE TBL TBL TBL TBL TBL TBL TBL TBL TBL TBL TBL TBL FOOTER OF TABLE PAGE BREAK HEADER OF TABLE TBL TBL TBL TBL TBL TBL FOOTER OF TABLE TEXTTEXTETEXT PAGE BREAK

Read the article
Working with data and meta data that are separated on different servers

- by afuzzyllama

While developing a product, I've come across a situation where my group wants to store meta data for data entry forms (questions, layout, etc) in a different database then the database where the collected data is stored. This is mostly for security because we want to be able to have our meta data public facing, while keeping collected data as secure as possible. I was thinking about writing a web service that provides the meta information that the data collection program could access. The only issue I see with this approach is the front end is going to have to match the meta data with the collected data, which would be more efficient as a join on the back end. Currently, this system is slated to run on .NET and MSSQL. I haven't played around with .NET libraries running in SQL, but I'm considering trying to create logic that would pull from the web service, convert the meta data into a table that SQL can join on, and return the combined data and meta data that way. Is this solution the wrong way to approach the problem? Is there a pattern or "industry standard" way of bringing together two datasets that don't live in the same database?

Read the article
The Business case for Big Data

- by jasonw

The Business Case for Big Data Part 1 What's the Big Deal Okay, so a new buzz word is emerging. It's gone beyond just a buzzword now, and I think it is going to change the landscape of retail, financial services, healthcare....everything. Let me spend a moment to talk about what i'm going to talk about. Massive amounts of data are being collected every second, more than ever imaginable, and the size of this data is more than can be practically managed by today’s current strategies and technologies. There is a revolution at hand centering on this groundswell of data and it will change how we execute our businesses through greater efficiencies, new revenue discovery and even enable innovation. It is the revolution of Big Data. This is more than just a new buzzword is being tossed around technology circles.This blog series for Big Data will explain this new wave of technology and provide a roadmap for businesses to take advantage of this growing trend. Cases for Big Data There is a growing list of use cases for big data. We naturally think of Marketing as the low hanging fruit. Many projects look to analyze twitter feeds to find new ways to do marketing. I think of a great example from a TED speech that I recently saw on data visualization from Facebook from my masters studies at University of Virginia. We can see when the most likely time for breaks-ups occurs by looking at status changes and updates on users Walls. This is the intersection of Big Data, Analytics and traditional structured data. Ted Video Marketers can use this to sell more stuff. I really like the following piece on looking at twitter feeds to measure mood. The following company was bought by a hedge fund. They could predict how the S&P was going to do within three days at an 85% accuracy. Link to the article Here we see a convergence of predictive analytics and Big Data. So, we'll look at a lot of these business cases and start talking about what this means for the business. It's more than just finding ways to use Hadoop + NoSql and we'll talk about that too. How do I start in Big Data? That's what is coming next post.

Read the article

Search Results

Search found 74473 results on 2979 pages for 'data table'.

Page 7/2979 | < Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >

- by Chris2021

- by AndyC

- by stefan.thieme(at)oracle.com

- by jamiet

- by Irem Radzik

- by dnlbrky

- by Dinesh Asanka

- by djerry

- by B-Ballerl

- by Macha

- by irem.radzik(at)oracle.com

- by phill

- by Paul White

- by developerit

- by pinaldave

- by Jenna Danko

- by srkirkland

- by Mladen Prajdic

- by Bkins

- by Rob Lourens

- by Mike Stiles

- by dewalla

- by afuzzyllama

- by jasonw

< Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >