Search Results

Search found 40744 results on 1630 pages for 'sql interview questions a'.

Page 102/1630 | < Previous Page | 98 99 100 101 102 103 104 105 106 107 108 109 | Next Page >

T-SQL Tuesday : Reflections on the PASS Summit and our community

- by AaronBertrand

Last week I attended the PASS Summit in Seattle. I blogged from both keynotes ( Keynote #1 and Keynote #2 ), as well as the WIT Luncheon - which SQL Sentry sponsored. I had a fantastic time at the conference, even though these days I attend far fewer sessions that I used to. As a company, we were overwhelmed by the positive energy in the Expo Hall. I really liked the notebook idea, where board members were assigned notebooks to carry around and take ideas from attendees. I took full advantage when...(read more)

Read the article
How SQL Saturday could be better

- by AaronBertrand

I've been to a lot of SQL Saturdays. They are great events to attend - from a community standpoint, from a learning standpoint, and from a speaker growth standpoint. Who could ask for more, right? Great sessions, from passionate speakers willing to both teach and learn, fantastic networking opportunities and lunch. All for free, or at a very low cost - some events need to recover costs and charge $10 for lunch. Still a phenomenal bargain IMHO. But we all know that these events aren't perfect... there...(read more)

Read the article
SQL Saturday #156 : Providence, RI

- by AaronBertrand

Well, East Greenwich, RI. Another successful event, this one put on by John Miner, Brandon Leach, Steve Simon, Scott Abrants and a host of other folks. Several #SQLFamily friends in attendance as well: Grant Fritchey, Mike Walsh, Jack Corbett, Wayne Sheffield and others. I gave a session in the morning and then a session to cap off the day. Thanks to everyone who attended! The downloads are here: T-SQL : Bad Habits & Best Practices The Ins & Outs of Contained Databases...(read more)

Read the article
SQL Server 2008 cluster freezing

- by Ed Leighton-Dick

We have run into a strange situation in which a SQL Server 2008 single-node cluster hangs. As background, we are rebuilding a Windows Server 2003/SQL Server 2005 two-node cluster using Windows 2008 and SQL Server 2008. Here's the timeline: Evicted the passive node (server B) from the Windows 2003/SQL 2005 cluster. The active node now functions as a single-node cluster with no problems. Wiped server B's disks and installed Windows 2008 and SQL Server 2008 as a single-node cluster. Since we do not want to the two clusters to communicate yet, we left the cluster's private network "heartbeat" adapter unconfigured. The cluster comes up and functions normally. Moved all databases to the new cluster. Cluster continues to function normally. Turned off server A (old cluster) in preparation for rebuilding as the second node of the new cluster. SQL Server instance on server B (new cluster) locks up, even though it should have no knowledge of or interaction with server A. Restarted server A. SQL Server instance on server B (new cluster) immediately begins working again. Things we have tried: The new cluster's name responds to ping and NETBIOS requests, even while the SQL Server is hung. We have confirmed that no IP address is assigned to the old heartbeat adapter, and it is not pulling an IP address from DHCP. Disabling the heartbeat's network card has the same effect. No errors were generated in any logs - Windows or SQL. When the error first occurred, it sat in the hung state for quite some time (well over 10 minutes) before anyone figured out what was going on. This would seem to eliminate any sort of normal cluster timeout in which it would have been searching for the other node (even if one had been configured). Server B is running Windows 2008 SP2, fully patched, and SQL Server 2008 SP1 CU7 (10.0.2775).

Read the article
SQL Server Express Failed to Install

- by JasCav

I am attempting to install SQL Server Express (as part of the Visual Studio 2010 Professional installation), but it is failing. I am receiving this error log. [06/22/11,16:31:39] Microsoft Visual Studio 2010 Professional - ENU: [2] UpdateFileFetcherFromMsi: Warning: Missing fwlink entry for cabinet: #SP.cab [06/22/11,16:31:40] setup.exe: [2] Duplicate module ID: {0AFE11CA-57AA-4F66-90BE-284F0F3A5ABD} [06/22/11,16:32:12] setup.exe: [2] Duplicate component in install order: SQL EULAs [06/22/11,16:32:12] setup.exe: [2] Duplicate component in install order: SQL EULAs [06/22/11,16:32:12] setup.exe: [2] Duplicate component in install order: SQL EULAs [06/22/11,17:07:55] Microsoft Visual Studio 2010 Professional - ENU: [2] UpdateFileFetcherFromMsi: Warning: Missing fwlink entry for cabinet: #SP.cab [06/22/11,17:07:55] setup.exe: [2] Duplicate module ID: {0AFE11CA-57AA-4F66-90BE-284F0F3A5ABD} [06/23/11,10:39:33] Microsoft Visual Studio 2010 Professional - ENU: [2] UpdateFileFetcherFromMsi: Warning: Missing fwlink entry for cabinet: #SP.cab [06/23/11,10:39:33] setup.exe: [2] Duplicate module ID: {0AFE11CA-57AA-4F66-90BE-284F0F3A5ABD} [06/23/11,10:40:22] setup.exe: [2] Duplicate component in install order: SQL EULAs [06/23/11,10:40:22] setup.exe: [2] Duplicate component in install order: SQL EULAs [06/23/11,10:40:22] setup.exe: [2] Duplicate component in install order: SQL EULAs [06/23/11,10:53:48] Microsoft Visual Studio 2010 Professional - ENU: [2] UpdateFileFetcherFromMsi: Warning: Missing fwlink entry for cabinet: #SP.cab [06/23/11,10:53:48] setup.exe: [2] Duplicate module ID: {0AFE11CA-57AA-4F66-90BE-284F0F3A5ABD} [06/23/11,13:19:26] Microsoft Visual Studio 2010 Professional - ENU: [2] UpdateFileFetcherFromMsi: Warning: Missing fwlink entry for cabinet: #SP.cab [06/23/11,13:19:26] setup.exe: [2] Duplicate module ID: {0AFE11CA-57AA-4F66-90BE-284F0F3A5ABD} [06/23/11,16:47:36] Microsoft SQL Server 2008 Express Service Pack 1 (x64): [2] Error code -2068643839 for this component is not recognized. [06/23/11,16:47:36] Microsoft SQL Server 2008 Express Service Pack 1 (x64): [2] Component Microsoft SQL Server 2008 Express Service Pack 1 (x64) returned an unexpected value. ***EndOfSession*** I'm reading various articles which point towards something being wrong in the register, but I can't find anything specific. Any suggestions for what I can do to fix this?

Read the article
Consolidate SQL Server Reporting Services

- by Eric C. Singer

I've been a big fan of consolidating as many DB's to a few SQL servers for a while and I've had great success with it. However, I've never had to deal with SQL reporting services. Has anyone migrated SSRS from a bunch of random SQL servers into a consolidated SQL server? I don't exactly know a whole lot about SSRS which is part of the problem. To my knowlege, it's one DB per SSRS instance, so it sounds like i'd need to find a way of exporting data and merging it. Basically the process used to look like: Move DB from SQL Express to shared SQL server Change point in APP to point at new SQL server With reporting services, how do I move the reporting service compenent of the DB as well? I realize I may need to tweak the app, but my question is on the SQL side.

Read the article
MS SQL to MySQL using MySQL Migration Toolkit: permission issue

- by Zeno

I have a MS SQL imported into SQL Server 2008 from a .bak and I set it to Mixed mode. I have a SQL user (called "test") that can correctly access the database using SQL Server. I need to convert this to a MySQL database, so I got the MySQL Migration Toolkit. I pick "MS SQL Server" and then it asks for the hostname/username/password/database. I'm not 100% sure on these, but I used "localhost" (running on same computer), left the port as is (1433) and the username/password ("test") for the SQL Server. And I used the database name for the SQL Server database I'm looking to import. I clicked next, enter my MySQL database details and then attempt to run it and I get this error: Connecting to source database and retrieve schemata names. Initializing JDBC driver ... Driver class MS SQL JDBC Driver Opening connection ... Connection jdbc:jtds:sqlserver://localhost:1433/Orders;user=test;password=blah;charset=utf-8;domain= The list of schema names could not be retrieved (error: 0). ReverseEngineeringMssql.getSchemata :Network error IOException: Connection refused: connect Details: net.sourceforge.jtds.jdbc.ConnectionJDBC2.<init>(ConnectionJDBC2.java:372) net.sourceforge.jtds.jdbc.ConnectionJDBC3.<init>(ConnectionJDBC3.java:50) net.sourceforge.jtds.jdbc.Driver.connect(Driver.java:178) java.sql.DriverManager.getConnection(Unknown Source) java.sql.DriverManager.getConnection(Unknown Source) com.mysql.grt.modules.ReverseEngineeringGeneric.establishConnection(ReverseEngineeringGeneric.java:141) com.mysql.grt.modules.ReverseEngineeringMssql.getSchemata(ReverseEngineeringMssql.java:99) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) java.lang.reflect.Method.invoke(Unknown Source) com.mysql.grt.Grt.callModuleFunction(Unknown Source)

Read the article
Connect over WiFi to SQL Server from another computer

- by Bronzato

I tried to connect over WiFi to SQL Server with SQL Server Management Studio from another computer, but it failed. I have a computer with Windows 7 & SQL Server 2008 (lets say the server computer). Next to it I have a freshly installed computer with Windows 7 & SQL Server Management Studio (let's say the client computer). What I did on the server computer: Configure firewall by enabling port 1433 Enabled network protocols (TCP/IP) inside SQL Server Configuration Manager Checked Allow remote connections to this server in server properties in the SQL Server Management application. Started SQL Server Browser Restarted services (SQL Server Browser is stopped at this point, but I don't think it is necessary. Is it?) Next, I successfully tested a ping on the port 1433 from my client computer with a tool named tcping (ex: tcping 192.168.1.4 1433). But I still cannot connect from my client computer to SQL Server on my server computer. Ok, something new with this problem: Until now, I successfully connected to my "server computer" with Management Studio. What I did is type the computer name in the server name field in the connection window of Management Studio. My previous (failed) attempt was to type the computer name followed by the instance of SQL server (ex: COMPUTER_NAME\SQL2008). I don't know why I only have to type the computer name. Now my new challenge is to be successful in connecting my VB6 application to this remote database located on my "server computer". I have a connection string for this but it failed to connect. Here is my connection string: "Provider=SQLOLEDB.1;Password=mypassword;User ID=sa;Initial Catalog=TPB;Data Source=THIERRY-HP\SQL2008" Any idea what's going wrong?

Read the article
MSSQL Timing out, a couple of questions...

- by user29000

Hi, About once a week, my MSSQL server is timing out, or rather the machine runs out of RAM. This morning it reached 3.9GB of the available 4, with MSSQL taking up 2.5GB. I'm concerned that i've not configured SQL to release memory as it should, so I ran sp_who2 while the timeouts were occuring to see what process were running. If i could post the CSV datafile i would, however, there were 85 processes in total, mostly related to the Full Text service: FT Gatherer - About 35 of these running under the 'sa' account against the master database with status of either sleeping or background, many were dependant on other processes. Is that normal? MySite database - There were only 5 processes for the one active site/database and all were either sleeping or suspended - but their lastBatch dates were set to 1/12/2020. Is that normal? The datbase is only about 20mb in size the traffic levels are very low, so i'm thinking of maybe limiting the amount of RAM SQL has access to (from unlimted to maybe 2GB). Any thoughts / advise would be appreciated. Mny thanks Ben

Read the article
Connect by Wifi to Sql Server from another computer

- by Bronzato

I try to connect by Wifi to Sql Server with Sql Server Management Studio from another computer but it failed. I have a computer with Windows Seven & Sql Server 2008 (lets say the server computer). Next to it, I have a fresh installed computer with Windows Seven & Sql Server Management Studio (let's say the client computer). What I do on the server computer: configure firewall by enabling port 1433 enabled network protocols (TCP/IP) inside Sql Server Configuration Manager checked "Allow remote connections to this server" on server properties in Sql Server Management. started Sql Server Browser restarted services (Sql Server Browser is stopped but I think it is not neccessary, isn't it?) Next, I successfully tested a ping on the port 1433 from my client computer with a tool named tcping (ex: tcping 192.168.1.4 1433). But I still cannot connect from my client computer to Sql Server on my other computer. Ok, something new on this problem: until now, I successfully connected to my "server computer" with Management Studio. What I do is typing the computer name in the server name field in the connection window of Management Studio. My previous (failed) attempt was to type the computer name followed by the instance of sql server (ex: COMPUTER_NAME\SQL2008). I don't know why I only have to type the computer name... Nevermind. Now my new challenge is to succeed connecting my VB6 application to this remote database located on my "computer server". I have a connection string for this but it failed to connect. Here is my connection string: "Provider=SQLOLEDB.1;Password=mypassword;User ID=sa;Initial Catalog=TPB;Data Source=THIERRY-HP\SQL2008" Any idea what's wrong? Thanks

Read the article
Unable to connect to remote MS SQL Server 2008 Express SP3 instance by name

- by Max

I am trying to connect to a remote MS SQL Server 2008 SP3 x86 Instance using it's name. At the first glance all seems to work well (e.g. it is possible to connect to the server locally and succesfully telnet it's port remotely), but there is a thing I can't understand... This line should connect us to the default instance of remote SQL Server: osql -S ServerIP -d MyDatabase /U sa -P MyPassword and it does the trick, however the next one: osql -S ServerIP\MyInstance -d MyDatabase /U sa -P MyPassword ends up with the following error: [SQL Native Client]SQL Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF]. [SQL Native Client]Login timeout expired [SQL Native Client]An error has occurred while establishing a connection to the server. When connecting to SQL Server 2005, this failure may be caused by the fact that under the default settings SQL Server does not allow remote connections. The only instance running on the server is MyInstance, which is (I guess) the default one. Could you please put some time in explaining the issue.

Read the article
Oracle SQL Developer v3.2.1 Now Available

- by thatjeffsmith

Oracle SQL Developer version 3.2.1 is now available. I recommend that everyone now upgrade to this release. It features more than 200 bug fixes, tweaks, and polish applied to the 3.2 edition. The high profile bug fixes submitted by customers and users on our forums are listed in all their glory for your review. I want to highlight a few of the changes though, as I recognize many of you lack the time and/or patience to ‘read the docs.’ That would include me, which is why I enjoy writing these kinds of blog posts. I’m lazy – just like you! No more artificial line breaks between CREATE OR REPLACE and your PL/SQL In versions 3.2 and older, when you pull up your stored procedural objects in our editor, you would see a line break inserted between the CREATE OR REPLACE and then the body of your code. In version 3.2.1, we have removed the line break. 3.1 3.2.1 Trivia Did You Know? The database doesn’t store the ‘CREATE’ or ‘CREATE OR REPLACE’ bit of your PL/SQL code in the database. If we look at the USER_SOURCE view, we can see that the code begins with the object name. So the CREATE OR REPLACE bit is ‘artificial’ The intent is to give you the code necessary to recreate your object – and have it ‘compile’ into the database. We pretty much HAVE to add the ‘CREATE OR REPLACE.’ From now on it will appear inline with the first line of your code. Exporting Tables & Views When exporting data from your tables or views, previous versions of SQL Developer presented a 3 step wizard. It allows you to choose your columns and apply data filters for what is exported. This was kind of redundant. The grids already allowed you to select your columns and apply filters. Wouldn’t it be more intuitive AND efficient to just make the grids behave in a What You See Is What You Get (WYSIWYG) fashion? In version 3.2.1, that is exactly what will happen. The wizard now only has two steps and the grid will export the data and columns as defined in the visible grid. Let the grid properties define what is actually exported! And here is what is pasted into my worksheet: "BREWERY"|"CITY" "3 Brewers Restaurant Micro-Brewery"|"Toronto" "Amsterdam Brewing Co."|"Toronto" "Ball Brewing Company Ltd."|"Toronto" "Big Ram Brewing Company"|"Toronto" "Black Creek Historic Brewery"|"Toronto" "Black Oak Brewing"|"Toronto" "C'est What?"|"Toronto" "Cool Beer Brewing Company"|"Toronto" "Denison's Brewing"|"Toronto" "Duggan's Brewery"|"Toronto" "Feathers"|"Toronto" "Fermentations! - Danforth"|"Toronto" "Fermentations! - Mount Pleasant"|"Toronto" "Granite Brewery & Restaurant"|"Toronto" "Labatt's Breweries of Canada"|"Toronto" "Mill Street Brew Pub"|"Toronto" "Mill Street Brewery"|"Toronto" "Molson Breweries of Canada"|"Toronto" "Molson Brewery at Air Canada Centre"|"Toronto" "Pioneer Brewery Ltd."|"Toronto" "Post-Production Bistro"|"Toronto" "Rotterdam Brewing"|"Toronto" "Steam Whistle Brewing"|"Toronto" "Strand Brasserie"|"Toronto" "Upper Canada Brewing"|"Toronto" JUST what I wanted And One Last Thing Speaking of export, sometimes I want to send data to Excel. And sometimes I want to send multiple objects to Excel – to a single Excel file that is. In version 3.2.1 you can now do that. Let’s export the bulk of the HR schema to Excel, with each table going to it’s own workbook in the same worksheet. Select many tables, put them in in a single Excel worksheet If you try this in previous versions of SQL Developer it will just write the first table to the Excel file. This is one of the bugs we addressed in v3.2.1. Here is what the output Excel file looks like now: Many tables - Many workbooks in an Excel Worksheet I have a sneaky suspicion that this will be a frequently used feature going forward. Excel seems to be the cornerstone of many of our popular features. Imagine that!

Read the article
Installed Service Pack 3 for SQL Server 2005 but it does not show up when selecting @@version

- by Blegger

We installed Service Pack 3 for SQL Server 2005 Standard Edition (64-bit). In the Add or Remove Programs menu we have the following entries under Microsoft SQL Server 2005 (64-bit): Service Pack 3 for SQL Server Integration Services 64-bit) ENU (KB955706) Service Pack 3 for SQL Server Notification Services 2005 (64-bit) ENU (KB955706) Service Pack 3 for SQL Server Database Services 2005 (64-bit) ENU (KB955706) Service Pack 3 for SQL Server Tools and Workstation Components 2005 (64-bit) ENU (KB955706) But when I run the following query on the server: select @@version I get this result: Microsoft SQL Server 2005 - 9.00.4035.00 (X64) Nov 24 2008 16:17:31 Copyright (c) 1988-2005 Microsoft Corporation Standard Edition (64-bit) on Windows NT 5.2 (Build 3790: Service Pack 2) Why does it not display that it is running Service Pack 3?

Read the article
Using SQL Execution Plans to discover the Swedish alphabet

- by Rob Farley

SQL Server is quite remarkable in a bunch of ways. In this post, I’m using the way that the Query Optimizer handles LIKE to keep it SARGable, the Execution Plans that result, Collations, and PowerShell to come up with the Swedish alphabet. SARGability is the ability to seek for items in an index according to a particular set of criteria. If you don’t have SARGability in play, you need to scan the whole index (or table if you don’t have an index). For example, I can find myself in the phonebook easily, because it’s sorted by LastName and I can find Farley in there by moving to the Fs, and so on. I can’t find everyone in my suburb easily, because the phonebook isn’t sorted that way. I can’t even find people who have six letters in their last name, because also the book is sorted by LastName, it’s not sorted by LEN(LastName). This is all stuff I’ve looked at before, including in the talk I gave at SQLBits in October 2010. If I try to find everyone who’s names start with F, I can do that using a query a bit like: SELECT LastName FROM dbo.PhoneBook WHERE LEFT(LastName,1) = 'F'; Unfortunately, the Query Optimizer doesn’t realise that all the entries that satisfy LEFT(LastName,1) = 'F' will be together, and it has to scan the whole table to find them. But if I write: SELECT LastName FROM dbo.PhoneBook WHERE LastName LIKE 'F%'; then SQL is smart enough to understand this, and performs an Index Seek instead. To see why, I look further into the plan, in particular, the properties of the Index Seek operator. The ToolTip shows me what I’m after: You’ll see that it does a Seek to find any entries that are at least F, but not yet G. There’s an extra Predicate in there (a Residual Predicate if you like), which checks that each LastName is really LIKE F% – I suppose it doesn’t consider that the Seek Predicate is quite enough – but most of the benefit is seen by its working out the Seek Predicate, filtering to just the “at least F but not yet G” section of the data. This got me curious though, particularly about where the G comes from, and whether I could leverage it to create the Swedish alphabet. I know that in the Swedish language, there are three extra letters that appear at the end of the alphabet. One of them is ä that appears in the word Västerås. It turns out that Västerås is quite hard to find in an index when you’re looking it up in a Swedish map. I talked about this briefly in my five-minute talk on Collation from SQLPASS (the one which was slightly less than serious). So by looking at the plan, I can work out what the next letter is in the alphabet of the collation used by the column. In other words, if my alphabet were Swedish, I’d be able to tell what the next letter after F is – just in case it’s not G. It turns out it is… Yes, the Swedish letter after F is G. But I worked this out by using a copy of my PhoneBook table that used the Finnish_Swedish_CI_AI collation. I couldn’t find how the Query Optimizer calculates the G, and my friend Paul White (@SQL_Kiwi) tells me that it’s frustratingly internal to the QO. He’s particularly smart, even if he is from New Zealand. To investigate further, I decided to do some PowerShell, leveraging the Get-SqlPlan function that I blogged about recently (make sure you also have the SqlServerCmdletSnapin100 snap-in added). I started by indicating that I was going to use Finnish_Swedish_CI_AI as my collation of choice, and that I’d start whichever letter cam straight after the number 9. I figure that this is a cheat’s way of guessing the first letter of the alphabet (but it doesn’t actually work in Unicode – luckily I’m using varchar not nvarchar. Actually, there are a few aspects of this code that only work using ASCII, so apologies if you were wanting to apply it to Greek, Japanese, etc). I also initialised my $alphabet variable. $collation = 'Finnish_Swedish_CI_AI'; $firstletter = '9'; $alphabet = ''; Now I created the table for my test. A single field would do, and putting a Clustered Index on it would suffice for the Seeks. Invoke-Sqlcmd -server . -data tempdb -query "create table dbo.collation_test (col varchar(10) collate $collation primary key);" Now I get into the looping. $c = $firstletter; $stillgoing = $true; while ($stillgoing) { I construct the query I want, seeking for entries which start with whatever $c has reached, and get the plan for it: $query = "select col from dbo.collation_test where col like '$($c)%';"; [xml] $pl = get-sqlplan $query "." "tempdb"; At this point, my $pl variable is a scary piece of XML, representing the execution plan. A bit of hunting through it showed me that the EndRange element contained what I was after, and that if it contained NULL, then I was done. $stillgoing = ($pl.ShowPlanXML.BatchSequence.Batch.Statements.StmtSimple.QueryPlan.RelOp.IndexScan.SeekPredicates.SeekPredicateNew.SeekKeys.EndRange -ne $null); Now I could grab the value out of it (which came with apostrophes that needed stripping), and append that to my $alphabet variable. if ($stillgoing) { $c=$pl.ShowPlanXML.BatchSequence.Batch.Statements.StmtSimple.QueryPlan.RelOp.IndexScan.SeekPredicates.SeekPredicateNew.SeekKeys.EndRange.RangeExpressions.ScalarOperator.ScalarString.Replace("'",""); $alphabet += $c; } Finally, finishing the loop, dropping the table, and showing my alphabet! } Invoke-Sqlcmd -server . -data tempdb -query "drop table dbo.collation_test;"; $alphabet; When I run all this, I see that the Swedish alphabet is ABCDEFGHIJKLMNOPQRSTUVXYZÅÄÖ, which matches what I see at Wikipedia. Interesting to see that the letters on the end are still there, even with Case Insensitivity. Turns out they’re not just “letters with accents”, they’re letters in their own right. I’m sure you gave up reading long ago, and really aren’t that fazed about the idea of doing this using PowerShell. I chose PowerShell because I’d already come up with an easy way of grabbing the estimated plan for a query, and PowerShell does allow for easy navigation of XML. I find the most interesting aspect of this as the fact that the Query Optimizer uses the next letter of the alphabet to maintain the SARGability of LIKE. I’m hoping they do something similar for a whole bunch of operations. Oh, and the fact that you know how to find stuff in the IKEA catalogue. Footnote: If you are interested in whether this works in other languages, you might want to consider the following screenshot, which shows that in principle, it should work with Japanese. It might be a bit harder to run this in PowerShell though, as I’m not sure how it translates. In Hiragana, the Japanese alphabet starts ?, ?, ?, ?, ?, ...

Read the article
Fraud Detection with the SQL Server Suite Part 2

- by Dejan Sarka

This is the second part of the fraud detection whitepaper. You can find the first part in my previous blog post about this topic. My Approach to Data Mining Projects It is impossible to evaluate the time and money needed for a complete fraud detection infrastructure in advance. Personally, I do not know the customer’s data in advance. I don’t know whether there is already an existing infrastructure, like a data warehouse, in place, or whether we would need to build one from scratch. Therefore, I always suggest to start with a proof-of-concept (POC) project. A POC takes something between 5 and 10 working days, and involves personnel from the customer’s site – either employees or outsourced consultants. The team should include a subject matter expert (SME) and at least one information technology (IT) expert. The SME must be familiar with both the domain in question as well as the meaning of data at hand, while the IT expert should be familiar with the structure of data, how to access it, and have some programming (preferably Transact-SQL) knowledge. With more than one IT expert the most time consuming work, namely data preparation and overview, can be completed sooner. I assume that the relevant data is already extracted and available at the very beginning of the POC project. If a customer wants to have their people involved in the project directly and requests the transfer of knowledge, the project begins with training. I strongly advise this approach as it offers the establishment of a common background for all people involved, the understanding of how the algorithms work and the understanding of how the results should be interpreted, a way of becoming familiar with the SQL Server suite, and more. Once the data has been extracted, the customer’s SME (i.e. the analyst), and the IT expert assigned to the project will learn how to prepare the data in an efficient manner. Together with me, knowledge and expertise allow us to focus immediately on the most interesting attributes and identify any additional, calculated, ones soon after. By employing our programming knowledge, we can, for example, prepare tens of derived variables, detect outliers, identify the relationships between pairs of input variables, and more, in only two or three days, depending on the quantity and the quality of input data. I favor the customer’s decision of assigning additional personnel to the project. For example, I actually prefer to work with two teams simultaneously. I demonstrate and explain the subject matter by applying techniques directly on the data managed by each team, and then both teams continue to work on the data overview and data preparation under our supervision. I explain to the teams what kind of results we expect, the reasons why they are needed, and how to achieve them. Afterwards we review and explain the results, and continue with new instructions, until we resolve all known problems. Simultaneously with the data preparation the data overview is performed. The logic behind this task is the same – again I show to the teams involved the expected results, how to achieve them and what they mean. This is also done in multiple cycles as is the case with data preparation, because, quite frankly, both tasks are completely interleaved. A specific objective of the data overview is of principal importance – it is represented by a simple star schema and a simple OLAP cube that will first of all simplify data discovery and interpretation of the results, and will also prove useful in the following tasks. The presence of the customer’s SME is the key to resolving possible issues with the actual meaning of the data. We can always replace the IT part of the team with another database developer; however, we cannot conduct this kind of a project without the customer’s SME. After the data preparation and when the data overview is available, we begin the scientific part of the project. I assist the team in developing a variety of models, and in interpreting the results. The results are presented graphically, in an intuitive way. While it is possible to interpret the results on the fly, a much more appropriate alternative is possible if the initial training was also performed, because it allows the customer’s personnel to interpret the results by themselves, with only some guidance from me. The models are evaluated immediately by using several different techniques. One of the techniques includes evaluation over time, where we use an OLAP cube. After evaluating the models, we select the most appropriate model to be deployed for a production test; this allows the team to understand the deployment process. There are many possibilities of deploying data mining models into production; at the POC stage, we select the one that can be completed quickly. Typically, this means that we add the mining model as an additional dimension to an existing DW or OLAP cube, or to the OLAP cube developed during the data overview phase. Finally, we spend some time presenting the results of the POC project to the stakeholders and managers. Even from a POC, the customer will receive lots of benefits, all at the sole risk of spending money and time for a single 5 to 10 day project: The customer learns the basic patterns of frauds and fraud detection The customer learns how to do the entire cycle with their own people, only relying on me for the most complex problems The customer’s analysts learn how to perform much more in-depth analyses than they ever thought possible The customer’s IT experts learn how to perform data extraction and preparation much more efficiently than they did before All of the attendees of this training learn how to use their own creativity to implement further improvements of the process and procedures, even after the solution has been deployed to production The POC output for a smaller company or for a subsidiary of a larger company can actually be considered a finished, production-ready solution It is possible to utilize the results of the POC project at subsidiary level, as a finished POC project for the entire enterprise Typically, the project results in several important “side effects” Improved data quality Improved employee job satisfaction, as they are able to proactively contribute to the central knowledge about fraud patterns in the organization Because eventually more minds get to be involved in the enterprise, the company should expect more and better fraud detection patterns After the POC project is completed as described above, the actual project would not need months of engagement from my side. This is possible due to our preference to transfer the knowledge onto the customer’s employees: typically, the customer will use the results of the POC project for some time, and only engage me again to complete the project, or to ask for additional expertise if the complexity of the problem increases significantly. I usually expect to perform the following tasks: Establish the final infrastructure to measure the efficiency of the deployed models Deploy the models in additional scenarios Through reports By including Data Mining Extensions (DMX) queries in OLTP applications to support real-time early warnings Include data mining models as dimensions in OLAP cubes, if this was not done already during the POC project Create smart ETL applications that divert suspicious data for immediate or later inspection I would also offer to investigate how the outcome could be transferred automatically to the central system; for instance, if the POC project was performed in a subsidiary whereas a central system is available as well Of course, for the actual project, I would repeat the data and model preparation as needed It is virtually impossible to tell in advance how much time the deployment would take, before we decide together with customer what exactly the deployment process should cover. Without considering the deployment part, and with the POC project conducted as suggested above (including the transfer of knowledge), the actual project should still only take additional 5 to 10 days. The approximate timeline for the POC project is, as follows: 1-2 days of training 2-3 days for data preparation and data overview 2 days for creating and evaluating the models 1 day for initial preparation of the continuous learning infrastructure 1 day for presentation of the results and discussion of further actions Quite frequently I receive the following question: are we going to find the best possible model during the POC project, or during the actual project? My answer is always quite simple: I do not know. Maybe, if we would spend just one hour more for data preparation, or create just one more model, we could get better patterns and predictions. However, we simply must stop somewhere, and the best possible way to do this, according to my experience, is to restrict the time spent on the project in advance, after an agreement with the customer. You must also never forget that, because we build the complete learning infrastructure and transfer the knowledge, the customer will be capable of doing further investigations independently and improve the models and predictions over time without the need for a constant engagement with me.

Read the article
Querying the SSIS Catalog? Here’s a handy query!

- by jamiet

I’ve been working on a SQL Server Integration Services (SSIS) solution for about 6 months now and I’ve learnt many many things that I intend to share on this blog just as soon as I get the time. Here’s a very short starter-for-ten… I’ve found the following query to be utterly invaluable when interrogating the SSIS Catalog to discover what is going on in my executions: SELECT event_message_id,MESSAGE,package_name,event_name,message_source_name,package_path,execution_path,message_type,message_source_typeFROM ( SELECT em.* FROM SSISDB.catalog.event_messages em WHERE em.operation_id = (SELECT MAX(execution_id) FROM SSISDB.catalog.executions) AND event_name NOT LIKE '%Validate%' )q/* Put in whatever WHERE predicates you might like*/--WHERE event_name = 'OnError'--WHERE package_name = 'Package.dtsx'--WHERE execution_path LIKE '%<some executable>%'ORDER BY message_time DESC Know it. Learn it. Love it. @jamiet

Read the article
T-SQL Tuesday #33: Trick Shots: Undocumented, Underdocumented, and Unknown Conspiracies!

- by Most Valuable Yak (Rob Volk)

Mike Fal (b | t) is hosting this month's T-SQL Tuesday on Trick Shots. I love this choice because I've been preoccupied with sneaky/tricky/evil SQL Server stuff for a long time and have been presenting on it for the past year. Mike's directives were "Show us a cool trick or process you developed…It doesn’t have to be useful", which most of my blogging definitely fits, and "Tell us what you learned from this trick…tell us how it gave you insight in to how SQL Server works", which is definitely a new concept. I've done a lot of reading and watching on SQL Server Internals and even attended training, but sometimes I need to go explore on my own, using my own tools and techniques. It's an itch I get every few months, and, well, it sure beats workin'. I've found some people to be intimidated by SQL Server's internals, and I'll admit there are A LOT of internals to keep track of, but there are tons of excellent resources that clearly document most of them, and show how knowing even the basics of internals can dramatically improve your database's performance. It may seem like rocket science, or even brain surgery, but you don't have to be a genius to understand it. Although being an "evil genius" can help you learn some things they haven't told you about. ;) This blog post isn't a traditional "deep dive" into internals, it's more of an approach to find out how a program works. It utilizes an extremely handy tool from an even more extremely handy suite of tools, Sysinternals. I'm not the only one who finds Sysinternals useful for SQL Server: Argenis Fernandez (b | t), Microsoft employee and former T-SQL Tuesday host, has an excellent presentation on how to troubleshoot SQL Server using Sysinternals, and I highly recommend it. Argenis didn't cover the Strings.exe utility, but I'll be using it to "hack" the SQL Server executable (DLL and EXE) files. Please note that I'm not promoting software piracy or applying these techniques to attack SQL Server via internal knowledge. This is strictly educational and doesn't reveal any proprietary Microsoft information. And since Argenis works for Microsoft and demonstrated Sysinternals with SQL Server, I'll just let him take the blame for it. :P (The truth is I've used Strings.exe on SQL Server before I ever met Argenis.) Once you download and install Strings.exe you can run it from the command line. For our purposes we'll want to run this in the Binn folder of your SQL Server instance (I'm referencing SQL Server 2012 RTM): cd "C:\Program Files\Microsoft SQL Server\MSSQL11\MSSQL\Binn" C:\Program Files\Microsoft SQL Server\MSSQL11\MSSQL\Binn> strings *sql*.dll > sqldll.txt C:\Program Files\Microsoft SQL Server\MSSQL11\MSSQL\Binn> strings *sql*.exe > sqlexe.txt I've limited myself to DLLs and EXEs that have "sql" in their names. There are quite a few more but I haven't examined them in any detail. (Homework assignment for you!) If you run this yourself you'll get 2 text files, one with all the extracted strings from every SQL DLL file, and the other with the SQL EXE strings. You can open these in Notepad, but you're better off using Notepad++, EditPad, Emacs, Vim or another more powerful text editor, as these will be several megabytes in size. And when you do open it…you'll find…a TON of gibberish. (If you think that's bad, just try opening the raw DLL or EXE file in Notepad. And by the way, don't do this in production, or even on a running instance of SQL Server.) Even if you don't clean up the file, you can still use your editor's search function to find a keyword like "SELECT" or some other item you expect to be there. As dumb as this sounds, I sometimes spend my lunch break just scanning the raw text for anything interesting. I'm boring like that. Sometimes though, having these files available can lead to some incredible learning experiences. For me the most recent time was after reading Joe Sack's post on non-parallel plan reasons. He mentions a new SQL Server 2012 execution plan element called NonParallelPlanReason, and demonstrates a query that generates "MaxDOPSetToOne". Joe (formerly on the Microsoft SQL Server product team, so he knows this stuff) mentioned that this new element was not currently documented and tried a few more examples to see what other reasons could be generated. Since I'd already run Strings.exe on the SQL Server DLLs and EXE files, it was easy to run grep/find/findstr for MaxDOPSetToOne on those extracts. Once I found which files it belonged to (sqlmin.dll) I opened the text to see if the other reasons were listed. As you can see in my comment on Joe's blog, there were about 20 additional non-parallel reasons. And while it's not "documentation" of this underdocumented feature, the names are pretty self-explanatory about what can prevent parallel processing. I especially like the ones about cursors – more ammo! - and am curious about the PDW compilation and Cloud DB replication reasons. One reason completely stumped me: NoParallelHekatonPlan. What the heck is a hekaton? Google and Wikipedia were vague, and the top results were not in English. I found one reference to Greek, stating "hekaton" can be translated as "hundredfold"; with a little more Wikipedia-ing this leads to hecto, the prefix for "one hundred" as a unit of measure. I'm not sure why Microsoft chose hekaton for such a plan name, but having already learned some Greek I figured I might as well dig some more in the DLL text for hekaton. Here's what I found: hekaton_slow_param_passing Occurs when a Hekaton procedure call dispatch goes to slow parameter passing code path The reason why Hekaton parameter passing code took the slow code path hekaton_slow_param_pass_reason sp_deploy_hekaton_database sp_undeploy_hekaton_database sp_drop_hekaton_database sp_checkpoint_hekaton_database sp_restore_hekaton_database e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\hkproc.cpp e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\matgen.cpp e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\matquery.cpp e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\sqlmeta.cpp e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\resultset.cpp Interesting! The first 4 entries (in red) mention parameters and "slow code". Could this be the foundation of the mythical DBCC RUNFASTER command? Have I been passing my parameters the slow way all this time? And what about those sp_xxxx_hekaton_database procedures (in blue)? Could THEY be the secret to a faster SQL Server? Could they promise a "hundredfold" improvement in performance? Are these special, super-undocumented DIB (databases in black)? I decided to look in the SQL Server system views for any objects with hekaton in the name, or references to them, in hopes of discovering some new code that would answer all my questions: SELECT name FROM sys.all_objects WHERE name LIKE '%hekaton%' SELECT name FROM sys.all_objects WHERE object_definition(OBJECT_ID) LIKE '%hekaton%' Which revealed: name ------------------------ (0 row(s) affected) name ------------------------ sp_createstats sp_recompile sp_updatestats (3 row(s) affected) Hmm. Well that didn't find much. Looks like these procedures are seriously undocumented, unknown, perhaps forbidden knowledge. Maybe a part of some unspeakable evil? (No, I'm not paranoid, I just like mysteries and thought that punching this up with that kind of thing might keep you reading. I know I'd fall asleep without it.) OK, so let's check out those 3 procedures and see what they reveal when I search for "Hekaton": sp_createstats: -- filter out local temp tables, Hekaton tables, and tables for which current user has no permissions -- Note that OBJECTPROPERTY returns NULL on type="IT" tables, thus we only call it on type='U' tables OK, that's interesting, let's go looking down a little further: ((@table_type<>'U') or (0 = OBJECTPROPERTY(@table_id, 'TableIsInMemory'))) and -- Hekaton table Wellllll, that tells us a few new things: There's such a thing as Hekaton tables (UPDATE: I'm not the only one to have found them!) They are not standard user tables and probably not in memory UPDATE: I misinterpreted this because I didn't read all the code when I wrote this blog post. The OBJECTPROPERTY function has an undocumented TableIsInMemory option Let's check out sp_recompile: -- (3) Must not be a Hekaton procedure. And once again go a little further: if (ObjectProperty(@objid, 'IsExecuted') <> 0 AND ObjectProperty(@objid, 'IsInlineFunction') = 0 AND ObjectProperty(@objid, 'IsView') = 0 AND -- Hekaton procedure cannot be recompiled -- Make them go through schema version bumping branch, which will fail ObjectProperty(@objid, 'ExecIsCompiledProc') = 0) And now we learn that hekaton procedures also exist, they can't be recompiled, there's a "schema version bumping branch" somewhere, and OBJECTPROPERTY has another undocumented option, ExecIsCompiledProc. (If you experiment with this you'll find this option returns null, I think it only works when called from a system object.) This is neat! Sadly sp_updatestats doesn't reveal anything new, the comments about hekaton are the same as sp_createstats. But we've ALSO discovered undocumented features for the OBJECTPROPERTY function, which we can now search for: SELECT name, object_definition(OBJECT_ID) FROM sys.all_objects WHERE object_definition(OBJECT_ID) LIKE '%OBJECTPROPERTY(%' I'll leave that to you as more homework. I should add that searching the system procedures was recommended long ago by the late, great Ken Henderson, in his Guru's Guide books, as a great way to find undocumented features. That seems to be really good advice! Now if you're a programmer/hacker, you've probably been drooling over the last 5 entries for hekaton (in green), because these are the names of source code files for SQL Server! Does this mean we can access the source code for SQL Server? As The Oracle suggested to Neo, can we return to The Source??? Actually, no. Well, maybe a little bit. While you won't get the actual source code from the compiled DLL and EXE files, you'll get references to source files, debugging symbols, variables and module names, error messages, and even the startup flags for SQL Server. And if you search for "DBCC" or "CHECKDB" you'll find a really nice section listing all the DBCC commands, including the undocumented ones. Granted those are pretty easy to find online, but you may be surprised what those web sites DIDN'T tell you! (And neither will I, go look for yourself!) And as we saw earlier, you'll also find execution plan elements, query processing rules, and who knows what else. It's also instructive to see how Microsoft organizes their source directories, how various components (storage engine, query processor, Full Text, AlwaysOn/HADR) are split into smaller modules. There are over 2000 source file references, go do some exploring! So what did we learn? We can pull strings out of executable files, search them for known items, browse them for unknown items, and use the results to examine internal code to learn even more things about SQL Server. We've even learned how to use command-line utilities! We are now 1337 h4X0rz! (Not really. I hate that leetspeak crap.) Although, I must confess I might've gone too far with the "conspiracy" part of this post. I apologize for that, it's just my overactive imagination. There's really no hidden agenda or conspiracy regarding SQL Server internals. It's not The Matrix. It's not like you'd find anything like that in there: Attach Matrix Database DM_MATRIX_COMM_PIPELINES MATRIXXACTPARTICIPANTS dm_matrix_agents Alright, enough of this paranoid ranting! Microsoft are not really evil! It's not like they're The Borg from Star Trek: ALTER FEDERATION DROP ALTER FEDERATION SPLIT DROP FEDERATION #tsql2sday

Read the article
Using transactions with LINQ-to-SQL

- by Jalpesh P. Vadgama

Today one of my colleague asked that how we can use transactions with the LINQ-to-SQL Classes when we use more then one entities updated at same time. It was a good question. Here is my answer for that.For ASP.NET 2.0 or higher version have a new class called TransactionScope which can be used to manage transaction with the LINQ. Let’s take a simple scenario we are having a shopping cart application in which we are storing details or particular order placed into the database using LINQ-to-SQL. There are two tables Order and OrderDetails which will have all the information related to order. Order will store particular information about orders while OrderDetails table will have product and quantity of product for particular order.We need to insert data in both tables as same time and if any errors comes then it should rollback the transaction. To use TransactionScope in above scenario first we have add a reference to System.Transactions like below. After adding the transaction we need to drag and drop the Order and Order Details tables into Linq-To-SQL Classes it will create entities for that. Below is the code for transaction scope to use mange transaction with Linq Context. MyContextDataContext objContext = new MyContextDataContext(); using (System.Transactions.TransactionScope tScope = new System.Transactions.TransactionScope(TransactionScopeOption.Required)) { objContext.Order.InsertOnSubmit(Order); objContext.OrderDetails.InsertOnSumbit(OrderDetails); objContext.SubmitChanges(); tScope.Complete(); } Here it will commit transaction only if using blocks will run successfully. Hope this will help you. Technorati Tags: Linq,Transaction,System.Transactions,ASP.NET

Read the article
Compare those hard-to-reach servers with SQL Snapper

- by Michelle Taylor

If you’ve got an environment which is at the end of an unreliable or slow network connection, or isn’t connected to your network at all, and you want to do a deployment to that environment – then pointing SQL Compare at it directly is difficult or impossible. While you could run SQL Compare locally on that environment, if it’s a server – especially if it’s a locked-down server – you probably don’t want to go through the hassle of using another activation on it. Or possibly you’re not allowed to install software at all, because you don’t have admin rights – but you can run user-mode software. SQL Snapper is a standalone, licensing-free program which takes SQL Compare snapshots of a database. It can create a snapshot within the context of that environment which can then be moved to your working environment to run SQL Compare against, allowing you to create a deployment script for environments you can’t get SQL Compare into. Where can I find it? You can find RedGate.SQLSnapper.exe in your SQL Compare installation directory – if you haven’t changed it, that will be something like C:\Program Files (x86)\Red Gate\SQL Compare 10 (or 11 if you’re using our SQL Server 2014 support beta). As well as copying the executable, you’ll also currently need to copy the System.Threading.dll and RedGate.SOCCompareInterface.dll files from the same directory alongside it. How do I use it? SQL Snapper’s UI is just a cut-down version of the snapshot creation UI in SQL Compare – just fill in the boxes and create your snapshot, then bring it back to the place you use SQL Compare to compare against your difficult-to-reach environment. SQL Snapper also has a command-line mode if you can’t run the UI in your target environment – just specify the server, database and output location with the /server, /database and /mksnap arguments, and optionally the username and password if you’re using SQL security, e.g.: RedGate.SQLSnapper.exe /database:yourdatabase /server:yourservername /username:youruser /password:yourpassword /mksnap:filename.snp What’s the catch? There are a few limitations of SQL Snapper in its current form – notably, it can’t read encrypted objects, and you’ll also currently need to copy the System.Threading.dll and RedGate.SOCCompareInterface.dll files alongside it, which we recognise is a little awkward in some environments. If you use SQL Snapper and want to share your experiences, or help us work on improving the experience in future, please comment here or leave a request on the SQL Compare UserVoice at https://redgate.uservoice.com/forums/141379-sql-compare.

Read the article
SQL Server 2005 to 2008 upgrade - are MDF files binary compatible?

- by james

I have 50 databases on a MS SQL Server 2005 system and want to upgrade to MS SQL Server 2008. This is what I tried on some test machines: 1. copied the \DATA directory from the source (MSSQL 2005) to exactly the same path on the target (MSSQL 2008) server. 2. edited the startup parameters on the MSSQL 2008 service to point to the path of the MSSQL 2005 master database. 3. restarted MSSQL service It worked and I can access all databases, tables and data. My questions are: I go back to SQL Server 4.2 and it has never been this easy. I know it worked, but should have it worked? Am I missing something, or is there going to be a gotcha next week? These are simple databases, with just tables, views and indexes. No cross database links, no triggers etc

Read the article
Investigation: Can different combinations of components effect Dataflow performance?

- by jamiet

Introduction The Dataflow task is one of the core components (if not the core component) of SQL Server Integration Services (SSIS) and often the most misunderstood. This is not surprising, its an incredibly complicated beast and we’re abstracted away from that complexity via some boxes that go yellow red or green and that have some lines drawn between them. Example dataflow In this blog post I intend to look under that facade and get into some of the nuts and bolts of the Dataflow Task by investigating how the decisions we make when building our packages can affect performance. I will do this by comparing the performance of three dataflows that all have the same input, all produce the same output, but which all operate slightly differently by way of having different transformation components. I also want to use this blog post to challenge a common held opinion that I see perpetuated over and over again on the SSIS forum. That is, that people assume adding components to a dataflow will be detrimental to overall performance. Its not surprising that people think this –it is intuitive to think that more components means more work- however this is not a view that I share. I have always been of the opinion that there are many factors affecting dataflow duration and the number of components is actually one of the less important ones; having said that I have never proven that assertion and that is one reason for this investigation. I have actually seen evidence that some people think dataflow duration is simply a function of number of rows and number of components. I’ll happily call that one out as a myth even without any investigation! The Setup I have a 2GB datafile which is a list of 4731904 (~4.7million) customer records with various attributes against them and it contains 2 columns that I am going to use for categorisation: [YearlyIncome] [BirthDate] The data file is a SSIS raw format file which I chose to use because it is the quickest way of getting data into a dataflow and given that I am testing the transformations, not the source or destination adapters, I want to minimise external influences as much as possible. In the test I will split the customers according to month of birth (12 of those) and whether or not their yearly income is above or below 50000 (2 of those); in other words I will be splitting them into 24 discrete categories and in order to do it I shall be using different combinations of SSIS’ Conditional Split and Derived Column transformation components. The 24 datapaths that occur will each input to a rowcount component, again because this is the least resource intensive means of terminating a datapath. The test is being carried out on a Dell XPS Studio laptop with a quad core (8 logical Procs) Intel Core i7 at 1.73GHz and Samsung SSD hard drive. Its running SQL Server 2008 R2 on Windows 7. The Variables Here are the three combinations of components that I am going to test: One Conditional Split - A single Conditional Split component CSPL Split by Month of Birth and income category that will use expressions on [YearlyIncome] & [BirthDate] to send each row to one of 24 outputs. This next screenshot displays the expression logic in use: Derived Column & Conditional Split - A Derived Column component DER Income Category that adds a new column [IncomeCategory] which will contain one of two possible text values {“LessThan50000”,”GreaterThan50000”} and uses [YearlyIncome] to determine which value each row should get. A Conditional Split component CSPL Split by Month of Birth and Income Category then uses that new column in conjunction with [BirthDate] to determine which of the same 24 outputs to send each row to. Put more simply, I am separating the Conditional Split of #1 into a Derived Column and a Conditional Split. The next screenshots display the expression logic in use: DER Income Category CSPL Split by Month of Birth and Income Category Three Conditional Splits - A Conditional Split component that produces two outputs based on [YearlyIncome], one for each Income Category. Each of those outputs will go to a further Conditional Split that splits the input into 12 outputs, one for each month of birth (identical logic in each). In this case then I am separating the single Conditional Split of #1 into three Conditional Split components. The next screenshots display the expression logic in use: CSPL Split by Income Category CSPL Split by Month of Birth 1& 2 Each of these combinations will provide an input to one of the 24 rowcount components, just the same as before. For illustration here is a screenshot of the dataflow containing three Conditional Split components: As you can these dataflows have a fair bit of work to do and remember that they’re doing that work for 4.7million rows. I will execute each dataflow 10 times and use the average for comparison. I foresee three possible outcomes: The dataflow containing just one Conditional Split (i.e. #1) will be quicker There is no significant difference between any of them One of the two dataflows containing multiple transformation components will be quicker Regardless of which of those outcomes come to pass we will have learnt something and that makes this an interesting test to carry out. Note that I will be executing the dataflows using dtexec.exe rather than hitting F5 within BIDS. The Results and Analysis The table below shows all of the executions, 10 for each dataflow. It also shows the average for each along with a standard deviation. All durations are in seconds. I’m pasting a screenshot because I frankly can’t be bothered with the faffing about needed to make a presentable HTML table. It is plain to see from the average that the dataflow containing three conditional splits is significantly faster, the other two taking 43% and 52% longer respectively. This seems strange though, right? Why does the dataflow containing the most components outperform the other two by such a big margin? The answer is actually quite logical when you put some thought into it and I’ll explain that below. Before progressing, a side note. The standard deviation for the “Three Conditional Splits” dataflow is orders of magnitude smaller – indicating that performance for this dataflow can be predicted with much greater confidence too. The Explanation I refer you to the screenshot above that shows how CSPL Split by Month of Birth and salary category in the first dataflow is setup. Observe that there is a case for each combination of Month Of Date and Income Category – 24 in total. These expressions get evaluated in the order that they appear and hence if we assume that Month of Date and Income Category are uniformly distributed in the dataset we can deduce that the expected number of expression evaluations for each row is 12.5 i.e. 1 (the minimum) + 24 (the maximum) divided by 2 = 12.5. Now take a look at the screenshots for the second dataflow. We are doing one expression evaluation in DER Income Category and we have the same 24 cases in CSPL Split by Month of Birth and Income Category as we had before, only the expression differs slightly. In this case then we have 1 + 12.5 = 13.5 expected evaluations for each row – that would account for the slightly longer average execution time for this dataflow. Now onto the third dataflow, the quick one. CSPL Split by Income Category does a maximum of 2 expression evaluations thus the expected number of evaluations per row is 1.5. CSPL Split by Month of Birth 1 & CSPL Split by Month of Birth 2 both have less work to do than the previous Conditional Split components because they only have 12 cases to test for thus the expected number of expression evaluations is 6.5 There are two of them so total expected number of expression evaluations for this dataflow is 6.5 + 6.5 + 1.5 = 14.5. 14.5 is still more than 12.5 & 13.5 though so why is the third dataflow so much quicker? Simple, the conditional expressions in the first two dataflows have two boolean predicates to evaluate – one for Income Category and one for Month of Birth; the expressions in the Conditional Split in the third dataflow however only have one predicate thus they are doing a lot less work. To sum up, the difference in execution times can be attributed to the difference between: MONTH(BirthDate) == 1 && YearlyIncome <= 50000 and MONTH(BirthDate) == 1 In the first two dataflows YearlyIncome <= 50000 gets evaluated an average of 12.5 times for every row whereas in the third dataflow it is evaluated once and once only. Multiply those 11.5 extra operations by 4.7million rows and you get a significant amount of extra CPU cycles – that’s where our duration difference comes from. The Wrap-up The obvious point here is that adding new components to a dataflow isn’t necessarily going to make it go any slower, moreover you may be able to achieve significant improvements by splitting logic over multiple components rather than one. Performance tuning is all about reducing the amount of work that needs to be done and that doesn’t necessarily mean use less components, indeed sometimes you may be able to reduce workload in ways that aren’t immediately obvious as I think I have proven here. Of course there are many variables in play here and your mileage will most definitely vary. I encourage you to download the package and see if you get similar results – let me know in the comments. The package contains all three dataflows plus a fourth dataflow that will create the 2GB raw file for you (you will also need the [AdventureWorksDW2008] sample database from which to source the data); simply disable all dataflows except the one you want to test before executing the package and remember, execute using dtexec, not within BIDS. If you want to explore dataflow performance tuning in more detail then here are some links you might want to check out: Inequality joins, Asynchronous transformations and Lookups Destination Adapter Comparison Don’t turn the dataflow into a cursor SSIS Dataflow – Designing for performance (webinar) Any comments? Let me know! @Jamiet

Read the article
Have SSIS' differing type systems ever caused you problems?

- by jamiet

One thing that has always infuriated me about SSIS is the fact that every package has three different type systems; to give you an idea of what I am talking about consider the following: The SSIS dataflow's type system is made up of types called DT_* (e.g. DT_STR, DT_I4) The SSIS variable type system is based on .Net datatypes (e.g. String, Int32) The types available for Execute SQL Task's parameters are based on something else - I don't exactly know what (e.g. VARCHAR, LONG) Speaking euphemistically ... this is not an optimum situation (were I not speaking euphemistically I would be a lot ruder) and hence I have submitted a suggestion to Connect at [SSIS] Consolidate three type systems into one requesting that it be remedied. This accompanying blog post is not however a request for votes (though that would be nice); the reason is actually subtler than that. Let me explain. I have been submitting bugs and suggestions pertaining to SSIS for years and have, so far, submitted over 200 Connect items. If that experience has taught me anything it is this - Connect items are not generally actioned because they are considered "nice to have". No, SSIS Connect items get actioned because they cause customers grief and if I am perfectly honest I must admit that, other than being a bit gnarly, SSIS' three type system architecture has never knowingly caused me any significant problems. The reason for this blog post is to ask if any reader out there has ever encountered any problems on account of SSIS' three type systems or have you, like me, never found them to be a problem? Errors or performance degredation caused by implicit type conversions would, I believe, present a strong case for getting this situation remedied in a future version of SSIS so if you HAVE encountered such problems I would encourage you to leave a comment on the Connect submission accordingly. Let me know in the comments too - I would be interested to hear others' opinions on this. @Jamiet

Read the article
Overview of Microsoft SQL Server 2008 Upgrade Advisor

- by Akshay Deep Lamba

Problem Like most organizations, we are planning to upgrade our database server from SQL Server 2005 to SQL Server 2008. I would like to know is there an easy way to know in advance what kind of issues one may encounter when upgrading to a newer version of SQL Server? One way of doing this is to use the Microsoft SQL Server 2008 Upgrade Advisor to plan for upgrades from SQL Server 2000 or SQL Server 2005. In this tip we will take a look at how one can use the SQL Server 2008 Upgrade Advisor to identify potential issues before the upgrade. Solution SQL Server 2008 Upgrade Advisor is a free tool designed by Microsoft to identify potential issues before upgrading your environment to a newer version of SQL Server. Below are prerequisites which need to be installed before installing the Microsoft SQL Server 2008 Upgrade Advisor. Prerequisites for Microsoft SQL Server 2008 Upgrade Advisor .Net Framework 2.0 or a higher version Windows Installer 4.5 or a higher version Windows Server 2003 SP 1 or a higher version, Windows Server 2008, Windows XP SP2 or a higher version, Windows Vista Download SQL Server 2008 Upgrade Advisor You can download SQL Server 2008 Upgrade Advisor from the following link. Once you have successfully installed Upgrade Advisor follow the below steps to see how you can use this tool to identify potential issues before upgrading your environment. 1. Click Start -> Programs -> Microsoft SQL Server 2008 -> SQL Server 2008 Upgrade Advisor. 2. Click Launch Upgrade Advisor Analysis Wizard as highlighted below to open the wizard. 2. On the wizard welcome screen click Next to continue. 3. In SQL Server Components screen, enter the Server Name and click the Detect button to identify components which need to be analyzed and then click Next to continue with the wizard. 4. In Connection Parameters screen choose Instance Name, Authentication and then click Next to continue with the wizard. 5. In SQL Server Parameters wizard screen select the Databases which you want to analysis, trace files if any and SQL batch files if any. Then click Next to continue with the wizard. 6. In Reporting Services Parameters screen you can specify the Reporting Server Instance name and then click next to continue with the wizard. 7. In Analysis Services Parameters screen you can specify an Analysis Server Instance name and then click Next to continue with the wizard. 8. In Confirm Upgrade Advisor Settings screen you will be able to see a quick summary of the options which you have selected so far. Click Run to start the analysis. 9. In Upgrade Advisor Progress screen you will be able to see the progress of the analysis. Basically, the upgrade advisor runs predefined rules which will help to identify potential issues that can affect your environment once you upgrade your server from a lower version of SQL Server to SQL Server 2008. 10. In the below snippet you can see that Upgrade Advisor has completed the analysis of SQL Server, Analysis Services and Reporting Services. To see the output click the Launch Report button at the bottom of the wizard screen. 11. In View Report screen you can see a summary of issues which can affect you once you upgrade. To learn more about each issue you can expand the issue and read the detailed description as shown in the below snippet.

Read the article
Performance-Driven Development

- by BuckWoody

I was reading a blog yesterday about the evils of SELECT *. The author pointed out that it's almost always a bad idea to use SELECT * for a query, but in the case of SQL Azure (or any cloud database, for that matter) it's especially bad, since you're paying for each transmission that comes down the line. A very good point indeed. This got me to thinking - shouldn't we treat ALL programming that way? In other words, wouldn't it make sense to pretend that we are paying for every chunk of data - a little less for a bit, a lot more for a BLOB or VARCHAR(MAX), that sort of thing? In effect, we really are paying for that. Which led me to the thought of Performance-Driven Development, or the act of programming with the goal of having the fastest code from the very outset. This isn't an original title, since a quick Bing-search shows me a couple of offerings from Forrester and a professional in Israel who already used that title, but the general idea I'm thinking of is assigning a "cost" to each code round-trip, be it network, storage, trip time and other variables, and then rewarding the developers that come up with the fastest code. I wonder what kind of throughput and round-trip times you could get if your developers were paid on a scale of how fast the application performed... Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Change Data Capture

- by Ricardo Peres

There's an hidden gem in SQL Server 2008: Change Data Capture (CDC). Using CDC we get full audit capabilities with absolutely no implementation code: we can see all changes made to a specific table, including the old and new values! You can only use CDC in SQL Server 2008 Standard or Enterprise, Express edition is not supported. Here are the steps you need to take, just remember SQL Agent must be running: use SomeDatabase; -- first create a table CREATE TABLE Author ( ID INT NOT NULL PRIMARY KEY IDENTITY(1, 1), Name NVARCHAR(20) NOT NULL, EMail NVARCHAR(50) NOT NULL, Birthday DATE NOT NULL ) -- enable CDC at the DB level EXEC sys.sp_cdc_enable_db -- check CDC is enabled for the current DB SELECT name, is_cdc_enabled FROM sys.databases WHERE name = 'SomeDatabase' -- enable CDC for table Author, all columns exec sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'Author', @role_name = null -- insert values into table Author insert into Author (Name, EMail, Birthday, Username) values ('Bla', 'bla@bla', 1990-10-10, 'bla') -- check CDC data for table Author -- __$operation: 1 = DELETE, 2 = INSERT, 3 = BEFORE UPDATE 4 = AFTER UPDATE -- __$start_lsn: operation timestamp select * from cdc.dbo_author_CT -- update table Author update Author set EMail = '[email protected]' where Name = 'Bla' -- check CDC data for table Author select * from cdc.dbo_author_CT -- delete from table Author delete from Author -- check CDC data for table Author select * from cdc.dbo_author_CT -- disable CDC for table Author -- this removes all CDC data, so be carefull exec sys.sp_cdc_disable_table @source_schema = 'dbo', @source_name = 'Author', @capture_instance = 'dbo_Author' -- disable CDC for the entire DB -- this removes all CDC data, so be carefull exec sys.sp_cdc_disable_db SyntaxHighlighter.config.clipboardSwf = 'http://alexgorbatchev.com/pub/sh/2.0.320/scripts/clipboard.swf'; SyntaxHighlighter.all();

Read the article

Search Results

Search found 40744 results on 1630 pages for 'sql interview questions a'.

Page 102/1630 | < Previous Page | 98 99 100 101 102 103 104 105 106 107 108 109 | Next Page >

- by AaronBertrand

- by AaronBertrand

- by AaronBertrand

- by Ed Leighton-Dick

- by JasCav

- by Eric C. Singer

- by Zeno

- by Bronzato

- by user29000

- by Bronzato

- by Max

- by thatjeffsmith

- by Blegger

- by Rob Farley

- by Dejan Sarka

- by jamiet

- by Most Valuable Yak (Rob Volk)

- by Jalpesh P. Vadgama

- by Michelle Taylor

- by james

- by jamiet

- by jamiet

- by Akshay Deep Lamba

- by BuckWoody

- by Ricardo Peres

< Previous Page | 98 99 100 101 102 103 104 105 106 107 108 109 | Next Page >