Search Results

Search found 27337 results on 1094 pages for 't sql'.

Page 92/1094 | < Previous Page | 88 89 90 91 92 93 94 95 96 97 98 99  | Next Page >

  • Fraud Detection with the SQL Server Suite Part 2

    - by Dejan Sarka
    This is the second part of the fraud detection whitepaper. You can find the first part in my previous blog post about this topic. My Approach to Data Mining Projects It is impossible to evaluate the time and money needed for a complete fraud detection infrastructure in advance. Personally, I do not know the customer’s data in advance. I don’t know whether there is already an existing infrastructure, like a data warehouse, in place, or whether we would need to build one from scratch. Therefore, I always suggest to start with a proof-of-concept (POC) project. A POC takes something between 5 and 10 working days, and involves personnel from the customer’s site – either employees or outsourced consultants. The team should include a subject matter expert (SME) and at least one information technology (IT) expert. The SME must be familiar with both the domain in question as well as the meaning of data at hand, while the IT expert should be familiar with the structure of data, how to access it, and have some programming (preferably Transact-SQL) knowledge. With more than one IT expert the most time consuming work, namely data preparation and overview, can be completed sooner. I assume that the relevant data is already extracted and available at the very beginning of the POC project. If a customer wants to have their people involved in the project directly and requests the transfer of knowledge, the project begins with training. I strongly advise this approach as it offers the establishment of a common background for all people involved, the understanding of how the algorithms work and the understanding of how the results should be interpreted, a way of becoming familiar with the SQL Server suite, and more. Once the data has been extracted, the customer’s SME (i.e. the analyst), and the IT expert assigned to the project will learn how to prepare the data in an efficient manner. Together with me, knowledge and expertise allow us to focus immediately on the most interesting attributes and identify any additional, calculated, ones soon after. By employing our programming knowledge, we can, for example, prepare tens of derived variables, detect outliers, identify the relationships between pairs of input variables, and more, in only two or three days, depending on the quantity and the quality of input data. I favor the customer’s decision of assigning additional personnel to the project. For example, I actually prefer to work with two teams simultaneously. I demonstrate and explain the subject matter by applying techniques directly on the data managed by each team, and then both teams continue to work on the data overview and data preparation under our supervision. I explain to the teams what kind of results we expect, the reasons why they are needed, and how to achieve them. Afterwards we review and explain the results, and continue with new instructions, until we resolve all known problems. Simultaneously with the data preparation the data overview is performed. The logic behind this task is the same – again I show to the teams involved the expected results, how to achieve them and what they mean. This is also done in multiple cycles as is the case with data preparation, because, quite frankly, both tasks are completely interleaved. A specific objective of the data overview is of principal importance – it is represented by a simple star schema and a simple OLAP cube that will first of all simplify data discovery and interpretation of the results, and will also prove useful in the following tasks. The presence of the customer’s SME is the key to resolving possible issues with the actual meaning of the data. We can always replace the IT part of the team with another database developer; however, we cannot conduct this kind of a project without the customer’s SME. After the data preparation and when the data overview is available, we begin the scientific part of the project. I assist the team in developing a variety of models, and in interpreting the results. The results are presented graphically, in an intuitive way. While it is possible to interpret the results on the fly, a much more appropriate alternative is possible if the initial training was also performed, because it allows the customer’s personnel to interpret the results by themselves, with only some guidance from me. The models are evaluated immediately by using several different techniques. One of the techniques includes evaluation over time, where we use an OLAP cube. After evaluating the models, we select the most appropriate model to be deployed for a production test; this allows the team to understand the deployment process. There are many possibilities of deploying data mining models into production; at the POC stage, we select the one that can be completed quickly. Typically, this means that we add the mining model as an additional dimension to an existing DW or OLAP cube, or to the OLAP cube developed during the data overview phase. Finally, we spend some time presenting the results of the POC project to the stakeholders and managers. Even from a POC, the customer will receive lots of benefits, all at the sole risk of spending money and time for a single 5 to 10 day project: The customer learns the basic patterns of frauds and fraud detection The customer learns how to do the entire cycle with their own people, only relying on me for the most complex problems The customer’s analysts learn how to perform much more in-depth analyses than they ever thought possible The customer’s IT experts learn how to perform data extraction and preparation much more efficiently than they did before All of the attendees of this training learn how to use their own creativity to implement further improvements of the process and procedures, even after the solution has been deployed to production The POC output for a smaller company or for a subsidiary of a larger company can actually be considered a finished, production-ready solution It is possible to utilize the results of the POC project at subsidiary level, as a finished POC project for the entire enterprise Typically, the project results in several important “side effects” Improved data quality Improved employee job satisfaction, as they are able to proactively contribute to the central knowledge about fraud patterns in the organization Because eventually more minds get to be involved in the enterprise, the company should expect more and better fraud detection patterns After the POC project is completed as described above, the actual project would not need months of engagement from my side. This is possible due to our preference to transfer the knowledge onto the customer’s employees: typically, the customer will use the results of the POC project for some time, and only engage me again to complete the project, or to ask for additional expertise if the complexity of the problem increases significantly. I usually expect to perform the following tasks: Establish the final infrastructure to measure the efficiency of the deployed models Deploy the models in additional scenarios Through reports By including Data Mining Extensions (DMX) queries in OLTP applications to support real-time early warnings Include data mining models as dimensions in OLAP cubes, if this was not done already during the POC project Create smart ETL applications that divert suspicious data for immediate or later inspection I would also offer to investigate how the outcome could be transferred automatically to the central system; for instance, if the POC project was performed in a subsidiary whereas a central system is available as well Of course, for the actual project, I would repeat the data and model preparation as needed It is virtually impossible to tell in advance how much time the deployment would take, before we decide together with customer what exactly the deployment process should cover. Without considering the deployment part, and with the POC project conducted as suggested above (including the transfer of knowledge), the actual project should still only take additional 5 to 10 days. The approximate timeline for the POC project is, as follows: 1-2 days of training 2-3 days for data preparation and data overview 2 days for creating and evaluating the models 1 day for initial preparation of the continuous learning infrastructure 1 day for presentation of the results and discussion of further actions Quite frequently I receive the following question: are we going to find the best possible model during the POC project, or during the actual project? My answer is always quite simple: I do not know. Maybe, if we would spend just one hour more for data preparation, or create just one more model, we could get better patterns and predictions. However, we simply must stop somewhere, and the best possible way to do this, according to my experience, is to restrict the time spent on the project in advance, after an agreement with the customer. You must also never forget that, because we build the complete learning infrastructure and transfer the knowledge, the customer will be capable of doing further investigations independently and improve the models and predictions over time without the need for a constant engagement with me.

    Read the article

  • Querying the SSIS Catalog? Here’s a handy query!

    - by jamiet
    I’ve been working on a SQL Server Integration Services (SSIS) solution for about 6 months now and I’ve learnt many many things that I intend to share on this blog just as soon as I get the time. Here’s a very short starter-for-ten… I’ve found the following query to be utterly invaluable when interrogating the SSIS Catalog to discover what is going on in my executions: SELECT event_message_id,MESSAGE,package_name,event_name,message_source_name,package_path,execution_path,message_type,message_source_typeFROM   (       SELECT  em.*       FROM    SSISDB.catalog.event_messages em       WHERE   em.operation_id = (SELECT MAX(execution_id) FROM SSISDB.catalog.executions)           AND event_name NOT LIKE '%Validate%'       )q/* Put in whatever WHERE predicates you might like*/--WHERE event_name = 'OnError'--WHERE package_name = 'Package.dtsx'--WHERE execution_path LIKE '%<some executable>%'ORDER BY message_time DESC Know it. Learn it. Love it. @jamiet

    Read the article

  • T-SQL Tuesday #33: Trick Shots: Undocumented, Underdocumented, and Unknown Conspiracies!

    - by Most Valuable Yak (Rob Volk)
    Mike Fal (b | t) is hosting this month's T-SQL Tuesday on Trick Shots.  I love this choice because I've been preoccupied with sneaky/tricky/evil SQL Server stuff for a long time and have been presenting on it for the past year.  Mike's directives were "Show us a cool trick or process you developed…It doesn’t have to be useful", which most of my blogging definitely fits, and "Tell us what you learned from this trick…tell us how it gave you insight in to how SQL Server works", which is definitely a new concept.  I've done a lot of reading and watching on SQL Server Internals and even attended training, but sometimes I need to go explore on my own, using my own tools and techniques.  It's an itch I get every few months, and, well, it sure beats workin'. I've found some people to be intimidated by SQL Server's internals, and I'll admit there are A LOT of internals to keep track of, but there are tons of excellent resources that clearly document most of them, and show how knowing even the basics of internals can dramatically improve your database's performance.  It may seem like rocket science, or even brain surgery, but you don't have to be a genius to understand it. Although being an "evil genius" can help you learn some things they haven't told you about. ;) This blog post isn't a traditional "deep dive" into internals, it's more of an approach to find out how a program works.  It utilizes an extremely handy tool from an even more extremely handy suite of tools, Sysinternals.  I'm not the only one who finds Sysinternals useful for SQL Server: Argenis Fernandez (b | t), Microsoft employee and former T-SQL Tuesday host, has an excellent presentation on how to troubleshoot SQL Server using Sysinternals, and I highly recommend it.  Argenis didn't cover the Strings.exe utility, but I'll be using it to "hack" the SQL Server executable (DLL and EXE) files. Please note that I'm not promoting software piracy or applying these techniques to attack SQL Server via internal knowledge. This is strictly educational and doesn't reveal any proprietary Microsoft information.  And since Argenis works for Microsoft and demonstrated Sysinternals with SQL Server, I'll just let him take the blame for it. :P (The truth is I've used Strings.exe on SQL Server before I ever met Argenis.) Once you download and install Strings.exe you can run it from the command line.  For our purposes we'll want to run this in the Binn folder of your SQL Server instance (I'm referencing SQL Server 2012 RTM): cd "C:\Program Files\Microsoft SQL Server\MSSQL11\MSSQL\Binn" C:\Program Files\Microsoft SQL Server\MSSQL11\MSSQL\Binn> strings *sql*.dll > sqldll.txt C:\Program Files\Microsoft SQL Server\MSSQL11\MSSQL\Binn> strings *sql*.exe > sqlexe.txt   I've limited myself to DLLs and EXEs that have "sql" in their names.  There are quite a few more but I haven't examined them in any detail. (Homework assignment for you!) If you run this yourself you'll get 2 text files, one with all the extracted strings from every SQL DLL file, and the other with the SQL EXE strings.  You can open these in Notepad, but you're better off using Notepad++, EditPad, Emacs, Vim or another more powerful text editor, as these will be several megabytes in size. And when you do open it…you'll find…a TON of gibberish.  (If you think that's bad, just try opening the raw DLL or EXE file in Notepad.  And by the way, don't do this in production, or even on a running instance of SQL Server.)  Even if you don't clean up the file, you can still use your editor's search function to find a keyword like "SELECT" or some other item you expect to be there.  As dumb as this sounds, I sometimes spend my lunch break just scanning the raw text for anything interesting.  I'm boring like that. Sometimes though, having these files available can lead to some incredible learning experiences.  For me the most recent time was after reading Joe Sack's post on non-parallel plan reasons.  He mentions a new SQL Server 2012 execution plan element called NonParallelPlanReason, and demonstrates a query that generates "MaxDOPSetToOne".  Joe (formerly on the Microsoft SQL Server product team, so he knows this stuff) mentioned that this new element was not currently documented and tried a few more examples to see what other reasons could be generated. Since I'd already run Strings.exe on the SQL Server DLLs and EXE files, it was easy to run grep/find/findstr for MaxDOPSetToOne on those extracts.  Once I found which files it belonged to (sqlmin.dll) I opened the text to see if the other reasons were listed.  As you can see in my comment on Joe's blog, there were about 20 additional non-parallel reasons.  And while it's not "documentation" of this underdocumented feature, the names are pretty self-explanatory about what can prevent parallel processing. I especially like the ones about cursors – more ammo! - and am curious about the PDW compilation and Cloud DB replication reasons. One reason completely stumped me: NoParallelHekatonPlan.  What the heck is a hekaton?  Google and Wikipedia were vague, and the top results were not in English.  I found one reference to Greek, stating "hekaton" can be translated as "hundredfold"; with a little more Wikipedia-ing this leads to hecto, the prefix for "one hundred" as a unit of measure.  I'm not sure why Microsoft chose hekaton for such a plan name, but having already learned some Greek I figured I might as well dig some more in the DLL text for hekaton.  Here's what I found: hekaton_slow_param_passing Occurs when a Hekaton procedure call dispatch goes to slow parameter passing code path The reason why Hekaton parameter passing code took the slow code path hekaton_slow_param_pass_reason sp_deploy_hekaton_database sp_undeploy_hekaton_database sp_drop_hekaton_database sp_checkpoint_hekaton_database sp_restore_hekaton_database e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\hkproc.cpp e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\matgen.cpp e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\matquery.cpp e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\sqlmeta.cpp e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\resultset.cpp Interesting!  The first 4 entries (in red) mention parameters and "slow code".  Could this be the foundation of the mythical DBCC RUNFASTER command?  Have I been passing my parameters the slow way all this time? And what about those sp_xxxx_hekaton_database procedures (in blue)? Could THEY be the secret to a faster SQL Server? Could they promise a "hundredfold" improvement in performance?  Are these special, super-undocumented DIB (databases in black)? I decided to look in the SQL Server system views for any objects with hekaton in the name, or references to them, in hopes of discovering some new code that would answer all my questions: SELECT name FROM sys.all_objects WHERE name LIKE '%hekaton%' SELECT name FROM sys.all_objects WHERE object_definition(OBJECT_ID) LIKE '%hekaton%' Which revealed: name ------------------------ (0 row(s) affected) name ------------------------ sp_createstats sp_recompile sp_updatestats (3 row(s) affected)   Hmm.  Well that didn't find much.  Looks like these procedures are seriously undocumented, unknown, perhaps forbidden knowledge. Maybe a part of some unspeakable evil? (No, I'm not paranoid, I just like mysteries and thought that punching this up with that kind of thing might keep you reading.  I know I'd fall asleep without it.) OK, so let's check out those 3 procedures and see what they reveal when I search for "Hekaton": sp_createstats: -- filter out local temp tables, Hekaton tables, and tables for which current user has no permissions -- Note that OBJECTPROPERTY returns NULL on type="IT" tables, thus we only call it on type='U' tables   OK, that's interesting, let's go looking down a little further: ((@table_type<>'U') or (0 = OBJECTPROPERTY(@table_id, 'TableIsInMemory'))) and -- Hekaton table   Wellllll, that tells us a few new things: There's such a thing as Hekaton tables (UPDATE: I'm not the only one to have found them!) They are not standard user tables and probably not in memory UPDATE: I misinterpreted this because I didn't read all the code when I wrote this blog post. The OBJECTPROPERTY function has an undocumented TableIsInMemory option Let's check out sp_recompile: -- (3) Must not be a Hekaton procedure.   And once again go a little further: if (ObjectProperty(@objid, 'IsExecuted') <> 0 AND ObjectProperty(@objid, 'IsInlineFunction') = 0 AND ObjectProperty(@objid, 'IsView') = 0 AND -- Hekaton procedure cannot be recompiled -- Make them go through schema version bumping branch, which will fail ObjectProperty(@objid, 'ExecIsCompiledProc') = 0)   And now we learn that hekaton procedures also exist, they can't be recompiled, there's a "schema version bumping branch" somewhere, and OBJECTPROPERTY has another undocumented option, ExecIsCompiledProc.  (If you experiment with this you'll find this option returns null, I think it only works when called from a system object.) This is neat! Sadly sp_updatestats doesn't reveal anything new, the comments about hekaton are the same as sp_createstats.  But we've ALSO discovered undocumented features for the OBJECTPROPERTY function, which we can now search for: SELECT name, object_definition(OBJECT_ID) FROM sys.all_objects WHERE object_definition(OBJECT_ID) LIKE '%OBJECTPROPERTY(%'   I'll leave that to you as more homework.  I should add that searching the system procedures was recommended long ago by the late, great Ken Henderson, in his Guru's Guide books, as a great way to find undocumented features.  That seems to be really good advice! Now if you're a programmer/hacker, you've probably been drooling over the last 5 entries for hekaton (in green), because these are the names of source code files for SQL Server!  Does this mean we can access the source code for SQL Server?  As The Oracle suggested to Neo, can we return to The Source??? Actually, no. Well, maybe a little bit.  While you won't get the actual source code from the compiled DLL and EXE files, you'll get references to source files, debugging symbols, variables and module names, error messages, and even the startup flags for SQL Server.  And if you search for "DBCC" or "CHECKDB" you'll find a really nice section listing all the DBCC commands, including the undocumented ones.  Granted those are pretty easy to find online, but you may be surprised what those web sites DIDN'T tell you! (And neither will I, go look for yourself!)  And as we saw earlier, you'll also find execution plan elements, query processing rules, and who knows what else.  It's also instructive to see how Microsoft organizes their source directories, how various components (storage engine, query processor, Full Text, AlwaysOn/HADR) are split into smaller modules. There are over 2000 source file references, go do some exploring! So what did we learn?  We can pull strings out of executable files, search them for known items, browse them for unknown items, and use the results to examine internal code to learn even more things about SQL Server.  We've even learned how to use command-line utilities!  We are now 1337 h4X0rz!  (Not really.  I hate that leetspeak crap.) Although, I must confess I might've gone too far with the "conspiracy" part of this post.  I apologize for that, it's just my overactive imagination.  There's really no hidden agenda or conspiracy regarding SQL Server internals.  It's not The Matrix.  It's not like you'd find anything like that in there: Attach Matrix Database DM_MATRIX_COMM_PIPELINES MATRIXXACTPARTICIPANTS dm_matrix_agents   Alright, enough of this paranoid ranting!  Microsoft are not really evil!  It's not like they're The Borg from Star Trek: ALTER FEDERATION DROP ALTER FEDERATION SPLIT DROP FEDERATION   #tsql2sday

    Read the article

  • Using transactions with LINQ-to-SQL

    - by Jalpesh P. Vadgama
    Today one of my colleague asked that how we can use transactions with the LINQ-to-SQL Classes when we use more then one entities updated at same time. It was a good question. Here is my answer for that.For ASP.NET 2.0  or higher version have a new class called TransactionScope which can be used to manage transaction with the LINQ. Let’s take a simple scenario we are having a shopping cart application in which we are storing details or particular order placed into the database using LINQ-to-SQL. There are two tables Order and OrderDetails which will have all the information related to order. Order will store particular information about orders while OrderDetails table will have product and quantity of product for particular order.We need to insert data in both tables as same time and if any errors comes then it should rollback the transaction. To use TransactionScope in above scenario first we have add a reference to System.Transactions like below. After adding the transaction we need to drag and drop the Order and Order Details tables into Linq-To-SQL Classes it will create entities for that. Below is the code for transaction scope to use mange transaction with Linq Context. MyContextDataContext objContext = new MyContextDataContext(); using (System.Transactions.TransactionScope tScope = new System.Transactions.TransactionScope(TransactionScopeOption.Required)) { objContext.Order.InsertOnSubmit(Order); objContext.OrderDetails.InsertOnSumbit(OrderDetails); objContext.SubmitChanges(); tScope.Complete(); } Here it will commit transaction only if using blocks will run successfully. Hope this will help you. Technorati Tags: Linq,Transaction,System.Transactions,ASP.NET

    Read the article

  • Compare those hard-to-reach servers with SQL Snapper

    - by Michelle Taylor
    If you’ve got an environment which is at the end of an unreliable or slow network connection, or isn’t connected to your network at all, and you want to do a deployment to that environment – then pointing SQL Compare at it directly is difficult or impossible. While you could run SQL Compare locally on that environment, if it’s a server – especially if it’s a locked-down server – you probably don’t want to go through the hassle of using another activation on it. Or possibly you’re not allowed to install software at all, because you don’t have admin rights – but you can run user-mode software. SQL Snapper is a standalone, licensing-free program which takes SQL Compare snapshots of a database. It can create a snapshot within the context of that environment which can then be moved to your working environment to run SQL Compare against, allowing you to create a deployment script for environments you can’t get SQL Compare into. Where can I find it? You can find RedGate.SQLSnapper.exe in your SQL Compare installation directory – if you haven’t changed it, that will be something like C:\Program Files (x86)\Red Gate\SQL Compare 10 (or 11 if you’re using our SQL Server 2014 support beta). As well as copying the executable, you’ll also currently need to copy the System.Threading.dll and RedGate.SOCCompareInterface.dll files from the same directory alongside it. How do I use it? SQL Snapper’s UI is just a cut-down version of the snapshot creation UI in SQL Compare – just fill in the boxes and create your snapshot, then bring it back to the place you use SQL Compare to compare against your difficult-to-reach environment. SQL Snapper also has a command-line mode if you can’t run the UI in your target environment – just specify the server, database and output location with the /server, /database and /mksnap arguments, and optionally the username and password if you’re using SQL security, e.g.: RedGate.SQLSnapper.exe /database:yourdatabase /server:yourservername /username:youruser /password:yourpassword /mksnap:filename.snp What’s the catch? There are a few limitations of SQL Snapper in its current form – notably, it can’t read encrypted objects, and you’ll also currently need to copy the System.Threading.dll and RedGate.SOCCompareInterface.dll files alongside it, which we recognise is a little awkward in some environments. If you use SQL Snapper and want to share your experiences, or help us work on improving the experience in future, please comment here or leave a request on the SQL Compare UserVoice at https://redgate.uservoice.com/forums/141379-sql-compare.

    Read the article

  • SQL Server 2005 to 2008 upgrade - are MDF files binary compatible?

    - by james
    I have 50 databases on a MS SQL Server 2005 system and want to upgrade to MS SQL Server 2008. This is what I tried on some test machines: 1. copied the \DATA directory from the source (MSSQL 2005) to exactly the same path on the target (MSSQL 2008) server. 2. edited the startup parameters on the MSSQL 2008 service to point to the path of the MSSQL 2005 master database. 3. restarted MSSQL service It worked and I can access all databases, tables and data. My questions are: I go back to SQL Server 4.2 and it has never been this easy. I know it worked, but should have it worked? Am I missing something, or is there going to be a gotcha next week? These are simple databases, with just tables, views and indexes. No cross database links, no triggers etc

    Read the article

  • Investigation: Can different combinations of components effect Dataflow performance?

    - by jamiet
    Introduction The Dataflow task is one of the core components (if not the core component) of SQL Server Integration Services (SSIS) and often the most misunderstood. This is not surprising, its an incredibly complicated beast and we’re abstracted away from that complexity via some boxes that go yellow red or green and that have some lines drawn between them. Example dataflow In this blog post I intend to look under that facade and get into some of the nuts and bolts of the Dataflow Task by investigating how the decisions we make when building our packages can affect performance. I will do this by comparing the performance of three dataflows that all have the same input, all produce the same output, but which all operate slightly differently by way of having different transformation components. I also want to use this blog post to challenge a common held opinion that I see perpetuated over and over again on the SSIS forum. That is, that people assume adding components to a dataflow will be detrimental to overall performance. Its not surprising that people think this –it is intuitive to think that more components means more work- however this is not a view that I share. I have always been of the opinion that there are many factors affecting dataflow duration and the number of components is actually one of the less important ones; having said that I have never proven that assertion and that is one reason for this investigation. I have actually seen evidence that some people think dataflow duration is simply a function of number of rows and number of components. I’ll happily call that one out as a myth even without any investigation!  The Setup I have a 2GB datafile which is a list of 4731904 (~4.7million) customer records with various attributes against them and it contains 2 columns that I am going to use for categorisation: [YearlyIncome] [BirthDate] The data file is a SSIS raw format file which I chose to use because it is the quickest way of getting data into a dataflow and given that I am testing the transformations, not the source or destination adapters, I want to minimise external influences as much as possible. In the test I will split the customers according to month of birth (12 of those) and whether or not their yearly income is above or below 50000 (2 of those); in other words I will be splitting them into 24 discrete categories and in order to do it I shall be using different combinations of SSIS’ Conditional Split and Derived Column transformation components. The 24 datapaths that occur will each input to a rowcount component, again because this is the least resource intensive means of terminating a datapath. The test is being carried out on a Dell XPS Studio laptop with a quad core (8 logical Procs) Intel Core i7 at 1.73GHz and Samsung SSD hard drive. Its running SQL Server 2008 R2 on Windows 7. The Variables Here are the three combinations of components that I am going to test:     One Conditional Split - A single Conditional Split component CSPL Split by Month of Birth and income category that will use expressions on [YearlyIncome] & [BirthDate] to send each row to one of 24 outputs. This next screenshot displays the expression logic in use: Derived Column & Conditional Split - A Derived Column component DER Income Category that adds a new column [IncomeCategory] which will contain one of two possible text values {“LessThan50000”,”GreaterThan50000”} and uses [YearlyIncome] to determine which value each row should get. A Conditional Split component CSPL Split by Month of Birth and Income Category then uses that new column in conjunction with [BirthDate] to determine which of the same 24 outputs to send each row to. Put more simply, I am separating the Conditional Split of #1 into a Derived Column and a Conditional Split. The next screenshots display the expression logic in use: DER Income Category         CSPL Split by Month of Birth and Income Category       Three Conditional Splits - A Conditional Split component that produces two outputs based on [YearlyIncome], one for each Income Category. Each of those outputs will go to a further Conditional Split that splits the input into 12 outputs, one for each month of birth (identical logic in each). In this case then I am separating the single Conditional Split of #1 into three Conditional Split components. The next screenshots display the expression logic in use: CSPL Split by Income Category         CSPL Split by Month of Birth 1& 2       Each of these combinations will provide an input to one of the 24 rowcount components, just the same as before. For illustration here is a screenshot of the dataflow containing three Conditional Split components: As you can these dataflows have a fair bit of work to do and remember that they’re doing that work for 4.7million rows. I will execute each dataflow 10 times and use the average for comparison. I foresee three possible outcomes: The dataflow containing just one Conditional Split (i.e. #1) will be quicker There is no significant difference between any of them One of the two dataflows containing multiple transformation components will be quicker Regardless of which of those outcomes come to pass we will have learnt something and that makes this an interesting test to carry out. Note that I will be executing the dataflows using dtexec.exe rather than hitting F5 within BIDS. The Results and Analysis The table below shows all of the executions, 10 for each dataflow. It also shows the average for each along with a standard deviation. All durations are in seconds. I’m pasting a screenshot because I frankly can’t be bothered with the faffing about needed to make a presentable HTML table. It is plain to see from the average that the dataflow containing three conditional splits is significantly faster, the other two taking 43% and 52% longer respectively. This seems strange though, right? Why does the dataflow containing the most components outperform the other two by such a big margin? The answer is actually quite logical when you put some thought into it and I’ll explain that below. Before progressing, a side note. The standard deviation for the “Three Conditional Splits” dataflow is orders of magnitude smaller – indicating that performance for this dataflow can be predicted with much greater confidence too. The Explanation I refer you to the screenshot above that shows how CSPL Split by Month of Birth and salary category in the first dataflow is setup. Observe that there is a case for each combination of Month Of Date and Income Category – 24 in total. These expressions get evaluated in the order that they appear and hence if we assume that Month of Date and Income Category are uniformly distributed in the dataset we can deduce that the expected number of expression evaluations for each row is 12.5 i.e. 1 (the minimum) + 24 (the maximum) divided by 2 = 12.5. Now take a look at the screenshots for the second dataflow. We are doing one expression evaluation in DER Income Category and we have the same 24 cases in CSPL Split by Month of Birth and Income Category as we had before, only the expression differs slightly. In this case then we have 1 + 12.5 = 13.5 expected evaluations for each row – that would account for the slightly longer average execution time for this dataflow. Now onto the third dataflow, the quick one. CSPL Split by Income Category does a maximum of 2 expression evaluations thus the expected number of evaluations per row is 1.5. CSPL Split by Month of Birth 1 & CSPL Split by Month of Birth 2 both have less work to do than the previous Conditional Split components because they only have 12 cases to test for thus the expected number of expression evaluations is 6.5 There are two of them so total expected number of expression evaluations for this dataflow is 6.5 + 6.5 + 1.5 = 14.5. 14.5 is still more than 12.5 & 13.5 though so why is the third dataflow so much quicker? Simple, the conditional expressions in the first two dataflows have two boolean predicates to evaluate – one for Income Category and one for Month of Birth; the expressions in the Conditional Split in the third dataflow however only have one predicate thus they are doing a lot less work. To sum up, the difference in execution times can be attributed to the difference between: MONTH(BirthDate) == 1 && YearlyIncome <= 50000 and MONTH(BirthDate) == 1 In the first two dataflows YearlyIncome <= 50000 gets evaluated an average of 12.5 times for every row whereas in the third dataflow it is evaluated once and once only. Multiply those 11.5 extra operations by 4.7million rows and you get a significant amount of extra CPU cycles – that’s where our duration difference comes from. The Wrap-up The obvious point here is that adding new components to a dataflow isn’t necessarily going to make it go any slower, moreover you may be able to achieve significant improvements by splitting logic over multiple components rather than one. Performance tuning is all about reducing the amount of work that needs to be done and that doesn’t necessarily mean use less components, indeed sometimes you may be able to reduce workload in ways that aren’t immediately obvious as I think I have proven here. Of course there are many variables in play here and your mileage will most definitely vary. I encourage you to download the package and see if you get similar results – let me know in the comments. The package contains all three dataflows plus a fourth dataflow that will create the 2GB raw file for you (you will also need the [AdventureWorksDW2008] sample database from which to source the data); simply disable all dataflows except the one you want to test before executing the package and remember, execute using dtexec, not within BIDS. If you want to explore dataflow performance tuning in more detail then here are some links you might want to check out: Inequality joins, Asynchronous transformations and Lookups Destination Adapter Comparison Don’t turn the dataflow into a cursor SSIS Dataflow – Designing for performance (webinar) Any comments? Let me know! @Jamiet

    Read the article

  • Have SSIS' differing type systems ever caused you problems?

    - by jamiet
    One thing that has always infuriated me about SSIS is the fact that every package has three different type systems; to give you an idea of what I am talking about consider the following: The SSIS dataflow's type system is made up of types called DT_*  (e.g. DT_STR, DT_I4) The SSIS variable type system is based on .Net datatypes (e.g. String, Int32) The types available for Execute SQL Task's parameters are based on something else - I don't exactly know what (e.g. VARCHAR, LONG) Speaking euphemistically ... this is not an optimum situation (were I not speaking euphemistically I would be a lot ruder) and hence I have submitted a suggestion to Connect at [SSIS] Consolidate three type systems into one requesting that it be remedied. This accompanying blog post is not however a request for votes (though that would be nice); the reason is actually subtler than that. Let me explain. I have been submitting bugs and suggestions pertaining to SSIS for years and have, so far, submitted over 200 Connect items. If that experience has taught me anything it is this - Connect items are not generally actioned because they are considered "nice to have". No, SSIS Connect items get actioned because they cause customers grief and if I am perfectly honest I must admit that, other than being a bit gnarly, SSIS' three type system architecture has never knowingly caused me any significant problems. The reason for this blog post is to ask if any reader out there has ever encountered any problems on account of SSIS' three type systems or have you, like me, never found them to be a problem? Errors or performance degredation caused by implicit type conversions would, I believe, present a strong case for getting this situation remedied in a future version of SSIS so if you HAVE encountered such problems I would encourage you to leave a comment on the Connect submission accordingly. Let me know in the comments too - I would be interested to hear others' opinions on this. @Jamiet

    Read the article

  • Performance-Driven Development

    - by BuckWoody
    I was reading a blog yesterday about the evils of SELECT *. The author pointed out that it's almost always a bad idea to use SELECT * for a query, but in the case of SQL Azure (or any cloud database, for that matter) it's especially bad, since you're paying for each transmission that comes down the line. A very good point indeed. This got me to thinking - shouldn't we treat ALL programming that way? In other words, wouldn't it make sense to pretend that we are paying for every chunk of data - a little less for a bit, a lot more for a BLOB or VARCHAR(MAX), that sort of thing? In effect, we really are paying for that. Which led me to the thought of Performance-Driven Development, or the act of programming with the goal of having the fastest code from the very outset. This isn't an original title, since a quick Bing-search shows me a couple of offerings from Forrester and a professional in Israel who already used that title, but the general idea I'm thinking of is assigning a "cost" to each code round-trip, be it network, storage, trip time and other variables, and then rewarding the developers that come up with the fastest code. I wonder what kind of throughput and round-trip times you could get if your developers were paid on a scale of how fast the application performed... Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Change Data Capture

    - by Ricardo Peres
    There's an hidden gem in SQL Server 2008: Change Data Capture (CDC). Using CDC we get full audit capabilities with absolutely no implementation code: we can see all changes made to a specific table, including the old and new values! You can only use CDC in SQL Server 2008 Standard or Enterprise, Express edition is not supported. Here are the steps you need to take, just remember SQL Agent must be running: use SomeDatabase; -- first create a table CREATE TABLE Author ( ID INT NOT NULL PRIMARY KEY IDENTITY(1, 1), Name NVARCHAR(20) NOT NULL, EMail NVARCHAR(50) NOT NULL, Birthday DATE NOT NULL ) -- enable CDC at the DB level EXEC sys.sp_cdc_enable_db -- check CDC is enabled for the current DB SELECT name, is_cdc_enabled FROM sys.databases WHERE name = 'SomeDatabase' -- enable CDC for table Author, all columns exec sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'Author', @role_name = null -- insert values into table Author insert into Author (Name, EMail, Birthday, Username) values ('Bla', 'bla@bla', 1990-10-10, 'bla') -- check CDC data for table Author -- __$operation: 1 = DELETE, 2 = INSERT, 3 = BEFORE UPDATE 4 = AFTER UPDATE -- __$start_lsn: operation timestamp select * from cdc.dbo_author_CT -- update table Author update Author set EMail = '[email protected]' where Name = 'Bla' -- check CDC data for table Author select * from cdc.dbo_author_CT -- delete from table Author delete from Author -- check CDC data for table Author select * from cdc.dbo_author_CT -- disable CDC for table Author -- this removes all CDC data, so be carefull exec sys.sp_cdc_disable_table @source_schema = 'dbo', @source_name = 'Author', @capture_instance = 'dbo_Author' -- disable CDC for the entire DB -- this removes all CDC data, so be carefull exec sys.sp_cdc_disable_db SyntaxHighlighter.config.clipboardSwf = 'http://alexgorbatchev.com/pub/sh/2.0.320/scripts/clipboard.swf'; SyntaxHighlighter.all();

    Read the article

  • Overview of Microsoft SQL Server 2008 Upgrade Advisor

    - by Akshay Deep Lamba
    Problem Like most organizations, we are planning to upgrade our database server from SQL Server 2005 to SQL Server 2008. I would like to know is there an easy way to know in advance what kind of issues one may encounter when upgrading to a newer version of SQL Server? One way of doing this is to use the Microsoft SQL Server 2008 Upgrade Advisor to plan for upgrades from SQL Server 2000 or SQL Server 2005. In this tip we will take a look at how one can use the SQL Server 2008 Upgrade Advisor to identify potential issues before the upgrade. Solution SQL Server 2008 Upgrade Advisor is a free tool designed by Microsoft to identify potential issues before upgrading your environment to a newer version of SQL Server. Below are prerequisites which need to be installed before installing the Microsoft SQL Server 2008 Upgrade Advisor. Prerequisites for Microsoft SQL Server 2008 Upgrade Advisor .Net Framework 2.0 or a higher version Windows Installer 4.5 or a higher version Windows Server 2003 SP 1 or a higher version, Windows Server 2008, Windows XP SP2 or a higher version, Windows Vista Download SQL Server 2008 Upgrade Advisor You can download SQL Server 2008 Upgrade Advisor from the following link. Once you have successfully installed Upgrade Advisor follow the below steps to see how you can use this tool to identify potential issues before upgrading your environment. 1. Click Start -> Programs -> Microsoft SQL Server 2008 -> SQL Server 2008 Upgrade Advisor. 2. Click Launch Upgrade Advisor Analysis Wizard as highlighted below to open the wizard. 2. On the wizard welcome screen click Next to continue. 3. In SQL Server Components screen, enter the Server Name and click the Detect button to identify components which need to be analyzed and then click Next to continue with the wizard. 4. In Connection Parameters screen choose Instance Name, Authentication and then click Next to continue with the wizard. 5. In SQL Server Parameters wizard screen select the Databases which you want to analysis, trace files if any and SQL batch files if any.  Then click Next to continue with the wizard. 6. In Reporting Services Parameters screen you can specify the Reporting Server Instance name and then click next to continue with the wizard. 7. In Analysis Services Parameters screen you can specify an Analysis Server Instance name and then click Next to continue with the wizard. 8. In Confirm Upgrade Advisor Settings screen you will be able to see a quick summary of the options which you have selected so far. Click Run to start the analysis. 9. In Upgrade Advisor Progress screen you will be able to see the progress of the analysis. Basically, the upgrade advisor runs predefined rules which will help to identify potential issues that can affect your environment once you upgrade your server from a lower version of SQL Server to SQL Server 2008. 10. In the below snippet you can see that Upgrade Advisor has completed the analysis of SQL Server, Analysis Services and Reporting Services. To see the output click the Launch Report button at the bottom of the wizard screen. 11. In View Report screen you can see a summary of issues which can affect you once you upgrade. To learn more about each issue you can expand the issue and read the detailed description as shown in the below snippet.

    Read the article

  • Running & Managing Concurrent Queries in SQL Developer

    - by thatjeffsmith
    We’ve all been there – you’ve managed to write a query that takes longer than a few seconds to execute. Tuning aside, sometimes it takes longer than you want for a query to run. So what’s a SQL Developer user to do? I say, keep going! While you’re waiting for your query to finish, there’s no reason why you can’t continue on with your work. If you need to execute something else in a worksheet, there’s no reason to launch a 2nd or 3rd copy of SQL Developer. Just open an un-shared worksheet. Now while you’ve got 1 or more queries running, you can easily get yourself into a situation where you’re not sure what’s running where. Or maybe you want to cancel a query or just check how long something’s been running. Just open the Task Progress Panel If a query or task in SQL Developer takes more than 3-5 seconds, it will appear in the Task Progress panel. You can then watch the throbbers go back and forth while you sip your coffee/soda/Red Bull. Run a query, spawn a new worksheet, run another query, watch them in the Task Progress panel. Kudos and thanks to @leight0nn for helping me get the title of this post right If you’re looking for help in managing and monitoring sessions in general, check out this post.

    Read the article

  • Should we be able to deploy a single package to the SSIS Catalog?

    - by jamiet
    My buddy Sutha Thiru sent me an email recently asking about my opinion on a particular nuance of the project deployment model in SQL Server Integration Services (SSIS) 2012 and I’d like to share my response as I think it warrants a wider discussion. Sutha asked: Jamie What is your take on this? http://www.mattmasson.com/index.php/2012/07/can-i-deploy-a-single-ssis-package-from-my-project-to-the-ssis-catalog/ Overnight I was talking to Matt who confirmed that they got no plans to change the deployment model. For example if we have following scenrio how do we do deploy? Sprint 1 Pkg1, 2 & 3 has been developed and deployed to UAT. Once signed off its been deployed to Live. Sprint 2 Pkg 4 & 5 been developed. During this time users raised a bug on Pkg2. We want to make the change to Pkg2 and deploy that to UAT and eventually to LIVE without releasing Pkg 4 &5. How do we do it? Matt pointed me to his blog entry which I have seen before . http://www.mattmasson.com/index.php/2012/02/thoughts-on-branching-strategies-for-ssis-projects/ Thanks Sutha My response: Personally, even though I've experienced the exact problem you just outlined, I agree with the current approach. I steadfastly believe that there should not be a way for an unscrupulous developer to slide in a new version of a package under the covers. Deploying .ispac files brings a degree of rigour to your operational processes. Yes, that means that we as SSIS developers are going to have to get better at using source control and branching properly but that is no bad thing in my opinion. Claiming to be proper "developers" is a bit of a cheap claim if we don't even do the fundamentals correctly. I would be interested in the thoughts of others who have used the project deployment model. Do you agree with my point of view? @Jamiet

    Read the article

  • How do I restore a database on a remote SQL server 2005 from a local backup?

    - by MatsT
    I have been given access to (parts of) a remote SQL Server 2005 with SQL Server authentication in order to be able to make changes to a database without involving other people who is not working on the project. The database have been created on my local machine. Is there any way to restore the remote database from a backup file on my local computer? I do not currently have access to the filesystem on the remote server. EDIT: To clarify, the access I have is that i can log in to the server via the SQL Server Management Studio. I have one connection to my local database server and one connection to the remote server. What I basically want to do is copy the database from one connection to the other.

    Read the article

  • How do I restore a database on a remote SQL server 2005 from a local backup?

    - by MatsT
    I have been given access to (parts of) a remote SQL Server 2005 with SQL Server authentication in order to be able to make changes to a database without involving other people who is not working on the project. The database have been created on my local machine. Is there any way to restore the remote database from a backup file on my local computer? I do not currently have access to the filesystem on the remote server. EDIT: To clarify, the access I have is that i can log in to the server via the SQL Server Management Studio. I have one connection to my local database server and one connection to the remote server. What I basically want to do is copy the database from one connection to the other.

    Read the article

  • how to troubleshoot sql server issues

    - by joe
    i have an ASP .net application with sql server database, and i am wondering if you can give your ideas on how to troubleshoot the following issue: i can insert / update / delete from any table, but i have one page that uses transactions to insert into different tables. the c# code is correct and very simple, but it fails. i used the sql profiler to see how my app interacts with the DB, especially that the app is using stored procedures, i can catch the exec procedure statement and run it manually from SSMS and it works fine, but the same stored procedure fails from the application!!! which lead me to think that issue is coming from the user account and settings, i am no expert in sql server and wondering if anyone can explain how to verify the required settings for user account. thanks EDIT: in web.config here is how i reference my connection <connectionStrings> <add name="Conn" connectionString="Data Source=localhost;Initial Catalog=myDB;Persist Security Info=True;User ID=DbUser;Password=password1254_3" providerName="System.Data.SqlClient"> </connectionstring> EDIT: i will try to describe the process here: 1- i begin a transaction 2- i call a stored proc to insert (which succeeds) and return the scope identity ( that will be used in the next step) 3- i call another stored procedure to insert some more info + scope identity from step 2, which is a foreign key here 4- i get error, foreign key violation 5- transaction rolled back, now tables empty again... thanks

    Read the article

  • PowerShell & SQL Compare

    - by Grant Fritchey
    Just a quick blog post to share a couple of scripts for using PowerShell to call SQL Compare. This is an example from my session at SQL in the City on setting up a sandbox development process. This just runs a compare between a set of scripts and a database and deploys it. set-Location “c:\Program Files (x86)\Red Gate\SQL Compare 10\”; ./sqlcompare /s2:DOJO /db2:MovieManagement_Sandbox /sourcecontrol1 /vu1:grant /vp1:12345 /r1:HEAD /sfx:scripts.xml /sync /mfx:migrations.xml /verbose; I would not recommend using the /verbose output for real automation, but I’m showing off how the tool works. This particular script does a compare straight from source control to a database on my server. You can use variables where I’ve hard coded. That’s it. Works great. Just wanted to share it out there. I have others that I’ll track down and put up here.  

    Read the article

  • SQL Server 2008R2 Express: which is the users limit in a real case scenario?

    - by PressPlayOnTape
    I know that sql server express has not a user limit, and every application has a different way to load/stress the server. But let's take "a typical accounting software", where users input some record, retrieve some data and from time to time they make some custom big queries. May someone share its own experience and tell me which is the limit of users that can realistically use a sql server express instance in this scenario? I am looking for an indicative idea, like (as an example): "I had a company with an average of 40 users logged in and the application was working ok on sql server express, but when the users become 60 the application started to seem non repsonsive" (please note this sentence is pure imagination, I just wrote it as an example).

    Read the article

  • First Shard for SQL Azure and SQL Server

    - by Herve Roggero
    That's it!!!!! It's ready to go and be tested, abused and improved! It requires .NET 4.0 and uses some cool technologies, like caching (the new System.Runtime.Caching) and the Task Parallel Library (System.Threading.Tasks). With this library you can: Define a shard of 1, 2 or 100 SQL databases (a mix of SQL Server and SQL Azure) Read from the shard in parallel or sequentially, and cache resultsets Update, Delete a record from the shard Insert records quickly in the shard with a round-robin load Reset the cache You can download the source code and a sample application here: http://enzosqlshard.codeplex.com/  Note about the breadcrumbs: I had to add a connection GUID in order for the library to know which database a record came from. The GUID is currently calculated on the fly in the library using some of the parameters of the connection string. The GUID is also dynamically added to the result set so the client can pass it back to the library. I am curious to get your feedback on this approach. ** Correction from my previous post: this is a library for a Horizontal Partition Shard (HPS): tables are split across databases horizontally. So in essence, the tables need to have the same schema across the databases.

    Read the article

  • TDD with SQL and data manipulation functions

    - by Xophmeister
    While I'm a professional programmer, I've never been formally trained in software engineering. As I'm frequently visiting here and SO, I've noticed a trend for writing unit tests whenever possible and, as my software gets more complex and sophisticated, I see automated testing as a good idea in aiding debugging. However, most of my work involves writing complex SQL and then processing the output in some way. How would you write a test to ensure your SQL was returning the correct data, for example? Then, say if the data wasn't under your control (e.g., that of a 3rd party system), how can you efficiently test your processing routines without having to hand write reams of dummy data? The best solution I can think of is making views of the data that, together, cover most cases. I can then join those views with my SQL to see if it's returning the correct records and manually process the views to see if my functions, etc. are doing what they're supposed to. Still, it seems excessive and flakey; particularly finding data to test against...

    Read the article

  • Options for transparent data encryption on SQL 2005 and 2008 DBs.

    - by Dan
    Recently, in Massachusetts a law was passed (rather silently) that data containing personally identifiable information, must be encrypted. PII is defined by the state, as containing the residents first and last name, in combination with either, A. SSN B. drivers license or ID card # C. Debit or CC # Due to the nature of the software we make, all of our clients use SQL as the backend. Typically servers will be running SQl2005 Standard or above, sometimes SQL 2008. Almost all client machines use SQL2005 Express. We use replication between client and server. Unfortunately, to get TDE you need to have SQL Enterprise on each machine, which is absolutely not an option. I'm looking for recommendations of products that will encrypt a DB. Right now, I'm not interested in whole disk encryption at all.

    Read the article

  • Design practice for securing data inside Azure SQL

    - by Sid
    Update: I'm looking for a specific design practice as we try to build-our-own database encryption. Azure SQL doesn't support many of the encryption features found in SQL Server (Table and Column encryption). We need to store some sensitive information that needs to be encrypted and we've rolled our own using AesCryptoServiceProvider to encrypt/decrypt data to/from the database. This solves the immediate issue (no cleartext in db) but poses other problems like Key rotation (we have to roll our own code for this, walking through the db converting old cipher text into new cipher text) metadata mapping of which tables and which columns are encrypted. This is simple when it's just couple of columns (send an email to all devs/document) but that quickly gets out of hand ... So, what is the best practice for doing application level encryption into a database that doesn't support encryption? In particular, what is a good design to solve the above two bullet points? If you had specific schema additions would love it if you could give details ("Have a NVARCHAR(max) column to store the cipher metadata as JSON" or a SQL script/commands). If someone would like to recommend a library, I'd be happy to stay away from "DIY" too. Before going too deep - I assume there isn't any way I can add encryption support to Azure by creating a stored procedure, right?

    Read the article

  • How setup federated SQL data to display limited information to a Web server in the DMZ?

    - by Pcav
    I have a SQL server behind a firewall. I need to push some limited SQL 2005 information to a Web Server in the DMZ so that I do not have to let database queries come all the way into the database server on our internal network. I want to push a small amount of dynamic data to a Web server in the DMZ and lock it down so that our hosted website does not need to come into the internal network for information; I want to put a server in the DMZ that will be the only connection allowed to the SQL database. This DMZ server will be the only server that can have any sort of connection to the back-end database so the hosting provider just pull the data from our server in the DMZ...

    Read the article

  • How to unlock sa user in SQL Server 2012 if Windows Authentication doesn't work?

    - by Tony_Henrich
    I am logged in as an admin on the computer. For some reason I can't log into SQL Server 2012 which is running on the same machine. The SA user is locked out. SQL Server was installed when I was logged in into my company's domain. I am not logged into the domain when I try to log in to sql server. I don't know if this matters st all. However I expect to log using Windows authentication if I am in the administrator group?

    Read the article

< Previous Page | 88 89 90 91 92 93 94 95 96 97 98 99  | Next Page >