Search Results

Search found 1369 results on 55 pages for 'in clause'.

Page 53/55 | < Previous Page | 49 50 51 52 53 54 55  | Next Page >

  • SSIS Lookup component tuning tips

    - by jamiet
    Yesterday evening I attended a London meeting of the UK SQL Server User Group at Microsoft’s offices in London Victoria. As usual it was both a fun and informative evening and in particular there seemed to be a few questions arising about tuning the SSIS Lookup component; I rattled off some comments and figured it would be prudent to drop some of them into a dedicated blog post, hence the one you are reading right now. Scene setting A popular pattern in SSIS is to use a Lookup component to determine whether a record in the pipeline already exists in the intended destination table or not and I cover this pattern in my 2006 blog post Checking if a row exists and if it does, has it changed? (note to self: must rewrite that blog post for SSIS2008). Fundamentally the SSIS lookup component (when using FullCache option) sucks some data out of a database and holds it in memory so that it can be compared to data in the pipeline. One of the big benefits of using SSIS dataflows is that they process data one buffer at a time; that means that not all of the data from your source exists in the dataflow at the same time and is why a SSIS dataflow can process data volumes that far exceed the available memory. However, that only applies to data in the pipeline; for reasons that are hopefully obvious ALL of the data in the lookup set must exist in the memory cache for the duration of the dataflow’s execution which means that any memory used by the lookup cache will not be available to be used as a pipeline buffer. Moreover, there’s an obvious correlation between the amount of data in the lookup cache and the time it takes to charge that cache; the more data you have then the longer it will take to charge and the longer you have to wait until the dataflow actually starts to do anything. For these reasons your goal is simple: ensure that the lookup cache contains as little data as possible. General tips Here is a simple tick list you can follow in order to tune your lookups: Use a SQL statement to charge your cache, don’t just pick a table from the dropdown list made available to you. (Read why in SELECT *... or select from a dropdown in an OLE DB Source component?) Only pick the columns that you need, ignore everything else Make the database columns that your cache is populated from as narrow as possible. If a column is defined as VARCHAR(20) then SSIS will allocate 20 bytes for every value in that column – that is a big waste if the actual values are significantly less than 20 characters in length. Do you need DT_WSTR typed columns or will DT_STR suffice? DT_WSTR uses twice the amount of space to hold values that can be stored using a DT_STR so if you can use DT_STR, consider doing so. Same principle goes for the numerical datatypes DT_I2/DT_I4/DT_I8. Only populate the cache with data that you KNOW you will need. In other words, think about your WHERE clause! Thinking outside the box It is tempting to build a large monolithic dataflow that does many things, one of which is a Lookup. Often though you can make better use of your available resources by, well, mixing things up a little and here are a few ideas to get your creative juices flowing: There is no rule that says everything has to happen in a single dataflow. If you have some particularly resource intensive lookups then consider putting that lookup into a dataflow all of its own and using raw files to pass the pipeline data in and out of that dataflow. Know your data. If you think, for example, that the majority of your incoming rows will match with only a small subset of your lookup data then consider chaining multiple lookup components together; the first would use a FullCache containing that data subset and the remaining data that doesn’t find a match could be passed to a second lookup that perhaps uses a NoCache lookup thus negating the need to pull all of that least-used lookup data into memory. Do you need to process all of your incoming data all at once? If you can process different partitions of your data separately then you can partition your lookup cache as well. For example, if you are using a lookup to convert a location into a [LocationId] then why not process your data one region at a time? This will mean your lookup cache only has to contain data for the location that you are currently processing and with the ability of the Lookup in SSIS2008 and beyond to charge the cache using a dynamically built SQL statement you’ll be able to achieve it using the same dataflow and simply loop over it using a ForEach loop. Taking the previous data partitioning idea further … a dataflow can contain more than one data path so why not split your data using a conditional split component and, again, charge your lookup caches with only the data that they need for that partition. Lookups have two uses: to (1) find a matching row from the lookup set and (2) put attributes from that matching row into the pipeline. Ask yourself, do you need to do these two things at the same time? After all once you have the key column(s) from your lookup set then you can use that key to get the rest of attributes further downstream, perhaps even in another dataflow. Are you using the same lookup data set multiple times? If so, consider the file caching option in SSIS 2008 and beyond. Above all, experiment and be creative with different combinations. You may be surprised at what works. Final  thoughts If you want to know more about how the Lookup component differs in SSIS2008 from SSIS2005 then I have a dedicated blog post about that at Lookup component gets a makeover. I am on a mini-crusade at the moment to get a BULK MERGE feature into the database engine, the thinking being that if the database engine can quickly merge massive amounts of data in a similar manner to how it can insert massive amounts using BULK INSERT then that’s a lot of work that wouldn’t have to be done in the SSIS pipeline. If you think that is a good idea then go and vote for BULK MERGE on Connect. If you have any other tips to share then please stick them in the comments. Hope this helps! @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Listing common SQL Code Smells.

    - by Phil Factor
    Once you’ve done a number of SQL Code-reviews, you’ll know those signs in the code that all might not be well. These ’Code Smells’ are coding styles that don’t directly cause a bug, but are indicators that all is not well with the code. . Kent Beck and Massimo Arnoldi seem to have coined the phrase in the "OnceAndOnlyOnce" page of www.C2.com, where Kent also said that code "wants to be simple". Bad Smells in Code was an essay by Kent Beck and Martin Fowler, published as Chapter 3 of the book ‘Refactoring: Improving the Design of Existing Code’ (ISBN 978-0201485677) Although there are generic code-smells, SQL has its own particular coding habits that will alert the programmer to the need to re-factor what has been written. See Exploring Smelly Code   and Code Deodorants for Code Smells by Nick Harrison for a grounding in Code Smells in C# I’ve always been tempted by the idea of automating a preliminary code-review for SQL. It would be so useful to trawl through code and pick up the various problems, much like the classic ‘Lint’ did for C, and how the Code Metrics plug-in for .NET Reflector by Jonathan 'Peli' de Halleux is used for finding Code Smells in .NET code. The problem is that few of the standard procedural code smells are relevant to SQL, and we need an agreed list of code smells. Merrilll Aldrich made a grand start last year in his blog Top 10 T-SQL Code Smells.However, I'd like to make a start by discovering if there is a general opinion amongst Database developers what the most important SQL Smells are. One can be a bit defensive about code smells. I will cheerfully write very long stored procedures, even though they are frowned on. I’ll use dynamic SQL occasionally. You can only use them as an aid for your own judgment and it is fine to ‘sign them off’ as being appropriate in particular circumstances. Also, whole classes of ‘code smells’ may be irrelevant for a particular database. The use of proprietary SQL, for example, is only a ‘code smell’ if there is a chance that the database will have to be ported to another RDBMS. The use of dynamic SQL is a risk only with certain security models. As the saying goes,  a CodeSmell is a hint of possible bad practice to a pragmatist, but a sure sign of bad practice to a purist. Plamen Ratchev’s wonderful article Ten Common SQL Programming Mistakes lists some of these ‘code smells’ along with out-and-out mistakes, but there are more. The use of nested transactions, for example, isn’t entirely incorrect, even though the database engine ignores all but the outermost: but it does flag up the possibility that the programmer thinks that nested transactions are supported. If anything requires some sort of general agreement, the definition of code smells is one. I’m therefore going to make this Blog ‘dynamic, in that, if anyone twitters a suggestion with a #SQLCodeSmells tag (or sends me a twitter) I’ll update the list here. If you add a comment to the blog with a suggestion of what should be added or removed, I’ll do my best to oblige. In other words, I’ll try to keep this blog up to date. The name against each 'smell' is the name of the person who Twittered me, commented about or who has written about the 'smell'. it does not imply that they were the first ever to think of the smell! Use of deprecated syntax such as *= (Dave Howard) Denormalisation that requires the shredding of the contents of columns. (Merrill Aldrich) Contrived interfaces Use of deprecated datatypes such as TEXT/NTEXT (Dave Howard) Datatype mis-matches in predicates that rely on implicit conversion.(Plamen Ratchev) Using Correlated subqueries instead of a join   (Dave_Levy/ Plamen Ratchev) The use of Hints in queries, especially NOLOCK (Dave Howard /Mike Reigler) Few or No comments. Use of functions in a WHERE clause. (Anil Das) Overuse of scalar UDFs (Dave Howard, Plamen Ratchev) Excessive ‘overloading’ of routines. The use of Exec xp_cmdShell (Merrill Aldrich) Excessive use of brackets. (Dave Levy) Lack of the use of a semicolon to terminate statements Use of non-SARGable functions on indexed columns in predicates (Plamen Ratchev) Duplicated code, or strikingly similar code. Misuse of SELECT * (Plamen Ratchev) Overuse of Cursors (Everyone. Special mention to Dave Levy & Adrian Hills) Overuse of CLR routines when not necessary (Sam Stange) Same column name in different tables with different datatypes. (Ian Stirk) Use of ‘broken’ functions such as ‘ISNUMERIC’ without additional checks. Excessive use of the WHILE loop (Merrill Aldrich) INSERT ... EXEC (Merrill Aldrich) The use of stored procedures where a view is sufficient (Merrill Aldrich) Not using two-part object names (Merrill Aldrich) Using INSERT INTO without specifying the columns and their order (Merrill Aldrich) Full outer joins even when they are not needed. (Plamen Ratchev) Huge stored procedures (hundreds/thousands of lines). Stored procedures that can produce different columns, or order of columns in their results, depending on the inputs. Code that is never used. Complex and nested conditionals WHILE (not done) loops without an error exit. Variable name same as the Datatype Vague identifiers. Storing complex data  or list in a character map, bitmap or XML field User procedures with sp_ prefix (Aaron Bertrand)Views that reference views that reference views that reference views (Aaron Bertrand) Inappropriate use of sql_variant (Neil Hambly) Errors with identity scope using SCOPE_IDENTITY @@IDENTITY or IDENT_CURRENT (Neil Hambly, Aaron Bertrand) Schemas that involve multiple dated copies of the same table instead of partitions (Matt Whitfield-Atlantis UK) Scalar UDFs that do data lookups (poor man's join) (Matt Whitfield-Atlantis UK) Code that allows SQL Injection (Mladen Prajdic) Tables without clustered indexes (Matt Whitfield-Atlantis UK) Use of "SELECT DISTINCT" to mask a join problem (Nick Harrison) Multiple stored procedures with nearly identical implementation. (Nick Harrison) Excessive column aliasing may point to a problem or it could be a mapping implementation. (Nick Harrison) Joining "too many" tables in a query. (Nick Harrison) Stored procedure returning more than one record set. (Nick Harrison) A NOT LIKE condition (Nick Harrison) excessive "OR" conditions. (Nick Harrison) User procedures with sp_ prefix (Aaron Bertrand) Views that reference views that reference views that reference views (Aaron Bertrand) sp_OACreate or anything related to it (Bill Fellows) Prefixing names with tbl_, vw_, fn_, and usp_ ('tibbling') (Jeremiah Peschka) Aliases that go a,b,c,d,e... (Dave Levy/Diane McNurlan) Overweight Queries (e.g. 4 inner joins, 8 left joins, 4 derived tables, 10 subqueries, 8 clustered GUIDs, 2 UDFs, 6 case statements = 1 query) (Robert L Davis) Order by 3,2 (Dave Levy) MultiStatement Table functions which are then filtered 'Sel * from Udf() where Udf.Col = Something' (Dave Ballantyne) running a SQL 2008 system in SQL 2000 compatibility mode(John Stafford)

    Read the article

  • SQL SERVER – Weekly Series – Memory Lane – #051

    - by Pinal Dave
    Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Explanation and Understanding NOT NULL Constraint NOT NULL is integrity CONSTRAINT. It does not allow creating of the row where column contains NULL value. Most discussed questions about NULL is what is NULL? I will not go in depth analysis it. Simply put NULL is unknown or missing data. When NULL is present in database columns, it can affect the integrity of the database. I really do not prefer NULL in the database unless they are absolutely necessary. Three T-SQL Script to Create Primary Keys on Table I have always enjoyed writing about three topics Constraint and Keys, Backup and Restore and Datetime Functions. Primary Keys constraints prevent duplicate values for columns and provides a unique identifier to each column, as well it creates clustered index on the columns. 2008 Get Numeric Value From Alpha Numeric String – UDF for Get Numeric Numbers Only SQL is great with String operations. Many times, I use T-SQL to do my string operation. Let us see User Defined Function, which I wrote a few days ago, which will return only Numeric values from Alpha Numeric values. Introduction and Example of UNION and UNION ALL It is very much interesting when I get requests from blog reader to re-write my previous articles. I have received few requests to rewrite my article SQL SERVER – Union vs. Union All – Which is better for performance? with examples. I request you to read my previous article first to understand what is the concept and read this article to understand the same concept with an example. Downgrade Database for Previous Version The main questions is how they can downgrade the from SQL Server 2005 to SQL Server 2000? The answer is : Not Possible. Get Common Records From Two Tables Without Using Join Following is my scenario, Suppose Table 1 and Table 2 has same column e.g. Column1 Following is the query, 1. Select column1,column2 From Table1 2. Select column1 From Table2 I want to find common records from these tables, but I don’t want to use the Join clause because for that I need to specify the column name for Join condition. Will you help me to get common records without using Join condition? I am using SQL Server 2005. Retrieve – Select Only Date Part From DateTime – Best Practice – Part 2 A year ago I wrote a post about SQL SERVER – Retrieve – Select Only Date Part From DateTime – Best Practice where I have discussed two different methods of getting the date part from datetime. Introduction to CLR – Simple Example of CLR Stored Procedure CLR is an abbreviation of Common Language Runtime. In SQL Server 2005 and later version of it database objects can be created which are created in CLR. Stored Procedures, Functions, Triggers can be coded in CLR. CLR is faster than T-SQL in many cases. CLR is mainly used to accomplish tasks which are not possible by T-SQL or can use lots of resources. The CLR can be usually implemented where there is an intense string operation, thread management or iteration methods which can be complicated for T-SQL. Implementing CLR provides more security to the Extended Stored Procedure. 2009 Comic Slow Query – SQL Joke Before Presentation After Presentation Enable Automatic Statistic Update on Database In one of the recent projects, I found out that despite putting good indexes and optimizing the query, I could not achieve an optimized performance and I still received an unoptimized response from the SQL Server. On examination, I figured out that the culprit was statistics. The database that I was trying to optimize had auto update of the statistics was disabled. Recently Executed T-SQL Query Please refer to blog post  query to recently executed T-SQL query on database. Change Collation of Database Column – T-SQL Script – Consolidating Collations – Extention Script At some time in your DBA career, you may find yourself in a position when you sit back and realize that your database collations have somehow run amuck, or are faced with the ever annoying CANNOT RESOLVE COLLATION message when trying to join data of varying collation settings. 2010 Visiting Alma Mater – Delivering Session on Database Performance and Career – Nirma Institute of Technology Everyone always dreams of visiting their school and college, where they have studied once. It is a great feeling to see the college once again – where you have spent the wonderful golden years of your time. College time is filled with studies, education, emotions and several plans to build a future. I consider myself fortunate as I got the opportunity to study at some of the best places in the world. Change Column DataTypes There are times when I feel like writing that I am a day older in SQL Server. In fact, there are many who are looking for a solution that is simple enough. Have you ever searched online for something very simple. I often do and enjoy doing things which are straight forward and easy to change. 2011 Three DMVs – sys.dm_server_memory_dumps – sys.dm_server_services – sys.dm_server_registry In this blog post we will see three new DMVs which are introduced in Denali. The DMVs are very simple and there is not much to describe them. So here is the simple game. I will be asking a question back to you after seeing the result of the each of the DMV and you help me to complete this blog post. A Simple Quiz – T-SQL Brain Trick If you have some time, I strongly suggest you try this quiz out as it is for sure twists your brain. 2012 List All The Column With Specific Data Types in Database 5 years ago I wrote script SQL SERVER – 2005 – List All The Column With Specific Data Types, when I read it again, it is very much relevant and I liked it. This is one of the script which every developer would like to keep it handy. I have upgraded the script bit more. I have included few additional information which I believe I should have added from the beginning. It is difficult to visualize the final script when we are writing it first time. Find First Non-Numeric Character from String The function PATINDEX exists for quite a long time in SQL Server but I hardly see it being used. Well, at least I use it and I am comfortable using it. Here is a simple script which I use when I have to identify first non-numeric character. Finding Different ColumnName From Almost Identitical Tables Well here is the interesting example of how we can use sys.column catalogue views and get the details of the newly added column. I have previously written about EXCEPT over here which is very similar to MINUS of Oracle. Storing Data and Files in Cloud – Dropbox – Personal Technology Tip I thought long and hard about doing a Personal Technology Tips series for this blog.  I have so many tips I’d like to share.  I am on my computer almost all day, every day, so I have a treasure trove of interesting tidbits I like to share if given the chance.  The only thing holding me back – which tip to share first?  The first tip obviously has the weight of seeming like the most important.  But this would mean choosing amongst my favorite tricks and shortcuts.  This is a hard task. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • SQL SERVER – 3 Online SQL Courses at Pluralsight and Free Learning Resources

    - by pinaldave
    Usain Bolt is an inspiration for all. He broke his own record multiple times because he wanted to do better! Read more about him on wikipedia. He is great and indeed fastest man on the planet. Usain Bolt – World’s Fastest Man “Can you teach me SQL Server Performance Tuning?” This is one of the most popular questions which I receive all the time. The answer is YES. I would love to do performance tuning training for anyone, anywhere.  It is my favorite thing to do, and it is my favorite thing to train others in.  If possible, I would love to do training 24 hours a day, 7 days a week, 365 days a year.  To me, it doesn’t feel like a job. Of course, as much as I would love to do performance tuning 24/7/365, obviously I am just one human being and can only be in one place t one time.  It is also very difficult to train more than one person at a time, and it is difficult to train two or more people at a time, especially when the two people are at different levels.  I am also limited by geography.  I live in India, and adjust to my own time zone.  Trying to teach a live course from India to someone whose time zone is 12 or more hours off of mine is very difficult.  If I am trying to teach at 2 am, I am sure I am not at my best! There was only one solution to scale – Online Trainings. I have built 3 different courses on SQL Server Performance Tuning with Pluralsight. Now I have no problem – I am 100% scalable and available 24/7 and 365. You can make me say the same things again and again till you find it right. I am in your mobile, PC as well as on XBOX. This is why I am such a big fan of online courses.  I have recorded many performance tuning classes and you can easily access them online, at your own time.  And don’t think that just because these aren’t live classes you won’t be able to get any feedback from me.  I encourage all my viewers to go ahead and ask me questions by e-mail, Twitter, Facebook, or whatever way you can get a hold of me. Here are details of three of my courses with Pluralsight. I suggest you go over the description of the course. As an author of the course, I have few FREE codes for watching the free courses. Please leave a comment with your valid email address, I will send a few of them to random winners. SQL Server Performance: Introduction to Query Tuning  SQL Server performance tuning is an art to master – for developers and DBAs alike. This course takes a systematic approach to planning, analyzing, debugging and troubleshooting common query-related performance problems. This includes an introduction to understanding execution plans inside SQL Server. In this almost four hour course we cover following important concepts. Introduction 10:22 Execution Plan Basics 45:59 Essential Indexing Techniques 20:19 Query Design for Performance 50:16 Performance Tuning Tools 01:15:14 Tips and Tricks 25:53 Checklist: Performance Tuning 07:13 The duration of each module is mentioned besides the name of the module. SQL Server Performance: Indexing Basics This course teaches you how to master the art of performance tuning SQL Server by better understanding indexes. In this almost two hour course we cover following important concepts. Introduction 02:03 Fundamentals of Indexing 22:21 Practical Indexing Implementation Techniques 37:25 Index Maintenance 16:33 Introduction to ColumnstoreIndex 08:06 Indexing Practical Performance Tips and Tricks 24:56 Checklist : Index and Performance 07:29 The duration of each module is mentioned besides the name of the module. SQL Server Questions and Answers This course is designed to help you better understand how to use SQL Server effectively. The course presents many of the common misconceptions about SQL Server, and then carefully debunks those misconceptions with clear explanations and short but compelling demos, showing you how SQL Server really works. In this almost 2 hours and 15 minutes course we cover following important concepts. Introduction 00:54 Retrieving IDENTITY value using @@IDENTITY 08:38 Concepts Related to Identity Values 04:15 Difference between WHERE and HAVING 05:52 Order in WHERE clause 07:29 Concepts Around Temporary Tables and Table Variables 09:03 Are stored procedures pre-compiled? 05:09 UNIQUE INDEX and NULLs problem 06:40 DELETE VS TRUNCATE 06:07 Locks and Duration of Transactions 15:11 Nested Transaction and Rollback 09:16 Understanding Date/Time Datatypes 07:40 Differences between VARCHAR and NVARCHAR datatypes 06:38 Precedence of DENY and GRANT security permissions 05:29 Identify Blocking Process 06:37 NULLS usage with Dynamic SQL 08:03 Appendix Tips and Tricks with Tools 20:44 The duration of each module is mentioned besides the name of the module. SQL in Sixty Seconds You will have to login and to get subscribed to the courses to view them. Here are my free video learning resources SQL in Sixty Seconds. These are 60 second video which I have built on various subjects related to SQL Server. Do let me know what you think about them? Here are three of my latest videos: Identify Most Resource Intensive Queries – SQL in Sixty Seconds #028 Copy Column Headers from Resultset – SQL in Sixty Seconds #027 Effect of Collation on Resultset – SQL in Sixty Seconds #026 You can watch and learn at your own pace.  Then you can easily ask me any questions you have.  E-mail is easiest, but for really tough questions I’m willing to talk on Skype, Gtalk, or even Facebook chat.  Please do watch and then talk with me, I am always available on the internet! Here is the video of the world’s fastest man.Usain St. Leo Bolt inspires us that we all do better than best. We can go the next level of our own record. We all can improve if we have a will and dedication.  Watch the video from 5:00 mark. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL in Sixty Seconds, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, SQL Training, SQLServer, T SQL, Technology, Video

    Read the article

  • SQL SERVER – SSMS Automatically Generates TOP (100) PERCENT in Query Designer

    - by pinaldave
    Earlier this week, I was surfing various SQL forums to see what kind of help developer need in the SQL Server world. One of the question indeed caught my attention. I am here regenerating complete question as well scenario to illustrate the point in a precise manner. Additionally, I have added added second part of the question to give completeness. Question: I am trying to create a view in Query Designer (not in the New Query Window). Every time I am trying to create a view it always adds  TOP (100) PERCENT automatically on the T-SQL script. No matter what I do, it always automatically adds the TOP (100) PERCENT to the script. I have attempted to copy paste from notepad, build a query and a few other things – there is no success. I am really not sure what I am doing wrong with Query Designer. Here is my query script: (I use AdventureWorks as a sample database) SELECT Person.Address.AddressID FROM Person.Address INNER JOIN Person.AddressType ON Person.Address.AddressID = Person.AddressType.AddressTypeID ORDER BY Person.Address.AddressID This script automatically replaces by following query: SELECT TOP (100) PERCENT Person.Address.AddressID FROM Person.Address INNER JOIN Person.AddressType ON Person.Address.AddressID = Person.AddressType.AddressTypeID ORDER BY Person.Address.AddressID However, when I try to do the same from New Query Window it works totally fine. However, when I attempt to create a view of the same query it gives following error. Msg 1033, Level 15, State 1, Procedure myView, Line 6 The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified. It is pretty clear to me now that the script which I have written seems to need TOP (100) PERCENT, so Query . Why do I need it? Is there any work around to this issue. I particularly find this question pretty interesting as it really touches the fundamentals of the T-SQL query writing. Please note that the query which is automatically changed is not in New Query Editor but opened from SSMS using following way. Database >> Views >> Right Click >> New View (see the image below) Answer: The answer to the above question can be very long but I will keep it simple and to the point. There are three things to discuss in above script 1) Reason for Error 2) Reason for Auto generates TOP (100) PERCENT and 3) Potential solutions to the above error. Let us quickly see them in detail. 1) Reason for Error The reason for error is already given in the error. ORDER BY is invalid in the views and a few other objects. One has to use TOP or other keywords along with it. The way semantics of the query works where optimizer only follows(honors) the ORDER BY in the same scope or the same SELECT/UPDATE/DELETE statement. There is a possibility that one can order after the scope of the view again the efforts spend to order view will be wasted. The final resultset of the query always follows the final ORDER BY or outer query’s order and due to the same reason optimizer follows the final order of the query and not of the views (as view will be used in another query for further processing e.g. in SELECT statement). Due to same reason ORDER BY is now allowed in the view. For further accuracy and clear guidance I suggest you read this blog post by Query Optimizer Team. They have explained it very clear manner the same subject. 2) Reason for Auto Generated TOP (100) PERCENT One of the most popular workaround to above error is to use TOP (100) PERCENT in the view. Now TOP (100) PERCENT allows user to use ORDER BY in the query and allows user to overcome above error which we discussed. This gives the impression to the user that they have resolved the error and successfully able to use ORDER BY in the View. Well, this is incorrect as well. The way this works is when TOP (100) PERCENT is used the result is not guaranteed as well it is ignored in our the query where the view is used. Here is the blog post on this subject: Interesting Observation – TOP 100 PERCENT and ORDER BY. Now when you create a new view in the SSMS and build a query with ORDER BY to avoid the error automatically it adds the TOP 100 PERCENT. Here is the connect item for the same issue. I am sure there will be more connect items as well but I could not find them. 3) Potential Solutions If you are reading this post from the beginning in that case, it is clear by now that ORDER BY should not be used in the View as it does not serve any purpose unless there is a specific need of it. If you are going to use TOP 100 PERCENT with ORDER BY there is absolutely no need of using ORDER BY rather avoid using it all together. Here is another blog post of mine which describes the same subject ORDER BY Does Not Work – Limitation of the Views Part 1. It is valid to use ORDER BY in a view if there is a clear business need of using TOP with any other percentage lower than 100 (for example TOP 10 PERCENT or TOP 50 PERCENT etc). In most of the cases ORDER BY is not needed in the view and it should be used in the most outer query for present result in desired order. User can remove TOP 100 PERCENT and ORDER BY from the view before using the view in any query or procedure. In the most outer query there should be ORDER BY as per the business need. I think this sums up the concept in a few words. This is a very long topic and not easy to illustrate in one single blog post. I welcome your comments and suggestions. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, SQL View, T SQL, Technology

    Read the article

  • Parallel LINQ - PLINQ

    - by nmarun
    Turns out now with .net 4.0 we can run a query like a multi-threaded application. Say you want to query a collection of objects and return only those that meet certain conditions. Until now, we basically had one ‘control’ that iterated over all the objects in the collection, checked the condition on each object and returned if it passed. We obviously agree that if we can ‘break’ this task into smaller ones, assign each task to a different ‘control’ and ask all the controls to do their job - in-parallel, the time taken the finish the entire task will be much lower. Welcome to PLINQ. Let’s take some examples. I have the following method that uses our good ol’ LINQ. 1: private static void Linq(int lowerLimit, int upperLimit) 2: { 3: // populate an array with int values from lowerLimit to the upperLimit 4: var source = Enumerable.Range(lowerLimit, upperLimit); 5:  6: // Start a timer 7: Stopwatch stopwatch = new Stopwatch(); 8: stopwatch.Start(); 9:  10: // set the expectation => build the expression tree 11: var evenNumbers =   from num in source 12: where IsDivisibleBy(num, 2) 13: select num; 14: 15: // iterate over and print the returned items 16: foreach (var number in evenNumbers) 17: { 18: Console.WriteLine(string.Format("** {0}", number)); 19: } 20:  21: stopwatch.Stop(); 22:  23: // check the metrics 24: Console.WriteLine(String.Format("Elapsed {0}ms", stopwatch.ElapsedMilliseconds)); 25: } I’ve added comments for the major steps, but the only thing I want to talk about here is the IsDivisibleBy() method. I know I could have just included the logic directly in the where clause. I called a method to add ‘delay’ to the execution of the query - to simulate a loooooooooong operation (will be easier to compare the results). 1: private static bool IsDivisibleBy(int number, int divisor) 2: { 3: // iterate over some database query 4: // to add time to the execution of this method; 5: // the TableB has around 10 records 6: for (int i = 0; i < 10; i++) 7: { 8: DataClasses1DataContext dataContext = new DataClasses1DataContext(); 9: var query = from b in dataContext.TableBs select b; 10: 11: foreach (var row in query) 12: { 13: // Do NOTHING (wish my job was like this) 14: } 15: } 16:  17: return number % divisor == 0; 18: } Now, let’s look at how to modify this to PLINQ. 1: private static void Plinq(int lowerLimit, int upperLimit) 2: { 3: // populate an array with int values from lowerLimit to the upperLimit 4: var source = Enumerable.Range(lowerLimit, upperLimit); 5:  6: // Start a timer 7: Stopwatch stopwatch = new Stopwatch(); 8: stopwatch.Start(); 9:  10: // set the expectation => build the expression tree 11: var evenNumbers = from num in source.AsParallel() 12: where IsDivisibleBy(num, 2) 13: select num; 14:  15: // iterate over and print the returned items 16: foreach (var number in evenNumbers) 17: { 18: Console.WriteLine(string.Format("** {0}", number)); 19: } 20:  21: stopwatch.Stop(); 22:  23: // check the metrics 24: Console.WriteLine(String.Format("Elapsed {0}ms", stopwatch.ElapsedMilliseconds)); 25: } That’s it, this is now in PLINQ format. Oh and if you haven’t found the difference, look line 11 a little more closely. You’ll see an extension method ‘AsParallel()’ added to the ‘source’ variable. Couldn’t be more simpler right? So this is going to improve the performance for us. Let’s test it. So in my Main method of the Console application that I’m working on, I make a call to both. 1: static void Main(string[] args) 2: { 3: // set lower and upper limits 4: int lowerLimit = 1; 5: int upperLimit = 20; 6: // call the methods 7: Console.WriteLine("Calling Linq() method"); 8: Linq(lowerLimit, upperLimit); 9: 10: Console.WriteLine(); 11: Console.WriteLine("Calling Plinq() method"); 12: Plinq(lowerLimit, upperLimit); 13:  14: Console.ReadLine(); // just so I get enough time to read the output 15: } YMMV, but here are the results that I got:    It’s quite obvious from the above results that the Plinq() method is taking considerably less time than the Linq() version. I’m sure you’ve already noticed that the output of the Plinq() method is not in order. That’s because, each of the ‘control’s we sent to fetch the results, reported with values as and when they obtained them. This is something about parallel LINQ that one needs to remember – the collection cannot be guaranteed to be undisturbed. This could be counted as a negative about PLINQ (emphasize ‘could’). Nevertheless, if we want the collection to be sorted, we can use a SortedSet (.net 4.0) or build our own custom ‘sorter’. Either way we go, there’s a good chance we’ll end up with a better performance using PLINQ. And there’s another negative of PLINQ (depending on how you see it). This is regarding the CPU cycles. See the usage for Linq() method (used ResourceMonitor): I have dual CPU’s and see the height of the peak in the bottom two blocks and now compare to what happens when I run the Plinq() method. The difference is obvious. Higher usage, but for a shorter duration (width of the peak). Both these points make sense in both cases. Linq() runs for a longer time, but uses less resources whereas Plinq() runs for a shorter time and consumes more resources. Even after knowing all these, I’m still inclined towards PLINQ. PLINQ rocks! (no hard feelings LINQ)

    Read the article

  • How to archive data from a table to a local or remote database in SQL 2005 and SQL 2008

    - by simonsabin
    Often you have the need to archive data from a table. This leads to a number of challenges 1. How can you do it without impacting users 2. How can I make it transactionally consistent, i.e. the data I put in the archive is the data I remove from the main table 3. How can I get it to perform well Points 1 is very much tied to point 3. If it doesn't perform well then the delete of data is going to cause lots of locks and thus potentially blocking. For points 1 and 3 refer to my previous posts DELETE-TOP-x-rows-avoiding-a-table-scan and UPDATE-and-DELETE-TOP-and-ORDER-BY---Part2. In essence you need to be removing small chunks of data from your table and you want to do that avoiding a table scan. So that deals with the delete approach but archiving is about inserting that data somewhere else. Well in SQL 2008 they introduced a new feature INSERT over DML (Data Manipulation Language, i.e. SQL statements that change data), or composable DML. The ability to nest DML statements within themselves, so you can past the results of an insert to an update to a merge. I've mentioned this before here SQL-Server-2008---MERGE-and-optimistic-concurrency. This feature is currently limited to being able to consume the results of a DML statement in an INSERT statement. There are many restrictions which you can find here http://msdn.microsoft.com/en-us/library/ms177564.aspx look for the section "Inserting Data Returned From an OUTPUT Clause Into a Table" Even with the restrictions what we can do is consume the OUTPUT from a DELETE and INSERT the results into a table in another database. Note that in BOL it refers to not being able to use a remote table, remote means a table on another SQL instance. To show this working use this SQL to setup two databases foo and fooArchive create database foo go --create the source table fred in database foo select * into foo..fred from sys.objects go create database fooArchive go if object_id('fredarchive',DB_ID('fooArchive')) is null begin     select getdate() ArchiveDate,* into fooArchive..FredArchive from sys.objects where 1=2       end go And then we can use this simple statement to archive the data insert into fooArchive..FredArchive select getdate(),d.* from (delete top (1)         from foo..Fred         output deleted.*) d         go In this statement the delete can be any delete statement you wish so if you are deleting by ids or a range of values then you can do that. Refer to the DELETE-TOP-x-rows-avoiding-a-table-scan post to ensure that your delete is going to perform. The last thing you want to do is to perform 100 deletes each with 5000 records for each of those deletes to do a table scan. For a solution that works for SQL2005 or if you want to archive to a different server then you can use linked servers or SSIS. This example shows how to do it with linked servers. [ONARC-LAP03] is the source server. begin transaction insert into fooArchive..FredArchive select getdate(),d.* from openquery ([ONARC-LAP03],'delete top (1)                     from foo..Fred                     output deleted.*') d commit transaction and to prove the transactions work try, you should get the same number of records before and after. select (select count(1) from foo..Fred) fred        ,(select COUNT(1) from fooArchive..FredArchive ) fredarchive   begin transaction insert into fooArchive..FredArchive select getdate(),d.* from openquery ([ONARC-LAP03],'delete top (1)                     from foo..Fred                     output deleted.*') d rollback transaction   select (select count(1) from foo..Fred) fred        ,(select COUNT(1) from fooArchive..FredArchive ) fredarchive The transactions are very important with this solution. Look what happens when you don't have transactions and an error occurs   select (select count(1) from foo..Fred) fred        ,(select COUNT(1) from fooArchive..FredArchive ) fredarchive   insert into fooArchive..FredArchive select getdate(),d.* from openquery ([ONARC-LAP03],'delete top (1)                     from foo..Fred                     output deleted.*                     raiserror (''Oh doo doo'',15,15)') d                     select (select count(1) from foo..Fred) fred        ,(select COUNT(1) from fooArchive..FredArchive ) fredarchive Before running this think what the result would be. I got it wrong. What seems to happen is that the remote query is executed as a transaction, the error causes that to rollback. However the results have already been sent to the client and so get inserted into the

    Read the article

  • How to Achieve OC4J RMI Load Balancing

    - by fip
    This is an old, Oracle SOA and OC4J 10G topic. In fact this is not even a SOA topic per se. Questions of RMI load balancing arise when you developed custom web applications accessing human tasks running off a remote SOA 10G cluster. Having returned from a customer who faced challenges with OC4J RMI load balancing, I felt there is still some confusions in the field how OC4J RMI load balancing work. Hence I decide to dust off an old tech note that I wrote a few years back and share it with the general public. Here is the tech note: Overview A typical use case in Oracle SOA is that you are building web based, custom human tasks UI that will interact with the task services housed in a remote BPEL 10G cluster. Or, in a more generic way, you are just building a web based application in Java that needs to interact with the EJBs in a remote OC4J cluster. In either case, you are talking to an OC4J cluster as RMI client. Then immediately you must ask yourself the following questions: 1. How do I make sure that the web application, as an RMI client, even distribute its load against all the nodes in the remote OC4J cluster? 2. How do I make sure that the web application, as an RMI client, is resilient to the node failures in the remote OC4J cluster, so that in the unlikely case when one of the remote OC4J nodes fail, my web application will continue to function? That is the topic of how to achieve load balancing with OC4J RMI client. Solutions You need to configure and code RMI load balancing in two places: 1. Provider URL can be specified with a comma separated list of URLs, so that the initial lookup will land to one of the available URLs. 2. Choose a proper value for the oracle.j2ee.rmi.loadBalance property, which, along side with the PROVIDER_URL property, is one of the JNDI properties passed to the JNDI lookup.(http://docs.oracle.com/cd/B31017_01/web.1013/b28958/rmi.htm#BABDGFBI) More details below: About the PROVIDER_URL The JNDI property java.name.provider.url's job is, when the client looks up for a new context at the very first time in the client session, to provide a list of RMI context The value of the JNDI property java.name.provider.url goes by the format of a single URL, or a comma separate list of URLs. A single URL. For example: opmn:ormi://host1:6003:oc4j_instance1/appName1 A comma separated list of multiple URLs. For examples:  opmn:ormi://host1:6003:oc4j_instanc1/appName, opmn:ormi://host2:6003:oc4j_instance1/appName, opmn:ormi://host3:6003:oc4j_instance1/appName When the client looks up for a new Context the very first time in the client session, it sends a query against the OPMN referenced by the provider URL. The OPMN host and port specifies the destination of such query, and the OC4J instance name and appName are actually the “where clause” of the query. When the PROVIDER URL reference a single OPMN server Let's consider the case when the provider url only reference a single OPMN server of the destination cluster. In this case, that single OPMN server receives the query and returns a list of the qualified Contexts from all OC4Js within the cluster, even though there is a single OPMN server in the provider URL. A context represent a particular starting point at a particular server for subsequent object lookup. For example, if the URL is opmn:ormi://host1:6003:oc4j_instance1/appName, then, OPMN will return the following contexts: appName on oc4j_instance1 on host1 appName on oc4j_instance1 on host2, appName on oc4j_instance1 on host3,  (provided that host1, host2, host3 are all in the same cluster) Please note that One OPMN will be sufficient to find the list of all contexts from the entire cluster that satisfy the JNDI lookup query. You can do an experiment by shutting down appName on host1, and observe that OPMN on host1 will still be able to return you appname on host2 and appName on host3. When the PROVIDER URL reference a comma separated list of multiple OPMN servers When the JNDI propery java.naming.provider.url references a comma separated list of multiple URLs, the lookup will return the exact same things as with the single OPMN server: a list of qualified Contexts from the cluster. The purpose of having multiple OPMN servers is to provide high availability in the initial context creation, such that if OPMN at host1 is unavailable, client will try the lookup via OPMN on host2, and so on. After the initial lookup returns and cache a list of contexts, the JNDI URL(s) are no longer used in the same client session. That explains why removing the 3rd URL from the list of JNDI URLs will not stop the client from getting the EJB on the 3rd server. About the oracle.j2ee.rmi.loadBalance Property After the client acquires the list of contexts, it will cache it at the client side as “list of available RMI contexts”.  This list includes all the servers in the destination cluster. This list will stay in the cache until the client session (JVM) ends. The RMI load balancing against the destination cluster is happening at the client side, as the client is switching between the members of the list. Whether and how often the client will fresh the Context from the list of Context is based on the value of the  oracle.j2ee.rmi.loadBalance. The documentation at http://docs.oracle.com/cd/B31017_01/web.1013/b28958/rmi.htm#BABDGFBI list all the available values for the oracle.j2ee.rmi.loadBalance. Value Description client If specified, the client interacts with the OC4J process that was initially chosen at the first lookup for the entire conversation. context Used for a Web client (servlet or JSP) that will access EJBs in a clustered OC4J environment. If specified, a new Context object for a randomly-selected OC4J instance will be returned each time InitialContext() is invoked. lookup Used for a standalone client that will access EJBs in a clustered OC4J environment. If specified, a new Context object for a randomly-selected OC4J instance will be created each time the client calls Context.lookup(). Please note the regardless of the setting of oracle.j2ee.rmi.loadBalance property, the “refresh” only occurs at the client. The client can only choose from the "list of available context" that was returned and cached from the very first lookup. That is, the client will merely get a new Context object from the “list of available RMI contexts” from the cache at the client side. The client will NOT go to the OPMN server again to get the list. That also implies that if you are adding a node to the server cluster AFTER the client’s initial lookup, the client would not know it because neither the server nor the client will initiate a refresh of the “list of available servers” to reflect the new node. About High Availability (i.e. Resilience Against Node Failure of Remote OC4J Cluster) What we have discussed above is about load balancing. Let's also discuss high availability. This is how the High Availability works in RMI: when the client use the context but get an exception such as socket is closed, it knows that the server referenced by that Context is problematic and will try to get another unused Context from the “list of available contexts”. Again, this list is the list that was returned and cached at the very first lookup in the entire client session.

    Read the article

  • At most how many customized P3 attributes could be added into Agile?

    - by Jie Chen
    I have one customer/Oracle Partner Consultant asking me such question: how many customized attributes can be allowed to add to Agile's subclass Page Three? I never did research against this because Agile User Guide never says this and theoretically Agile supports unlimited amount of customized attributes, unless the browser itself cannot handle them in allocated memory. However my customers says when to add almost 1000 attributes, the browser (Web Client) will not show any Page Three attributes, including all the out-of-box attributes. Let's see why. Analysis It is horrible to add 1000 attributes manually. Let's do it by a batch SQL like below to add them to Item's subclass Page Three tab. Do not execute below SQL because it will not take effect due to your different node id. CREATE OR REPLACE PROCEDURE createP3Text(v_name IN VARCHAR2) IS v_nid NUMBER; v_pid NUMBER; BEGIN select SEQNODETABLE.nextval into v_nid from dual; Insert Into nodeTable ( id,parentID,description,objType,inherit,helpID,version,name ) values ( v_nid,2473003, v_name ,1,0,0,0, v_name); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,2,1,0,1,925, null); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,0,0,0,0,1,'0'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,0,0,0,0,2,'0'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,2,2,0,1,3,'50'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,2,1,0,1,5, null); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,2,2,0,1,6,'50'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,2,2,0,0,7,'0'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,4,1,451,1,8,'0'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,4,1,451,1,9,'1'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,2,1,0,1,10,v_name); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,0,0,0,0,11,'0'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,4,1,11743,1,14,'2'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,2,1,0,1,30, null); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,2,1,0,1,38, null); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,4,1,451,0,59,'1'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,4,1,451,0,60,'1'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,4,1,724,0,61, null); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,2,1,0,0,232,'0'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,4,1,451,0,233,'1'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,4,1,12239,1,415,'13307'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,2,1,0,0,605,'0'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,4,1,451,1,610,'0'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,4,1,451,0,716,'1'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,4,1,451,1,795,'0'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,4,1,2000008821,1,864,'2'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,4,1,451,1,923,'0'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,4,1,451,0,719,'0'); Insert Into tableInfo ( tabID,tableID,classID,att,ordering ) values ( 2473005,1501,2473002,v_nid,9999); commit; END createP3Text; / BEGIN FOR i in 1..1000 LOOP createP3Text('MyText' || i); END LOOP; END; / DROP PROCEDURE createP3Text; COMMIT; Now restart Agile Server and check the Server's log, we noticed below: ***** Node Created : 85625 ***** Property Created : 184579 +++++++++++++++++++++++++++++++++++++ + Agile PLM Server Starting Up... + +++++++++++++++++++++++++++++++++++++ However the previously log before batch SQL is ***** Node Created : 84625 ***** Property Created : 157579 +++++++++++++++++++++++++++++++++++++ + Agile PLM Server Starting Up... + +++++++++++++++++++++++++++++++++++++ Obviously we successfully imported 1000 (85625-84625) attributes. Now go to JavaClient and confirm if we have them or not. Theoretically we are able to open such item object and see all these 1000 attributes and their values, but we get below error. We have no error tips in server log. But never mind we have the Java Console for JavaClient. If to open the same item in JavaClient we get a clear error and detailed trace in Java Console. ORA-01795: maximum number of expressions in a list is 1000 java.sql.SQLException: ORA-01795: maximum number of expressions in a list is 1000 at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:125) ... ... at weblogic.jdbc.wrapper.PreparedStatement.executeQuery(PreparedStatement.java:128) at com.agile.pc.cmserver.base.AgileFlexUtil.setFlexValuesForOneRowTable(AgileFlexUtil.java:1104) at com.agile.pc.cmserver.base.BaseFlexTableDAO.loadExtraFlexAttValues(BaseFlexTableDAO.java:111) at com.agile.pc.cmserver.base.BasePageThreeDAO.loadTable(BasePageThreeDAO.java:108) If you are interested in the background of the problem, you may de-compile the class com.agile.pc.cmserver.base.AgileFlexUtil.setFlexValuesForOneRowTable and find the root cause that Agile happens to hit Oracle Database's limitation that more than 1000 values in the "IN" clause. Check here http://ora-01795.ora-code.com If you need Oracle Agile's final solution, please contact Oracle Agile Support. Performance Below two screenshot are jvm heap usage from before-SQL and after-SQL. We can see there is no big memory gap between two cases. So definitely there is no performance impact to Agile Application Server unless you have more than 1000 attributes for EACH of your dozens of  subclasses. And for client, 1000 attributes should not impact the browser's performance because in HTML we only use dt and dd for each attribute's pair: label and value. It is quite lightweight.

    Read the article

  • SQL SERVER – Weekly Series – Memory Lane – #032

    - by Pinal Dave
    Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Complete Series of Database Coding Standards and Guidelines SQL SERVER Database Coding Standards and Guidelines – Introduction SQL SERVER – Database Coding Standards and Guidelines – Part 1 SQL SERVER – Database Coding Standards and Guidelines – Part 2 SQL SERVER Database Coding Standards and Guidelines Complete List Download Explanation and Example – SELF JOIN When all of the data you require is contained within a single table, but data needed to extract is related to each other in the table itself. Examples of this type of data relate to Employee information, where the table may have both an Employee’s ID number for each record and also a field that displays the ID number of an Employee’s supervisor or manager. To retrieve the data tables are required to relate/join to itself. Insert Multiple Records Using One Insert Statement – Use of UNION ALL This is very interesting question I have received from new developer. How can I insert multiple values in table using only one insert? Now this is interesting question. When there are multiple records are to be inserted in the table following is the common way using T-SQL. Function to Display Current Week Date and Day – Weekly Calendar Straight blog post with script to find current week date and day based on the parameters passed in the function.  2008 In my beginning years, I have almost same confusion as many of the developer had in their earlier years. Here are two of the interesting question which I have attempted to answer in my early year. Even if you are experienced developer may be you will still like to read following two questions: Order Of Column In Index Order of Conditions in WHERE Clauses Example of DISTINCT in Aggregate Functions Have you ever used DISTINCT with the Aggregation Function? Here is a simple example about how users can do it. Create a Comma Delimited List Using SELECT Clause From Table Column Straight to script example where I explained how to do something easy and quickly. Compound Assignment Operators SQL SERVER 2008 has introduced new concept of Compound Assignment Operators. Compound Assignment Operators are available in many other programming languages for quite some time. Compound Assignment Operators is operator where variables are operated upon and assigned on the same line. PIVOT and UNPIVOT Table Examples Here is a very interesting question – the answer to the question can be YES or NO both. “If we PIVOT any table and UNPIVOT that table do we get our original table?” Read the blog post to get the explanation of the question above. 2009 What is Interim Table – Simple Definition of Interim Table The interim table is a table that is generated by joining two tables and not the final result table. In other words, when two tables are joined they create an interim table as resultset but the resultset is not final yet. It may be possible that more tables are about to join on the interim table, and more operations are still to be applied on that table (e.g. Order By, Having etc). Besides, it may be possible that there is no interim table; sometimes final table is what is generated when the query is run. 2010 Stored Procedure and Transactions If Stored Procedure is transactional then, it should roll back complete transactions when it encounters any errors. Well, that does not happen in this case, which proves that Stored Procedure does not only provide just the transactional feature to a batch of T-SQL. Generate Database Script for SQL Azure When talking about SQL Azure the most common complaint I hear is that the script generated from stand-along SQL Server database is not compatible with SQL Azure. This was true for some time for sure but not any more. If you have SQL Server 2008 R2 installed you can follow the guideline below to generate a script which is compatible with SQL Azure. Convert IN to EXISTS – Performance Talk It is NOT necessary that every time when IN is replaced by EXISTS it gives better performance. However, in our case listed above it does for sure give better performance. You can read about this subject in the associated blog post. Subquery or Join – Various Options – SQL Server Engine Knows the Best Every single time whenever there is a performance tuning exercise, I hear the conversation from developer where some prefer subquery and some prefer join. In this two part blog post, I explain the same in the detail with examples. Part 1 | Part 2 Merge Operations – Insert, Update, Delete in Single Execution MERGE is a new feature that provides an efficient way to do multiple DML operations. In earlier versions of SQL Server, we had to write separate statements to INSERT, UPDATE, or DELETE data based on certain conditions; however, at present, by using the MERGE statement, we can include the logic of such data changes in one statement that even checks when the data is matched and then just update it, and similarly, when the data is unmatched, it is inserted. 2011 Puzzle – Statistics are not updated but are Created Once Here is the quick scenario about my setup. Create Table Insert 1000 Records Check the Statistics Now insert 10 times more 10,000 indexes Check the Statistics – it will be NOT updated – WHY? Question to You – When to use Function and When to use Stored Procedure Personally, I believe that they are both different things - they cannot be compared. I can say, it will be like comparing apples and oranges. Each has its own unique use. However, they can be used interchangeably at many times and in real life (i.e., production environment). I have personally seen both of these being used interchangeably many times. This is the precise reason for asking this question. 2012 In year 2012 I had two interesting series ran on the blog. If there is no fun in learning, the learning becomes a burden. For the same reason, I had decided to build a three part quiz around SEQUENCE. The quiz was to identify the next value of the sequence. I encourage all of you to take part in this fun quiz. Guess the Next Value – Puzzle 1 Guess the Next Value – Puzzle 2 Guess the Next Value – Puzzle 3 Guess the Next Value – Puzzle 4 Simple Example to Configure Resource Governor – Introduction to Resource Governor Resource Governor is a feature which can manage SQL Server Workload and System Resource Consumption. We can limit the amount of CPU and memory consumption by limiting /governing /throttling on the SQL Server. If there are different workloads running on SQL Server and each of the workload needs different resources or when workloads are competing for resources with each other and affecting the performance of the whole server resource governor is a very important task. Tricks to Replace SELECT * with Column Names – SQL in Sixty Seconds #017 – Video  Retrieves unnecessary columns and increases network traffic When a new columns are added views needs to be refreshed manually Leads to usage of sub-optimal execution plan Uses clustered index in most of the cases instead of using optimal index It is difficult to debug SQL SERVER – Load Generator – Free Tool From CodePlex The best part of this SQL Server Load Generator is that users can run multiple simultaneous queries again SQL Server using different login account and different application name. The interface of the tool is extremely easy to use and very intuitive as well. A Puzzle – Swap Value of Column Without Case Statement Let us assume there is a single column in the table called Gender. The challenge is to write a single update statement which will flip or swap the value in the column. For example if the value in the gender column is ‘male’ swap it with ‘female’ and if the value is ‘female’ swap it with ‘male’. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • SQL SERVER – Weekly Series – Memory Lane – #034

    - by Pinal Dave
    Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 UDF – User Defined Function to Strip HTML – Parse HTML – No Regular Expression The UDF used in the blog does fantastic task – it scans entire HTML text and removes all the HTML tags. It keeps only valid text data without HTML task. This is one of the quite commonly requested tasks many developers have to face everyday. De-fragmentation of Database at Operating System to Improve Performance Operating system skips MDF file while defragging the entire filesystem of the operating system. It is absolutely fine and there is no impact of the same on performance. Read the entire blog post for my conversation with our network engineers. Delay Function – WAITFOR clause – Delay Execution of Commands How do you delay execution of the commands in SQL Server – ofcourse by using WAITFOR keyword. In this blog post, I explain the same with the help of T-SQL script. Find Length of Text Field To measure the length of TEXT fields the function is DATALENGTH(textfield). Len will not work for text field. As of SQL Server 2005, developers should migrate all the text fields to VARCHAR(MAX) as that is the way forward. Retrieve Current Date Time in SQL Server CURRENT_TIMESTAMP, GETDATE(), {fn NOW()} There are three ways to retrieve the current datetime in SQL SERVER. CURRENT_TIMESTAMP, GETDATE(), {fn NOW()} Explanation and Comparison of NULLIF and ISNULL An interesting observation is NULLIF returns null if it comparison is successful, whereas ISNULL returns not null if its comparison is successful. In one way they are opposite to each other. Here is my question to you - How to create infinite loop using NULLIF and ISNULL? If this is even possible? 2008 Introduction to SERVERPROPERTY and example SERVERPROPERTY is a very interesting system function. It returns many of the system values. I use it very frequently to get different server values like Server Collation, Server Name etc. SQL Server Start Time We can use DMV to find out what is the start time of SQL Server in 2008 and later version. In this blog you can see how you can do the same. Find Current Identity of Table Many times we need to know what is the current identity of the column. I have found one of my developers using aggregated function MAX () to find the current identity. However, I prefer following DBCC command to figure out current identity. Create Check Constraint on Column Some time we just need to create a simple constraint over the table but I have noticed that developers do many different things to make table column follow rules than just creating constraint. I suggest constraint is a very useful concept and every SQL Developer should pay good attention to this subject. 2009 List Schema Name and Table Name for Database This is one of the blog post where I straight forward display script. One of the kind of blog posts, which I still love to read and write. Clustered Index on Separate Drive From Table Location A table devoid of primary key index is called heap, and here data is not arranged in a particular order, which gives rise to issues that adversely affect performance. Data must be stored in some kind of order. If we put clustered index on it then the order will be forced by that index and the data will be stored in that particular order. Understanding Table Hints with Examples Hints are options and strong suggestions specified for enforcement by the SQL Server query processor on DML statements. The hints override any execution plan the query optimizer might select for a query. 2010 Data Pages in Buffer Pool – Data Stored in Memory Cache One of my earlier year article, which I still read it many times and point developers to read it again. It is clear from the Resultset that when more than one index is used, datapages related to both or all of the indexes are stored in Memory Cache separately. TRANSACTION, DML and Schema Locks Can you create a situation where you can see Schema Lock? Well, this is a very simple question, however during the interview I notice over 50 candidates failed to come up with the scenario. In this blog post, I have demonstrated the situation where we can see the schema lock in database. 2011 Solution – Puzzle – Statistics are not updated but are Created Once In this example I have created following situation: Create Table Insert 1000 Records Check the Statistics Now insert 10 times more 10,000 indexes Check the Statistics – it will be NOT updated Auto Update Statistics and Auto Create Statistics for database is TRUE Now I have requested two things in the example 1) Why this is happening? 2) How to fix this issue? Selecting Domain from Email Address This is a straight to script blog post where I explain how to select only domain name from entire email address. Solution – Generating Zero Without using Any Numbers in T-SQL How to get zero digit without using any digit? This is indeed a very interesting question and the answer is even interesting. Try to come up with answer in next 10 minutes and if you can’t come up with the answer the blog post read this post for solution. 2012 Simple Explanation and Puzzle with SOUNDEX Function and DIFFERENCE Function In simple words - SOUNDEX converts an alphanumeric string to a four-character code to find similar-sounding words or names. DIFFERENCE function returns an integer value. The  integer returned is the number of characters in the SOUNDEX values that are the same. Read Only Files and SQL Server Management Studio (SSMS) I have come across a very interesting feature in SSMS related to “Read Only” files. I believe it is a little unknown feature as well so decided to write a blog about the same. Identifying Column Data Type of uniqueidentifier without Querying System Tables How do I know if any table has a uniqueidentifier column and what is its value without using any DMV or System Catalogues? Only information you know is the table name and you are allowed to return any kind of error if the table does not have uniqueidentifier column. Read the blog post to find the answer. Solution – User Not Able to See Any User Created Object in Tables – Security and Permissions Issue Interesting question – “When I try to connect to SQL Server, it lets me connect just fine as well let me open and explore the database. I noticed that I do not see any user created instances but when my colleague attempts to connect to the server, he is able to explore the database as well see all the user created tables and other objects. Can you help me fix it?” Importing CSV File Into Database – SQL in Sixty Seconds #018 – Video Here is interesting small 60 second video on how to import CSV file into Database. ColumnStore Index – Batch Mode vs Row Mode Here is the logic behind when Columnstore Index uses Batch Mode and when it uses Row Mode. A batch typically represents about 1000 rows of data. Batch mode processing also uses algorithms that are optimized for the multicore CPUs and increased memory throughput. Follow up – Usage of $rowguid and $IDENTITY This is an excellent follow up blog post of my earlier blog post where I explain where to use $rowguid and $identity.  If you do not know the difference between them, this is a blog with a script example. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • Full-text Indexing Books Online

    - by Most Valuable Yak (Rob Volk)
    While preparing for a recent SQL Saturday presentation, I was struck by a crazy idea (shocking, I know): Could someone import the content of SQL Server Books Online into a database and apply full-text indexing to it?  The answer is yes, and it's really quite easy to do. The first step is finding the installed help files.  If you have SQL Server 2012, BOL is installed under the Microsoft Help Library.  You can find the install location by opening SQL Server Books Online and clicking the gear icon for the Help Library Manager.  When the new window pops up click the Settings link, you'll get the following: You'll see the path under Library Location. Once you navigate to that path you'll have to drill down a little further, to C:\ProgramData\Microsoft\HelpLibrary\content\Microsoft\store.  This is where the help file content is kept if you downloaded it for offline use. Depending on which products you've downloaded help for, you may see a few hundred files.  Fortunately they're named well and you can easily find the "SQL_Server_Denali_Books_Online_" files.  We are interested in the .MSHC files only, and can skip the Installation and Developer Reference files. Despite the .MHSC extension, these files are compressed with the standard Zip format, so your favorite archive utility (WinZip, 7Zip, WinRar, etc.) can open them.  When you do, you'll see a few thousand files in the archive.  We are only interested in the .htm files, but there's no harm in extracting all of them to a folder.  7zip provides a command-line utility and the following will extract to a D:\SQLHelp folder previously created: 7z e –oD:\SQLHelp "C:\ProgramData\Microsoft\HelpLibrary\content\Microsoft\store\SQL_Server_Denali_Books_Online_B780_SQL_110_en-us_1.2.mshc" *.htm Well that's great Rob, but how do I put all those files into a full-text index? I'll tell you in a second, but first we have to set up a few things on the database side.  I'll be using a database named Explore (you can certainly change that) and the following setup is a fragment of the script I used in my presentation: USE Explore; GO CREATE SCHEMA help AUTHORIZATION dbo; GO -- Create default fulltext catalog for later FT indexes CREATE FULLTEXT CATALOG FTC AS DEFAULT; GO CREATE TABLE help.files(file_id int not null IDENTITY(1,1) CONSTRAINT PK_help_files PRIMARY KEY, path varchar(256) not null CONSTRAINT UNQ_help_files_path UNIQUE, doc_type varchar(6) DEFAULT('.xml'), content varbinary(max) not null); CREATE FULLTEXT INDEX ON help.files(content TYPE COLUMN doc_type LANGUAGE 1033) KEY INDEX PK_help_files; This will give you a table, default full-text catalog, and full-text index on that table for the content you're going to insert.  I'll be using the command line again for this, it's the easiest method I know: for %a in (D:\SQLHelp\*.htm) do sqlcmd -S. -E -d Explore -Q"set nocount on;insert help.files(path,content) select '%a', cast(c as varbinary(max)) from openrowset(bulk '%a', SINGLE_CLOB) as c(c)" You'll need to copy and run that as one line in a command prompt.  I'll explain what this does while you run it and watch several thousand files get imported: The "for" command allows you to loop over a collection of items.  In this case we want all the .htm files in the D:\SQLHelp folder.  For each file it finds, it will assign the full path and file name to the %a variable.  In the "do" clause, we'll specify another command to be run for each iteration of the loop.  I make a call to "sqlcmd" in order to run a SQL statement.  I pass in the name of the server (-S.), where "." represents the local default instance. I specify -d Explore as the database, and -E for trusted connection.  I then use -Q to run a query that I enclose in double quotes. The query uses OPENROWSET(BULK…SINGLE_CLOB) to open the file as a data source, and to treat it as a single character large object.  In order for full-text indexing to work properly, I have to convert the text content to varbinary. I then INSERT these contents along with the full path of the file into the help.files table created earlier.  This process continues for each file in the folder, creating one new row in the table. And that's it! 5 SQL Statements and 2 command line statements to unzip and import SQL Server Books Online!  In case you're wondering why I didn't use FILESTREAM or FILETABLE, it's simply because I haven't learned them…yet. I may return to this blog after I figure that out and update it with the steps to do so.  I believe that will make it even easier. In the spirit of exploration, I'll leave you to work on some fulltext queries of this content.  I also recommend playing around with the sys.dm_fts_xxxx DMVs (I particularly like sys.dm_fts_index_keywords, it's pretty interesting).  There are additional example queries in the download material for my presentation linked above. Many thanks to Kevin Boles (t) for his advice on (re)checking the content of the help files.  Don't let that .htm extension fool you! The 2012 help files are actually XML, and you'd need to specify '.xml' in your document type column in order to extract the full-text keywords.  (You probably noticed this in the default definition for the doc_type column.)  You can query sys.fulltext_document_types to get a complete list of the types that can be full-text indexed. I also need to thank Hilary Cotter for giving me the original idea. I believe he used MSDN content in a full-text index for an article from waaaaaaaaaaay back, that I can't find now, and had forgotten about until just a few days ago.  He is also co-author of Pro Full-Text Search in SQL Server 2008, which I highly recommend.  He also has some FTS articles on Simple Talk: http://www.simple-talk.com/sql/learn-sql-server/sql-server-full-text-search-language-features/ http://www.simple-talk.com/sql/learn-sql-server/sql-server-full-text-search-language-features,-part-2/

    Read the article

  • PASS: Election Changes for 2011

    - by Bill Graziano
    Last year after the election, the PASS Board created an Election Review Committee.  This group was charged with reviewing our election procedures and making suggestions to improve the process.  You can read about the formation of the group and review some of the intermediate work on the site – especially in the forums. I was one of the members of the group along with Joe Webb (Chair), Lori Edwards, Brian Kelley, Wendy Pastrick, Andy Warren and Allen White.  This group worked from October to April on our election process.  Along the way we: Interviewed interested parties including former NomCom members, Board candidates and anyone else that came forward. Held a session at the Summit to allow interested parties to discuss the issues Had numerous conference calls and worked through the various topics I can’t thank these people enough for the work they did.  They invested a tremendous number of hours thinking, talking and writing about our elections.  I’m proud to say I was a member of this group and thoroughly enjoyed working with everyone (even if I did finally get tired of all the calls.) The ERC delivered their recommendations to the PASS Board prior to our May Board meeting.  We reviewed those and made a few modifications.  I took their recommendations and rewrote them as procedures while incorporating those changes.  Their original recommendations as well as our final document are posted at the ERC documents page.  Please take a second and read them BEFORE we start the elections.  If you have any questions please post them in the forums on the ERC site. (My final document includes a change log at the end that I decided to leave in.  If you want to know which areas to pay special attention to that’s a good start.) Many of those recommendations were already posted in the forums or in the blogs of individual ERC members.  Hopefully nothing in the ERC document is too surprising. In this post I’m going to walk through some of the key changes and talk about what I remember from both ERC and Board discussions.  I’ll pay a little extra attention to things the Board changed from the ERC.  I’d also encourage any of the Board or ERC members to blog their thoughts on this. The Nominating Committee will continue to exist.  Personally, I was curious to see what the non-Board ERC members would think about the NomCom.  There was broad agreement that a group to vet candidates had value to the organization. The NomCom will be composed of five members.  Two will be Board members and three will be from the membership at large.  The only requirement for the three community members is that you’ve volunteered in some way (and volunteering is defined very broadly).  We expect potential at-large NomCom members to participate in a forum on the PASS site to answer questions from the other PASS members. We’re going to hold an election to determine the three community members.  It will be closer to voting for Summit sessions than voting for Board members.  That means there won’t be multiple dedicated emails.  If you’re at all paying attention it will be easy to participate.  Personally I wanted it easy for those that cared to participate but not overwhelm those that didn’t care.  I think this strikes a good balance. There’s also a clause that in order to be considered a winner in this NomCom election, you must receive 10 votes.  This is something I suggested.  I have no idea how popular the NomCom election is going to be.  I just wanted a fallback that if no one participated and some random person got in with one or two votes.  Any open slots will be filled by the NomCom chair (usually the PASS Immediate Past President).  My assumption is that they would probably take the next highest vote getters unless they were throwing flames in the forums or clearly unqualified.  As a final check, the Board still approves the final NomCom. The NomCom is going to rank candidates instead of rating them.  This has interesting implications.  This was championed by another ERC member and I’m hoping they write something about it.  This will really force the NomCom to make decisions between candidates.  You can’t just rate everyone a 3 and be done with it.  It may also make candidates appear further apart than they actually are.  I’m looking forward talking with the NomCom after this election and getting their feedback on this. The PASS Board added an option to remove a candidate with a unanimous vote of the NomCom.  This was primarily put in place to handle people that lied on their application or had a criminal background or some other unusual situation and we figured it out. We list an explicit goal of three candidate per open slot. We also wanted an easy way to find the NomCom candidate rankings from the ballot.  Hopefully this will satisfy those that want a broad candidate pool and those that want the NomCom to identify the most qualified candidates. The primary spokesperson for the NomCom is the committee chair.  After the issues around the election last year we didn’t have a good communication plan in place.  We should have and that was a failure on the part of the Board.  If there is criticism of the election this year I hope that falls squarely on the Board.  The community members of the NomCom shouldn’t be fielding complaints over the election process.  That said, the NomCom is ranking candidates and we are forcing them to rank some lower than others.  I’m sure you’ll each find someone that you think should have been ranked differently.  I also want to highlight one other change to the process that we started last year and isn’t included in these documents.  I think the candidate forums on the PASS site were tremendously helpful last year in helping people to find out more about candidates.  That gives our members a way to ask hard questions of the candidates and publicly see their answers. This year we have two important groups to fill.  The first is the NomCom.  We need three people from our membership to step up and fill this role.  It won’t be easy.  You will have to make subjective rankings of your fellow community members.  Your actions will be important in deciding who the future leaders of PASS will be.  There’s a 50/50 chance that one of the people you interview will be the President of PASS someday.  This is not a responsibility to be taken lightly. The second is the slate of candidates.  If you’ve ever thought about running for the Board this is the year.  We’ve never had nine candidates on the ballot before.  Your chance of making it through the NomCom are higher than in any previous year.  Unfortunately the more of you that run, the more of you that will lose in the election.  And hopefully that competition will mean more community involvement and better Board members for PASS. Is this the end of changes to the election process?  It isn’t.  Every year that I’ve been on the Board the election process has changed.  Some years there have been small changes and some years there have been large changes.  After this election we’ll look at how the process worked and decide what steps to take – just like we do every year.

    Read the article

  • SQL SERVER – Weekly Series – Memory Lane – #031

    - by Pinal Dave
    Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Find Table without Clustered Index – Find Table with no Primary Key Clustered index is very important concept for any table. They impact the performance very heavily. Here is a quick script to find tables without a clustered index. Replace TEXT with VARCHAR(MAX) – Stop using TEXT, NTEXT, IMAGE Data Types Question: “Is VARCHAR (MAX) big enough to store the TEXT field?” Answer: “Yes, VARCHAR(MAX) is big enough to accommodate TEXT field. TEXT, NTEXT and IMAGE data types of SQL Server 2000 will be deprecated in a future version of SQL Server, SQL Server 2005 provides backward compatibility to data types but it is recommended to use new data types which are VARHCAR (MAX), NVARCHAR (MAX) and VARBINARY (MAX).” Limiting Result Sets by Using TABLESAMPLE – Examples Introduced in SQL Server 2005, TABLESAMPLE allows you to extract a sampling of rows from a table in the FROM clause. The rows retrieved are random and they are are not in any order. This sampling can be based on a percentage of number of rows. You can use TABLESAMPLE when only a sampling of rows is necessary for the application instead of a full result set. User Defined Functions (UDF) Limitations UDF have its own advantage and usage but in this article we will see the limitation of UDF. Things UDF can not do and why Stored Procedure are considered as more flexible then UDFs. Stored Procedure are more flexibility then User Defined Functions(UDF). However, this blog post is a good read to know what are the limitations of UDF. Change Database Compatible Level – Backward Compatibility For a long time SQL Server stayed on the compatibility level of 80 which is of SQL Server 2000. However, as soon as SQL Server 2005 introduced the issue of compatibility was quite a major issue. Since that time MS has been releasing the versions at every 2-3 years, changing compatibility is a ever popular topic. In this blog post, we learn how we can do the same using T-SQL. We can also do the same using SSMS and here is the blog post for the same: Change Database Compatible Level – Backward Compatibility – Part 2 – Management Studio. Constraint on VARCHAR(MAX) Field To Limit It Certain Length How can I limit the VARCHAR(MAX) field with maximum length of 12500 characters only. His Question was valid as our application was allowed 12500 characters. First of all – this requirement is bit strange but if someone wants to do the same, they can do it as described in this blog post. 2008 UNPIVOT Table Example Understanding UNPIVOT can be very complicated at times. In this blog post, I have attempted to explain the same concept in very simple words. Create Default Constraint Over Table Column A simple straight to script blog post – I still use this blog quite many times for my own reference. UDF – Get the Day of the Week Function It took me 4 iteration to find this very simple function which can immediately get the day of the week in a single line. 2009 Find Hostname and Current Logged In User Name There are two tricks listed in this blog post where users can find out the hostname and current logged user name immediately and very easily. Interesting Observation of Logon Trigger On All Servers When I was doing a project, I made an interesting observation of executing a logon trigger multiple times. It was absolutely unexpected for me! As I was logging only once, naturally, I was expecting the entry only once. However, it did it multiple times on different threads – indeed an eccentric phenomenon at first sight! Difference Between Candidate Keys and Primary Key One needs to be very careful in selecting the Primary Key as an incorrect selection can adversely impact the database architect and future normalization. For a Candidate Key to qualify as a Primary Key, it should be Non-NULL and unique in any domain. I have observed quite often that Primary Keys are seldom changed. I would like to have your feedback on not changing a Primary Key. Create Multiple Filegroup For Single Database Why should one create multiple file group for any database and what are the advantages of the same. In this blog post, I explain the same in detail. List All Objects Created on All Filegroups in Database In this blog post we discuss the essential question – “How can I find which object belongs to which filegroup. Is there any way to know this?” 2010 DATE and TIME in SQL Server 2008 When DATE is converted to DATETIME it adds the of midnight. When TIME is converted to DATETIME it adds the date of 1900 and it is something one wants to consider if you are going to run scripts from SQL Server 2008 to earlier version with CONVERT. Disabled Index and Update Statistics If you do not need a nonclustered index, I suggest you to drop it as keeping them disabled is an overhead on your system. This is because every time the statistics are updated for system all the statistics for disabled indexes are also updated. Precision of SMALLDATETIME – A 1 Minute Precision The precision of the datatype SMALLDATETIME is 1 minute. It discards the seconds by rounding up or rounding down any seconds greater than zero. 2011 Getting Columns Headers without Result Data – SET FMTONLY ON SET FMTONLY ON returns only metadata to the client. It can be used to test the format of the response without actually running the query. When this setting is ON the resultset only have headers of the results but no data. Copy Database from Instance to Another Instance – Copy Paste in SQL Server SQL Server has a feature which copy database from one database to another database and it can be automated as well using SSIS. Make sure you have SQL Server Agent Turned on as this feature will create a job. Puzzle – SELECT * vs SELECT COUNT(*) If you have ever wondered SELECT * gives error when executed alone but SELECT COUNT(*) does not. Why? in that case, you should read this blog post. Creating All New Database with Full Recovery Model This blog post is very based on very interesting story where the user wants to do something by default for every single new database created. Model database is a secret weapon which should be used very carefully and with proper evalution. If used carefully this can be a very much beneficiary when we need a newly created database behave in certain fashion. 2012 In year 2012 I had two interesting series ran on the blog. If there is no fun in learning, the learning becomes a burden. For the same reason, I had decided to build a three part quiz around SEQUENCE. The quiz was to identify the next value of the sequence. I encourage all of you to take part in this fun quiz. Guess the Next Value – Puzzle 1 Guess the Next Value – Puzzle 2 Guess the Next Value – Puzzle 3 Can anyone remember their final day of schooling?  This is probably a silly question because – of course you can!  Many people mark this as the most exciting, happiest day of their life.  It marks the end of testing, the end of following rules set by teachers, and the beginning of finally being able to earn money and work in your chosen field. Read five part series on developer training subject Developer Training - Importance and Significance - Part 1 Developer Training – Employee Morals and Ethics – Part 2 Developer Training – Difficult Questions and Alternative Perspective - Part 3 Developer Training – Various Options for Developer Training – Part 4 Developer Training – A Conclusive Summary- Part 5 Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • Different Not Automatically Implies Better

    - by Alois Kraus
    Originally posted on: http://geekswithblogs.net/akraus1/archive/2013/11/05/154556.aspxRecently I was digging deeper why some WCF hosted workflow application did consume quite a lot of memory although it did basically only load a xaml workflow. The first tool of choice is Process Explorer or even better Process Hacker (has more options and the best feature copy&paste does work). The three most important numbers of a process with regards to memory are Working Set, Private Working Set and Private Bytes. Working set is the currently consumed physical memory (parts can be shared between processes e.g. loaded dlls which are read only) Private Working Set is the physical memory needed by this process which is not shareable Private Bytes is the number of non shareable which is only visible in the current process (e.g. all new, malloc, VirtualAlloc calls do create private bytes) When you have a bigger workflow it can consume under 64 bit easily 500MB for a 1-2 MB xaml file. This does not look very scalable. Under 64 bit the issue is excessive private bytes consumption and not the managed heap. The picture is quite different for 32 bit which looks a bit strange but it seems that the hosted VB compiler is a lot less memory hungry under 32 bit. I did try to repro the issue with a medium sized xaml file (400KB) which does contain 1000 variables and 1000 if which can be represented by C# code like this: string Var1; string Var2; ... string Var1000; if (!String.IsNullOrEmpty(Var1) ) { Console.WriteLine(“Var1”); } if (!String.IsNullOrEmpty(Var2) ) { Console.WriteLine(“Var2”); } ....   Since WF is based on VB.NET expressions you are bound to the hosted VB.NET compiler which does result in (x64) 140 MB of private bytes which is ca. 140 KB for each if clause which is quite a lot if you think about the actually present functionality. But there is hope. .NET 4.5 does allow now C# expressions for WF which is a major step forward for all C# lovers. I did create some simple patcher to “cross compile” my xaml to C# expressions. Lets look at the result: C# Expressions VB Expressions x86 x86 On my home machine I have only 32 bit which gives you quite exactly half of the memory consumption under 64 bit. C# expressions are 10 times more memory hungry than VB.NET expressions! I wanted to do more with less memory but instead it did consume a magnitude more memory. That is surprising to say the least. The workflow does initialize in about the same time under x64 and x86 where the VB code does it in 2s whereas the C# version needs 18s. Also nearly ten times slower. That is a too high price to pay for any bigger sized xaml workflow to convert from VB.NET to C# expressions. If I do reduce the number of expressions to 500 then it does need 400MB which is about half of the memory. It seems that the cost per if does rise linear with the number of total expressions in a xaml workflow.  Expression Language Cost per IF Startup Time C# 1000 Ifs x64 1,5 MB 18s C# 500 Ifs x64 750 KB 9s VB 1000 Ifs x64 140 KB 2s VB 500 Ifs x64 70 KB 1s Now we can directly compare two MS implementations. It is clear that the VB.NET compiler uses the same underlying structure but it has much higher offset compared to the highly inefficient C# expression compiler. I have filed a connect bug here with a harsher wording about recent advances in memory consumption. The funniest thing is that one MS employee did give an Azure AppFabric demo around early 2011 which was so slow that he needed to investigate with xperf. He was after startup time and the call stacks with regards to VB.NET expression compilation were remarkably similar. In fact I only found this post by googling for parts of my call stacks. … “C# expressions will be coming soon to WF, and that will have different performance characteristics than VB” … What did he know Jan 2011 what I did no know until today? ;-). He knew that C# expression will come but that they will not be automatically have better footprint. It is about time to fix that. In its current state C# expressions are not usable for bigger workflows. That also explains the headline for today. You can cheat startup time by prestarting workflows so that the demo looks nice and snappy but it does hurt scalability a lot since you do need much more memory than necessary. I did find the stacks by enabling virtual allocation tracking within XPerf which is still the best tool out there. But first you need to look at your process to check where the memory is hiding: For the C# Expression compiler you do not need xperf. You can directly dump the managed heap and check with a profiler of your choice. But if the allocations are happening on the Private Data ( VirtualAlloc ) you can find it with xperf. There is a nice video on channel 9 explaining VirtualAlloc tracking it in greater detail. If your data allocations are on the Heap it does mean that the C/C++ runtime did create a heap for you where all malloc, new calls do allocate from it. You can enable heap tracing with xperf and full call stack support as well which is doable via xperf like it is shown also on channel 9. Or you can use WPRUI directly: To make “Heap Usage” it work you need to set for your executable the tracing flags (before you start it). For example devenv.exe HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\devenv.exe DWORD TracingFlags 1 Do not forget to disable it after you did complete profiling the process or it will impact the startup time quite a lot. You can with xperf attach directly to a running process and collect heap allocation information from a gone wild process. Very handy if you need to find out what a process was doing which has arrived in a funny state. “VirtualAlloc usage” does work without explicitly enabling stuff for a specific process and is always on machine wide. I had issues on my Windows 7 machines with the call stack collection and the latest Windows 8.1 Performance Toolkit. I was told that WPA from Windows 8.0 should work fine but I do not want to downgrade.

    Read the article

  • Parallelism in .NET – Part 9, Configuration in PLINQ and TPL

    - by Reed
    Parallel LINQ and the Task Parallel Library contain many options for configuration.  Although the default configuration options are often ideal, there are times when customizing the behavior is desirable.  Both frameworks provide full configuration support. When working with Data Parallelism, there is one primary configuration option we often need to control – the number of threads we want the system to use when parallelizing our routine.  By default, PLINQ and the TPL both use the ThreadPool to schedule tasks.  Given the major improvements in the ThreadPool in CLR 4, this default behavior is often ideal.  However, there are times that the default behavior is not appropriate.  For example, if you are working on multiple threads simultaneously, and want to schedule parallel operations from within both threads, you might want to consider restricting each parallel operation to using a subset of the processing cores of the system.  Not doing this might over-parallelize your routine, which leads to inefficiencies from having too many context switches. In the Task Parallel Library, configuration is handled via the ParallelOptions class.  All of the methods of the Parallel class have an overload which accepts a ParallelOptions argument. We configure the Parallel class by setting the ParallelOptions.MaxDegreeOfParallelism property.  For example, let’s revisit one of the simple data parallel examples from Part 2: Parallel.For(0, pixelData.GetUpperBound(0), row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Here, we’re looping through an image, and calling a method on each pixel in the image.  If this was being done on a separate thread, and we knew another thread within our system was going to be doing a similar operation, we likely would want to restrict this to using half of the cores on the system.  This could be accomplished easily by doing: var options = new ParallelOptions(); options.MaxDegreeOfParallelism = Math.Max(Environment.ProcessorCount / 2, 1); Parallel.For(0, pixelData.GetUpperBound(0), options, row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); Now, we’re restricting this routine to using no more than half the cores in our system.  Note that I included a check to prevent a single core system from supplying zero; without this check, we’d potentially cause an exception.  I also did not hard code a specific value for the MaxDegreeOfParallelism property.  One of our goals when parallelizing a routine is allowing it to scale on better hardware.  Specifying a hard-coded value would contradict that goal. Parallel LINQ also supports configuration, and in fact, has quite a few more options for configuring the system.  The main configuration option we most often need is the same as our TPL option: we need to supply the maximum number of processing threads.  In PLINQ, this is done via a new extension method on ParallelQuery<T>: ParallelEnumerable.WithDegreeOfParallelism. Let’s revisit our declarative data parallelism sample from Part 6: double min = collection.AsParallel().Min(item => item.PerformComputation()); Here, we’re performing a computation on each element in the collection, and saving the minimum value of this operation.  If we wanted to restrict this to a limited number of threads, we would add our new extension method: int maxThreads = Math.Max(Environment.ProcessorCount / 2, 1); double min = collection .AsParallel() .WithDegreeOfParallelism(maxThreads) .Min(item => item.PerformComputation()); This automatically restricts the PLINQ query to half of the threads on the system. PLINQ provides some additional configuration options.  By default, PLINQ will occasionally revert to processing a query in parallel.  This occurs because many queries, if parallelized, typically actually cause an overall slowdown compared to a serial processing equivalent.  By analyzing the “shape” of the query, PLINQ often decides to run a query serially instead of in parallel.  This can occur for (taken from MSDN): Queries that contain a Select, indexed Where, indexed SelectMany, or ElementAt clause after an ordering or filtering operator that has removed or rearranged original indices. Queries that contain a Take, TakeWhile, Skip, SkipWhile operator and where indices in the source sequence are not in the original order. Queries that contain Zip or SequenceEquals, unless one of the data sources has an originally ordered index and the other data source is indexable (i.e. an array or IList(T)). Queries that contain Concat, unless it is applied to indexable data sources. Queries that contain Reverse, unless applied to an indexable data source. If the specific query follows these rules, PLINQ will run the query on a single thread.  However, none of these rules look at the specific work being done in the delegates, only at the “shape” of the query.  There are cases where running in parallel may still be beneficial, even if the shape is one where it typically parallelizes poorly.  In these cases, you can override the default behavior by using the WithExecutionMode extension method.  This would be done like so: var reversed = collection .AsParallel() .WithExecutionMode(ParallelExecutionMode.ForceParallelism) .Select(i => i.PerformComputation()) .Reverse(); Here, the default behavior would be to not parallelize the query unless collection implemented IList<T>.  We can force this to run in parallel by adding the WithExecutionMode extension method in the method chain. Finally, PLINQ has the ability to configure how results are returned.  When a query is filtering or selecting an input collection, the results will need to be streamed back into a single IEnumerable<T> result.  For example, the method above returns a new, reversed collection.  In this case, the processing of the collection will be done in parallel, but the results need to be streamed back to the caller serially, so they can be enumerated on a single thread. This streaming introduces overhead.  IEnumerable<T> isn’t designed with thread safety in mind, so the system needs to handle merging the parallel processes back into a single stream, which introduces synchronization issues.  There are two extremes of how this could be accomplished, but both extremes have disadvantages. The system could watch each thread, and whenever a thread produces a result, take that result and send it back to the caller.  This would mean that the calling thread would have access to the data as soon as data is available, which is the benefit of this approach.  However, it also means that every item is introducing synchronization overhead, since each item needs to be merged individually. On the other extreme, the system could wait until all of the results from all of the threads were ready, then push all of the results back to the calling thread in one shot.  The advantage here is that the least amount of synchronization is added to the system, which means the query will, on a whole, run the fastest.  However, the calling thread will have to wait for all elements to be processed, so this could introduce a long delay between when a parallel query begins and when results are returned. The default behavior in PLINQ is actually between these two extremes.  By default, PLINQ maintains an internal buffer, and chooses an optimal buffer size to maintain.  Query results are accumulated into the buffer, then returned in the IEnumerable<T> result in chunks.  This provides reasonably fast access to the results, as well as good overall throughput, in most scenarios. However, if we know the nature of our algorithm, we may decide we would prefer one of the other extremes.  This can be done by using the WithMergeOptions extension method.  For example, if we know that our PerformComputation() routine is very slow, but also variable in runtime, we may want to retrieve results as they are available, with no bufferring.  This can be done by changing our above routine to: var reversed = collection .AsParallel() .WithExecutionMode(ParallelExecutionMode.ForceParallelism) .WithMergeOptions(ParallelMergeOptions.NotBuffered) .Select(i => i.PerformComputation()) .Reverse(); On the other hand, if are already on a background thread, and we want to allow the system to maximize its speed, we might want to allow the system to fully buffer the results: var reversed = collection .AsParallel() .WithExecutionMode(ParallelExecutionMode.ForceParallelism) .WithMergeOptions(ParallelMergeOptions.FullyBuffered) .Select(i => i.PerformComputation()) .Reverse(); Notice, also, that you can specify multiple configuration options in a parallel query.  By chaining these extension methods together, we generate a query that will always run in parallel, and will always complete before making the results available in our IEnumerable<T>.

    Read the article

  • Implementing a generic repository for WCF data services

    - by cibrax
    The repository implementation I am going to discuss here is not exactly what someone would call repository in terms of DDD, but it is an abstraction layer that becomes handy at the moment of unit testing the code around this repository. In other words, you can easily create a mock to replace the real repository implementation. The WCF Data Services update for .NET 3.5 introduced a nice feature to support two way data bindings, which is very helpful for developing WPF or Silverlight based application but also for implementing the repository I am going to talk about. As part of this feature, the WCF Data Services Client library introduced a new collection DataServiceCollection<T> that implements INotifyPropertyChanged to notify the data context (DataServiceContext) about any change in the association links. This means that it is not longer necessary to manually set or remove the links in the data context when an item is added or removed from a collection. Before having this new collection, you basically used the following code to add a new item to a collection. Order order = new Order {   Name = "Foo" }; OrderItem item = new OrderItem {   Name = "bar",   UnitPrice = 10,   Qty = 1 }; var context = new OrderContext(); context.AddToOrders(order); context.AddToOrderItems(item); context.SetLink(item, "Order", order); context.SaveChanges(); Now, thanks to this new collection, everything is much simpler and similar to what you have in other ORMs like Entity Framework or L2S. Order order = new Order {   Name = "Foo" }; OrderItem item = new OrderItem {   Name = "bar",   UnitPrice = 10,   Qty = 1 }; order.Items.Add(item); var context = new OrderContext(); context.AddToOrders(order); context.SaveChanges(); In order to use this new feature, you first need to enable V2 in the data service, and then use some specific arguments in the datasvcutil tool (You can find more information about this new feature and how to use it in this post). DataSvcUtil /uri:"http://localhost:3655/MyDataService.svc/" /out:Reference.cs /dataservicecollection /version:2.0 Once you use those two arguments, the generated proxy classes will use DataServiceCollection<T> rather than a simple ObjectCollection<T>, which was the default collection in V1. There are some aspects that you need to know to use this feature correctly. 1. All the entities retrieved directly from the data context with a query track the changes and report those to the data context automatically. 2. A entity created with “new” does not track any change in the properties or associations. In order to enable change tracking in this entity, you need to do the following trick. public Order CreateOrder() {   var collection = new DataServiceCollection<Order>(this.context);   var order = new Order();   collection.Add(order);   return order; } You basically need to create a collection, and add the entity to that collection with the “Add” method to enable change tracking on that entity. 3. If you need to attach an existing entity (For example, if you created the entity with the “new” operator rather than retrieving it from the data context with a query) to a data context for tracking changes, you can use the “Load” method in the DataServiceCollection. var order = new Order {   Id = 1 }; var collection = new DataServiceCollection<Order>(this.context); collection.Load(order); In this case, the order with Id = 1 must exist on the data source exposed by the Data service. Otherwise, you will get an error because the entity did not exist. These cool extensions methods discussed by Stuart Leeks in this post to replace all the magic strings in the “Expand” operation with Expression Trees represent another feature I am going to use to implement this generic repository. Thanks to these extension methods, you could replace the following query with magic strings by a piece of code that only uses expressions. Magic strings, var customers = dataContext.Customers .Expand("Orders")         .Expand("Orders/Items") Expressions, var customers = dataContext.Customers .Expand(c => c.Orders.SubExpand(o => o.Items)) That query basically returns all the customers with their orders and order items. Ok, now that we have the automatic change tracking support and the expression support for explicitly loading entity associations, we are ready to create the repository. The interface for this repository looks like this,public interface IRepository { T Create<T>() where T : new(); void Update<T>(T entity); void Delete<T>(T entity); IQueryable<T> RetrieveAll<T>(params Expression<Func<T, object>>[] eagerProperties); IQueryable<T> Retrieve<T>(Expression<Func<T, bool>> predicate, params Expression<Func<T, object>>[] eagerProperties); void Attach<T>(T entity); void SaveChanges(); } The Retrieve and RetrieveAll methods are used to execute queries against the data service context. While both methods receive an array of expressions to load associations explicitly, only the Retrieve method receives a predicate representing the “where” clause. The following code represents the final implementation of this repository.public class DataServiceRepository: IRepository { ResourceRepositoryContext context; public DataServiceRepository() : this (new DataServiceContext()) { } public DataServiceRepository(DataServiceContext context) { this.context = context; } private static string ResolveEntitySet(Type type) { var entitySetAttribute = (EntitySetAttribute)type.GetCustomAttributes(typeof(EntitySetAttribute), true).FirstOrDefault(); if (entitySetAttribute != null) return entitySetAttribute.EntitySet; return null; } public T Create<T>() where T : new() { var collection = new DataServiceCollection<T>(this.context); var entity = new T(); collection.Add(entity); return entity; } public void Update<T>(T entity) { this.context.UpdateObject(entity); } public void Delete<T>(T entity) { this.context.DeleteObject(entity); } public void Attach<T>(T entity) { var collection = new DataServiceCollection<T>(this.context); collection.Load(entity); } public IQueryable<T> Retrieve<T>(Expression<Func<T, bool>> predicate, params Expression<Func<T, object>>[] eagerProperties) { var entitySet = ResolveEntitySet(typeof(T)); var query = context.CreateQuery<T>(entitySet); foreach (var e in eagerProperties) { query = query.Expand(e); } return query.Where(predicate); } public IQueryable<T> RetrieveAll<T>(params Expression<Func<T, object>>[] eagerProperties) { var entitySet = ResolveEntitySet(typeof(T)); var query = context.CreateQuery<T>(entitySet); foreach (var e in eagerProperties) { query = query.Expand(e); } return query; } public void SaveChanges() { this.context.SaveChanges(SaveChangesOptions.Batch); } } For instance, you can use the following code to retrieve customers with First name equal to “John”, and all their orders in a single call. repository.Retrieve<Customer>(    c => c.FirstName == “John”, //Where    c => c.Orders.SubExpand(o => o.Items)); In case, you want to have some pre-defined queries that you are going to use across several places, you can put them in an specific class. public static class CustomerQueries {   public static Expression<Func<Customer, bool>> LastNameEqualsTo(string lastName)   {     return c => c.LastName == lastName;   } } And then, use it with the repository. repository.Retrieve<Customer>(    CustomerQueries.LastNameEqualsTo("foo"),    c => c.Orders.SubExpand(o => o.Items));

    Read the article

  • Heaps of Trouble?

    - by Paul White NZ
    If you’re not already a regular reader of Brad Schulz’s blog, you’re missing out on some great material.  In his latest entry, he is tasked with optimizing a query run against tables that have no indexes at all.  The problem is, predictably, that performance is not very good.  The catch is that we are not allowed to create any indexes (or even new statistics) as part of our optimization efforts. In this post, I’m going to look at the problem from a slightly different angle, and present an alternative solution to the one Brad found.  Inevitably, there’s going to be some overlap between our entries, and while you don’t necessarily need to read Brad’s post before this one, I do strongly recommend that you read it at some stage; he covers some important points that I won’t cover again here. The Example We’ll use data from the AdventureWorks database, copied to temporary unindexed tables.  A script to create these structures is shown below: CREATE TABLE #Custs ( CustomerID INTEGER NOT NULL, TerritoryID INTEGER NULL, CustomerType NCHAR(1) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, ); GO CREATE TABLE #Prods ( ProductMainID INTEGER NOT NULL, ProductSubID INTEGER NOT NULL, ProductSubSubID INTEGER NOT NULL, Name NVARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, ); GO CREATE TABLE #OrdHeader ( SalesOrderID INTEGER NOT NULL, OrderDate DATETIME NOT NULL, SalesOrderNumber NVARCHAR(25) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, CustomerID INTEGER NOT NULL, ); GO CREATE TABLE #OrdDetail ( SalesOrderID INTEGER NOT NULL, OrderQty SMALLINT NOT NULL, LineTotal NUMERIC(38,6) NOT NULL, ProductMainID INTEGER NOT NULL, ProductSubID INTEGER NOT NULL, ProductSubSubID INTEGER NOT NULL, ); GO INSERT #Custs ( CustomerID, TerritoryID, CustomerType ) SELECT C.CustomerID, C.TerritoryID, C.CustomerType FROM AdventureWorks.Sales.Customer C WITH (TABLOCK); GO INSERT #Prods ( ProductMainID, ProductSubID, ProductSubSubID, Name ) SELECT P.ProductID, P.ProductID, P.ProductID, P.Name FROM AdventureWorks.Production.Product P WITH (TABLOCK); GO INSERT #OrdHeader ( SalesOrderID, OrderDate, SalesOrderNumber, CustomerID ) SELECT H.SalesOrderID, H.OrderDate, H.SalesOrderNumber, H.CustomerID FROM AdventureWorks.Sales.SalesOrderHeader H WITH (TABLOCK); GO INSERT #OrdDetail ( SalesOrderID, OrderQty, LineTotal, ProductMainID, ProductSubID, ProductSubSubID ) SELECT D.SalesOrderID, D.OrderQty, D.LineTotal, D.ProductID, D.ProductID, D.ProductID FROM AdventureWorks.Sales.SalesOrderDetail D WITH (TABLOCK); The query itself is a simple join of the four tables: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #Prods P JOIN #OrdDetail D ON P.ProductMainID = D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID JOIN #OrdHeader H ON D.SalesOrderID = H.SalesOrderID JOIN #Custs C ON H.CustomerID = C.CustomerID ORDER BY P.ProductMainID ASC OPTION (RECOMPILE, MAXDOP 1); Remember that these tables have no indexes at all, and only the single-column sampled statistics SQL Server automatically creates (assuming default settings).  The estimated query plan produced for the test query looks like this (click to enlarge): The Problem The problem here is one of cardinality estimation – the number of rows SQL Server expects to find at each step of the plan.  The lack of indexes and useful statistical information means that SQL Server does not have the information it needs to make a good estimate.  Every join in the plan shown above estimates that it will produce just a single row as output.  Brad covers the factors that lead to the low estimates in his post. In reality, the join between the #Prods and #OrdDetail tables will produce 121,317 rows.  It should not surprise you that this has rather dire consequences for the remainder of the query plan.  In particular, it makes a nonsense of the optimizer’s decision to use Nested Loops to join to the two remaining tables.  Instead of scanning the #OrdHeader and #Custs tables once (as it expected), it has to perform 121,317 full scans of each.  The query takes somewhere in the region of twenty minutes to run to completion on my development machine. A Solution At this point, you may be thinking the same thing I was: if we really are stuck with no indexes, the best we can do is to use hash joins everywhere. We can force the exclusive use of hash joins in several ways, the two most common being join and query hints.  A join hint means writing the query using the INNER HASH JOIN syntax; using a query hint involves adding OPTION (HASH JOIN) at the bottom of the query.  The difference is that using join hints also forces the order of the join, whereas the query hint gives the optimizer freedom to reorder the joins at its discretion. Adding the OPTION (HASH JOIN) hint results in this estimated plan: That produces the correct output in around seven seconds, which is quite an improvement!  As a purely practical matter, and given the rigid rules of the environment we find ourselves in, we might leave things there.  (We can improve the hashing solution a bit – I’ll come back to that later on). Faster Nested Loops It might surprise you to hear that we can beat the performance of the hash join solution shown above using nested loops joins exclusively, and without breaking the rules we have been set. The key to this part is to realize that a condition like (A = B) can be expressed as (A <= B) AND (A >= B).  Armed with this tremendous new insight, we can rewrite the join predicates like so: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #OrdDetail D JOIN #OrdHeader H ON D.SalesOrderID >= H.SalesOrderID AND D.SalesOrderID <= H.SalesOrderID JOIN #Custs C ON H.CustomerID >= C.CustomerID AND H.CustomerID <= C.CustomerID JOIN #Prods P ON P.ProductMainID >= D.ProductMainID AND P.ProductMainID <= D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID ORDER BY D.ProductMainID OPTION (RECOMPILE, LOOP JOIN, MAXDOP 1, FORCE ORDER); I’ve also added LOOP JOIN and FORCE ORDER query hints to ensure that only nested loops joins are used, and that the tables are joined in the order they appear.  The new estimated execution plan is: This new query runs in under 2 seconds. Why Is It Faster? The main reason for the improvement is the appearance of the eager Index Spools, which are also known as index-on-the-fly spools.  If you read my Inside The Optimiser series you might be interested to know that the rule responsible is called JoinToIndexOnTheFly. An eager index spool consumes all rows from the table it sits above, and builds a index suitable for the join to seek on.  Taking the index spool above the #Custs table as an example, it reads all the CustomerID and TerritoryID values with a single scan of the table, and builds an index keyed on CustomerID.  The term ‘eager’ means that the spool consumes all of its input rows when it starts up.  The index is built in a work table in tempdb, has no associated statistics, and only exists until the query finishes executing. The result is that each unindexed table is only scanned once, and just for the columns necessary to build the temporary index.  From that point on, every execution of the inner side of the join is answered by a seek on the temporary index – not the base table. A second optimization is that the sort on ProductMainID (required by the ORDER BY clause) is performed early, on just the rows coming from the #OrdDetail table.  The optimizer has a good estimate for the number of rows it needs to sort at that stage – it is just the cardinality of the table itself.  The accuracy of the estimate there is important because it helps determine the memory grant given to the sort operation.  Nested loops join preserves the order of rows on its outer input, so sorting early is safe.  (Hash joins do not preserve order in this way, of course). The extra lazy spool on the #Prods branch is a further optimization that avoids executing the seek on the temporary index if the value being joined (the ‘outer reference’) hasn’t changed from the last row received on the outer input.  It takes advantage of the fact that rows are still sorted on ProductMainID, so if duplicates exist, they will arrive at the join operator one after the other. The optimizer is quite conservative about introducing index spools into a plan, because creating and dropping a temporary index is a relatively expensive operation.  It’s presence in a plan is often an indication that a useful index is missing. I want to stress that I rewrote the query in this way primarily as an educational exercise – I can’t imagine having to do something so horrible to a production system. Improving the Hash Join I promised I would return to the solution that uses hash joins.  You might be puzzled that SQL Server can create three new indexes (and perform all those nested loops iterations) faster than it can perform three hash joins.  The answer, again, is down to the poor information available to the optimizer.  Let’s look at the hash join plan again: Two of the hash joins have single-row estimates on their build inputs.  SQL Server fixes the amount of memory available for the hash table based on this cardinality estimate, so at run time the hash join very quickly runs out of memory. This results in the join spilling hash buckets to disk, and any rows from the probe input that hash to the spilled buckets also get written to disk.  The join process then continues, and may again run out of memory.  This is a recursive process, which may eventually result in SQL Server resorting to a bailout join algorithm, which is guaranteed to complete eventually, but may be very slow.  The data sizes in the example tables are not large enough to force a hash bailout, but it does result in multiple levels of hash recursion.  You can see this for yourself by tracing the Hash Warning event using the Profiler tool. The final sort in the plan also suffers from a similar problem: it receives very little memory and has to perform multiple sort passes, saving intermediate runs to disk (the Sort Warnings Profiler event can be used to confirm this).  Notice also that because hash joins don’t preserve sort order, the sort cannot be pushed down the plan toward the #OrdDetail table, as in the nested loops plan. Ok, so now we understand the problems, what can we do to fix it?  We can address the hash spilling by forcing a different order for the joins: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #Prods P JOIN #Custs C JOIN #OrdHeader H ON H.CustomerID = C.CustomerID JOIN #OrdDetail D ON D.SalesOrderID = H.SalesOrderID ON P.ProductMainID = D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID ORDER BY D.ProductMainID OPTION (MAXDOP 1, HASH JOIN, FORCE ORDER); With this plan, each of the inputs to the hash joins has a good estimate, and no hash recursion occurs.  The final sort still suffers from the one-row estimate problem, and we get a single-pass sort warning as it writes rows to disk.  Even so, the query runs to completion in three or four seconds.  That’s around half the time of the previous hashing solution, but still not as fast as the nested loops trickery. Final Thoughts SQL Server’s optimizer makes cost-based decisions, so it is vital to provide it with accurate information.  We can’t really blame the performance problems highlighted here on anything other than the decision to use completely unindexed tables, and not to allow the creation of additional statistics. I should probably stress that the nested loops solution shown above is not one I would normally contemplate in the real world.  It’s there primarily for its educational and entertainment value.  I might perhaps use it to demonstrate to the sceptical that SQL Server itself is crying out for an index. Be sure to read Brad’s original post for more details.  My grateful thanks to him for granting permission to reuse some of his material. Paul White Email: [email protected] Twitter: @PaulWhiteNZ

    Read the article

  • MySQL Query performance - huge difference in time

    - by Damo
    I have a query that is returning in vastly different amounts of time between 2 datasets. For one set (database A) it returns in a few seconds, for the other (database B)....well I haven't waited long enough yet, but over 10 minutes. I have dumped both of these databases to my local machine where I can reproduce the issue running MySQL 5.1.37. Curiously, database B is smaller than database A. A stripped down version of the query that reproduces the problem is: SELECT * FROM po_shipment ps JOIN po_shipment_item psi USING (ship_id) JOIN po_alloc pa ON ps.ship_id = pa.ship_id AND pa.UID_items = psi.UID_items JOIN po_header ph ON pa.hdr_id = ph.hdr_id LEFT JOIN EVENT_TABLE ev0 ON ev0.TABLE_ID1 = ps.ship_id AND ev0.EVENT_TYPE = 'MAS0' LEFT JOIN EVENT_TABLE ev1 ON ev1.TABLE_ID1 = ps.ship_id AND ev1.EVENT_TYPE = 'MAS1' LEFT JOIN EVENT_TABLE ev2 ON ev2.TABLE_ID1 = ps.ship_id AND ev2.EVENT_TYPE = 'MAS2' LEFT JOIN EVENT_TABLE ev3 ON ev3.TABLE_ID1 = ps.ship_id AND ev3.EVENT_TYPE = 'MAS3' LEFT JOIN EVENT_TABLE ev4 ON ev4.TABLE_ID1 = ps.ship_id AND ev4.EVENT_TYPE = 'MAS4' LEFT JOIN EVENT_TABLE ev5 ON ev5.TABLE_ID1 = ps.ship_id AND ev5.EVENT_TYPE = 'MAS5' WHERE ps.eta >= '2010-03-22' GROUP BY ps.ship_id LIMIT 100; The EXPLAIN query plan for the first database (A) that returns in ~2 seconds is: +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+------------------------------+------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+------------------------------+------+----------------------------------------------+ | 1 | SIMPLE | ps | range | PRIMARY,IX_ETA_DATE | IX_ETA_DATE | 4 | NULL | 174 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | ev0 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev1 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev2 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev3 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev4 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev5 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_PROD.ps.ship_id,const | 1 | | | 1 | SIMPLE | psi | ref | PRIMARY,IX_po_shipment_item_po_shipment1,FK_po_shipment_item_po_shipment1 | IX_po_shipment_item_po_shipment1 | 4 | UNIVIS_PROD.ps.ship_id | 1 | | | 1 | SIMPLE | pa | ref | IX_po_alloc_po_shipment_item2,IX_po_alloc_po_details_old,FK_po_alloc_po_shipment1,FK_po_alloc_po_shipment_item1,FK_po_alloc_po_header1 | FK_po_alloc_po_shipment1 | 4 | UNIVIS_PROD.psi.ship_id | 5 | Using where | | 1 | SIMPLE | ph | eq_ref | PRIMARY,IX_HDR_ID | PRIMARY | 4 | UNIVIS_PROD.pa.hdr_id | 1 | | +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+------------------------------+------+----------------------------------------------+ The EXPLAIN query plan for the second database (B) that returns in 600 seconds is: +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+--------------------------------+------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+--------------------------------+------+----------------------------------------------+ | 1 | SIMPLE | ps | range | PRIMARY,IX_ETA_DATE | IX_ETA_DATE | 4 | NULL | 38 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | psi | ref | PRIMARY,IX_po_shipment_item_po_shipment1,FK_po_shipment_item_po_shipment1 | IX_po_shipment_item_po_shipment1 | 4 | UNIVIS_DEV01.ps.ship_id | 1 | | | 1 | SIMPLE | ev0 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.psi.ship_id,const | 1 | | | 1 | SIMPLE | ev1 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.psi.ship_id,const | 1 | | | 1 | SIMPLE | ev2 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.ps.ship_id,const | 1 | | | 1 | SIMPLE | ev3 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.psi.ship_id,const | 1 | | | 1 | SIMPLE | ev4 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.psi.ship_id,const | 1 | | | 1 | SIMPLE | ev5 | ref | IX_EVENT_ID_EVENT_TYPE | IX_EVENT_ID_EVENT_TYPE | 36 | UNIVIS_DEV01.ps.ship_id,const | 1 | | | 1 | SIMPLE | pa | ref | IX_po_alloc_po_shipment_item2,IX_po_alloc_po_details_old,FK_po_alloc_po_shipment1,FK_po_alloc_po_shipment_item1,FK_po_alloc_po_header1 | IX_po_alloc_po_shipment_item2 | 4 | UNIVIS_DEV01.ps.ship_id | 4 | Using where | | 1 | SIMPLE | ph | eq_ref | PRIMARY,IX_HDR_ID | PRIMARY | 4 | UNIVIS_DEV01.pa.hdr_id | 1 | | +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+--------------------------------+------+----------------------------------------------+ When database B is running I can look at the MySQL Administrator and the state remains at "Copying to tmp table" indefinitely. Database A also has this state but for only a second or so. There are no differences in the table structure, indexes, keys etc between these databases (I have done show create tables and diff'd them). The sizes of the tables are: database A: po_shipment 1776 po_shipment_item 1945 po_alloc 36298 po_header 71642 EVENT_TABLE 1608 database B: po_shipment 463 po_shipment_item 470 po_alloc 3291 po_header 56149 EVENT_TABLE 1089 Some points to note: Removing the WHERE clause makes the query return < 1 sec. Removing the GROUP BY makes the query return < 1 sec. Removing ev5, ev4, ev3 etc makes the query get faster for each one removed. Can anyone suggest how to resolve this issue? What have I missed? Many Thanks.

    Read the article

  • SQL Server SQL Injection from start to end

    - by Mladen Prajdic
    SQL injection is a method by which a hacker gains access to the database server by injecting specially formatted data through the user interface input fields. In the last few years we have witnessed a huge increase in the number of reported SQL injection attacks, many of which caused a great deal of damage. A SQL injection attack takes many guises, but the underlying method is always the same. The specially formatted data starts with an apostrophe (') to end the string column (usually username) check, continues with malicious SQL, and then ends with the SQL comment mark (--) in order to comment out the full original SQL that was intended to be submitted. The really advanced methods use binary or encoded text inputs instead of clear text. SQL injection vulnerabilities are often thought to be a database server problem. In reality they are a pure application design problem, generally resulting from unsafe techniques for dynamically constructing SQL statements that require user input. It also doesn't help that many web pages allow SQL Server error messages to be exposed to the user, having no input clean up or validation, allowing applications to connect with elevated (e.g. sa) privileges and so on. Usually that's caused by novice developers who just copy-and-paste code found on the internet without understanding the possible consequences. The first line of defense is to never let your applications connect via an admin account like sa. This account has full privileges on the server and so you virtually give the attacker open access to all your databases, servers, and network. The second line of defense is never to expose SQL Server error messages to the end user. Finally, always use safe methods for building dynamic SQL, using properly parameterized statements. Hopefully, all of this will be clearly demonstrated as we demonstrate two of the most common ways that enable SQL injection attacks, and how to remove the vulnerability. 1) Concatenating SQL statements on the client by hand 2) Using parameterized stored procedures but passing in parts of SQL statements As will become clear, SQL Injection vulnerabilities cannot be solved by simple database refactoring; often, both the application and database have to be redesigned to solve this problem. Concatenating SQL statements on the client This problem is caused when user-entered data is inserted into a dynamically-constructed SQL statement, by string concatenation, and then submitted for execution. Developers often think that some method of input sanitization is the solution to this problem, but the correct solution is to correctly parameterize the dynamic SQL. In this simple example, the code accepts a username and password and, if the user exists, returns the requested data. First the SQL code is shown that builds the table and test data then the C# code with the actual SQL Injection example from beginning to the end. The comments in code provide information on what actually happens. /* SQL CODE *//* Users table holds usernames and passwords and is the object of out hacking attempt */CREATE TABLE Users( UserId INT IDENTITY(1, 1) PRIMARY KEY , UserName VARCHAR(50) , UserPassword NVARCHAR(10))/* Insert 2 users */INSERT INTO Users(UserName, UserPassword)SELECT 'User 1', 'MyPwd' UNION ALLSELECT 'User 2', 'BlaBla' Vulnerable C# code, followed by a progressive SQL injection attack. /* .NET C# CODE *//*This method checks if a user exists. It uses SQL concatination on the client, which is susceptible to SQL injection attacks*/private bool DoesUserExist(string username, string password){ using (SqlConnection conn = new SqlConnection(@"server=YourServerName; database=tempdb; Integrated Security=SSPI;")) { /* This is the SQL string you usually see with novice developers. It returns a row if a user exists and no rows if it doesn't */ string sql = "SELECT * FROM Users WHERE UserName = '" + username + "' AND UserPassword = '" + password + "'"; SqlCommand cmd = conn.CreateCommand(); cmd.CommandText = sql; cmd.CommandType = CommandType.Text; cmd.Connection.Open(); DataSet dsResult = new DataSet(); /* If a user doesn't exist the cmd.ExecuteScalar() returns null; this is just to simplify the example; you can use other Execute methods too */ string userExists = (cmd.ExecuteScalar() ?? "0").ToString(); return userExists != "0"; } }}/*The SQL injection attack example. Username inputs should be run one after the other, to demonstrate the attack pattern.*/string username = "User 1";string password = "MyPwd";// See if we can even use SQL injection.// By simply using this we can log into the application username = "' OR 1=1 --";// What follows is a step-by-step guessing game designed // to find out column names used in the query, via the // error messages. By using GROUP BY we will get // the column names one by one.// First try the Idusername = "' GROUP BY Id HAVING 1=1--";// We get the SQL error: Invalid column name 'Id'.// From that we know that there's no column named Id. // Next up is UserIDusername = "' GROUP BY Users.UserId HAVING 1=1--";// AHA! here we get the error: Column 'Users.UserName' is // invalid in the SELECT list because it is not contained // in either an aggregate function or the GROUP BY clause.// We have guessed correctly that there is a column called // UserId and the error message has kindly informed us of // a table called Users with a column called UserName// Now we add UserName to our GROUP BYusername = "' GROUP BY Users.UserId, Users.UserName HAVING 1=1--";// We get the same error as before but with a new column // name, Users.UserPassword// Repeat this pattern till we have all column names that // are being return by the query.// Now we have to get the column data types. One non-string // data type is all we need to wreck havoc// Because 0 can be implicitly converted to any data type in SQL server we use it to fill up the UNION.// This can be done because we know the number of columns the query returns FROM our previous hacks.// Because SUM works for UserId we know it's an integer type. It doesn't matter which exactly.username = "' UNION SELECT SUM(Users.UserId), 0, 0 FROM Users--";// SUM() errors out for UserName and UserPassword columns giving us their data types:// Error: Operand data type varchar is invalid for SUM operator.username = "' UNION SELECT SUM(Users.UserName) FROM Users--";// Error: Operand data type nvarchar is invalid for SUM operator.username = "' UNION SELECT SUM(Users.UserPassword) FROM Users--";// Because we know the Users table structure we can insert our data into itusername = "'; INSERT INTO Users(UserName, UserPassword) SELECT 'Hacker user', 'Hacker pwd'; --";// Next let's get the actual data FROM the tables.// There are 2 ways you can do this.// The first is by using MIN on the varchar UserName column and // getting the data from error messages one by one like this:username = "' UNION SELECT min(UserName), 0, 0 FROM Users --";username = "' UNION SELECT min(UserName), 0, 0 FROM Users WHERE UserName > 'User 1'--";// we can repeat this method until we get all data one by one// The second method gives us all data at once and we can use it as soon as we find a non string columnusername = "' UNION SELECT (SELECT * FROM Users FOR XML RAW) as c1, 0, 0 --";// The error we get is: // Conversion failed when converting the nvarchar value // '<row UserId="1" UserName="User 1" UserPassword="MyPwd"/>// <row UserId="2" UserName="User 2" UserPassword="BlaBla"/>// <row UserId="3" UserName="Hacker user" UserPassword="Hacker pwd"/>' // to data type int.// We can see that the returned XML contains all table data including our injected user account.// By using the XML trick we can get any database or server info we wish as long as we have access// Some examples:// Get info for all databasesusername = "' UNION SELECT (SELECT name, dbid, convert(nvarchar(300), sid) as sid, cmptlevel, filename FROM master..sysdatabases FOR XML RAW) as c1, 0, 0 --";// Get info for all tables in master databaseusername = "' UNION SELECT (SELECT * FROM master.INFORMATION_SCHEMA.TABLES FOR XML RAW) as c1, 0, 0 --";// If that's not enough here's a way the attacker can gain shell access to your underlying windows server// This can be done by enabling and using the xp_cmdshell stored procedure// Enable xp_cmdshellusername = "'; EXEC sp_configure 'show advanced options', 1; RECONFIGURE; EXEC sp_configure 'xp_cmdshell', 1; RECONFIGURE;";// Create a table to store the values returned by xp_cmdshellusername = "'; CREATE TABLE ShellHack (ShellData NVARCHAR(MAX))--";// list files in the current SQL Server directory with xp_cmdshell and store it in ShellHack table username = "'; INSERT INTO ShellHack EXEC xp_cmdshell \"dir\"--";// return the data via an error messageusername = "' UNION SELECT (SELECT * FROM ShellHack FOR XML RAW) as c1, 0, 0; --";// delete the table to get clean output (this step is optional)username = "'; DELETE ShellHack; --";// repeat the upper 3 statements to do other nasty stuff to the windows server// If the returned XML is larger than 8k you'll get the "String or binary data would be truncated." error// To avoid this chunk up the returned XML using paging techniques. // the username and password params come from the GUI textboxes.bool userExists = DoesUserExist(username, password ); Having demonstrated all of the information a hacker can get his hands on as a result of this single vulnerability, it's perhaps reassuring to know that the fix is very easy: use parameters, as show in the following example. /* The fixed C# method that doesn't suffer from SQL injection because it uses parameters.*/private bool DoesUserExist(string username, string password){ using (SqlConnection conn = new SqlConnection(@"server=baltazar\sql2k8; database=tempdb; Integrated Security=SSPI;")) { //This is the version of the SQL string that should be safe from SQL injection string sql = "SELECT * FROM Users WHERE UserName = @username AND UserPassword = @password"; SqlCommand cmd = conn.CreateCommand(); cmd.CommandText = sql; cmd.CommandType = CommandType.Text; // adding 2 SQL Parameters solves the SQL injection issue completely SqlParameter usernameParameter = new SqlParameter(); usernameParameter.ParameterName = "@username"; usernameParameter.DbType = DbType.String; usernameParameter.Value = username; cmd.Parameters.Add(usernameParameter); SqlParameter passwordParameter = new SqlParameter(); passwordParameter.ParameterName = "@password"; passwordParameter.DbType = DbType.String; passwordParameter.Value = password; cmd.Parameters.Add(passwordParameter); cmd.Connection.Open(); DataSet dsResult = new DataSet(); /* If a user doesn't exist the cmd.ExecuteScalar() returns null; this is just to simplify the example; you can use other Execute methods too */ string userExists = (cmd.ExecuteScalar() ?? "0").ToString(); return userExists == "1"; }} We have seen just how much danger we're in, if our code is vulnerable to SQL Injection. If you find code that contains such problems, then refactoring is not optional; it simply has to be done and no amount of deadline pressure should be a reason not to do it. Better yet, of course, never allow such vulnerabilities into your code in the first place. Your business is only as valuable as your data. If you lose your data, you lose your business. Period. Incorrect parameterization in stored procedures It is a common misconception that the mere act of using stored procedures somehow magically protects you from SQL Injection. There is no truth in this rumor. If you build SQL strings by concatenation and rely on user input then you are just as vulnerable doing it in a stored procedure as anywhere else. This anti-pattern often emerges when developers want to have a single "master access" stored procedure to which they'd pass a table name, column list or some other part of the SQL statement. This may seem like a good idea from the viewpoint of object reuse and maintenance but it's a huge security hole. The following example shows what a hacker can do with such a setup. /*Create a single master access stored procedure*/CREATE PROCEDURE spSingleAccessSproc( @select NVARCHAR(500) = '' , @tableName NVARCHAR(500) = '' , @where NVARCHAR(500) = '1=1' , @orderBy NVARCHAR(500) = '1')ASEXEC('SELECT ' + @select + ' FROM ' + @tableName + ' WHERE ' + @where + ' ORDER BY ' + @orderBy)GO/*Valid use as anticipated by a novice developer*/EXEC spSingleAccessSproc @select = '*', @tableName = 'Users', @where = 'UserName = ''User 1'' AND UserPassword = ''MyPwd''', @orderBy = 'UserID'/*Malicious use SQL injectionThe SQL injection principles are the same aswith SQL string concatenation I described earlier,so I won't repeat them again here.*/EXEC spSingleAccessSproc @select = '* FROM INFORMATION_SCHEMA.TABLES FOR XML RAW --', @tableName = '--Users', @where = '--UserName = ''User 1'' AND UserPassword = ''MyPwd''', @orderBy = '--UserID' One might think that this is a "made up" example but in all my years of reading SQL forums and answering questions there were quite a few people with "brilliant" ideas like this one. Hopefully I've managed to demonstrate the dangers of such code. Even if you think your code is safe, double check. If there's even one place where you're not using proper parameterized SQL you have vulnerability and SQL injection can bare its ugly teeth.

    Read the article

  • Do you play Sudoku ?

    - by Gilles Haro
    Did you know that 11gR2 database could solve a Sudoku puzzle with a single query and, most of the time, and this in less than a second ? The following query shows you how ! Simply pass a flattened Sudoku grid to it a get the result instantaneously ! col "Solution" format a9 col "Problem" format a9 with Iteration( initialSudoku, Step, EmptyPosition ) as ( select initialSudoku, InitialSudoku, instr( InitialSudoku, '-' )        from ( select '--64----2--7-35--1--58-----27---3--4---------4--2---96-----27--7--58-6--3----18--' InitialSudoku from dual )    union all    select initialSudoku        , substr( Step, 1, EmptyPosition - 1 ) || OneDigit || substr( Step, EmptyPosition + 1 )         , instr( Step, '-', EmptyPosition + 1 )      from Iteration         , ( select to_char( rownum ) OneDigit from dual connect by rownum <= 9 ) OneDigit     where EmptyPosition > 0       and not exists          ( select null              from ( select rownum IsPossible from dual connect by rownum <= 9 )             where OneDigit = substr( Step, trunc( ( EmptyPosition - 1 ) / 9 ) * 9 + IsPossible, 1 )   -- One line must contain the 1-9 digits                or OneDigit = substr( Step, mod( EmptyPosition - 1, 9 ) - 8 + IsPossible * 9, 1 )      -- One row must contain the 1-9 digits                or OneDigit = substr( Step, mod( trunc( ( EmptyPosition - 1 ) / 3 ), 3 ) * 3           -- One square must contain the 1-9 digits                            + trunc( ( EmptyPosition - 1 ) / 27 ) * 27 + IsPossible                            + trunc( ( IsPossible - 1 ) / 3 ) * 6 , 1 )          ) ) select initialSudoku "Problem", Step "Solution"    from Iteration  where EmptyPosition = 0 ;   The Magic thing behind this is called Recursive Subquery Factoring. The Oracle documentation gives the following definition: If a subquery_factoring_clause refers to its own query_name in the subquery that defines it, then the subquery_factoring_clause is said to be recursive. A recursive subquery_factoring_clause must contain two query blocks: the first is the anchor member and the second is the recursive member. The anchor member must appear before the recursive member, and it cannot reference query_name. The anchor member can be composed of one or more query blocks combined by the set operators: UNION ALL, UNION, INTERSECT or MINUS. The recursive member must follow the anchor member and must reference query_name exactly once. You must combine the recursive member with the anchor member using the UNION ALL set operator. This new feature is a replacement of this old Hierarchical Query feature that exists in Oracle since the days of Aladdin (well, at least, release 2 of the database in 1977). Everyone remembers the old syntax : select empno, ename, job, mgr, level      from   emp      start with mgr is null      connect by prior empno = mgr; that could/should be rewritten (but not as often as it should) as withT_Emp (empno, name, level) as        ( select empno, ename, job, mgr, level             from   emp             start with mgr is null             connect by prior empno = mgr        ) select * from   T_Emp; which uses the "with" syntax, whose main advantage is to clarify the readability of the query. Although very efficient, this syntax had the disadvantage of being a Non-Ansi Sql Syntax. Ansi-Sql version of Hierarchical Query is called Recursive Subquery Factoring. As of 11gR2, Oracle got compliant with Ansi Sql and introduced Recursive Subquery Factoring. It is basically an extension of the "With" clause that enables recursion. Now, the new syntax for the query would be with T_Emp (empno, name, job, mgr, hierlevel) as       ( select E.empno, E.ename, E.job, E.mgr, 1 from emp E where E.mgr is null         union all         select E.empno, E.ename, E.job, E.mgr, T.hierlevel + 1from emp E                                                                                                            join T_Emp T on ( E.mgr = T.empno ) ) select * from   T_Emp; The anchor member is a replacement for the "start with" The recursive member is processed through iterations. It joins the Source table (EMP) with the result from the Recursive Query itself (T_Emp) Each iteration works with the results of all its preceding iterations.     Iteration 1 works on the results of the first query     Iteration 2 works on the results of Iteration 1 and first query     Iteration 3 works on the results of Iteration 1, Iteration 2 and first query. So, knowing that, the Sudoku query it self-explaining; The anchor member contains the "Problem" : The Initial Sudoku and the Position of the first "hole" in the grid. The recursive member tries to replace the considered hole with any of the 9 digit that would satisfy the 3 rules of sudoku Recursion progress through the grid until it is complete.   Another example :  Fibonaccy Numbers :  un = (un-1) + (un-2) with Fib (u1, u2, depth) as   (select 1, 1, 1 from dual    union all    select u1+u2, u1, depth+1 from Fib where depth<10) select u1 from Fib; Conclusion Oracle brings here a new feature (which, to be honest, already existed on other concurrent systems) and extends the power of the database to new boundaries. It’s now up to developers to try and test it and find more useful application than solving puzzles… But still, solving a Sudoku in less time it takes to say it remains impressive… Interesting links: You might be interested by the following links which cover different aspects of this feature Oracle Documentation Lucas Jellema 's Blog Fibonaci Numbers

    Read the article

  • Table Variables: an empirical approach.

    - by Phil Factor
    It isn’t entirely a pleasant experience to publish an article only to have it described on Twitter as ‘Horrible’, and to have it criticized on the MVP forum. When this happened to me in the aftermath of publishing my article on Temporary tables recently, I was taken aback, because these critics were experts whose views I respect. What was my crime? It was, I think, to suggest that, despite the obvious quirks, it was best to use Table Variables as a first choice, and to use local Temporary Tables if you hit problems due to these quirks, or if you were doing complex joins using a large number of rows. What are these quirks? Well, table variables have advantages if they are used sensibly, but this requires some awareness by the developer about the potential hazards and how to avoid them. You can be hit by a badly-performing join involving a table variable. Table Variables are a compromise, and this compromise doesn’t always work out well. Explicit indexes aren’t allowed on Table Variables, so one cannot use covering indexes or non-unique indexes. The query optimizer has to make assumptions about the data rather than using column distribution statistics when a table variable is involved in a join, because there aren’t any column-based distribution statistics on a table variable. It assumes a reasonably even distribution of data, and is likely to have little idea of the number of rows in the table variables that are involved in queries. However complex the heuristics that are used might be in determining the best way of executing a SQL query, and they most certainly are, the Query Optimizer is likely to fail occasionally with table variables, under certain circumstances, and produce a Query Execution Plan that is frightful. The experienced developer or DBA will be on the lookout for this sort of problem. In this blog, I’ll be expanding on some of the tests I used when writing my article to illustrate the quirks, and include a subsequent example supplied by Kevin Boles. A simplified example. We’ll start out by illustrating a simple example that shows some of these characteristics. We’ll create two tables filled with random numbers and then see how many matches we get between the two tables. We’ll forget indexes altogether for this example, and use heaps. We’ll try the same Join with two table variables, two table variables with OPTION (RECOMPILE) in the JOIN clause, and with two temporary tables. It is all a bit jerky because of the granularity of the timing that isn’t actually happening at the millisecond level (I used DATETIME). However, you’ll see that the table variable is outperforming the local temporary table up to 10,000 rows. Actually, even without a use of the OPTION (RECOMPILE) hint, it is doing well. What happens when your table size increases? The table variable is, from around 30,000 rows, locked into a very bad execution plan unless you use OPTION (RECOMPILE) to provide the Query Analyser with a decent estimation of the size of the table. However, if it has the OPTION (RECOMPILE), then it is smokin’. Well, up to 120,000 rows, at least. It is performing better than a Temporary table, and in a good linear fashion. What about mixed table joins, where you are joining a temporary table to a table variable? You’d probably expect that the query analyzer would throw up its hands and produce a bad execution plan as if it were a table variable. After all, it knows nothing about the statistics in one of the tables so how could it do any better? Well, it behaves as if it were doing a recompile. And an explicit recompile adds no value at all. (we just go up to 45000 rows since we know the bigger picture now)   Now, if you were new to this, you might be tempted to start drawing conclusions. Beware! We’re dealing with a very complex beast: the Query Optimizer. It can come up with surprises What if we change the query very slightly to insert the results into a Table Variable? We change nothing else and just measure the execution time of the statement as before. Suddenly, the table variable isn’t looking so much better, even taking into account the time involved in doing the table insert. OK, if you haven’t used OPTION (RECOMPILE) then you’re toast. Otherwise, there isn’t much in it between the Table variable and the temporary table. The table variable is faster up to 8000 rows and then not much in it up to 100,000 rows. Past the 8000 row mark, we’ve lost the advantage of the table variable’s speed. Any general rule you may be formulating has just gone for a walk. What we can conclude from this experiment is that if you join two table variables, and can’t use constraints, you’re going to need that Option (RECOMPILE) hint. Count Dracula and the Horror Join. These tables of integers provide a rather unreal example, so let’s try a rather different example, and get stuck into some implicit indexing, by using constraints. What unusual words are contained in the book ‘Dracula’ by Bram Stoker? Here we get a table of all the common words in the English language (60,387 of them) and put them in a table. We put them in a Table Variable with the word as a primary key, a Table Variable Heap and a Table Variable with a primary key. We then take all the distinct words used in the book ‘Dracula’ (7,558 of them). We then create a table variable and insert into it all those uncommon words that are in ‘Dracula’. i.e. all the words in Dracula that aren’t matched in the list of common words. To do this we use a left outer join, where the right-hand value is null. The results show a huge variation, between the sublime and the gorblimey. If both tables contain a Primary Key on the columns we join on, and both are Table Variables, it took 33 Ms. If one table contains a Primary Key, and the other is a heap, and both are Table Variables, it took 46 Ms. If both Table Variables use a unique constraint, then the query takes 36 Ms. If neither table contains a Primary Key and both are Table Variables, it took 116383 Ms. Yes, nearly two minutes!! If both tables contain a Primary Key, one is a Table Variables and the other is a temporary table, it took 113 Ms. If one table contains a Primary Key, and both are Temporary Tables, it took 56 Ms.If both tables are temporary tables and both have primary keys, it took 46 Ms. Here we see table variables which are joined on their primary key again enjoying a  slight performance advantage over temporary tables. Where both tables are table variables and both are heaps, the query suddenly takes nearly two minutes! So what if you have two heaps and you use option Recompile? If you take the rogue query and add the hint, then suddenly, the query drops its time down to 76 Ms. If you add unique indexes, then you've done even better, down to half that time. Here are the text execution plans.So where have we got to? Without drilling down into the minutiae of the execution plans we can begin to create a hypothesis. If you are using table variables, and your tables are relatively small, they are faster than temporary tables, but as the number of rows increases you need to do one of two things: either you need to have a primary key on the column you are using to join on, or else you need to use option (RECOMPILE) If you try to execute a query that is a join, and both tables are table variable heaps, you are asking for trouble, well- slow queries, unless you give the table hint once the number of rows has risen past a point (30,000 in our first example, but this varies considerably according to context). Kevin’s Skew In describing the table-size, I used the term ‘relatively small’. Kevin Boles produced an interesting case where a single-row table variable produces a very poor execution plan when joined to a very, very skewed table. In the original, pasted into my article as a comment, a column consisted of 100000 rows in which the key column was one number (1) . To this was added eight rows with sequential numbers up to 9. When this was joined to a single-tow Table Variable with a key of 2 it produced a bad plan. This problem is unlikely to occur in real usage, and the Query Optimiser team probably never set up a test for it. Actually, the skew can be slightly less extreme than Kevin made it. The following test showed that once the table had 54 sequential rows in the table, then it adopted exactly the same execution plan as for the temporary table and then all was well. Undeniably, real data does occasionally cause problems to the performance of joins in Table Variables due to the extreme skew of the distribution. We've all experienced Perfectly Poisonous Table Variables in real live data. As in Kevin’s example, indexes merely make matters worse, and the OPTION (RECOMPILE) trick does nothing to help. In this case, there is no option but to use a temporary table. However, one has to note that once the slight de-skew had taken place, then the plans were identical across a huge range. Conclusions Where you need to hold intermediate results as part of a process, Table Variables offer a good alternative to temporary tables when used wisely. They can perform faster than a temporary table when the number of rows is not great. For some processing with huge tables, they can perform well when only a clustered index is required, and when the nature of the processing makes an index seek very effective. Table Variables are scoped to the batch or procedure and are unlikely to hang about in the TempDB when they are no longer required. They require no explicit cleanup. Where the number of rows in the table is moderate, you can even use them in joins as ‘Heaps’, unindexed. Beware, however, since, as the number of rows increase, joins on Table Variable heaps can easily become saddled by very poor execution plans, and this must be cured either by adding constraints (UNIQUE or PRIMARY KEY) or by adding the OPTION (RECOMPILE) hint if this is impossible. Occasionally, the way that the data is distributed prevents the efficient use of Table Variables, and this will require using a temporary table instead. Tables Variables require some awareness by the developer about the potential hazards and how to avoid them. If you are not prepared to do any performance monitoring of your code or fine-tuning, and just want to pummel out stuff that ‘just runs’ without considering namby-pamby stuff such as indexes, then stick to Temporary tables. If you are likely to slosh about large numbers of rows in temporary tables without considering the niceties of processing just what is required and no more, then temporary tables provide a safer and less fragile means-to-an-end for you.

    Read the article

  • A tale from a Stalker

    - by Peter Larsson
    Today I thought I should write something about a stalker I've got. Don't get me wrong, I have way more fans than stalkers, but this stalker is particular persistent towards me. It all started when I wrote about Relational Division with Sets late last year(http://weblogs.sqlteam.com/peterl/archive/2010/07/02/Proper-Relational-Division-With-Sets.aspx) and no matter what he tried, he didn't get a better performing query than me. But this I didn't click until later into this conversation. He must have saved himself for 9 months before posting to me again. Well... Some days ago I get an email from someone I thought i didn't know. Here is his first email Hi, I want a proper solution for achievement the result. The solution must be standard query, means no using as any native code like TOP clause, also the query should run in SQL Server 2000 (no CTE use). We have a table with consecutive keys (nbr) that is not exact sequence. We need bringing all values related with nearest key in the current key row. See the DDL: CREATE TABLE Nums(nbr INTEGER NOT NULL PRIMARY KEY, val INTEGER NOT NULL); INSERT INTO Nums(nbr, val) VALUES (1, 0),(5, 7),(9, 4); See the Result: pre_nbr     pre_val     nbr         val         nxt_nbr     nxt_val ----------- ----------- ----------- ----------- ----------- ----------- NULL        NULL        1           0           5           7 1           0           5           7           9           4 5           7           9           4           NULL        NULL The goal is suggesting most elegant solution. I would like see your best solution first, after that I will send my best (if not same with yours)   Notice there is no name, no please or nothing polite asking for my help. So, on the top of my head I sent him two solutions, following the rule "Work on SQL Server 2000 and only standard non-native code".     -- Peso 1 SELECT               pre_nbr,                              (                                                           SELECT               x.val                                                           FROM                dbo.Nums AS x                                                           WHERE              x.nbr = d.pre_nbr                              ) AS pre_val,                              d.nbr,                              d.val,                              d.nxt_nbr,                              (                                                           SELECT               x.val                                                           FROM                dbo.Nums AS x                                                           WHERE              x.nbr = d.nxt_nbr                              ) AS nxt_val FROM                (                                                           SELECT               (                                                                                                                     SELECT               MAX(x.nbr) AS nbr                                                                                                                     FROM                dbo.Nums AS x                                                                                                                     WHERE              x.nbr < n.nbr                                                                                        ) AS pre_nbr,                                                                                        n.nbr,                                                                                        n.val,                                                                                        (                                                                                                                     SELECT               MIN(x.nbr) AS nbr                                                                                                                     FROM                dbo.Nums AS x                                                                                                                     WHERE              x.nbr > n.nbr                                                                                        ) AS nxt_nbr                                                           FROM                dbo.Nums AS n                              ) AS d -- Peso 2 CREATE TABLE #Temp                                                         (                                                                                        ID INT IDENTITY(1, 1) PRIMARY KEY,                                                                                        nbr INT,                                                                                        val INT                                                           )   INSERT                                            #Temp                                                           (                                                                                        nbr,                                                                                        val                                                           ) SELECT                                            nbr,                                                           val FROM                                             dbo.Nums ORDER BY         nbr   SELECT                                            pre.nbr AS pre_nbr,                                                           pre.val AS pre_val,                                                           t.nbr,                                                           t.val,                                                           nxt.nbr AS nxt_nbr,                                                           nxt.val AS nxt_val FROM                                             #Temp AS pre RIGHT JOIN      #Temp AS t ON t.ID = pre.ID + 1 LEFT JOIN         #Temp AS nxt ON nxt.ID = t.ID + 1   DROP TABLE    #Temp Notice there are no indexes on #Temp table yet. And here is where the conversation derailed. First I got this response back Now my solutions: --My 1st Slt SELECT T2.*, T1.*, T3.*   FROM Nums AS T1        LEFT JOIN Nums AS T2          ON T2.nbr = (SELECT MAX(nbr)                         FROM Nums                        WHERE nbr < T1.nbr)        LEFT JOIN Nums AS T3          ON T3.nbr = (SELECT MIN(nbr)                         FROM Nums                        WHERE nbr > T1.nbr); --My 2nd Slt SELECT MAX(CASE WHEN N1.nbr > N2.nbr THEN N2.nbr ELSE NULL END) AS pre_nbr,        (SELECT val FROM Nums WHERE nbr = MAX(CASE WHEN N1.nbr > N2.nbr THEN N2.nbr ELSE NULL END)) AS pre_val,        N1.nbr AS cur_nbr, N1.val AS cur_val,        MIN(CASE WHEN N1.nbr < N2.nbr THEN N2.nbr ELSE NULL END) AS nxt_nbr,        (SELECT val FROM Nums WHERE nbr = MIN(CASE WHEN N1.nbr < N2.nbr THEN N2.nbr ELSE NULL END)) AS nxt_val   FROM Nums AS N1,        Nums AS N2  GROUP BY N1.nbr, N1.val;   /* My 1st Slt Table 'Nums'. Scan count 7, logical reads 14 My 2nd Slt Table 'Nums'. Scan count 4, logical reads 23 Peso 1 Table 'Nums'. Scan count 9, logical reads 28 Peso 2 Table '#Temp'. Scan count 0, logical reads 7 Table 'Nums'. Scan count 1, logical reads 2 Table '#Temp'. Scan count 3, logical reads 16 */  To this, I emailed him back asking for a scalability test What if you try with a Nums table with 100,000 rows? His response to that started to get nasty.  I have to say Peso 2 is not acceptable. As I said before the solution must be standard, ORDER BY is not part of standard SELECT. Try this without ORDER BY:  Truncate Table Nums INSERT INTO Nums (nbr, val) VALUES (1, 0),(9,4), (5, 7)  So now we have new rules. No ORDER BY because it's not standard SQL! Of course I asked him  Why do you have that idea? ORDER BY is not standard? To this, his replies went stranger and stranger Standard Select = Set-based (no any cursor) It’s free to know, just refer to Advanced SQL Programming by Celko or mail to him if you accept comments from him. What the stalker probably doesn't know, is that I and Mr Celko occasionally are involved in some conversation and thus we exchange emails. I don't know if this reference to Mr Celko was made to intimidate me either. So I answered him, still polite, this What do you mean? The SELECT itself has a ”cursor under the hood”. Now the stalker gets rude  But however I mean the solution must no containing any order by, top... No problem, I do not like Peso 2, it’s very non-intelligent and elementary. Yes, Peso 2 is elementary but most performing queries are... And now is the time where I started to feel the stalker really wanted to achieve something else, so I wrote to him So what is your goal? Have a query that performs well, or a query that is super-portable? My Peso 2 outperforms any of your code with a factor of 100 when using more than 100,000 rows. While I awaited his answer, I posted him this query Ok, here is another one -- Peso 3 SELECT             MAX(CASE WHEN d = 1 THEN nbr ELSE NULL END) AS pre_nbr,                    MAX(CASE WHEN d = 1 THEN val ELSE NULL END) AS pre_val,                    MAX(CASE WHEN d = 0 THEN nbr ELSE NULL END) AS nbr,                    MAX(CASE WHEN d = 0 THEN val ELSE NULL END) AS val,                    MAX(CASE WHEN d = -1 THEN nbr ELSE NULL END) AS nxt_nbr,                    MAX(CASE WHEN d = -1 THEN val ELSE NULL END) AS nxt_val FROM               (                              SELECT    nbr,                                        val,                                        ROW_NUMBER() OVER (ORDER BY nbr) AS SeqID                              FROM      dbo.Nums                    ) AS s CROSS JOIN         (                              VALUES    (-1),                                        (0),                                        (1)                    ) AS x(d) GROUP BY           SeqID + x.d HAVING             COUNT(*) > 1 And here is the stats Table 'Nums'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. It beats the hell out of your queries…. Now I finally got a response from my stalker and now I also clicked who he was. This is his reponse Why you post my original method with a bit change under you name? I do not like it. See: http://www.sqlservercentral.com/Forums/Topic468501-362-14.aspx ;WITH C AS ( SELECT seq_nbr, k,        DENSE_RANK() OVER(ORDER BY seq_nbr ASC) + k AS grp_fct   FROM [Sample]         CROSS JOIN         (VALUES (-1), (0), (1)         ) AS D(k) ) SELECT MIN(seq_nbr) AS pre_value,        MAX(CASE WHEN k = 0 THEN seq_nbr END) AS current_value,        MAX(seq_nbr) AS next_value   FROM C GROUP BY grp_fct HAVING min(seq_nbr) < max(seq_nbr); These posts: Posted Tuesday, April 12, 2011 10:04 AM Posted Tuesday, April 12, 2011 1:22 PM Why post a solution where will not work in SQL Server 2000? Wait a minute! His own solution is using both a CTE and a ranking function so his query will not work on SQL Server 2000! Bummer... The reference to "Me not like" are my exact words in a previous topic on SQLTeam.com and when I remembered the phrasing, I also knew who he was. See this topic http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=159262 where he writes a query and posts it under my name, as if I wrote it. So I answered him this (less polite). Like I keep track of all topics in the whole world… J So you think you are the only one coming up with this idea? Besides, “M S solution” doesn’t work.   This is the result I get pre_value        current_value                             next_value 1                           1                           5 1                           5                           9 5                           9                           9   And I did nothing like you did here, where you posted a solution which you “thought” I should write http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=159262 So why are you yourself using ranking function when this was not allowed per your original email, and no cte? You use CTE in your link above, which do not work in SQL Server 2000. All this makes no sense to me, other than you are trying your best to once in a lifetime create a better performing query than me? After a few hours I get this email back. I don't fully understand it, but it's probably a language barrier. >>Like I keep track of all topics in the whole world… J So you think you are the only one coming up with this idea?<< You right, but do not think you are the first creator of this.   >>Besides, “M S Solution” doesn’t work. This is the result I get <<   Why you get so unimportant mistake? See this post to correct it: Posted 4/12/2011 8:22:23 PM >> So why are you yourself using ranking function when this was not allowed per your original email, and no cte? You use CTE in your link above, which do not work in SQL Server 2000. <<  Again, why you get some unimportant incompatibility? You offer that solution for current goals not me  >> All this makes no sense to me, other than you are trying your best to once in a lifetime create a better performing query than me? <<  No, I only wanted to know who you will solve it. Now I know you do not have a special solution. No problem. No problem for me either. So I just answered him I am not the first, and you are not the first to come up with this idea. So what is your problem? I am pretty sure other people have come up with the same idea before us. I used this technique all the way back to 2007, see http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=93911 Let's see if he returns...  He did! >> So what is your problem? << Nothing Thanks for all replies; maybe we have some competitions in future, maybe. Also I like you but you do not attend it. Your behavior with me is not friendly. Not any meeting… Regards //Peso

    Read the article

  • Different Flavors of Leases Back On

    - by Theresa Hickman
    Given the continued interest regarding the proposed changes to Lease Accounting, I decided to write another entry on this controversial topic with colorful commentary from our resident accounting expert, Seamus Moran. Background (A History Lesson) Back in 1976, the FASB issued FAS 13, “Accounting for Leases” that permitted leases to be either an operating lease or capital (finance) lease. In substance, operating leases are a form of off-balance sheet financing. According to Seamus, operating leases date back to the launch of the Boeing 707 in the 1950s.  Because the aircraft was so much more expensive than previous aircrafts, the industry came up with the operating lease concept to accommodate these jet liners that dominated air transport.  How it worked was the bank would buy the plane and lease it to the airline.  Because the bank never controlled or flew the plane, they never placed the asset on their balance sheet, and because the airline never owned the plane, they didn’t place it on their balance sheet either. They simply treated the monthly lease payments as rental expenses on the P&L.   August 2010 Original Lease Accounting Changes In August 2010, FASB and IASB decided to overhaul lease accounting as part of their joint commitment “to insure that investors and other users of financial statements are provided useful, transparent, and complete information about leasing transactions in the financial statements.”  Some say that the current lease accounting standards are broken because it keeps assets off the balance sheet, hidden from investors’ view. The original proposal abolished operating leases and only permitted capital leases where all leases would be recorded on the balance sheet as assets and liabilities. The asset side would reflect the right to use the asset for the leased term, and the liability side would reflect the obligation to make lease payments.   Why Companies Were Freaking Out According to the SEC, the financial impact of the aforementioned lease changes was estimated to add more than $1.3 trillion of operating lease obligations to corporate balance sheets. Many companies in various industries, especially retail, are concerned because the changes are significant and will impact existing leases with no grandfather clause for existing operating leases. Of course, the banks and airlines I mentioned earlier really hate this because neither wants to report the airplane (now costing around $60 M) as an asset. Regular companies were concerned that they would have to report routine short term leases of real estate or equipment as fixed assets, even though they were really just longer term rentals.  One company we spoke to leased roadside billboards, and really did not consider them to be fixed assets in any way. Obviously, these changes would have had a profound and lasting effect on a company’s financial and real estate strategies and significantly impact its financial statements.  Financial statements would show higher depreciation and interest expense with significantly higher total assets and debt. In terms of financial metrics, they’re negatively impacted. It would raise a company’s debt-to-capital ratio to reflect the higher debt compared to equity, it would negatively impact their return-on-assets because now companies will appear more asset intensive, and it will decrease EPS, lowering shareholder ROI. Feb. 2011 Recent Update The comment period on leases closed in December 2010. The FASB and the IASB have met several times since then and published their initial responses to the input they received from the various interested parties.  They are “redeliberating” the principles involved in Lease Accounting.  Some of the issues they are looking at include: The core definition of a lease.  This will articulate principles on what is a lease and what is “not-a-lease.” One theory or supposition is that they might define a lease as the transfer of certain but not all major ownership attributes for a certain period of time.  So a year’s lease of an aircraft might be a “lease,” but a year’s lease of half a floor in an office building would be “not-a-lease.”  The ownership attributes transferred from the core owner to the user are different; the airline must maintain, paint, and do whatever it needs to do on the aircraft. However, the office renter will have strictly limited rights in respect to the rented space. The differences between a lease contract and service contract.  Even if they call them “leases” for the purpose of commercial law, a service contract might not be accounted for as a lease. The accounting to be done by the lessee.  They would define when the bank or landlord would retain the asset on their balance sheet, and perhaps by implication, when the lessor would not need to include the asset on theirs.  So if the finance house keeps the airplane or office on their balance sheet, the tenant doesn’t need to.  I’m not sure that I can draw the opposite conclusion where the finance house doesn’t report but the tenant must. The difference, if any, between a financing lease and other leases, and the implications to the accounting. The present value calculation when renewable terms exist. They have reduced the circumstances in which one must look at the renewable terms of a lease in calculating the present value.  In most circumstances, you will use the lease term rather than the potential renewable term. Their latest discussion this past week with the contents of the discussion was not available at the time of me writing this entry.  For more details, the results of the discussions are posted on both the FASB and the IASB websites. Implied Software Changes Whatever the final rules turn out to be, all ERP systems, such as Oracle E-Business Suite, PeopleSoft Enterprise, JD Edwards, and Oracle Hyperion will need to change their software to accommodate the new rules. The following lists some changes that might have to be made to accounting software depending on what the final standards will be in June 2011: Lease tracking may require modifications with tracking of additional lease details that might require a centralized repository to maintain Accounting may need to be modified as there are many changes to how capital leases and the new “other than finance” leases are accounted for both on the lessee and lessor side.  For example, valuation, amortization, and disclosure will be considerably different requiring different types of data to be captured. Companies may need to modify their chart of accounts depending on how they want to track leases, which could then impact financial reporting and consolidation Business processes may require changes which could then impact internal controls Software applications may need to perform more advanced computations on leases Reports and KPIs may need to reflect new operating metrics Hold Onto Your Seats           Before you redo all your lease agreements and call your software vendors asking when the changes to the software will be made, remember that the rules are not finalized yet, and from appearances, will not reflect the proposals in the exposure draft.  Not only are there objections to putting the operating lease assets on anyone’s balance sheet, there are lots of objections to subjectivity and the data required for the valuation.  According to Seamus, there is huge opposition from New York bankers, the airlines, the EU, the Communist Party of China (since it impacts their exporting business), and Republicans (hearing complaints from small and large businesses). Even if everyone can agree on the proposed changes, 2013 might be the earliest that companies would need to change how they report leases. The Boards will finish their deliberations in April, May or June 2011.  As we’ve seen with other Exposure Drafts, if the changes are minor and the principles met the General Acceptance consensus criteria, the Standard could be finalized at that time.  However, if substantial changes are made, a fresh exposure draft, comment period, and review period might be involved, too. Seamus added an interesting perspective. Even if the proposed changes do pass, don’t you think our customers, such as Boeing, GE Capital, United Airlines, etc. will be clever enough to come up with a new kind of financing arrangement that complies with the new accounting? How about the large retail customers, such as Best Buy and Macerich? Don’t you think they might simply cut deals around retail locations with new contracts that prevent their leases from being capital leases? Instead of blindly adapting the software to meet the principles outlined in the final standard, our software needs to accommodate how businesses will respond to the new rules. We cannot know our customers’ responses until the rules are finalized. Oracle is aware of the potential changes and is staying abreast of the developments through our domain expertise staff, our relationship with customers, our market awareness, and, of course, our relationships with the Big 4. This is part of our normal process with respect to worldwide regulatory compliance. Oracle products have been IFRS and GAAP compliant for years and we will continue to maintain those standards going forward.

    Read the article

  • Complex Rails queries across multiple tables, unions, and will_paginate. Solved.

    - by uberllama
    Hi folks. I've been working on a complex "user feed" type of functionality for a while now, and after experimenting with various union plugins, hacking named scopes, and brute force, have arrived at a solution I'm happy with. S.O. has been hugely helpful for me, so I thought I'd post it here in hopes that it might help others and also to get feedback -- it's very possible that I worked on this so long that I walked down an unnecessarily complicated road. For the sake of my example, I'll use users, groups, and articles. A user can follow other users to get a feed of their articles. They can also join groups and get a feed of articles that have been added to those groups. What I needed was a combined, pageable feed of distinct articles from a user's contacts and groups. Let's begin. user.rb has_many :articles has_many :contacts has_many :contacted_users, :through => :contacts has_many :memberships has_many :groups, :through => :memberships contact.rb belongs_to :user belongs_to :contacted_user, :class_name => "User", :foreign_key => "contacted_user_id" article.rb belongs_to :user has_many :submissions has_many :groups, :through => :submissions group.rb has_many :memberships has_many :users, :through => :memberships has_many :submissions has_many :articles, :through => :submissions Those are the basic models that define my relationships. Now, I add two named scopes to the Article model so that I can get separate feeds of both contact articles and group articles should I desire. article.rb # Get all articles by user's contacts named_scope :by_contacts, lambda {|user| {:joins => "inner join contacts on articles.user_id = contacts.contacted_user_id", :conditions => ["articles.published = 1 and contacts.user_id = ?", user.id]} } # Get all articles in user's groups. This does an additional query to get the user's group IDs, then uses those in an IN clause named_scope :by_groups, lambda {|user| {:select => "DISTINCT articles.*", :joins => :submissions, :conditions => {:submissions => {:group_id => user.group_ids}}} } Now I have to create a method that will provide a UNION of these two feeds into one. Since I'm using Rails 2.3.5, I have to use the construct_finder_sql method to render a scope into its base sql. In Rails 3.0, I could use the to_sql method. user.rb def feed "(#{Article.by_groups(self).send(:construct_finder_sql,{})}) UNION (#{Article.by_contacts(self).send(:construct_finder_sql,{})})" end And finally, I can now call this method and paginate it from my controller using will_paginate's paginate_by_sql method. HomeController.rb @articles = Article.paginate_by_sql(current_user.feed, :page => 1) And we're done! It may seem simple now, but it was a lot of work getting there. Feedback is always appreciated. In particular, it would be great to get away from some of the raw sql hacking. Cheers.

    Read the article

< Previous Page | 49 50 51 52 53 54 55  | Next Page >