Search Results

Search found 148 results on 6 pages for 'subqueries'.

Page 6/6 | < Previous Page | 2 3 4 5 6 

  • When to use CTEs to encapsulate sub-results, and when to let the RDBMS worry about massive joins.

    - by IanC
    This is a SQL theory question. I can provide an example, but I don't think it's needed to make my point. Anyone experienced with SQL will immediately know what I'm talking about. Usually we use joins to minimize the number of records due to matching the left and right rows. However, under certain conditions, joining tables cause a multiplication of results where the result is all permutations of the left and right records. I have a database which has 3 or 4 such joins. This turns what would be a few records into a multitude. My concern is that the tables will be large in production, so the number of these joined rows will be immense. Further, heavy math is performed on each row, and the idea of performing math on duplicate rows is enough to make anyone shudder. I have two questions. The first is, is this something I should care about, or will SQL Server intelligently realize these rows are all duplicates and optimize all processing accordingly? The second is, is there any advantage to grouping each part of the query so as to get only the distinct values going into the next part of the query, using something like: WITH t1 AS ( SELECT DISTINCT... [or GROUP BY] ), t2 AS ( SELECT DISTINCT... ), t3 AS ( SELECT DISTINCT... ) SELECT... I have often seen the use of DISTINCT applied to subqueries. There is obviously a reason for doing this. However, I'm talking about something a little different and perhaps more subtle and tricky.

    Read the article

  • Beginner having difficulty with SQL query

    - by Vulcanizer
    Hi, I've been studying SQL for 2 weeks now and I'm preparing for an SQL test. Anyway I'm trying to do this question: For the table: 1 create table data { 2 id int, 3 n1 int not null, 4 n2 int not null, 5 n3 int not null, 6 n4 int not null, 7 primary key (id) 8 } I need to return the relation with tuples (n1, n2, n3) where all the corresponding values for n4 are 0. The problem asks me to solve it WITHOUT using subqueries(nested selects/views) It also gives me an example table and the expected output from my query: 01 insert into data (id, n1, n2, n3, n4) 02 values (1, 2,4,7,0), 03 (2, 2,4,7,0), 04 (3, 3,6,9,8), 05 (4, 1,1,2,1), 06 (5, 1,1,2,0), 07 (6, 1,1,2,0), 08 (7, 5,3,8,0), 09 (8, 5,3,8,0), 10 (9, 5,3,8,0); expects (2,4,7) (5,3,8) and not (1,1,2) since that has a 1 in n4 in one of the cases. The best I could come up with was: 1 SELECT DISTINCT n1, n2, n3 2 FROM data a, data b 3 WHERE a.ID <> b.ID 4 AND a.n1 = b.n1 5 AND a.n2 = b.n2 6 AND a.n3 = b.n3 7 AND a.n4 = b.n4 8 AND a.n4 = 0 but I found out that also prints (1,1,2) since in the example (1,1,2,0) happens twice from IDs 5 and 6. Any suggestions would be really appreciated.

    Read the article

  • Composable FLinq expressions

    - by Daniel
    When doing linq-to-sql in c#, you could do something like this: var data = context.MyTable.Where(x => x.Parameter > 10); var q1 = data.Take(10); var q2 = data.Take(3); q1.ToArray(); q2.ToArray(); This would generate 2 separate SQL queries, one with TOP 10, and the other with TOP 3. In playing around with Flinq, I see that: let data = query <@ seq { for i in context.MyTable do if x.Parameter > 10 then yield i } @> data |> Seq.take 10 |> Seq.toList data |> Seq.take 3 |> Seq.toList is not doing the same thing. Here it seems to do one full query, and then do the "take" calls on the client side. An alternative that I see used is: let q1 = query <@ for i in context.MyTable do if x.Param > 10 then yield i } |> Seq.take 10 @> let q2 = query <@ for i in context.MyTable do if x.Param > 10 then yield i } |> Seq.take 3 @> These 2 generate the SQL with the appropriate TOP N filter. My problem with this is that it doesn't seem composable. I'm basically having to duplicate the "where" clause, and potentially would have to duplicate other other subqueries that I might want to run on a base query. Is there a way to have F# give me something more composable? (I originally posted this question to hubfs, where I have gotten a few answers, dealing with the fact that C# performs the query transformation "at the end", i.e. when the data is needed, where F# is doing that transformation eagerly.)

    Read the article

  • Incorrect usage of UPDATE and ORDER BY

    - by nico55555
    I have written some code to update certain rows of a table with a decreasing sequence of numbers. To select the correct rows I have to JOIN two tables. The last row in the table needs to have a value of 0, the second last -1 and so on. To achieve this I use ORDER BY DESC. Unfortunately my code brings up the following error: Incorrect usage of UPDATE and ORDER BY My reading suggests that I can't use UPDATE, JOIN and ORDER BY together. I've read that maybe subqueries might help? I don't really have any idea how to change my code to do this. Perhaps someone could post a modified version that will work? while($row = mysql_fetch_array( $result )) { $products_id = $row['products_id']; $products_stock_attributes = $row['products_stock_attributes']; mysql_query("SET @i = 0"); $result2 = mysql_query("UPDATE orders_products op, orders ord SET op.stock_when_purchased = (@i:=(@i - op.products_quantity)) WHERE op.orders_id = ord.orders_id AND op.products_id = '$products_id' AND op.products_stock_attributes = '$products_stock_attributes' AND op.stock_when_purchased < 0 AND ord.orders_status = 2 ORDER BY orders_products_id DESC") or die(mysql_error()); }

    Read the article

  • New Replication, Optimizer and High Availability features in MySQL 5.6.5!

    - by Rob Young
    As the Product Manager for the MySQL database it is always great to announce when the MySQL Engineering team delivers another great product release.  As a field DBA and developer it is even better when that release contains improvements and innovation that I know will help those currently using MySQL for apps that range from modest intranet sites to the most highly trafficked web sites on the web.  That said, it is my pleasure to take my hat off to MySQL Engineering for today's release of the MySQL 5.6.5 Development Milestone Release ("DMR"). The new highlighted features in MySQL 5.6.5 are discussed here: New Self-Healing Replication ClustersThe 5.6.5 DMR improves MySQL Replication by adding Global Transaction Ids and automated utilities for self-healing Replication clusters.  Prior to 5.6.5 this has been somewhat of a pain point for MySQL users with most developing custom solutions or looking to costly, complex third-party solutions for these capabilities.  With 5.6.5 these shackles are all but removed by a solution that is included with the GPL version of the database and supporting GPL tools.  You can learn all about the details of the great, problem solving Replication features in MySQL 5.6 in Mat Keep's Developer Zone article.  New Replication Administration and Failover UtilitiesAs mentioned above, the new Replication features, Global Transaction Ids specifically, are now supported by a set of automated GPL utilities that leverage the new GTIDs to provide administration and manual or auto failover to the most up to date slave (that is the default, but user configurable if needed) in the event of a master failure. The new utilities, along with links to Engineering related blogs, are discussed in detail in the DevZone Article noted above. Better Query Optimization and ThroughputThe MySQL Optimizer team continues to amaze with the latest round of improvements in 5.6.5. Along with much refactoring of the legacy code base, the Optimizer team has improved complex query optimization and throughput by adding these functional improvements: Subquery Optimizations - Subqueries are now included in the Optimizer path for runtime optimization.  Better throughput of nested queries enables application developers to simplify and consolidate multiple queries and result sets into a single unit or work. Optimizer now uses CURRENT_TIMESTAMP as default for DATETIME columns - For simplification, this eliminates the need for application developers to assign this value when a column of this type is blank by default. Optimizations for Range based queries - Optimizer now uses ready statistics vs Index based scans for queries with multiple range values. Optimizations for queries using filesort and ORDER BY.  Optimization criteria/decision on execution method is done now at optimization vs parsing stage. Print EXPLAIN in JSON format for hierarchical readability and Enterprise tool consumption. You can learn the details about these new features as well all of the Optimizer based improvements in MySQL 5.6 by following the Optimizer team blog. You can download and try the MySQL 5.6.5 DMR here. (look under "Development Releases")  Please let us know what you think!  The new HA utilities for Replication Administration and Failover are available as part of the MySQL Workbench Community Edition, which you can download here .Also New in MySQL LabsAs has become our tradition when announcing DMRs we also like to provide "Early Access" development features to the MySQL Community via the MySQL Labs.  Today is no exception as we are also releasing the following to Labs for you to download, try and let us know your thoughts on where we need to improve:InnoDB Online OperationsMySQL 5.6 now provides Online ADD Index, FK Drop and Online Column RENAME.  These operations are non-blocking and will continue to evolve in future DMRs.  You can learn the grainy details by following John Russell's blog.InnoDB data access via Memcached API ("NotOnlySQL") - Improved refresh of an earlier feature releaseSimilar to Cluster 7.2, MySQL 5.6 provides direct NotOnlySQL access to InnoDB data via the familiar Memcached API. This provides the ultimate in flexibility for developers who need fast, simple key/value access and complex query support commingled within their applications.Improved Transactional Performance, ScaleThe InnoDB Engineering team has once again under promised and over delivered in the area of improved performance and scale.  These improvements are also included in the aggregated Spring 2012 labs release:InnoDB CPU cache performance improvements for modern, multi-core/CPU systems show great promise with internal tests showing:    2x throughput improvement for read only activity 6x throughput improvement for SELECT range Read/Write benchmarks are in progress More details on the above are available here. You can download all of the above in an aggregated "InnoDB 2012 Spring Labs Release" binary from the MySQL Labs. You can also learn more about these improvements and about related fixes to mysys mutex and hash sort by checking out the InnoDB team blog.MySQL 5.6.5 is another installment in what we believe will be the best release of the MySQL database ever.  It also serves as a shining example of how the MySQL Engineering team at Oracle leads in MySQL innovation.You can get the overall Oracle message on the MySQL 5.6.5 DMR and Early Access labs features here. As always, thanks for your continued support of MySQL, the #1 open source database on the planet!

    Read the article

  • Programmatically specifying Django model attributes

    - by mojbro
    Hi! I would like to add attributes to a Django models programmatically, at run time. For instance, lets say I have a Car model class and want to add one price attribute (database column) per currency, given a list of currencies. What is the best way to do this? I had an approach that I thought would work, but it didn't exactly. This is how I tried doing it, using the car example above: from django.db import models class Car(models.Model): name = models.CharField(max_length=50) currencies = ['EUR', 'USD'] for currency in currencies: Car.add_to_class('price_%s' % currency.lower(), models.IntegerField()) This does seem to work pretty well at first sight: $ ./manage.py syncdb Creating table shop_car $ ./manage.py dbshell shop=# \d shop_car Table "public.shop_car" Column | Type | Modifiers -----------+-----------------------+------------------------------------------------------- id | integer | not null default nextval('shop_car_id_seq'::regclass) name | character varying(50) | not null price_eur | integer | not null price_usd | integer | not null Indexes: "shop_car_pkey" PRIMARY KEY, btree (id) But when I try to create a new Car, it doesn't really work anymore: >>> from shop.models import Car >>> mycar = Car(name='VW Jetta', price_eur=100, price_usd=130) >>> mycar <Car: Car object> >>> mycar.save() Traceback (most recent call last): File "<console>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/django/db/models/base.py", line 410, in save self.save_base(force_insert=force_insert, force_update=force_update) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/django/db/models/base.py", line 495, in save_base result = manager._insert(values, return_id=update_pk) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/django/db/models/manager.py", line 177, in _insert return insert_query(self.model, values, **kwargs) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/django/db/models/query.py", line 1087, in insert_query return query.execute_sql(return_id) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/django/db/models/sql/subqueries.py", line 320, in execute_sql cursor = super(InsertQuery, self).execute_sql(None) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/django/db/models/sql/query.py", line 2369, in execute_sql cursor.execute(sql, params) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/django/db/backends/util.py", line 19, in execute return self.cursor.execute(sql, params) ProgrammingError: column "price_eur" specified more than once LINE 1: ...NTO "shop_car" ("name", "price_eur", "price_usd", "price_eur... ^

    Read the article

  • LINQ aggregate left join on SQL CE

    - by P Daddy
    What I need is such a simple, easy query, it blows me away how much work I've done just trying to do it in LINQ. In T-SQL, it would be: SELECT I.InvoiceID, I.CustomerID, I.Amount AS AmountInvoiced, I.Date AS InvoiceDate, ISNULL(SUM(P.Amount), 0) AS AmountPaid, I.Amount - ISNULL(SUM(P.Amount), 0) AS AmountDue FROM Invoices I LEFT JOIN Payments P ON I.InvoiceID = P.InvoiceID WHERE I.Date between @start and @end GROUP BY I.InvoiceID, I.CustomerID, I.Amount, I.Date ORDER BY AmountDue DESC The best equivalent LINQ expression I've come up with, took me much longer to do: var invoices = ( from I in Invoices where I.Date >= start && I.Date <= end join P in Payments on I.InvoiceID equals P.InvoiceID into payments select new{ I.InvoiceID, I.CustomerID, AmountInvoiced = I.Amount, InvoiceDate = I.Date, AmountPaid = ((decimal?)payments.Select(P=>P.Amount).Sum()).GetValueOrDefault(), AmountDue = I.Amount - ((decimal?)payments.Select(P=>P.Amount).Sum()).GetValueOrDefault() } ).OrderByDescending(row=>row.AmountDue); This gets an equivalent result set when run against SQL Server. Using a SQL CE database, however, changes things. The T-SQL stays almost the same. I only have to change ISNULL to COALESCE. Using the same LINQ expression, however, results in an error: There was an error parsing the query. [ Token line number = 4, Token line offset = 9,Token in error = SELECT ] So we look at the generated SQL code: SELECT [t3].[InvoiceID], [t3].[CustomerID], [t3].[Amount] AS [AmountInvoiced], [t3].[Date] AS [InvoiceDate], [t3].[value] AS [AmountPaid], [t3].[value2] AS [AmountDue] FROM ( SELECT [t0].[InvoiceID], [t0].[CustomerID], [t0].[Amount], [t0].[Date], COALESCE(( SELECT SUM([t1].[Amount]) FROM [Payments] AS [t1] WHERE [t0].[InvoiceID] = [t1].[InvoiceID] ),0) AS [value], [t0].[Amount] - (COALESCE(( SELECT SUM([t2].[Amount]) FROM [Payments] AS [t2] WHERE [t0].[InvoiceID] = [t2].[InvoiceID] ),0)) AS [value2] FROM [Invoices] AS [t0] ) AS [t3] WHERE ([t3].[Date] >= @p0) AND ([t3].[Date] <= @p1) ORDER BY [t3].[value2] DESC Ugh! Okay, so it's ugly and inefficient when run against SQL Server, but we're not supposed to care, since it's supposed to be quicker to write, and the performance difference shouldn't be that large. But it just doesn't work against SQL CE, which apparently doesn't support subqueries within the SELECT list. In fact, I've tried several different left join queries in LINQ, and they all seem to have the same problem. Even: from I in Invoices join P in Payments on I.InvoiceID equals P.InvoiceID into payments select new{I, payments} generates: SELECT [t0].[InvoiceID], [t0].[CustomerID], [t0].[Amount], [t0].[Date], [t1].[InvoiceID] AS [InvoiceID2], [t1].[Amount] AS [Amount2], [t1].[Date] AS [Date2], ( SELECT COUNT(*) FROM [Payments] AS [t2] WHERE [t0].[InvoiceID] = [t2].[InvoiceID] ) AS [value] FROM [Invoices] AS [t0] LEFT OUTER JOIN [Payments] AS [t1] ON [t0].[InvoiceID] = [t1].[InvoiceID] ORDER BY [t0].[InvoiceID] which also results in the error: There was an error parsing the query. [ Token line number = 2, Token line offset = 5,Token in error = SELECT ] So how can I do a simple left join on a SQL CE database using LINQ? Am I wasting my time?

    Read the article

  • IP address numbers in MySQL subquery

    - by Iain Collins
    I have a problem with a subquery involving IPV4 addresses stored in MySQL (MySQL 5.0). The IP addresses are stored in two tables, both in network number format - e.g. the format output by MySQL's INET_ATON(). The first table ('events') contains lots of rows with IP addresses associated with them, the second table ('network_providers') contains a list of provider information for given netblocks. events table (~4,000,000 rows): event_id (int) event_name (varchar) ip_address (unsigned 4 byte int) network_providers table (~60,000 rows): ip_start (unsigned 4 byte int) ip_end (unsigned 4 byte int) provider_name (varchar) Simplified for the purposes of the problem I'm having, the goal is to create an export along the lines of: event_id,event_name,ip_address,provider_name If do a query along the lines of either of the following, I get the result I expect: SELECT provider_name FROM network_providers WHERE INET_ATON('192.168.0.1') >= network_providers.ip_start ORDER BY network_providers.ip_start DESC LIMIT 1 SELECT provider_name FROM network_providers WHERE 3232235521 >= network_providers.ip_start ORDER BY network_providers.ip_start DESC LIMIT 1 That is to say, it returns the correct provider_name for whatever IP I look up (of course I'm not really using 192.168.0.1 in my queries). However, when performing this same query as a subquery, in the following manner, it doesn't yield the result I would expect: SELECT event.id, event.event_name, (SELECT provider_name FROM network_providers WHERE event.ip_address >= network_providers.ip_start ORDER BY network_providers.ip_start DESC LIMIT 1) as provider FROM events Instead the a different (incorrect) value for network_provider is returned - over 90% (but curiously not all) values returned in the provider column contain the wrong provider information for that IP. Using event.ip_address in a subquery just to echo out the value confirms it contains the value I'd expect and that the subquery can parse it. Replacing event.ip_address with an actual network number also works, just using it dynamically in the subquery in this manner that doesn't work for me. I suspect the problem is there is something fundamental and important about subqueries in MySQL that I don't get. I've worked with IP addresses like this in MySQL quite a bit before, but haven't previously done lookups for them using a subquery. The question: I'd really appreciate an example of how I could get the output I want, and if someone here knows, some enlightenment as to why what I'm doing doesn't work so I can avoid making this mistake again. Notes: The actual real-world usage I'm trying to do is considerably more complicated (involving joining two or three tables). This is a simplified version, to avoid overly complicating the question. Additionally, I know I'm not using a between on ip_start & ip_end - that's intentional (the DB's can be out of date, and such cases the owner in the DB is almost always in the next specified range and 'best guess' is fine in this context) however I'm grateful for any suggestions for improvement that relate to the question. Efficiency is always nice, but in this case absolutely not essential - any help appreciated.

    Read the article

  • MSSQL 2005: Update rows in a specified order (like ORDER BY)?

    - by JMTyler
    I want to update rows of a table in a specific order, like one would expect if including an ORDER BY clause, but MS SQL does not support the ORDER BY clause in UPDATE queries. I have checked out this question which supplied a nice solution, but my query is a bit more complicated than the one specified there. UPDATE TableA AS Parent SET Parent.ColA = Parent.ColA + (SELECT TOP 1 Child.ColA FROM TableA AS Child WHERE Child.ParentColB = Parent.ColB ORDER BY Child.Priority) ORDER BY Parent.Depth DESC; So, what I'm hoping that you'll notice is that a single table (TableA) contains a hierarchy of rows, wherein one row can be the parent or child of any other row. The rows need to be updated in order from the deepest child up to the root parent. This is because TableA.ColA must contain an up-to-date concatenation of its own current value with the values of its children (I realize this query only concats with one child, but that is for the sake of simplicity - the purpose of the example in this question does not necessitate any more verbosity), therefore the query must update from the bottom up. The solution suggested in the question I noted above is as follows: UPDATE messages SET status=10 WHERE ID in (SELECT TOP (10) Id FROM Table WHERE status=0 ORDER BY priority DESC ); The reason that I don't think I can use this solution is because I am referencing column values from the parent table inside my subquery (see WHERE Child.ParentColB = Parent.ColB), and I don't think two sibling subqueries would have access to each others' data. So far I have only determined one way to merge that suggested solution with my current problem, and I don't think it works. UPDATE TableA AS Parent SET Parent.ColA = Parent.ColA + (SELECT TOP 1 Child.ColA FROM TableA AS Child WHERE Child.ParentColB = Parent.ColB ORDER BY Child.Priority) WHERE Parent.Id IN (SELECT Id FROM TableA ORDER BY Parent.Depth DESC); The WHERE..IN subquery will not actually return a subset of the rows, it will just return the full list of IDs in the order that I want. However (I don't know for sure - please tell me if I'm wrong) I think that the WHERE..IN clause will not care about the order of IDs within the parentheses - it will just check the ID of the row it currently wants to update to see if it's in that list (which, they all are) in whatever order it is already trying to update... Which would just be a total waste of cycles, because it wouldn't change anything. So, in conclusion, I have looked around and can't seem to figure out a way to update in a specified order (and included the reason I need to update in that order, because I am sure I would otherwise get the ever-so-useful "why?" answers) and I am now hitting up Stack Overflow to see if any of you gurus out there who know more about SQL than I do (which isn't saying much) know of an efficient way to do this. It's particularly important that I only use a single query to complete this action. A long question, but I wanted to cover my bases and give you guys as much info to feed off of as possible. :) Any thoughts?

    Read the article

  • SQL Server 2005: Update rows in a specified order (like ORDER BY)?

    - by JMTyler
    I want to update rows of a table in a specific order, like one would expect if including an ORDER BY clause, but SQL Server does not support the ORDER BY clause in UPDATE queries. I have checked out this question which supplied a nice solution, but my query is a bit more complicated than the one specified there. UPDATE TableA AS Parent SET Parent.ColA = Parent.ColA + (SELECT TOP 1 Child.ColA FROM TableA AS Child WHERE Child.ParentColB = Parent.ColB ORDER BY Child.Priority) ORDER BY Parent.Depth DESC; So, what I'm hoping that you'll notice is that a single table (TableA) contains a hierarchy of rows, wherein one row can be the parent or child of any other row. The rows need to be updated in order from the deepest child up to the root parent. This is because TableA.ColA must contain an up-to-date concatenation of its own current value with the values of its children (I realize this query only concats with one child, but that is for the sake of simplicity - the purpose of the example in this question does not necessitate any more verbosity), therefore the query must update from the bottom up. The solution suggested in the question I noted above is as follows: UPDATE messages SET status=10 WHERE ID in (SELECT TOP (10) Id FROM Table WHERE status=0 ORDER BY priority DESC ); The reason that I don't think I can use this solution is because I am referencing column values from the parent table inside my subquery (see WHERE Child.ParentColB = Parent.ColB), and I don't think two sibling subqueries would have access to each others' data. So far I have only determined one way to merge that suggested solution with my current problem, and I don't think it works. UPDATE TableA AS Parent SET Parent.ColA = Parent.ColA + (SELECT TOP 1 Child.ColA FROM TableA AS Child WHERE Child.ParentColB = Parent.ColB ORDER BY Child.Priority) WHERE Parent.Id IN (SELECT Id FROM TableA ORDER BY Parent.Depth DESC); The WHERE..IN subquery will not actually return a subset of the rows, it will just return the full list of IDs in the order that I want. However (I don't know for sure - please tell me if I'm wrong) I think that the WHERE..IN clause will not care about the order of IDs within the parentheses - it will just check the ID of the row it currently wants to update to see if it's in that list (which, they all are) in whatever order it is already trying to update... Which would just be a total waste of cycles, because it wouldn't change anything. So, in conclusion, I have looked around and can't seem to figure out a way to update in a specified order (and included the reason I need to update in that order, because I am sure I would otherwise get the ever-so-useful "why?" answers) and I am now hitting up Stack Overflow to see if any of you gurus out there who know more about SQL than I do (which isn't saying much) know of an efficient way to do this. It's particularly important that I only use a single query to complete this action. A long question, but I wanted to cover my bases and give you guys as much info to feed off of as possible. :) Any thoughts?

    Read the article

  • SQL query - choosing 'last updated' record in a group, better db design?

    - by Jimmy
    Hi, Let's say I have a MySQL database with 3 tables: table 1: Persons, with 1 column ID (int) table 2: Newsletters, with 1 column ID (int) table 3: Subscriptions, with columns Person_ID (int), Newsletter_ID (int), Subscribed (bool), Updated (Datetime) Subscriptions.Person_ID points to a Person, and Subscription.Newsletter_ID points to a Newsletter. Thus, each person may have 0 or more subscriptions to 0 or more magazines at once. The table Subscriptions will also store the entire history of each person's subscriptions to each newsletter. If a particular Person_ID-Newsletter_ID pair doesn't have a row in the Subscriptions table, then it's equivalent to that pair having a subscription status of 'false'. Here is a sample dataset Persons ID 1 2 3 Newsletters ID 4 5 6 Subscriptions Person_ID Newsletter_ID Subscribed Updated 2 4 true 2010-05-01 3 4 true 2010-05-01 3 5 true 2010-05-10 3 4 false 2010-05-15 Thus, as of 2010-05-16, Person 1 has no subscription, Person 2 has a subscription to Newsletter 4, and Person 3 has a subscription to Newsletter 5. Person 3 had a subscription to Newsletter 4 for a while, but not anymore. I'm trying to do 2 kinds of query. A query that shows everyone's active subscriptions as of query time (we can assume that updated will never be in the future -- thus, this means returning the record with the latest 'updated' value for each Person_ID-Newsletter_ID pair, as long as Subscribed is true (if the latest record for a Person_ID-Newsletter_ID pair has a Subscribed status of false, then I don't want that record returned)). A query that returns all active subscriptions for a specific newsletter - same qualification as in 1. regarding records with 'false' in the Subscribed column. I don't use SQL/databases often enough to tell if this design is good, or if the SQL queries needed would be slow on a database with, say, 1M records in the Subscriptions table. I was using the Visual query builder tool in Visual Studio 2010 but I can't even get the query to return the latest updated record for each Person_ID-Newsletter_ID pair. Is it possible to come up with SQL queries that don't involve using subqueries (presumably because they would become too slow with a larger data set)? If not, would it be a better design to have a separate Subscriptions_History table, and every time a subscription status for a Person_ID-Newsletter-ID pair is added to Subscriptions, any existing record for that pair is moved to Subscriptions_History (that way the Subscriptions table only ever contains the latest status update for any Person_ID-Newsletter_ID pair)? I'm using .net on Windows, so would it be easier (or the same, or harder) to do this kind of queries using Linq? Entity Framework? Thanks!

    Read the article

  • Select latest group by in nhibernate

    - by Kendrick
    I have Canine and CanineHandler objects in my application. The CanineHandler object has a PersonID (which references a completely different database), an EffectiveDate (which specifies when a handler started with the canine), and a FK reference to the Canine (CanineID). Given a specific PersonID, I want to find all canines they're currently responsible for. The (simplified) query I'd use in SQL would be: Select Canine.* from Canine inner join CanineHandler on(CanineHandler.CanineID=Canine.CanineID) inner join (select CanineID,Max(EffectiveDate) MaxEffectiveDate from caninehandler group by CanineID) as CurrentHandler on(CurrentHandler.CanineID=CanineHandler.CanineID and CurrentHandler.MaxEffectiveDate=CanineHandler.EffectiveDate) where CanineHandler.HandlerPersonID=@PersonID Edit: Added mapping files below: <class name="CanineHandler" table="CanineHandler" schema="dbo"> <id name="CanineHandlerID" type="Int32"> <generator class="identity" /> </id> <property name="EffectiveDate" type="DateTime" precision="16" not-null="true" /> <property name="HandlerPersonID" type="Int64" precision="19" not-null="true" /> <many-to-one name="Canine" class="Canine" column="CanineID" not-null="true" access="field.camelcase-underscore" /> </class> <class name="Canine" table="Canine"> <id name="CanineID" type="Int32"> <generator class="identity" /> </id> <property name="Name" type="String" length="64" not-null="true" /> ... <set name="CanineHandlers" table="CanineHandler" inverse="true" order-by="EffectiveDate desc" cascade="save-update" access="field.camelcase-underscore"> <key column="CanineID" /> <one-to-many class="CanineHandler" /> </set> <property name="IsDeleted" type="Boolean" not-null="true" /> </class> I haven't tried yet, but I'm guessing I could do this in HQL. I haven't had to write anything in HQL yet, so I'll have to tackle that eventually anyway, but my question is whether/how I can do this sub-query with the criterion/subqueries objects. I got as far as creating the following detached criteria: DetachedCriteria effectiveHandlers = DetachedCriteria.For<Canine>() .SetProjection(Projections.ProjectionList() .Add(Projections.Max("EffectiveDate"),"MaxEffectiveDate") .Add(Projections.GroupProperty("CanineID"),"handledCanineID") ); but I can't figure out how to do the inner join. If I do this: Session.CreateCriteria<Canine>() .CreateCriteria("CanineHandler", "handler", NHibernate.SqlCommand.JoinType.InnerJoin) .List<Canine>(); I get an error "could not resolve property: CanineHandler of: OPS.CanineApp.Model.Canine". Obviously I'm missing something(s) but from the documentation I got the impression that should return a list of Canines that have handlers (possibly with duplicates). Until I can make this work, adding the subquery isn't going to work... I've found similar questions, such as http://stackoverflow.com/questions/747382/only-get-latest-results-using-nhibernate but none of the answers really seem to apply with the kind of direct result I'm looking for. Any help or suggestion is greatly appreciated.

    Read the article

  • Execution plan issue requires reset on SQL Server 2005, how to determine cause?

    - by Tony Brandner
    We have a web application that delivers training to thousands of corporate students running on top of SQL Server 2005. Recently, we started seeing that a single specific query in the application went from 1 second to about 30 seconds in terms of execution time. The application started throwing timeouts in that area. Our first thought was that we may have incorrect indexes, so we reviewed the tables and indexes. However, similar queries elsewhere in the application also run quickly. Reviewing the indexes showed us that they were configured as expected. We were able to narrow it down to a single query, not a stored procedure. Running this query in SQL Studio also runs quickly. We tried running the application in a different server environment. So a different web server with the same query, parameters and database. The query still ran slow. The query is a fairly large one related to determining a student's current list of training. It includes joins and left joins on a dozen tables and subqueries. A few of the tables are fairly large (hundreds of thousands of rows) and some of the other tables are small lookup tables. The query uses a grouping clause and a few where conditions. A few of the tables are quite active and the contents change often but the volume of added rows doesn't seem extreme. These symptoms led us to consider the execution plan. First off, as soon as we reset the execution plan cache with the SQL command 'DBCC FREEPROCCACHE', the problem went away. Unfortunately, the problem started to reoccur within a few days. The problem has continued to plague us for awhile now. It's usually the same query, but we did appear to see the problem occur in another single query recently. It happens enough to be a nuisance. We're having a heck of a time trying to fix it since we can't reproduce it in any other environment other than production. I have downloaded the High Availability guide from Red Gate and I read up more on execution plans. I hope to run the profiler on the live server, but I'm a bit concerned about impact. I would like to ask - what is the best way to figure out what is triggering this problem? Has anyone else seen this same issue?

    Read the article

  • SQL SERVER – SSMS Automatically Generates TOP (100) PERCENT in Query Designer

    - by pinaldave
    Earlier this week, I was surfing various SQL forums to see what kind of help developer need in the SQL Server world. One of the question indeed caught my attention. I am here regenerating complete question as well scenario to illustrate the point in a precise manner. Additionally, I have added added second part of the question to give completeness. Question: I am trying to create a view in Query Designer (not in the New Query Window). Every time I am trying to create a view it always adds  TOP (100) PERCENT automatically on the T-SQL script. No matter what I do, it always automatically adds the TOP (100) PERCENT to the script. I have attempted to copy paste from notepad, build a query and a few other things – there is no success. I am really not sure what I am doing wrong with Query Designer. Here is my query script: (I use AdventureWorks as a sample database) SELECT Person.Address.AddressID FROM Person.Address INNER JOIN Person.AddressType ON Person.Address.AddressID = Person.AddressType.AddressTypeID ORDER BY Person.Address.AddressID This script automatically replaces by following query: SELECT TOP (100) PERCENT Person.Address.AddressID FROM Person.Address INNER JOIN Person.AddressType ON Person.Address.AddressID = Person.AddressType.AddressTypeID ORDER BY Person.Address.AddressID However, when I try to do the same from New Query Window it works totally fine. However, when I attempt to create a view of the same query it gives following error. Msg 1033, Level 15, State 1, Procedure myView, Line 6 The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified. It is pretty clear to me now that the script which I have written seems to need TOP (100) PERCENT, so Query . Why do I need it? Is there any work around to this issue. I particularly find this question pretty interesting as it really touches the fundamentals of the T-SQL query writing. Please note that the query which is automatically changed is not in New Query Editor but opened from SSMS using following way. Database >> Views >> Right Click >> New View (see the image below) Answer: The answer to the above question can be very long but I will keep it simple and to the point. There are three things to discuss in above script 1) Reason for Error 2) Reason for Auto generates TOP (100) PERCENT and 3) Potential solutions to the above error. Let us quickly see them in detail. 1) Reason for Error The reason for error is already given in the error. ORDER BY is invalid in the views and a few other objects. One has to use TOP or other keywords along with it. The way semantics of the query works where optimizer only follows(honors) the ORDER BY in the same scope or the same SELECT/UPDATE/DELETE statement. There is a possibility that one can order after the scope of the view again the efforts spend to order view will be wasted. The final resultset of the query always follows the final ORDER BY or outer query’s order and due to the same reason optimizer follows the final order of the query and not of the views (as view will be used in another query for further processing e.g. in SELECT statement). Due to same reason ORDER BY is now allowed in the view. For further accuracy and clear guidance I suggest you read this blog post by Query Optimizer Team. They have explained it very clear manner the same subject. 2) Reason for Auto Generated TOP (100) PERCENT One of the most popular workaround to above error is to use TOP (100) PERCENT in the view. Now TOP (100) PERCENT allows user to use ORDER BY in the query and allows user to overcome above error which we discussed. This gives the impression to the user that they have resolved the error and successfully able to use ORDER BY in the View. Well, this is incorrect as well. The way this works is when TOP (100) PERCENT is used the result is not guaranteed as well it is ignored in our the query where the view is used. Here is the blog post on this subject: Interesting Observation – TOP 100 PERCENT and ORDER BY. Now when you create a new view in the SSMS and build a query with ORDER BY to avoid the error automatically it adds the TOP 100 PERCENT. Here is the connect item for the same issue. I am sure there will be more connect items as well but I could not find them. 3) Potential Solutions If you are reading this post from the beginning in that case, it is clear by now that ORDER BY should not be used in the View as it does not serve any purpose unless there is a specific need of it. If you are going to use TOP 100 PERCENT with ORDER BY there is absolutely no need of using ORDER BY rather avoid using it all together. Here is another blog post of mine which describes the same subject ORDER BY Does Not Work – Limitation of the Views Part 1. It is valid to use ORDER BY in a view if there is a clear business need of using TOP with any other percentage lower than 100 (for example TOP 10 PERCENT or TOP 50 PERCENT etc). In most of the cases ORDER BY is not needed in the view and it should be used in the most outer query for present result in desired order. User can remove TOP 100 PERCENT and ORDER BY from the view before using the view in any query or procedure. In the most outer query there should be ORDER BY as per the business need. I think this sums up the concept in a few words. This is a very long topic and not easy to illustrate in one single blog post. I welcome your comments and suggestions. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, SQL View, T SQL, Technology

    Read the article

  • SQL SERVER – Guest Posts – Feodor Georgiev – The Context of Our Database Environment – Going Beyond the Internal SQL Server Waits – Wait Type – Day 21 of 28

    - by pinaldave
    This guest post is submitted by Feodor. Feodor Georgiev is a SQL Server database specialist with extensive experience of thinking both within and outside the box. He has wide experience of different systems and solutions in the fields of architecture, scalability, performance, etc. Feodor has experience with SQL Server 2000 and later versions, and is certified in SQL Server 2008. In this article Feodor explains the server-client-server process, and concentrated on the mutual waits between client and SQL Server. This is essential in grasping the concept of waits in a ‘global’ application plan. Recently I was asked to write a blog post about the wait statistics in SQL Server and since I had been thinking about writing it for quite some time now, here it is. It is a wide-spread idea that the wait statistics in SQL Server will tell you everything about your performance. Well, almost. Or should I say – barely. The reason for this is that SQL Server is always a part of a bigger system – there are always other players in the game: whether it is a client application, web service, any other kind of data import/export process and so on. In short, the SQL Server surroundings look like this: This means that SQL Server, aside from its internal waits, also depends on external waits and settings. As we can see in the picture above, SQL Server needs to have an interface in order to communicate with the surrounding clients over the network. For this communication, SQL Server uses protocol interfaces. I will not go into detail about which protocols are best, but you can read this article. Also, review the information about the TDS (Tabular data stream). As we all know, our system is only as fast as its slowest component. This means that when we look at our environment as a whole, the SQL Server might be a victim of external pressure, no matter how well we have tuned our database server performance. Let’s dive into an example: let’s say that we have a web server, hosting a web application which is using data from our SQL Server, hosted on another server. The network card of the web server for some reason is malfunctioning (think of a hardware failure, driver failure, or just improper setup) and does not send/receive data faster than 10Mbs. On the other end, our SQL Server will not be able to send/receive data at a faster rate either. This means that the application users will notify the support team and will say: “My data is coming very slow.” Now, let’s move on to a bit more exciting example: imagine that there is a similar setup as the example above – one web server and one database server, and the application is not using any stored procedure calls, but instead for every user request the application is sending 80kb query over the network to the SQL Server. (I really thought this does not happen in real life until I saw it one day.) So, what happens in this case? To make things worse, let’s say that the 80kb query text is submitted from the application to the SQL Server at least 100 times per minute, and as often as 300 times per minute in peak times. Here is what happens: in order for this query to reach the SQL Server, it will have to be broken into a of number network packets (according to the packet size settings) – and will travel over the network. On the other side, our SQL Server network card will receive the packets, will pass them to our network layer, the packets will get assembled, and eventually SQL Server will start processing the query – parsing, allegorizing, generating the query execution plan and so on. So far, we have already had a serious network overhead by waiting for the packets to reach our Database Engine. There will certainly be some processing overhead – until the database engine deals with the 80kb query and its 20 subqueries. The waits you see in the DMVs are actually collected from the point the query reaches the SQL Server and the packets are assembled. Let’s say that our query is processed and it finally returns 15000 rows. These rows have a certain size as well, depending on the data types returned. This means that the data will have converted to packages (depending on the network size package settings) and will have to reach the application server. There will also be waits, however, this time you will be able to see a wait type in the DMVs called ASYNC_NETWORK_IO. What this wait type indicates is that the client is not consuming the data fast enough and the network buffers are filling up. Recently Pinal Dave posted a blog on Client Statistics. What Client Statistics does is captures the physical flow characteristics of the query between the client(Management Studio, in this case) and the server and back to the client. As you see in the image, there are three categories: Query Profile Statistics, Network Statistics and Time Statistics. Number of server roundtrips–a roundtrip consists of a request sent to the server and a reply from the server to the client. For example, if your query has three select statements, and they are separated by ‘GO’ command, then there will be three different roundtrips. TDS Packets sent from the client – TDS (tabular data stream) is the language which SQL Server speaks, and in order for applications to communicate with SQL Server, they need to pack the requests in TDS packets. TDS Packets sent from the client is the number of packets sent from the client; in case the request is large, then it may need more buffers, and eventually might even need more server roundtrips. TDS packets received from server –is the TDS packets sent by the server to the client during the query execution. Bytes sent from client – is the volume of the data set to our SQL Server, measured in bytes; i.e. how big of a query we have sent to the SQL Server. This is why it is best to use stored procedures, since the reusable code (which already exists as an object in the SQL Server) will only be called as a name of procedure + parameters, and this will minimize the network pressure. Bytes received from server – is the amount of data the SQL Server has sent to the client, measured in bytes. Depending on the number of rows and the datatypes involved, this number will vary. But still, think about the network load when you request data from SQL Server. Client processing time – is the amount of time spent in milliseconds between the first received response packet and the last received response packet by the client. Wait time on server replies – is the time in milliseconds between the last request packet which left the client and the first response packet which came back from the server to the client. Total execution time – is the sum of client processing time and wait time on server replies (the SQL Server internal processing time) Here is an illustration of the Client-server communication model which should help you understand the mutual waits in a client-server environment. Keep in mind that a query with a large ‘wait time on server replies’ means the server took a long time to produce the very first row. This is usual on queries that have operators that need the entire sub-query to evaluate before they proceed (for example, sort and top operators). However, a query with a very short ‘wait time on server replies’ means that the query was able to return the first row fast. However a long ‘client processing time’ does not necessarily imply the client spent a lot of time processing and the server was blocked waiting on the client. It can simply mean that the server continued to return rows from the result and this is how long it took until the very last row was returned. The bottom line is that developers and DBAs should work together and think carefully of the resource utilization in the client-server environment. From experience I can say that so far I have seen only cases when the application developers and the Database developers are on their own and do not ask questions about the other party’s world. I would recommend using the Client Statistics tool during new development to track the performance of the queries, and also to find a synchronous way of utilizing resources between the client – server – client. Here is another example: think about similar setup as above, but add another server to the game. Let’s say that we keep our media on a separate server, and together with the data from our SQL Server we need to display some images on the webpage requested by our user. No matter how simple or complicated the logic to get the images is, if the images are 500kb each our users will get the page slowly and they will still think that there is something wrong with our data. Anyway, I don’t mean to get carried away too far from SQL Server. Instead, what I would like to say is that DBAs should also be aware of ‘the big picture’. I wrote a blog post a while back on this topic, and if you are interested, you can read it here about the big picture. And finally, here are some guidelines for monitoring the network performance and improving it: Run a trace and outline all queries that return more than 1000 rows (in Profiler you can actually filter and sort the captured trace by number of returned rows). This is not a set number; it is more of a guideline. The general thought is that no application user can consume that many rows at once. Ask yourself and your fellow-developers: ‘why?’. Monitor your network counters in Perfmon: Network Interface:Output queue length, Redirector:Network errors/sec, TCPv4: Segments retransmitted/sec and so on. Make sure to establish a good friendship with your network administrator (buy them coffee, for example J ) and get into a conversation about the network settings. Have them explain to you how the network cards are setup – are they standalone, are they ‘teamed’, what are the settings – full duplex and so on. Find some time to read a bit about networking. In this short blog post I hope I have turned your attention to ‘the big picture’ and the fact that there are other factors affecting our SQL Server, aside from its internal workings. As a further reading I would still highly recommend the Wait Stats series on this blog, also I would recommend you have the coffee break conversation with your network admin as soon as possible. This guest post is written by Feodor Georgiev. Read all the post in the Wait Types and Queue series. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, Readers Contribution, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Wait Stats, SQL Wait Types, T SQL

    Read the article

  • SQL SERVER – Weekly Series – Memory Lane – #037

    - by Pinal Dave
    Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Convert Text to Numbers (Integer) – CAST and CONVERT If table column is VARCHAR and has all the numeric values in it, it can be retrieved as Integer using CAST or CONVERT function. List All Stored Procedure Modified in Last N Days If SQL Server suddenly start behaving in un-expectable behavior and if stored procedure were changed recently, following script can be used to check recently modified stored procedure. If a stored procedure was created but never modified afterwards modified date and create a date for that stored procedure are same. Count Duplicate Records – Rows Validate Field For DATE datatype using function ISDATE() We always checked DATETIME field for incorrect data type. One of the user input date as 30/2/2007. The date was sucessfully inserted in the temp table but while inserting from temp table to final table it crashed with error. We had now task to validate incorrect date value before we insert in final table. Jr. Developer asked me how can he do that? We check for incorrect data type (varchar, int, NULL) but this is incorrect date value. Regular expression works fine with them because of mm/dd/yyyy format. 2008 Find Space Used For Any Particular Table It is very simple to find out the space used by any table in the database. Two Convenient Features Inline Assignment – Inline Operations Here is the script which does both – Inline Assignment and Inline Operation DECLARE @idx INT = 0 SET @idx+=1 SELECT @idx Introduction to SPARSE Columns SPARSE column are better at managing NULL and ZERO values in SQL Server. It does not take any space in database at all. If column is created with SPARSE clause with it and it contains ZERO or NULL it will be take lesser space then regular column (without SPARSE clause). SP_CONFIGURE – Displays or Changes Global Configuration Settings If advanced settings are not enabled at configuration level SQL Server will not let user change the advanced features on server. Authorized user can turn on or turn off advance settings. 2009 Standby Servers and Types of Standby Servers Standby Server is a type of server that can be brought online in a situation when Primary Server goes offline and application needs continuous (high) availability of the server. There is always a need to set up a mechanism where data and objects from primary server are moved to secondary (standby) server. BLOB – Pointer to Image, Image in Database, FILESTREAM Storage When it comes to storing images in database there are two common methods. I had previously blogged about the same subject on my visit to Toronto. With SQL Server 2008, we have a new method of FILESTREAM storage. However, the answer on when to use FILESTREAM and when to use other methods is still vague in community. 2010 Upper Case Shortcut SQL Server Management Studio I select the word and hit CTRL+SHIFT+U and it SSMS immediately changes the case of the selected word. Similar way if one want to convert cases to lower case, another short cut CTRL+SHIFT+L is also available. The Self Join – Inner Join and Outer Join Self Join has always been a noteworthy case. It is interesting to ask questions about self join in a room full of developers. I often ask – if there are three kinds of joins, i.e.- Inner Join, Outer Join and Cross Join; what type of join is Self Join? The usual answer is that it is an Inner Join. However, the reality is very different. Parallelism – Row per Processor – Row per Thread – Thread 0  If you look carefully in the Properties window or XML Plan, there is “Thread 0?. What does this “Thread 0” indicate? Well find out from the blog post. How do I Learn and How do I Teach The blog post has raised three very interesting questions. How do you learn? How do you teach? What are you learning or teaching? Let me try to answer the same. 2011 SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 7 of 31 What are Different Types of Locks? What are Pessimistic Lock and Optimistic Lock? When is the use of UPDATE_STATISTICS command? What is the Difference between a HAVING clause and a WHERE clause? What is Connection Pooling and why it is Used? What are the Properties and Different Types of Sub-Queries? What are the Authentication Modes in SQL Server? How can it be Changed? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 8 of 31 Which Command using Query Analyzer will give you the Version of SQL Server and Operating System? What is an SQL Server Agent? Can a Stored Procedure call itself or a Recursive Stored Procedure? How many levels of SP nesting is possible? What is Log Shipping? Name 3 ways to get an Accurate Count of the Number of Records in a Table? What does it mean to have QUOTED_IDENTIFIER ON? What are the Implications of having it OFF? What is the Difference between a Local and a Global Temporary Table? What is the STUFF Function and How Does it Differ from the REPLACE Function? What is PRIMARY KEY? What is UNIQUE KEY Constraint? What is FOREIGN KEY? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 9 of 31 What is CHECK Constraint? What is NOT NULL Constraint? What is the difference between UNION and UNION ALL? What is B-Tree? How to get @@ERROR and @@ROWCOUNT at the Same Time? What is a Scheduled Job or What is a Scheduled Task? What are the Advantages of Using Stored Procedures? What is a Table Called, if it has neither Cluster nor Non-cluster Index? What is it Used for? Can SQL Servers Linked to other Servers like Oracle? What is BCP? When is it Used? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 10 of 31 What Command do we Use to Rename a db, a Table and a Column? What are sp_configure Commands and SET Commands? How to Implement One-to-One, One-to-Many and Many-to-Many Relationships while Designing Tables? What is Difference between Commit and Rollback when Used in Transactions? What is an Execution Plan? When would you Use it? How would you View the Execution Plan? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 11 of 31 What is Difference between Table Aliases and Column Aliases? Do they Affect Performance? What is the difference between CHAR and VARCHAR Datatypes? What is the Difference between VARCHAR and VARCHAR(MAX) Datatypes? What is the Difference between VARCHAR and NVARCHAR datatypes? Which are the Important Points to Note when Multilanguage Data is Stored in a Table? How to Optimize Stored Procedure Optimization? What is SQL Injection? How to Protect Against SQL Injection Attack? How to Find Out the List Schema Name and Table Name for the Database? What is CHECKPOINT Process in the SQL Server? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 12 of 31 How does Using a Separate Hard Drive for Several Database Objects Improves Performance Right Away? How to Find the List of Fixed Hard Drive and Free Space on Server? Why can there be only one Clustered Index and not more than one? What is Difference between Line Feed (\n) and Carriage Return (\r)? Is It Possible to have Clustered Index on Separate Drive From Original Table Location? What is a Hint? How to Delete Duplicate Rows? Why the Trigger Fires Multiple Times in Single Login? 2012 CTRL+SHIFT+] Shortcut to Select Code Between Two Parenthesis Shortcut key is CTRL+SHIFT+]. This key can be very useful when dealing with multiple subqueries, CTE or query with multiple parentheses. When exercised this shortcut key it selects T-SQL code between two parentheses. Monday Morning Puzzle – Query Returns Results Sometimes but Not Always I am beginner with SQL Server. I have one query, it sometime returns a result and sometime it does not return me the result. Where should I start looking for a solution and what kind of information I should send to you so you can help me with solving. I have no clue, please guide me. Remove Debug Button in SSMS – SQL in Sixty Seconds #020 – Video Effect of Case Sensitive Collation on Resultset Collation is a very interesting concept but I quite often see it is heavily neglected. I have seen developer and DBA looking for a workaround to fix collation error rather than understanding if the side effect of the workaround. Switch Between Two Parenthesis using Shortcut CTRL+] Earlier this week I wrote a blog post about CTRL+SHIFT+] Shortcut to Select Code Between Two Parenthesis, I received quite a lot of positive feedback from readers. If you are a regular reader of the blog post, you must be aware that I appreciate the learning shared by readers. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • Solving Big Problems with Oracle R Enterprise, Part II

    - by dbayard
    Part II – Solving Big Problems with Oracle R Enterprise In the first post in this series (see https://blogs.oracle.com/R/entry/solving_big_problems_with_oracle), we showed how you can use R to perform historical rate of return calculations against investment data sourced from a spreadsheet.  We demonstrated the calculations against sample data for a small set of accounts.  While this worked fine, in the real-world the problem is much bigger because the amount of data is much bigger.  So much bigger that our approach in the previous post won’t scale to meet the real-world needs. From our previous post, here are the challenges we need to conquer: The actual data that needs to be used lives in a database, not in a spreadsheet The actual data is much, much bigger- too big to fit into the normal R memory space and too big to want to move across the network The overall process needs to run fast- much faster than a single processor The actual data needs to be kept secured- another reason to not want to move it from the database and across the network And the process of calculating the IRR needs to be integrated together with other database ETL activities, so that IRR’s can be calculated as part of the data warehouse refresh processes In this post, we will show how we moved from sample data environment to working with full-scale data.  This post is based on actual work we did for a financial services customer during a recent proof-of-concept. Getting started with the Database At this point, we have some sample data and our IRR function.  We were at a similar point in our customer proof-of-concept exercise- we had sample data but we did not have the full customer data yet.  So our database was empty.  But, this was easily rectified by leveraging the transparency features of Oracle R Enterprise (see https://blogs.oracle.com/R/entry/analyzing_big_data_using_the).  The following code shows how we took our sample data SimpleMWRRData and easily turned it into a new Oracle database table called IRR_DATA via ore.create().  The code also shows how we can access the database table IRR_DATA as if it was a normal R data.frame named IRR_DATA. If we go to sql*plus, we can also check out our new IRR_DATA table: At this point, we now have our sample data loaded in the database as a normal Oracle table called IRR_DATA.  So, we now proceeded to test our R function working with database data. As our first test, we retrieved the data from a single account from the IRR_DATA table, pull it into local R memory, then call our IRR function.  This worked.  No SQL coding required! Going from Crawling to Walking Now that we have shown using our R code with database-resident data for a single account, we wanted to experiment with doing this for multiple accounts.  In other words, we wanted to implement the split-apply-combine technique we discussed in our first post in this series.  Fortunately, Oracle R Enterprise provides a very scalable way to do this with a function called ore.groupApply().  You can read more about ore.groupApply() here: https://blogs.oracle.com/R/entry/analyzing_big_data_using_the1 Here is an example of how we ask ORE to take our IRR_DATA table in the database, split it by the ACCOUNT column, apply a function that calls our SimpleMWRR() calculation, and then combine the results. (If you are following along at home, be sure to have installed our myIRR package on your database server via  “R CMD INSTALL myIRR”). The interesting thing about ore.groupApply is that the calculation is not actually performed in my desktop R environment from which I am running.  What actually happens is that ore.groupApply uses the Oracle database to perform the work.  And the Oracle database is what actually splits the IRR_DATA table by ACCOUNT.  Then the Oracle database takes the data for each account and sends it to an embedded R engine running on the database server to apply our R function.  Then the Oracle database combines all the individual results from the calls to the R function. This is significant because now the embedded R engine only needs to deal with the data for a single account at a time.  Regardless of whether we have 20 accounts or 1 million accounts or more, the R engine that performs the calculation does not care.  Given that normal R has a finite amount of memory to hold data, the ore.groupApply approach overcomes the R memory scalability problem since we only need to fit the data from a single account in R memory (not all of the data for all of the accounts). Additionally, the IRR_DATA does not need to be sent from the database to my desktop R program.  Even though I am invoking ore.groupApply from my desktop R program, because the actual SimpleMWRR calculation is run by the embedded R engine on the database server, the IRR_DATA does not need to leave the database server- this is both a performance benefit because network transmission of large amounts of data take time and a security benefit because it is harder to protect private data once you start shipping around your intranet. Another benefit, which we will discuss in a few paragraphs, is the ability to leverage Oracle database parallelism to run these calculations for dozens of accounts at once. From Walking to Running ore.groupApply is rather nice, but it still has the drawback that I run this from a desktop R instance.  This is not ideal for integrating into typical operational processes like nightly data warehouse refreshes or monthly statement generation.  But, this is not an issue for ORE.  Oracle R Enterprise lets us run this from the database using regular SQL, which is easily integrated into standard operations.  That is extremely exciting and the way we actually did these calculations in the customer proof. As part of Oracle R Enterprise, it provides a SQL equivalent to ore.groupApply which it refers to as “rqGroupEval”.  To use rqGroupEval via SQL, there is a bit of simple setup needed.  Basically, the Oracle Database needs to know the structure of the input table and the grouping column, which we are able to define using the database’s pipeline table function mechanisms. Here is the setup script: At this point, our initial setup of rqGroupEval is done for the IRR_DATA table.  The next step is to define our R function to the database.  We do that via a call to ORE’s rqScriptCreate. Now we can test it.  The SQL you use to run rqGroupEval uses the Oracle database pipeline table function syntax.  The first argument to irr_dataGroupEval is a cursor defining our input.  You can add additional where clauses and subqueries to this cursor as appropriate.  The second argument is any additional inputs to the R function.  The third argument is the text of a dummy select statement.  The dummy select statement is used by the database to identify the columns and datatypes to expect the R function to return.  The fourth argument is the column of the input table to split/group by.  The final argument is the name of the R function as you defined it when you called rqScriptCreate(). The Real-World Results In our real customer proof-of-concept, we had more sophisticated calculation requirements than shown in this simplified blog example.  For instance, we had to perform the rate of return calculations for 5 separate time periods, so the R code was enhanced to do so.  In addition, some accounts needed a time-weighted rate of return to be calculated, so we extended our approach and added an R function to do that.  And finally, there were also a few more real-world data irregularities that we needed to account for, so we added logic to our R functions to deal with those exceptions.  For the full-scale customer test, we loaded the customer data onto a Half-Rack Exadata X2-2 Database Machine.  As our half-rack had 48 physical cores (and 96 threads if you consider hyperthreading), we wanted to take advantage of that CPU horsepower to speed up our calculations.  To do so with ORE, it is as simple as leveraging the Oracle Database Parallel Query features.  Let’s look at the SQL used in the customer proof: Notice that we use a parallel hint on the cursor that is the input to our rqGroupEval function.  That is all we need to do to enable Oracle to use parallel R engines. Here are a few screenshots of what this SQL looked like in the Real-Time SQL Monitor when we ran this during the proof of concept (hint: you might need to right-click on these images to be able to view the images full-screen to see the entire image): From the above, you can notice a few things (numbers 1 thru 5 below correspond with highlighted numbers on the images above.  You may need to right click on the above images and view the images full-screen to see the entire image): The SQL completed in 110 seconds (1.8minutes) We calculated rate of returns for 5 time periods for each of 911k accounts (the number of actual rows returned by the IRRSTAGEGROUPEVAL operation) We accessed 103m rows of detailed cash flow/market value data (the number of actual rows returned by the IRR_STAGE2 operation) We ran with 72 degrees of parallelism spread across 4 database servers Most of our 110seconds was spent in the “External Procedure call” event On average, we performed 8,200 executions of our R function per second (110s/911k accounts) On average, each execution was passed 110 rows of data (103m detail rows/911k accounts) On average, we did 41,000 single time period rate of return calculations per second (each of the 8,200 executions of our R function did rate of return calculations for 5 time periods) On average, we processed over 900,000 rows of database data in R per second (103m detail rows/110s) R + Oracle R Enterprise: Best of R + Best of Oracle Database This blog post series started by describing a real customer problem: how to perform a lot of calculations on a lot of data in a short period of time.  While standard R proved to be a very good fit for writing the necessary calculations, the challenge of working with a lot of data in a short period of time remained. This blog post series showed how Oracle R Enterprise enables R to be used in conjunction with the Oracle Database to overcome the data volume and performance issues (as well as simplifying the operations and security issues).  It also showed that we could calculate 5 time periods of rate of returns for almost a million individual accounts in less than 2 minutes. In a future post, we will take the same R function and show how Oracle R Connector for Hadoop can be used in the Hadoop world.  In that next post, instead of having our data in an Oracle database, our data will live in Hadoop and we will how to use the Oracle R Connector for Hadoop and other Oracle Big Data Connectors to move data between Hadoop, R, and the Oracle Database easily.

    Read the article

  • Building Queries Systematically

    - by Jeremy Smyth
    The SQL language is a bit like a toolkit for data. It consists of lots of little fiddly bits of syntax that, taken together, allow you to build complex edifices and return powerful results. For the uninitiated, the many tools can be quite confusing, and it's sometimes difficult to decide how to go about the process of building non-trivial queries, that is, queries that are more than a simple SELECT a, b FROM c; A System for Building Queries When you're building queries, you could use a system like the following:  Decide which fields contain the values you want to use in our output, and how you wish to alias those fields Values you want to see in your output Values you want to use in calculations . For example, to calculate margin on a product, you could calculate price - cost and give it the alias margin. Values you want to filter with. For example, you might only want to see products that weigh more than 2Kg or that are blue. The weight or colour columns could contain that information. Values you want to order by. For example you might want the most expensive products first, and the least last. You could use the price column in descending order to achieve that. Assuming the fields you've picked in point 1 are in multiple tables, find the connections between those tables Look for relationships between tables and identify the columns that implement those relationships. For example, The Orders table could have a CustomerID field referencing the same column in the Customers table. Sometimes the problem doesn't use relationships but rests on a different field; sometimes the query is looking for a coincidence of fact rather than a foreign key constraint. For example you might have sales representatives who live in the same state as a customer; this information is normally not used in relationships, but if your query is for organizing events where sales representatives meet customers, it's useful in that query. In such a case you would record the names of columns at either end of such a connection. Sometimes relationships require a bridge, a junction table that wasn't identified in point 1 above but is needed to connect tables you need; these are used in "many-to-many relationships". In these cases you need to record the columns in each table that connect to similar columns in other tables. Construct a join or series of joins using the fields and tables identified in point 2 above. This becomes your FROM clause. Filter using some of the fields in point 1 above. This becomes your WHERE clause. Construct an ORDER BY clause using values from point 1 above that are relevant to the desired order of the output rows. Project the result using the remainder of the fields in point 1 above. This becomes your SELECT clause. A Worked Example   Let's say you want to query the world database to find a list of countries (with their capitals) and the change in GNP, using the difference between the GNP and GNPOld columns, and that you only want to see results for countries with a population greater than 100,000,000. Using the system described above, we could do the following:  The Country.Name and City.Name columns contain the name of the country and city respectively.  The change in GNP comes from the calculation GNP - GNPOld. Both those columns are in the Country table. This calculation is also used to order the output, in descending order To see only countries with a population greater than 100,000,000, you need the Population field of the Country table. There is also a Population field in the City table, so you'll need to specify the table name to disambiguate. You can also represent a number like 100 million as 100e6 instead of 100000000 to make it easier to read. Because the fields come from the Country and City tables, you'll need to join them. There are two relationships between these tables: Each city is hosted within a country, and the city's CountryCode column identifies that country. Also, each country has a capital city, whose ID is contained within the country's Capital column. This latter relationship is the one to use, so the relevant columns and the condition that uses them is represented by the following FROM clause:  FROM Country JOIN City ON Country.Capital = City.ID The statement should only return countries with a population greater than 100,000,000. Country.Population is the relevant column, so the WHERE clause becomes:  WHERE Country.Population > 100e6  To sort the result set in reverse order of difference in GNP, you could use either the calculation, or the position in the output (it's the third column): ORDER BY GNP - GNPOld or ORDER BY 3 Finally, project the columns you wish to see by constructing the SELECT clause: SELECT Country.Name AS Country, City.Name AS Capital,        GNP - GNPOld AS `Difference in GNP`  The whole statement ends up looking like this:  mysql> SELECT Country.Name AS Country, City.Name AS Capital, -> GNP - GNPOld AS `Difference in GNP` -> FROM Country JOIN City ON Country.Capital = City.ID -> WHERE Country.Population > 100e6 -> ORDER BY 3 DESC; +--------------------+------------+-------------------+ | Country            | Capital    | Difference in GNP | +--------------------+------------+-------------------+ | United States | Washington | 399800.00 | | China | Peking | 64549.00 | | India | New Delhi | 16542.00 | | Nigeria | Abuja | 7084.00 | | Pakistan | Islamabad | 2740.00 | | Bangladesh | Dhaka | 886.00 | | Brazil | Brasília | -27369.00 | | Indonesia | Jakarta | -130020.00 | | Russian Federation | Moscow | -166381.00 | | Japan | Tokyo | -405596.00 | +--------------------+------------+-------------------+ 10 rows in set (0.00 sec) Queries with Aggregates and GROUP BY While this system might work well for many queries, it doesn't cater for situations where you have complex summaries and aggregation. For aggregation, you'd start with choosing which columns to view in the output, but this time you'd construct them as aggregate expressions. For example, you could look at the average population, or the count of distinct regions.You could also perform more complex aggregations, such as the average of GNP per head of population calculated as AVG(GNP/Population). Having chosen the values to appear in the output, you must choose how to aggregate those values. A useful way to think about this is that every aggregate query is of the form X, Y per Z. The SELECT clause contains the expressions for X and Y, as already described, and Z becomes your GROUP BY clause. Ordinarily you would also include Z in the query so you see how you are grouping, so the output becomes Z, X, Y per Z.  As an example, consider the following, which shows a count of  countries and the average population per continent:  mysql> SELECT Continent, COUNT(Name), AVG(Population)     -> FROM Country     -> GROUP BY Continent; +---------------+-------------+-----------------+ | Continent     | COUNT(Name) | AVG(Population) | +---------------+-------------+-----------------+ | Asia          |          51 |   72647562.7451 | | Europe        |          46 |   15871186.9565 | | North America |          37 |   13053864.8649 | | Africa        |          58 |   13525431.0345 | | Oceania       |          28 |    1085755.3571 | | Antarctica    |           5 |          0.0000 | | South America |          14 |   24698571.4286 | +---------------+-------------+-----------------+ 7 rows in set (0.00 sec) In this case, X is the number of countries, Y is the average population, and Z is the continent. Of course, you could have more fields in the SELECT clause, and  more fields in the GROUP BY clause as you require. You would also normally alias columns to make the output more suited to your requirements. More Complex Queries  Queries can get considerably more interesting than this. You could also add joins and other expressions to your aggregate query, as in the earlier part of this post. You could have more complex conditions in the WHERE clause. Similarly, you could use queries such as these in subqueries of yet more complex super-queries. Each technique becomes another tool in your toolbox, until before you know it you're writing queries across 15 tables that take two pages to write out. But that's for another day...

    Read the article

  • Generated LinqtoSql Sql 5x slower than SAME EXACT hand-written sql

    - by JasonM
    I have a sql statement which is hardcoded in an existing VB6 app. I'm upgrading a new version in C# and using Linq To Sql. I was able to get LinqToSql to generate the same sql (before I start refactoring), but for some reason the Sql generated by LinqToSql is 5x slower than the original sql. This is running the generated Sql Directly in LinqPad. The only real difference my meager sql eyes can spot is the WITH (NOLOCK), which if I add into the LinqToSql generated sql, makes no difference. Can someone point out what I'm doing wrong here? Thanks! Existing Hard Coded Sql (5.0 Seconds) SELECT DISTINCT CH.ClaimNum, CH.AcnProvID, CH.AcnPatID, CH.TinNum, CH.Diag1, CH.GroupNum, CH.AllowedTotal FROM Claims.dbo.T_ClaimsHeader AS CH WITH (NOLOCK) WHERE CH.ContractID IN ('123A','123B','123C','123D','123E','123F','123G','123H') AND ( ( (CH.Transmited Is Null or CH.Transmited = '') AND CH.DateTransmit Is Null AND CH.EobDate Is Null AND CH.ProcessFlag IN ('Y','E') AND CH.DataSource NOT IN ('A','EC','EU') AND CH.AllowedTotal > 0 ) ) ORDER BY CH.AcnPatID, CH.ClaimNum Generated Sql from LinqToSql (27.6 Seconds) -- Region Parameters DECLARE @p0 NVarChar(4) SET @p0 = '123A' DECLARE @p1 NVarChar(4) SET @p1 = '123B' DECLARE @p2 NVarChar(4) SET @p2 = '123C' DECLARE @p3 NVarChar(4) SET @p3 = '123D' DECLARE @p4 NVarChar(4) SET @p4 = '123E' DECLARE @p5 NVarChar(4) SET @p5 = '123F' DECLARE @p6 NVarChar(4) SET @p6 = '123G' DECLARE @p7 NVarChar(4) SET @p7 = '123H' DECLARE @p8 VarChar(1) SET @p8 = '' DECLARE @p9 NVarChar(1) SET @p9 = 'Y' DECLARE @p10 NVarChar(1) SET @p10 = 'E' DECLARE @p11 NVarChar(1) SET @p11 = 'A' DECLARE @p12 NVarChar(2) SET @p12 = 'EC' DECLARE @p13 NVarChar(2) SET @p13 = 'EU' DECLARE @p14 Decimal(5,4) SET @p14 = 0 -- EndRegion SELECT DISTINCT [t0].[ClaimNum], [t0].[acnprovid] AS [AcnProvID], [t0].[acnpatid] AS [AcnPatID], [t0].[tinnum] AS [TinNum], [t0].[diag1] AS [Diag1], [t0].[GroupNum], [t0].[allowedtotal] AS [AllowedTotal] FROM [Claims].[dbo].[T_ClaimsHeader] AS [t0] WHERE ([t0].[contractid] IN (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7)) AND (([t0].[Transmited] IS NULL) OR ([t0].[Transmited] = @p8)) AND ([t0].[DATETRANSMIT] IS NULL) AND ([t0].[EOBDATE] IS NULL) AND ([t0].[PROCESSFLAG] IN (@p9, @p10)) AND (NOT ([t0].[DataSource] IN (@p11, @p12, @p13))) AND ([t0].[allowedtotal] > @p14) ORDER BY [t0].[acnpatid], [t0].[ClaimNum] New LinqToSql Code (30+ seconds... Times out ) var contractIds = T_ContractDatas.Where(x => x.EdiSubmissionGroupID == "123-01").Select(x => x.CONTRACTID).ToList(); var processFlags = new List<string> {"Y","E"}; var dataSource = new List<string> {"A","EC","EU"}; var results = (from claims in T_ClaimsHeaders where contractIds.Contains(claims.contractid) && (claims.Transmited == null || claims.Transmited == string.Empty ) && claims.DATETRANSMIT == null && claims.EOBDATE == null && processFlags.Contains(claims.PROCESSFLAG) && !dataSource.Contains(claims.DataSource) && claims.allowedtotal > 0 select new { ClaimNum = claims.ClaimNum, AcnProvID = claims.acnprovid, AcnPatID = claims.acnpatid, TinNum = claims.tinnum, Diag1 = claims.diag1, GroupNum = claims.GroupNum, AllowedTotal = claims.allowedtotal }).OrderBy(x => x.ClaimNum).OrderBy(x => x.AcnPatID).Distinct(); I'm using the list of constants above to make LinqToSql Generate IN ('xxx','xxx',etc) Otherwise it uses subqueries which are just as slow...

    Read the article

  • How to do left joins with least-n-per-group query?

    - by Nate
    I'm trying to get a somewhat complicated query working and am not having any luck whatsoever. Suppose I have the following tables: cart_items: +--------------------------------------------+ | item_id | cart_id | movie_name | quantity | +--------------------------------------------+ | 0 | 0 | braveheart | 4 | | 1 | 0 | braveheart | 9 | | . | . | . | . | | . | . | . | . | | . | . | . | . | | . | . | . | . | +--------------------------------------------+ movies: +------------------------------+ | movie_id | movie_name | ... | +------------------------------+ | 0 | braveheart | . | | . | . | . | | . | . | . | | . | . | . | | . | . | . | +------------------------------+ pricing: +-----------------------------------------+ | id | movie_name | quantity | price_per | +-----------------------------------------+ | 0 | braveheart | 1 | 1.99 | | 1 | braveheart | 2 | 1.50 | | 2 | braveheart | 4 | 1.25 | | 3 | braveheart | 8 | 1.00 | | . | . | . | . | | . | . | . | . | | . | . | . | . | | . | . | . | . | | . | . | . | . | +-----------------------------------------+ I need to join the data from the tables, but with the added complexity that I need to get appropriate price_per from the pricing table. Only one price should be returned for each cart_item, and that should be the lowest price from the pricing table where the quantity for the cart item is at least the quantity in the pricing table. So, the query should return for each item in cart_items the following: +---------------------------------------------+ | item_id | movie_name | quantity | price_per | +---------------------------------------------+ Example 1: Variable passed to the query: cart_id = 0. Return: +---------------------------------------------+ | item_id | movie_name | quantity | price_per | +---------------------------------------------+ | 0 | braveheart | 4 | 1.25 | | 1 | braveheart | 9 | 1.00 | +---------------------------------------------+ Note that this is a minimalist example and that additional data will be pulled from the tables mentioned (particularly the movies table). How could this query be composed? I have tried using left joins and subqueries, but the difficult part is getting the price and nothing I have tried has worked. Thanks for your help. EDIT: I think this is similar to what I have working with my "real" tables: SELECT t1.item_id, t2.movie_name, t1.quantity FROM cart_items t1 LEFT JOIN movies t2 ON t2.movie_name = t1.movie_name WHERE t1.cart_id = 0 Assuming I wrote that correctly (I quickly tried to "port over" my real query), then the output would currently be: +---------------------------------+ | item_id | movie_name | quantity | +---------------------------------+ | 0 | braveheart | 4 | | 1 | braveheart | 9 | +---------------------------------+ The trouble I'm having is joining the price at a certain quantity for a movie. I simply cannot figure out how to do it.

    Read the article

  • Why is SQLite3 using covering indices instead of the indices I created?

    - by Geoff
    I have an extremely large database (contacts has ~3 billion entries, people has ~280 million entries, and the other tables have a negligible number of entries). Most other queries I've run are really fast. However, I've encountered a more complicated query that's really slow. I'm wondering if there's any way to speed this up. First of all, here is my schema: CREATE TABLE activities (id INTEGER PRIMARY KEY, name TEXT NOT NULL); CREATE TABLE contacts ( id INTEGER PRIMARY KEY, person1_id INTEGER NOT NULL, person2_id INTEGER NOT NULL, duration REAL NOT NULL, -- hours activity_id INTEGER NOT NULL -- FOREIGN_KEY(person1_id) REFERENCES people(id), -- FOREIGN_KEY(person2_id) REFERENCES people(id) ); CREATE TABLE people ( id INTEGER PRIMARY KEY, state_id INTEGER NOT NULL, county_id INTEGER NOT NULL, age INTEGER NOT NULL, gender TEXT NOT NULL, -- M or F income INTEGER NOT NULL -- FOREIGN_KEY(state_id) REFERENCES states(id) ); CREATE TABLE states ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, abbreviation TEXT NOT NULL ); CREATE INDEX activities_name_index on activities(name); CREATE INDEX contacts_activity_id_index on contacts(activity_id); CREATE INDEX contacts_duration_index on contacts(duration); CREATE INDEX contacts_person1_id_index on contacts(person1_id); CREATE INDEX contacts_person2_id_index on contacts(person2_id); CREATE INDEX people_age_index on people(age); CREATE INDEX people_county_id_index on people(county_id); CREATE INDEX people_gender_index on people(gender); CREATE INDEX people_income_index on people(income); CREATE INDEX people_state_id_index on people(state_id); CREATE INDEX states_abbreviation_index on states(abbreviation); CREATE INDEX states_name_index on states(name); Note that I've created an index on every column in the database. I don't care about the size of the database; speed is all I care about. Here's an example of a query that, as expected, runs almost instantly: SELECT count(*) FROM people, states WHERE people.state_id=states.id and states.abbreviation='IA'; Here's the troublesome query: SELECT * FROM contacts WHERE rowid IN (SELECT contacts.rowid FROM contacts, people, states WHERE contacts.person1_id=people.id AND people.state_id=states.id AND states.name='Kansas' INTERSECT SELECT contacts.rowid FROM contacts, people, states WHERE contacts.person2_id=people.id AND people.state_id=states.id AND states.name='Missouri'); Now, what I think would happen is that each subquery would use each relevant index I've created to speed this up. However, when I show the query plan, I see this: sqlite> EXPLAIN QUERY PLAN SELECT * FROM contacts WHERE rowid IN (SELECT contacts.rowid FROM contacts, people, states WHERE contacts.person1_id=people.id AND people.state_id=states.id AND states.name='Kansas' INTERSECT SELECT contacts.rowid FROM contacts, people, states WHERE contacts.person2_id=people.id AND people.state_id=states.id AND states.name='Missouri'); 0|0|0|SEARCH TABLE contacts USING INTEGER PRIMARY KEY (rowid=?) (~25 rows) 0|0|0|EXECUTE LIST SUBQUERY 1 2|0|2|SEARCH TABLE states USING COVERING INDEX states_name_index (name=?) (~1 rows) 2|1|1|SEARCH TABLE people USING COVERING INDEX people_state_id_index (state_id=?) (~5569556 rows) 2|2|0|SEARCH TABLE contacts USING COVERING INDEX contacts_person1_id_index (person1_id=?) (~12 rows) 3|0|2|SEARCH TABLE states USING COVERING INDEX states_name_index (name=?) (~1 rows) 3|1|1|SEARCH TABLE people USING COVERING INDEX people_state_id_index (state_id=?) (~5569556 rows) 3|2|0|SEARCH TABLE contacts USING COVERING INDEX contacts_person2_id_index (person2_id=?) (~12 rows) 1|0|0|COMPOUND SUBQUERIES 2 AND 3 USING TEMP B-TREE (INTERSECT) In fact, if I show the query plan for the first query I posted, I get this: sqlite> EXPLAIN QUERY PLAN SELECT count(*) FROM people, states WHERE people.state_id=states.id and states.abbreviation='IA'; 0|0|1|SEARCH TABLE states USING COVERING INDEX states_abbreviation_index (abbreviation=?) (~1 rows) 0|1|0|SEARCH TABLE people USING COVERING INDEX people_state_id_index (state_id=?) (~5569556 rows) Why is SQLite using covering indices instead of the indices I created? Shouldn't the search in the people table be able to happen in log(n) time given state_id which in turn is found in log(n) time?

    Read the article

  • XML parsing and transforming (XSLT or otherwise)

    - by observ
    I have several xml files that are formated this way: <ROOT> <OBJECT> <identity> <id>123</id> </identity> <child2 attr = "aa">32</child2> <child3> <childOfChild3 att1="aaa" att2="bbb" att3="CCC">LN</childOfChild3> </child3> <child4> <child5> <child6>3ddf</child6> <child7> <childOfChild7 att31="RR">1231</childOfChild7> </child7> </child5> </child4> </OBJECT> <OBJECT> <identity> <id>124</id> </identity> <child2 attr = "bb">212</child2> <child3> <childOfChild3 att1="ee" att2="ccc" att3="EREA">OP</childOfChild3> </child3> <child4> <child5> <child6>213r</child6> <child7> <childOfChild7 att31="EE">1233</childOfChild7> </child7> </child5> </child4> </OBJECT> </ROOT> How can i format it this way?: <ROOT> <OBJECT> <id>123</id> <child2>32</child2> <attr>aa</attr> <child3></child3> <childOfChild3>LN</childOfChild3> <att1>aaa</att1> <att2>bbb</att2> <att3>CCC</att3> <child4></child4> <child5></child5> <child6>3ddf</child6> <child7></child7> <childOfChild7>1231</childOfChild7> <att31>RR</att31> </OBJECT> <OBJECT> <id>124</id> <child2>212</child2> <attr>bb</attr> <child3></child3> <childOfChild3>LN</childOfChild3> <att1>ee</att1> <att2>ccc</att2> <att3>EREA</att3> <child4></child4> <child5></child5> <child6>213r</child6> <child7></child7> <childOfChild7>1233</childOfChild7> <att31>EE</att31> </OBJECT> </ROOT> I know some C# so maybe a parser there? or some generic xslt? The xml files are some data received from a client, so i can't control the way they are sending it to me. L.E. Basically when i am trying to test this data in excel (for example i want to make sure that the attribute of childOfChild7 corresponds to the correct identity id) i am getting a lot of blank spaces. If i am importing in access to get only the data i want out, i have to do a thousands subqueries to get them all in a nice table. Basically i just want to see for one Object all its data (one object - One row) and then just delete/hide the columns i don't need.

    Read the article

  • Same SELECT used in an INSERT has different execution plan

    - by amacias
    A customer complained that a query and its INSERT counterpart had different execution plans, and of course, the INSERT was slower. First lets look at the SELECT : SELECT ua_tr_rundatetime,        ua_ch_treatmentcode,        ua_tr_treatmentcode,        ua_ch_cellid,        ua_tr_cellid FROM   (SELECT DISTINCT CH.treatmentcode AS UA_CH_TREATMENTCODE,                         CH.cellid        AS UA_CH_CELLID         FROM    CH,                 DL         WHERE  CH.contactdatetime > SYSDATE - 5                AND CH.treatmentcode = DL.treatmentcode) CH_CELLS,        (SELECT DISTINCT T.treatmentcode AS UA_TR_TREATMENTCODE,                         T.cellid        AS UA_TR_CELLID,                         T.rundatetime   AS UA_TR_RUNDATETIME         FROM    T,                 DL         WHERE  T.treatmentcode = DL.treatmentcode) TRT_CELLS WHERE  CH_CELLS.ua_ch_treatmentcode(+) = TRT_CELLS.ua_tr_treatmentcode;  The query has 2 DISTINCT subqueries.  The execution plan shows one with DISTICT Placement transformation applied and not the other. The view in Step 5 has the prefix VW_DTP which means DISTINCT Placement. -------------------------------------------------------------------- | Id  | Operation                    | Name            | Cost (%CPU) -------------------------------------------------------------------- |   0 | SELECT STATEMENT             |                 |   272K(100) |*  1 |  HASH JOIN OUTER             |                 |   272K  (1) |   2 |   VIEW                       |                 |  4408   (1) |   3 |    HASH UNIQUE               |                 |  4408   (1) |*  4 |     HASH JOIN                |                 |  4407   (1) |   5 |      VIEW                    | VW_DTP_48BAF62C |  1660   (2) |   6 |       HASH UNIQUE            |                 |  1660   (2) |   7 |        TABLE ACCESS FULL     | DL              |  1644   (1) |   8 |      TABLE ACCESS FULL       | T               |  2744   (1) |   9 |   VIEW                       |                 |   267K  (1) |  10 |    HASH UNIQUE               |                 |   267K  (1) |* 11 |     HASH JOIN                |                 |   267K  (1) |  12 |      PARTITION RANGE ITERATOR|                 |   266K  (1) |* 13 |       TABLE ACCESS FULL      | CH              |   266K  (1) |  14 |      TABLE ACCESS FULL       | DL              |  1644   (1) -------------------------------------------------------------------- Query Block Name / Object Alias (identified by operation id): -------------------------------------------------------------    1 - SEL$1    2 - SEL$AF418D5F / TRT_CELLS@SEL$1    3 - SEL$AF418D5F    5 - SEL$F6AECEDE / VW_DTP_48BAF62C@SEL$48BAF62C    6 - SEL$F6AECEDE    7 - SEL$F6AECEDE / DL@SEL$3    8 - SEL$AF418D5F / T@SEL$3    9 - SEL$2        / CH_CELLS@SEL$1   10 - SEL$2   13 - SEL$2        / CH@SEL$2   14 - SEL$2        / DL@SEL$2 Predicate Information (identified by operation id): ---------------------------------------------------    1 - access("CH_CELLS"."UA_CH_TREATMENTCODE"="TRT_CELLS"."UA_TR_TREATMENTCODE")    4 - access("T"."TREATMENTCODE"="ITEM_1")   11 - access("CH"."TREATMENTCODE"="DL"."TREATMENTCODE")   13 - filter("CH"."CONTACTDATETIME">SYSDATE@!-5) The outline shows PLACE_DISTINCT(@"SEL$3" "DL"@"SEL$3") indicating that the QB3 is the one that got the transformation. Outline Data -------------   /*+       BEGIN_OUTLINE_DATA       IGNORE_OPTIM_EMBEDDED_HINTS       OPTIMIZER_FEATURES_ENABLE('11.2.0.3')       DB_VERSION('11.2.0.3')       ALL_ROWS       OUTLINE_LEAF(@"SEL$2")       OUTLINE_LEAF(@"SEL$F6AECEDE")       OUTLINE_LEAF(@"SEL$AF418D5F") PLACE_DISTINCT(@"SEL$3" "DL"@"SEL$3")       OUTLINE_LEAF(@"SEL$1")       OUTLINE(@"SEL$48BAF62C")       OUTLINE(@"SEL$3")       NO_ACCESS(@"SEL$1" "TRT_CELLS"@"SEL$1")       NO_ACCESS(@"SEL$1" "CH_CELLS"@"SEL$1")       LEADING(@"SEL$1" "TRT_CELLS"@"SEL$1" "CH_CELLS"@"SEL$1")       USE_HASH(@"SEL$1" "CH_CELLS"@"SEL$1")       FULL(@"SEL$2" "CH"@"SEL$2")       FULL(@"SEL$2" "DL"@"SEL$2")       LEADING(@"SEL$2" "CH"@"SEL$2" "DL"@"SEL$2")       USE_HASH(@"SEL$2" "DL"@"SEL$2")       USE_HASH_AGGREGATION(@"SEL$2")       NO_ACCESS(@"SEL$AF418D5F" "VW_DTP_48BAF62C"@"SEL$48BAF62C")       FULL(@"SEL$AF418D5F" "T"@"SEL$3")       LEADING(@"SEL$AF418D5F" "VW_DTP_48BAF62C"@"SEL$48BAF62C" "T"@"SEL$3")       USE_HASH(@"SEL$AF418D5F" "T"@"SEL$3")       USE_HASH_AGGREGATION(@"SEL$AF418D5F")       FULL(@"SEL$F6AECEDE" "DL"@"SEL$3")       USE_HASH_AGGREGATION(@"SEL$F6AECEDE")       END_OUTLINE_DATA   */ The 10053 shows there is a comparative of cost with and without the transformation. This means the transformation belongs to Cost-Based Query Transformations (CBQT). In SEL$3 the optimization of the query block without the transformation is 6659.73 and with the transformation is 4408.41 so the transformation is kept. GBP/DP: Checking validity of GBP/DP for query block SEL$3 (#3) DP: Checking validity of distinct placement for query block SEL$3 (#3) DP: Using search type: linear DP: Considering distinct placement on query block SEL$3 (#3) DP: Starting iteration 1, state space = (5) : (0) DP: Original query DP: Costing query block. DP: Updated best state, Cost = 6659.73 DP: Starting iteration 2, state space = (5) : (1) DP: Using DP transformation in this iteration. DP: Transformed query DP: Costing query block. DP: Updated best state, Cost = 4408.41 DP: Doing DP on the original QB. DP: Doing DP on the preserved QB. In SEL$2 the cost without the transformation is less than with it so it is not kept. GBP/DP: Checking validity of GBP/DP for query block SEL$2 (#2) DP: Checking validity of distinct placement for query block SEL$2 (#2) DP: Using search type: linear DP: Considering distinct placement on query block SEL$2 (#2) DP: Starting iteration 1, state space = (3) : (0) DP: Original query DP: Costing query block. DP: Updated best state, Cost = 267936.93 DP: Starting iteration 2, state space = (3) : (1) DP: Using DP transformation in this iteration. DP: Transformed query DP: Costing query block. DP: Not update best state, Cost = 267951.66 To the same query an INSERT INTO is added and the result is a very different execution plan. INSERT  INTO cc               (ua_tr_rundatetime,                ua_ch_treatmentcode,                ua_tr_treatmentcode,                ua_ch_cellid,                ua_tr_cellid)SELECT ua_tr_rundatetime,       ua_ch_treatmentcode,       ua_tr_treatmentcode,       ua_ch_cellid,       ua_tr_cellidFROM   (SELECT DISTINCT CH.treatmentcode AS UA_CH_TREATMENTCODE,                        CH.cellid        AS UA_CH_CELLID        FROM    CH,                DL        WHERE  CH.contactdatetime > SYSDATE - 5               AND CH.treatmentcode = DL.treatmentcode) CH_CELLS,       (SELECT DISTINCT T.treatmentcode AS UA_TR_TREATMENTCODE,                        T.cellid        AS UA_TR_CELLID,                        T.rundatetime   AS UA_TR_RUNDATETIME        FROM    T,                DL        WHERE  T.treatmentcode = DL.treatmentcode) TRT_CELLSWHERE  CH_CELLS.ua_ch_treatmentcode(+) = TRT_CELLS.ua_tr_treatmentcode;----------------------------------------------------------| Id  | Operation                     | Name | Cost (%CPU)----------------------------------------------------------|   0 | INSERT STATEMENT              |      |   274K(100)|   1 |  LOAD TABLE CONVENTIONAL      |      |            |*  2 |   HASH JOIN OUTER             |      |   274K  (1)|   3 |    VIEW                       |      |  6660   (1)|   4 |     SORT UNIQUE               |      |  6660   (1)|*  5 |      HASH JOIN                |      |  6659   (1)|   6 |       TABLE ACCESS FULL       | DL   |  1644   (1)|   7 |       TABLE ACCESS FULL       | T    |  2744   (1)|   8 |    VIEW                       |      |   267K  (1)|   9 |     SORT UNIQUE               |      |   267K  (1)|* 10 |      HASH JOIN                |      |   267K  (1)|  11 |       PARTITION RANGE ITERATOR|      |   266K  (1)|* 12 |        TABLE ACCESS FULL      | CH   |   266K  (1)|  13 |       TABLE ACCESS FULL       | DL   |  1644   (1)----------------------------------------------------------Query Block Name / Object Alias (identified by operation id):-------------------------------------------------------------   1 - SEL$1   3 - SEL$3 / TRT_CELLS@SEL$1   4 - SEL$3   6 - SEL$3 / DL@SEL$3   7 - SEL$3 / T@SEL$3   8 - SEL$2 / CH_CELLS@SEL$1   9 - SEL$2  12 - SEL$2 / CH@SEL$2  13 - SEL$2 / DL@SEL$2Predicate Information (identified by operation id):---------------------------------------------------   2 - access("CH_CELLS"."UA_CH_TREATMENTCODE"="TRT_CELLS"."UA_TR_TREATMENTCODE")   5 - access("T"."TREATMENTCODE"="DL"."TREATMENTCODE")  10 - access("CH"."TREATMENTCODE"="DL"."TREATMENTCODE")  12 - filter("CH"."CONTACTDATETIME">SYSDATE@!-5)Outline Data-------------  /*+      BEGIN_OUTLINE_DATA      IGNORE_OPTIM_EMBEDDED_HINTS      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')      DB_VERSION('11.2.0.3')      ALL_ROWS      OUTLINE_LEAF(@"SEL$2")      OUTLINE_LEAF(@"SEL$3")      OUTLINE_LEAF(@"SEL$1")      OUTLINE_LEAF(@"INS$1")      FULL(@"INS$1" "CC"@"INS$1")      NO_ACCESS(@"SEL$1" "TRT_CELLS"@"SEL$1")      NO_ACCESS(@"SEL$1" "CH_CELLS"@"SEL$1")      LEADING(@"SEL$1" "TRT_CELLS"@"SEL$1" "CH_CELLS"@"SEL$1")      USE_HASH(@"SEL$1" "CH_CELLS"@"SEL$1")      FULL(@"SEL$2" "CH"@"SEL$2")      FULL(@"SEL$2" "DL"@"SEL$2")      LEADING(@"SEL$2" "CH"@"SEL$2" "DL"@"SEL$2")      USE_HASH(@"SEL$2" "DL"@"SEL$2")      USE_HASH_AGGREGATION(@"SEL$2")      FULL(@"SEL$3" "DL"@"SEL$3")      FULL(@"SEL$3" "T"@"SEL$3")      LEADING(@"SEL$3" "DL"@"SEL$3" "T"@"SEL$3")      USE_HASH(@"SEL$3" "T"@"SEL$3")      USE_HASH_AGGREGATION(@"SEL$3")      END_OUTLINE_DATA  */ There is no DISTINCT Placement view and no hint.The 10053 trace shows a new legend "DP: Bypassed: Not SELECT"implying that this is a transformation that it is possible only for SELECTs. GBP/DP: Checking validity of GBP/DP for query block SEL$3 (#4) DP: Checking validity of distinct placement for query block SEL$3 (#4) DP: Bypassed: Not SELECT. GBP/DP: Checking validity of GBP/DP for query block SEL$2 (#3) DP: Checking validity of distinct placement for query block SEL$2 (#3) DP: Bypassed: Not SELECT. In 12.1 (and hopefully in 11.2.0.4 when released) the restriction on applying CBQT to some DMLs and DDLs (like CTAS) is lifted.This is documented in BugTag Note:10013899.8 Allow CBQT for some DML / DDLAnd interestingly enough, it is possible to have a one-off patch in 11.2.0.3. SQL> select DESCRIPTION,OPTIMIZER_FEATURE_ENABLE,IS_DEFAULT     2  from v$system_fix_control where BUGNO='10013899'; DESCRIPTION ---------------------------------------------------------------- OPTIMIZER_FEATURE_ENABLE  IS_DEFAULT ------------------------- ---------- enable some transformations for DDL and DML statements 11.2.0.4                           1

    Read the article

< Previous Page | 2 3 4 5 6