Search Results

Search found 1421 results on 57 pages for 'distinct'.

Page 11/57 | < Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18 | Next Page >

Sorting array of 1000 distinct integers in the range [1, 5000], accessing each element at most once

- by Cronydevil

Suppose you have an array of 1000 integers. The integers are in random order, but you know each of the integers is between 1 and 5000 (inclusive). In addition, each number appears only once in the array. Assume that you can access each element of the array only once. Describe an algorithm to sort it. How i can sorting? If you used auxiliary storage in your algorithm, can you find an algorithm that remains O(n) space complexity?

Read the article
Selecting only the entries that have a distinct combination of values?

- by Theodore E O'Neal

I have a table, links1, that has the columns headers CardID and AbilityID, that looks like this: CardID | AbilityID 1001 | 1 1001 | 2 1001 | 3 1002 | 2 1002 | 3 1002 | 4 1003 | 3 1003 | 4 1003 | 5 What I want is to be able to return all the CardID that that have two specific AbilityID. For example: If I choose 1 and 2, it returns 1001. If I choose 3 and 4, it returns 1002 and 1003. Is it possible to do this with only one table, or will I need to create an identical table and do an INNER JOIN on those?

Read the article
Fetching Partition Information

- by Mike Femenella

For a recent SSIS package at work I needed to determine the distinct values in a partition, the number of rows in each partition and the file group name on which each partition resided in order to come up with a grouping mechanism. Of course sys.partitions comes to mind for some of that but there are a few other tables you need to link to in order to grab the information required. The table I’m working on contains 8.8 billion rows. Finding the distinct partition keys from this table was not a fast operation. My original solution was to create a temporary table, grab the distinct values for the partitioned column, then update via sys.partitions for the rows and the $partition function for the partitionid and finally look back to the sys.filegroups table for the filegroup names. It wasn’t pretty, it could take up to 15 minutes to return the results. The primary issue is pulling distinct values from the table. Queries for distinct against 8.8 billion rows don’t go quickly. A few beers into a conversation with a friend and we ended up talking about work which led to a conversation about the task described above. The solution was already built in SQL Server, just needed to pull it together. The first table I needed was sys.partition_range_values. This contains one row for each range boundary value for a partition function. In my case I have a partition function which uses dayid values. For example July 4th would be represented as an int, 20130704. This table lists out all of the dayid values which were defined in the function. This eliminated the need to query my source table for distinct dayid values, everything I needed was already built in here for me. The only caveat was that in my SSIS package I needed to create a bucket for any dayid values that were out of bounds for my function. For example if my function handled 20130501 through 20130704 and I had day values of 20130401 or 20130705 in my table, these would not be listed in sys.partition_range_values. I just created an “everything else” bucket in my ssis package just in case I had any dayid values unaccounted for. To get the number of rows for a partition is very easy. The sys.partitions table contains values for each partition. Easy enough to achieve by querying for the object_id and index value of 1 (the clustered index) The final piece of information was the filegroup name. There are 2 options available to get the filegroup name, sys.data_spaces or sys.filegroups. For my query I chose sys.filegroups but really it’s a matter of preference and data needs. In order to bridge between sys.partitions table and either sys.data_spaces or sys.filegroups you need to get the container_id. This can be done by joining sys.allocation_units.container_id to the sys.partitions.hobt_id. sys.allocation_units contains the field data_space_id which then lets you join in either sys.data_spaces or sys.file_groups. The end result is the query below, which typically executes for me in under 1 second. I’ve included the join to sys.filegroups and to sys.dataspaces, and I’ve just commented out the join sys.filegroups. As I mentioned above, this shaves a good 10-15 minutes off of my original ssis package and is a really easy tweak to get a boost in my ETL time. Enjoy.

Read the article
Analyzing data from same tables in diferent db instances.

- by Oscar Reyes

Short version: How can I map two columns from table A and B if they both have a common identifier which in turn may have two values in column C Lets say: A --- 1 , 2 B --- ? , 3 C ----- 45, 2 45, 3 Using table C I know that id 2 and 3 belong to the same item ( 45 ) and thus "?" in table B should be 1. What query could do something like that? EDIT Long version ommited. It was really boring/confusing EDIT I'm posting some output here. From this query: select distinct( rolein) , activityin from taskperformance@dm_prod where activityin in ( select activityin from activities@dm_prod where activityid in ( select activityid from activities@dm_prod where activityin in ( select distinct( activityin ) from taskperformance where rolein = 0 ) ) ) I have the following parts: select distinct( activityin ) from taskperformance where rolein = 0 Output: http://question1337216.pastebin.com/f5039557 select activityin from activities@dm_prod where activityid in ( select activityid from activities@dm_prod where activityin in ( select distinct( activityin ) from taskperformance where rolein = 0 ) ) Output: http://question1337216.pastebin.com/f6cef9393 And finally: select distinct( rolein) , activityin from taskperformance@dm_prod where activityin in ( select activityin from activities@dm_prod where activityid in ( select activityid from activities@dm_prod where activityin in ( select distinct( activityin ) from taskperformance where rolein = 0 ) ) ) Output: http://question1337216.pastebin.com/f346057bd Take for instace activityin 335 from first query ( from taskperformance B) . It is present in actvities from A. But is not in taskperformace in A ( but a the related activities: 92, 208, 335, 595 ) Are present in the result. The corresponding role in is: 1

Read the article
How can i get rid of 'ORA-01489: result of string concatenation is too long' in this query?

- by core_pro

this query gets the dominating sets in a network. so for example given a network A<----->B B<----->C B<----->D C<----->E D<----->C D<----->E F<----->E it returns B,E B,F A,E but it doesn't work for large data because i'm using string methods in my result. i have been trying to remove the string methods and return a view or something but to no avail With t as (select 'A' as per1, 'B' as per2 from dual union all select 'B','C' from dual union all select 'B','D' from dual union all select 'C','B' from dual union all select 'C','E' from dual union all select 'D','C' from dual union all select 'D','E' from dual union all select 'E','C' from dual union all select 'E','D' from dual union all select 'F','E' from dual) ,t2 as (select distinct least(per1, per2) as per1, greatest(per1, per2) as per2 from t union select distinct greatest(per1, per2) as per1, least(per1, per2) as per1 from t) ,t3 as (select per1, per2, row_number() over (partition by per1 order by per2) as rn from t2) ,people as (select per, row_number() over (order by per) rn from (select distinct per1 as per from t union select distinct per2 from t) ) ,comb as (select sys_connect_by_path(per,',')||',' as p from people connect by rn > prior rn ) ,find as (select p, per2, count(*) over (partition by p) as cnt from ( select distinct comb.p, t3.per2 from comb, t3 where instr(comb.p, ','||t3.per1||',') > 0 or instr(comb.p, ','||t3.per2||',') > 0 ) ) ,rnk as (select p, rank() over (order by length(p)) as rnk from find where cnt = (select count(*) from people) order by rnk ) select distinct trim(',' from p) as p from rnk where rnk.rnk = 1`

Read the article
SQL SERVER – Difference Between DATETIME and DATETIME2

- by pinaldave

Yesterday I have written a very quick blog post on SQL SERVER – Difference Between GETDATE and SYSDATETIME and I got tremendous response for the same. I suggest you read that blog post before continuing this blog post today. I had asked people to honestly take part and share their view about above two system function. There are few emails as well few comments on the blog post asking question how did I come to know the difference between the same. The answer is real world issues. I was called in for performance tuning consultancy where I was asked very strange question by one developer. Here is the situation he was facing. System had a single table with two different column of datetime. One column was datelastmodified and second column was datefirstmodified. One of the column was DATETIME and another was DATETIME2. Developer was populating them with SYSDATETIME respectively. He was always thinking that the value inserted in the table will be the same. This table was only accessed by INSERT statement and there was no updates done over it in application.One fine day he ran distinct on both of this column and was in for surprise. He always thought that both of the table will have same data, but in fact they had very different data. He presented this scenario to me. I said this can not be possible but when looked at the resultset, I had to agree with him. Here is the simple script generated to demonstrate the problem he was facing. This is just a sample of original table. DECLARE @Intveral INT SET @Intveral = 10000 CREATE TABLE #TimeTable (FirstDate DATETIME, LastDate DATETIME2) WHILE (@Intveral > 0) BEGIN INSERT #TimeTable (FirstDate, LastDate) VALUES (SYSDATETIME(), SYSDATETIME()) SET @Intveral = @Intveral - 1 END GO SELECT COUNT(DISTINCT FirstDate) D_GETDATE, COUNT(DISTINCT LastDate) D_SYSGETDATE FROM #TimeTable GO SELECT DISTINCT a.FirstDate, b.LastDate FROM #TimeTable a INNER JOIN #TimeTable b ON a.FirstDate = b.LastDate GO SELECT * FROM #TimeTable GO DROP TABLE #TimeTable GO Let us see the resultset. You can clearly see from result that SYSDATETIME() does not populate the same value in the both of the field. In fact the value is either rounded down or rounded up in the field which is DATETIME. Event though we are populating the same value, the values are totally different in both the column resulting the SELF JOIN fail and display different DISTINCT values. The best policy is if you are using DATETIME use GETDATE() and if you are suing DATETIME2 use SYSDATETIME() to populate them with current date and time to accurately address the precision. As DATETIME2 is introduced in SQL Server 2008, above script will only work with SQL SErver 2008 and later versions. I hope I have answered few questions asked yesterday. Reference: Pinal Dave (http://www.SQLAuthority.com) Filed under: Pinal Dave, SQL, SQL Authority, SQL DateTime, SQL Optimization, SQL Performance, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
Using set operation in LINQ

- by vik20000in

There are many set operation that are required to be performed while working with any kind of data. This can be done very easily with the help of LINQ methods available for this functionality. Below are some of the examples of the set operation with LINQ. Finding distinct values in the set of data. We can use the distinct method to find out distinct values in a given list. int[] factorsOf300 = { 2, 2, 3, 5, 5 }; var uniqueFactors = factorsOf300.Distinct(); We can also use the set operation of UNION with the help of UNION method in the LINQ. The Union method takes another collection as a parameter and returns the distinct union values in both the list. Below is an example. int[] numbersA = { 0, 2, 4, 5, 6, 8, 9 }; int[] numbersB = { 1, 3, 5, 7, 8 }; var uniqueNumbers = numbersA.Union(numbersB); We can also get the set operation of INTERSECT with the help of the INTERSECT method. Below is an example. int[] numbersA = { 0, 2, 4, 5, 6, 8, 9 }; int[] numbersB = { 1, 3, 5, 7, 8 }; var commonNumbers = numbersA.Intersect(numbersB); We can also find the difference between the 2 sets of data with the help of except method. int[] numbersA = { 0, 2, 4, 5, 6, 8, 9 }; int[] numbersB = { 1, 3, 5, 7, 8 }; IEnumerable<int> aOnlyNumbers = numbersA.Except(numbersB); Vikram

Read the article
Repeat Customers Each Year (Retention)

- by spazzie

I've been working on this and I don't think I'm doing it right. |D Our database doesn't keep track of how many customers we retain so we looked for an alternate method. It's outlined in this article. It suggests you have this table to fill in: Year Number of Customers Number of customers Retained in 2009 Percent (%) Retained in 2009 Number of customers Retained in 2010 Percent (%) Retained in 2010 .... 2008 2009 2010 2011 2012 Total The table would go out to 2012 in the headers. I'm just saving space. It tells you to find the total number of customers you had in your starting year. To do this, I used this query since our starting year is 2008: select YEAR(OrderDate) as 'Year', COUNT(distinct(billemail)) as Customers from dbo.tblOrder where OrderDate >= '2008-01-01' and OrderDate <= '2008-12-31' group by YEAR(OrderDate) At the moment we just differentiate our customers by email address. Then you have to search for the same names of customers who purchased again in later years (ours are 2009, 10, 11, and 12). I came up with this. It should find people who purchased in both 2008 and 2009. SELECT YEAR(OrderDate) as 'Year',COUNT(distinct(billemail)) as Customers FROM dbo.tblOrder o with (nolock) WHERE o.BillEmail IN (SELECT DISTINCT o1.BillEmail FROM dbo.tblOrder o1 with (nolock) WHERE o1.OrderDate BETWEEN '2008-1-1' AND '2009-1-1') AND o.BillEmail IN (SELECT DISTINCT o2.BillEmail FROM dbo.tblOrder o2 with (nolock) WHERE o2.OrderDate BETWEEN '2009-1-1' AND '2010-1-1') --AND o.OrderDate BETWEEN '2008-1-1' AND '2013-1-1' AND o.BillEmail NOT LIKE '%@halloweencostumes.com' AND o.BillEmail NOT LIKE '' GROUP BY YEAR(OrderDate) So I'm just finding the customers who purchased in both those years. And then I'm doing an independent query to find those who purchased in 2008 and 2010, then 08 and 11, and then 08 and 12. This one finds 2008 and 2010 purchasers: SELECT YEAR(OrderDate) as 'Year',COUNT(distinct(billemail)) as Customers FROM dbo.tblOrder o with (nolock) WHERE o.BillEmail IN (SELECT DISTINCT o1.BillEmail FROM dbo.tblOrder o1 with (nolock) WHERE o1.OrderDate BETWEEN '2008-1-1' AND '2009-1-1') AND o.BillEmail IN (SELECT DISTINCT o2.BillEmail FROM dbo.tblOrder o2 with (nolock) WHERE o2.OrderDate BETWEEN '2010-1-1' AND '2011-1-1') --AND o.OrderDate BETWEEN '2008-1-1' AND '2013-1-1' AND o.BillEmail NOT LIKE '%@halloweencostumes.com' AND o.BillEmail NOT LIKE '' GROUP BY YEAR(OrderDate) So you see I have a different query for each year comparison. They're all unrelated. So in the end I'm just finding people who bought in 2008 and 2009, and then a potentially different group that bought in 2008 and 2010, and so on. For this to be accurate, do I have to use the same grouping of 2008 buyers each time? So they bought in 2009 and 2010 and 2011, and 2012? This is where I'm worried and not sure how to proceed or even find such data. Any advice would be appreciated! Thanks!

Read the article
The blocking nature of aggregates

- by Rob Farley

I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here. In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

Read the article
The blocking nature of aggregates

- by Rob Farley

I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here. In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

Read the article
Hash Function Added To The PredicateEqualityComparer

- by Paulo Morgado

Sometime ago I wrote a predicate equality comparer to be used with LINQ’s Distinct operator. The Distinct operator uses an instance of an internal Set class to maintain the collection of distinct elements in the source collection which in turn checks the hash code of each element (by calling the GetHashCode method of the equality comparer) and only if there’s already an element with the same hash code in the collection calls the Equals method of the comparer to disambiguate. At the time I provided only the possibility to specify the comparison predicate, but, in some cases, comparing a hash code instead of calling the provided comparer predicate can be a significant performance improvement, I’ve added the possibility to had a hash function to the predicate equality comparer. You can get the updated code from the PauloMorgado.Linq project on CodePlex,

Read the article
Oracle SQL Developer Data Modeler: What Tables Aren’t In At Least One SubView?

- by thatjeffsmith

Organizing your data model makes the information easier to consume. One of the organizational tools provided by Oracle SQL Developer Data Modeler is the ‘SubView.’ In a nutshell, a SubView is a subset of your model. The Challenge: I’ve just created a model which represents my entire ____________ application. We’ll call it ‘residential lending.’ Instead of having all 100+ tables in a single model diagram, I want to break out the tables by module, e.g. appraisals, credit reports, work histories, customers, etc. I’ve spent several hours breaking out the tables to one or more SubViews, but I think i may have missed a few. Is there an easy way to see what tables aren’t in at least ONE subview? The Answer Yes, mostly. The mostly comes about from the way I’m going to accomplish this task. It involves querying the SQL Developer Data Modeler Reporting Schema. So if you don’t have the Reporting Schema setup, you’ll need to do so. Got it? Good, let’s proceed. Before you start querying your Reporting Schema, you might need a data model for the actual reporting schema…meta-meta data! You could reverse engineer the data modeler reporting schema to a new data model, or you could just reference the PDFs in \datamodeler\reports\Reporting Schema diagrams directory. Here’s a hint, it’s THIS one The Query Well, it’s actually going to be at least 2 queries. We need to get a list of distinct designs stored in your repository. For giggles, I’m going to get a listing including each version of the model. So I can query based on design and version, or in this case, timestamp of when it was added to the repository. We’ll get that from the DMRS_DESIGNS table: SELECT DISTINCT design_name, design_ovid, date_published FROM DMRS_designs Then I’m going to feed the design_ovid, down to a subquery for my child report. select name, count(distinct diagram_id) from DMRS_DIAGRAM_ELEMENTS where design_ovid = :dESIGN_OVID and type = 'Table' group by name having count(distinct diagram_id) < 2 order by count(distinct diagram_id) desc Each diagram element has an entry in this table, so I need to filter on type=’Table.’ Each design has AT LEAST one diagram, the master diagram. So any relational table in this table, only having one listing means it’s not in any SubViews. If you have overloaded object names, which is VERY possible, you’ll want to do the report off of ‘OBJECT_ID’, but then you’ll need to correlate that to the NAME, as I doubt you’re so intimate with your designs that you recognize the GUIDs So I’m going to cheat and just stick with names, but I think you get the gist. My Model Of my almost 90 tables, how many of those have I not added to at least one SubView? Now let’s run my report! Voila! My ‘BEER2′ table isn’t in any SubView! It says ’1′ because the main model diagram counts as a view. So if the count came back as ’2′, that would mean the table was in the main model diagram and in 1 SubView diagram. And I know what you’re thinking, what kind of residential lending program would have a table called ‘BEER2?’ Let’s just say, that my business model has some kinks to work out!

Read the article
PHP & MYSQL: How can i neglect empty variables from select

- by cash-cash

hello all; if i have 4 variables and i want to select DISTINCT values form data base <?php $var1 = ""; //this variable can be blank $var2 = ""; //this variable can be blank $var3 = ""; //this variable can be blank $var4 = ""; //this variable can be blank $result = mysql_query("SELECT DISTINCT title,description FROM table WHERE **keywords ='$var1' OR author='$var2' OR date='$var3' OR forums='$var4'** "); ?> note: some or all variables ($var1,$var2,$var3,$var4) can be empty what i want: i want to neglect empty fields lets say that $var1 (keywords) is empty it will select all empty fileds, but i want if $var1 is empty the result will be like $result = mysql_query("SELECT DISTINCT title,description FROM table WHERE author='$var2' OR date='$var3' OR forums='$var4' "); if $var2 is empty the result will be like $result = mysql_query("SELECT DISTINCT title,description FROM table WHERE keywords ='$var1' OR date='$var3' OR forums='$var4' "); if $var1 and $var2 are empty the result will be like $result = mysql_query("SELECT DISTINCT title,description FROM table WHERE date='$var3' OR forums='$var4' "); and so on

Read the article
How can get unique values from data table using dql?

- by piemesons

I am having a table in which there is a column in which various values are stored.i want to retrieve unique values from that table using dql. Doctrine_Query::create() ->select('rec.school') ->from('Records rec') ->where("rec.city='$city' ") ->execute(); Now i want only unique values. Can anybody tell me how to do that... Edit Table Structure: CREATE TABLE IF NOT EXISTS `records` ( `id` int(11) NOT NULL AUTO_INCREMENT, `state` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, `city` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, `school` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=16334 ; This is the Query I am using: Doctrine_Query::create() ->select('DISTINCT rec.city') ->from('Records rec') ->where("rec.state = '$state'") // ->getSql(); ->execute(); Generting Sql for this gives me: SELECT DISTINCT r.id AS r__id, r.city AS r__city FROM records r WHERE r.state = 'AR' Now check the sql generated:::: DISTINCT is on 'id' column where as i want Distinct on city column. Anybody know how to fix this. EDIT2 Id is unique cause its an auto incremental value.Ya i have some real duplicates in city column like: Delhi and Delhi. Right.. Now when i am trying to fetch data from it, I am getting Delhi two times. How can i make query like this: select DISTINCT rec.city where state="xyz"; Cause this will give me the proper output. EDIT3: Anybody who can tell me how to figure out this query..???

Read the article
SQL Select for multiple where clause

- by Tony

Hi, I am trying to create SQL Select that returns counts of a certain field based on a field. So, here is what I am trying to do. Select count(distinct id) as TotalCount, -- this will be the total of id count(distinct id where type='A') as TotalA, -- this will be total when type='A' count(distinct id where type='B') as TotalB -- This will be total when type = 'B' from MyTable Basically, TotalCount = TotalA + TotalB. How can I achieve this in SQL Select Statement? Thanks.

Read the article
SQL aggregate query question

- by Phil

Hi, Can anyone help me with a SQL query in Apache Derby SQL to get a "simple" count. Given a table ABC that looks like this... id a b c 1 1 1 1 2 1 1 2 3 2 1 3 4 2 1 1 ** 5 2 1 2 ** ** 6 2 2 1 ** 7 3 1 2 8 3 1 3 9 3 1 1 How can I write a query to get a count of how may distinct values of 'a' have both (b=1 and c=2) AND (b=2 and c=1) to get the correct result of 1. (the two rows marked match the criteria and both have a value of a=2, there is only 1 distinct value of a in this table that match the criteria) The tricky bit is that (b=1 and c=2) AND (b=2 and c=1) are obviously mutually exclusive when applied to a single row. .. so how do I apply that expression across multiple rows of distinct values for a? These queries are wrong but to illustrate what I'm trying to do... "SELECT DISTINCT COUNT(a) WHERE b=1 AND c=2 AND b=2 AND c=1 ..." .. (0) no go as mutually exclusive "SELECT DISTINCT COUNT(a) WHERE b=1 AND c=2 OR b=2 AND c=1 ..." .. (3) gets me the wrong result. SELECT COUNT(a) (CASE WHEN b=1 AND c=10 THEN 1 END) FROM ABC WHERE b=2 AND c=1 .. (0) no go as mutually exclusive Cheers, Phil.

Read the article
When to use CTEs to encapsulate sub-results, and when to let the RDBMS worry about massive joins.

- by IanC

This is a SQL theory question. I can provide an example, but I don't think it's needed to make my point. Anyone experienced with SQL will immediately know what I'm talking about. Usually we use joins to minimize the number of records due to matching the left and right rows. However, under certain conditions, joining tables cause a multiplication of results where the result is all permutations of the left and right records. I have a database which has 3 or 4 such joins. This turns what would be a few records into a multitude. My concern is that the tables will be large in production, so the number of these joined rows will be immense. Further, heavy math is performed on each row, and the idea of performing math on duplicate rows is enough to make anyone shudder. I have two questions. The first is, is this something I should care about, or will SQL Server intelligently realize these rows are all duplicates and optimize all processing accordingly? The second is, is there any advantage to grouping each part of the query so as to get only the distinct values going into the next part of the query, using something like: WITH t1 AS ( SELECT DISTINCT... [or GROUP BY] ), t2 AS ( SELECT DISTINCT... ), t3 AS ( SELECT DISTINCT... ) SELECT... I have often seen the use of DISTINCT applied to subqueries. There is obviously a reason for doing this. However, I'm talking about something a little different and perhaps more subtle and tricky.

Read the article
Multiple contacts with shared information

- by Keith Thompson

Background: I currently have several hundred contacts, synchronized between a Microsoft Exchange server and several mobile devices. I also save exported copies of the contacts in .vcf format. Is there a good way (application, file format, whatever) to maintain contacts with shared information? A very common scenario is that I have contacts for two or more people who live in the same house, for example: John Doe 123 Main Street, Anytown USA Home: 555-555-1111 Work: 555-555-2222 Mobile: 555-555-3333 E-mail: [email protected] Jane Doe 123 Main Street, Anytown USA Home: 555-555-1111 Work: 555-555-4444 Mobile: 555-555-5555 E-mail: [email protected] As you can see, both contacts have the same home address and phone number, but distinct names and work and mobile phone numbers. (Other information might also be either shared or distinct.) The applications and file formats I'm familiar with don't seem to have a good way to deal with this. If I use a single "John & Jane Doe" contact for both, it's difficult to distinguish the distinct information (if I want to call Jane's mobile phone rather than John's). If I use a separate contact for each, I have to remember to update both of them (or all of them for N 2) when they move or change their home phone number. An ideal solution would let me create a record containing information for their household, and have each of their contact records contain a reference to the household record, so that when I view John's contact record I see both shared and distinct information. Is there anything out there that has good support this kind of thing? (I would think there would be, since it's a very common scenario.) (I suppose I could roll my own system that generates merged .vcf files from some extended format, but that wouldn't play well with synchronizing across multiple devices.)

Read the article
SQL SERVER – Difference Between DATETIME and DATETIME2 – WITH GETDATE

- by pinaldave

Earlier I wrote blog post SQL SERVER – Difference Between GETDATE and SYSDATETIME which inspired me to write SQL SERVER – Difference Between DATETIME and DATETIME2. Now earlier two blog post inspired me to write this blog post (and 4 emails and 3 reads from readers). I previously populated DATETIME and DATETIME2 field with SYSDATETIME, which gave me very different behavior as SYSDATETIME was rounded up/down for the DATETIME datatype. I just ran the same experiment but instead of populating SYSDATETIME in this script I will be using GETDATE function. DECLARE @Intveral INT SET @Intveral = 10000 CREATE TABLE #TimeTable (FirstDate DATETIME, LastDate DATETIME2) WHILE (@Intveral > 0) BEGIN INSERT #TimeTable (FirstDate, LastDate) VALUES (GETDATE(), GETDATE()) SET @Intveral = @Intveral - 1 END GO SELECT COUNT(DISTINCT FirstDate) D_FirstDate, COUNT(DISTINCT LastDate) D_LastDate FROM #TimeTable GO SELECT DISTINCT a.FirstDate, b.LastDate FROM #TimeTable a INNER JOIN #TimeTable b ON a.FirstDate = b.LastDate GO SELECT * FROM #TimeTable GO DROP TABLE #TimeTable GO Let us run above script and observe the results. You will find that the values of GETDATE which is populated in both the columns FirstDate and LastDate are very much same. This is because GETDATE is of datatype DATETIME and the precision of the GETDATE is smaller than DATETIME2 there is no rounding happening. In other word, this experiment is pointless. I have included this as I got 4 emails and 3 twitter questions on this subject. If your datatype of variable is smaller than column datatype there is no manipulation of data, if data type of variable is larger than column datatype the data is rounded. Reference: Pinal Dave (http://www.SQLAuthority.com) Filed under: Pinal Dave, SQL, SQL Authority, SQL DateTime, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
SQL SERVER – Disk Space Monitoring – Detecting Low Disk Space on Server

- by Pinal Dave

A very common question I often receive is how to detect if the disk space is running low on SQL Server. There are two different ways to do the same. I personally prefer method 2 as that is very easy to use and I can use it creatively along with database name. Method 1: EXEC MASTER..xp_fixeddrives GO Above query will return us two columns, drive name and MB free. If we want to use this data in our query, we will have to create a temporary table and insert the data from this stored procedure into the temporary table and use it. Method 2: SELECT DISTINCT dovs.logical_volume_name AS LogicalName, dovs.volume_mount_point AS Drive, CONVERT(INT,dovs.available_bytes/1048576.0) AS FreeSpaceInMB FROM sys.master_files mf CROSS APPLY sys.dm_os_volume_stats(mf.database_id, mf.FILE_ID) dovs ORDER BY FreeSpaceInMB ASC GO The above query will give us three columns: drive logical name, drive letter and free space in MB. We can further modify above query to also include database name in the query as well. SELECT DISTINCT DB_NAME(dovs.database_id) DBName, dovs.logical_volume_name AS LogicalName, dovs.volume_mount_point AS Drive, CONVERT(INT,dovs.available_bytes/1048576.0) AS FreeSpaceInMB FROM sys.master_files mf CROSS APPLY sys.dm_os_volume_stats(mf.database_id, mf.FILE_ID) dovs ORDER BY FreeSpaceInMB ASC GO This will give us additional data about which database is placed on which drive. If you see a database name multiple times, it is because your database has multiple files and they are on different drives. You can modify above query one more time to even include the details of actual file location. SELECT DISTINCT DB_NAME(dovs.database_id) DBName, mf.physical_name PhysicalFileLocation, dovs.logical_volume_name AS LogicalName, dovs.volume_mount_point AS Drive, CONVERT(INT,dovs.available_bytes/1048576.0) AS FreeSpaceInMB FROM sys.master_files mf CROSS APPLY sys.dm_os_volume_stats(mf.database_id, mf.FILE_ID) dovs ORDER BY FreeSpaceInMB ASC GO The above query will now additionally include the physical file location as well. As I mentioned earlier, I prefer method 2 as I can creatively use it as per the business need. Let me know which method are you using in your production server. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Not your usual "The multi-part identifier could not be bound" error

- by Eugene Niemand

I have the following query, now the strange thing is if I run this query on my development and pre-prod server it runs fine. If I run it on production it fails. I have figured out that if I run just the Select statement its happy but as soon as I try insert into the table variable it complains. DECLARE @RESULTS TABLE ( [Parent] VARCHAR(255) ,[client] VARCHAR(255) ,[ComponentName] VARCHAR(255) ,[DealName] VARCHAR(255) ,[Purchase Date] DATETIME ,[Start Date] DATETIME ,[End Date] DATETIME ,[Value] INT ,[Currency] VARCHAR(255) ,[Brand] VARCHAR(255) ,[Business Unit] VARCHAR(255) ,[Region] VARCHAR(255) ,[DealID] INT ) INSERT INTO @RESULTS SELECT DISTINCT ClientName 'Parent' ,F.ClientID 'client' ,ComponentName ,A.DealName ,CONVERT(SMALLDATETIME , ISNULL(PurchaseDate , '1900-01-01')) 'Purchase Date' ,CONVERT(SMALLDATETIME , ISNULL(StartDate , '1900-01-01')) 'Start Date' ,CONVERT(SMALLDATETIME , ISNULL(EndDate , '1900-01-01')) 'End Date' ,DealValue 'Value' ,D.Currency 'Currency' ,ShortBrand 'Brand' ,G.BU 'Business Unit' ,C.DMRegion 'Region' ,DealID FROM LTCDB_admin_tbl_Deals A INNER JOIN dbo_DM_Brand B ON A.BrandID = B.ID INNER JOIN LTCDB_admin_tbl_DM_Region C ON A.Region = C.ID INNER JOIN LTCDB_admin_tbl_Currency D ON A.Currency = D.ID INNER JOIN LTCDB_admin_tbl_Deal_Clients E ON A.DealID = E.Deal_ID INNER JOIN LTCDB_admin_tbl_Clients F ON E.Client_ID = F.ClientID INNER JOIN LTCDB_admin_tbl_DM_BU G ON G.ID = A.BU INNER JOIN LTCDB_admin_tbl_Deal_Components H ON A.DealID = H.Deal_ID INNER JOIN LTCDB_admin_tbl_Components I ON I.ComponentID = H.Component_ID WHERE EndDate != '1899-12-30T00:00:00.000' AND StartDate < EndDate AND B.ID IN ( 1 , 2 , 5 , 6 , 7 , 8 , 10 , 12 ) AND C.SalesRegionID IN ( 1 , 3 , 4 , 11 , 16 ) AND A.BU IN ( 1 , 2 , 3 , 4 , 5 , 6 , 8 , 9 , 11 , 12 , 15 , 16 , 19 , 20 , 22 , 23 , 24 , 26 , 28 , 30 ) AND ClientID = 16128 SELECT ... FROM @Results I get the following error Msg 8180, Level 16, State 1, Line 1 Statement(s) could not be prepared. Msg 4104, Level 16, State 1, Line 1 The multi-part identifier "Tbl1021.ComponentName" could not be bound. Msg 4104, Level 16, State 1, Line 1 The multi-part identifier "Tbl1011.Currency" could not be bound. Msg 207, Level 16, State 1, Line 1 Invalid column name 'Col2454'. Msg 207, Level 16, State 1, Line 1 Invalid column name 'Col2461'. Msg 207, Level 16, State 1, Line 1 Invalid column name 'Col2491'. Msg 207, Level 16, State 1, Line 1 Invalid column name 'Col2490'. Msg 207, Level 16, State 1, Line 1 Invalid column name 'Col2482'. Msg 207, Level 16, State 1, Line 1 Invalid column name 'Col2478'. Msg 207, Level 16, State 1, Line 1 Invalid column name 'Col2477'. Msg 207, Level 16, State 1, Line 1 Invalid column name 'Col2475'. EDIT - EDIT - EDIT - EDIT - EDIT - EDIT through a process of elimination I have found that following and wondered if anyone can shed some light on this. If I remove only the DISTINCT the query runs fine, add the DISTINCT and I get strange errors. Also I have found that if I comment the following lines then the query runs with the DISTINCT what is strange is that none of the columns in the table LTCDB_admin_tbl_Deal_Components is referenced so I don't see how the distinct will affect that. INNER JOIN LTCDB_admin_tbl_Deal_Components H ON A.DealID = H.Deal_ID

Read the article
whats wrong in this LINQ synatx?

- by Saurabh Kumar

Hi, I am trying to convert a SQL query to LINQ. Somehow my count(distinct(x)) logic does not seem to be working correctly. The original SQL is quite efficient(or so i think), but the generated SQL is not even returning the correct result. I am trying to fix this LINQ to do what the original SQL is doing, AND in an efficient way as the original query is doing. Help here would be really apreciated as I am stuck here :( SQL which is working and I need to make a comparable LINQ of: SELECT [t1].[PersonID] AS [personid] FROM [dbo].[Code] AS [t0] INNER JOIN [dbo].[phonenumbers] AS [t1] ON [t1].[PhoneCode] = [t0].[Code] INNER JOIN [dbo].[person] ON [t1].[PersonID]= [dbo].[Person].PersonID WHERE ([t0].[codetype] = 'phone') AND ( ([t0].[CodeDescription] = 'Home') AND ([t1].[PhoneNum] = '111') OR ([t0].[CodeDescription] = 'Work') AND ([t1].[PhoneNum] = '222') ) GROUP BY [t1].[PersonID] HAVING COUNT(DISTINCT([t1].[PhoneNum]))=2 The LINQ which I made is approximately as below: var ids = context.Code.Where(predicate); var rs = from r in ids group r by new { r.phonenumbers.person.PersonID} into g let matchcount=g.Select(p => p.phonenumbers.PhoneNum).Distinct().Count() where matchcount ==2 select new { personid = g.Key }; Unfortunately, the above LINQ is NOT generating the correct result, and is actually internally getting generated to the SQL shown below. By the way, this generated query is also reading ALL the rows(about 19592040) around 2 times due to the COUNTS :( Wich is a big performance issue too. Please help/point me to the right direction. Declare @p0 VarChar(10)='phone' Declare @p1 VarChar(10)='Home' Declare @p2 VarChar(10)='111' Declare @p3 VarChar(10)='Work' Declare @p4 VarChar(10)='222' Declare @p5 VarChar(10)='2' SELECT [t9].[PersonID], ( SELECT COUNT(*) FROM ( SELECT DISTINCT [t13].[PhoneNum] FROM [dbo].[Code] AS [t10] INNER JOIN [dbo].[phonenumbers] AS [t11] ON [t11].[PhoneType] = [t10].[Code] INNER JOIN [dbo].[Person] AS [t12] ON [t12].[PersonID] = [t11].[PersonID] INNER JOIN [dbo].[phonenumbers] AS [t13] ON [t13].[PhoneType] = [t10].[Code] WHERE ([t9].[PersonID] = [t12].[PersonID]) AND ([t10].[codetype] = @p0) AND ((([t10].[codetype] = @p1) AND ([t11].[PhoneNum] = @p2)) OR (([t10].[codetype] = @p3) AND ([t11].[PhoneNum] = @p4))) ) AS [t14] ) AS [cnt] FROM ( SELECT [t3].[PersonID], ( SELECT COUNT(*) FROM ( SELECT DISTINCT [t7].[PhoneNum] FROM [dbo].[Code] AS [t4] INNER JOIN [dbo].[phonenumbers] AS [t5] ON [t5].[PhoneType] = [t4].[Code] INNER JOIN [dbo].[Person] AS [t6] ON [t6].[PersonID] = [t5].[PersonID] INNER JOIN [dbo].[phonenumbers] AS [t7] ON [t7].[PhoneType] = [t4].[Code] WHERE ([t3].[PersonID] = [t6].[PersonID]) AND ([t4].[codetype] = @p0) AND ((([t4].[codetype] = @p1) AND ([t5].[PhoneNum] = @p2)) OR (([t4].[codetype] = @p3) AND ([t5].[PhoneNum] = @p4))) ) AS [t8] ) AS [value] FROM ( SELECT [t2].[PersonID] FROM [dbo].[Code] AS [t0] INNER JOIN [dbo].[phonenumbers] AS [t1] ON [t1].[PhoneType] = [t0].[Code] INNER JOIN [dbo].[Person] AS [t2] ON [t2].[PersonID] = [t1].[PersonID] WHERE ([t0].[codetype] = @p0) AND ((([t0].[codetype] = @p1) AND ([t1].[PhoneNum] = @p2)) OR (([t0].[codetype] = @p3) AND ([t1].[PhoneNum] = @p4))) GROUP BY [t2].[PersonID] ) AS [t3] ) AS [t9] WHERE [t9].[value] = @p5 Thanks!

Read the article
Entity Framework query not returning correctly enumerated results.

- by SkippyFire

I have this really strange problem where my entity framework query isn't enumerating correctly. The SQL Server table I'm using has a table with a Sku field, and the column is "distinct". It isn't a key, but it doesn't contain any duplicate values. Using actual SQL with where, distinct and group by cluases I have confirmed this. However, when I do this: // Not good foreach(var product in dc.Products) or // Not good foreach(var product in dc.Products.ToList()) or // Not good foreach(var product in dc.Products.OrderBy(p => p.Sku)) the first two objects that are returned ARE THE SAME!!! The third item was technically the second item in the table, but then the fourth item was the first row from the table again!!! The only solution I have found is to use the Distinct extension method, which shouldn't really do anything in this situation: // Good foreach(var product in dc.Products.ToList().Distinct()) Another weird thing about this is that the count of the resulting queries is the same!!! So whether or not the resulting enumerable has the correct results or duplicates, I always get the number of rows in the actual table! (No I don't have a limit clause anywhere). What could possibly cause this!?!?!?

Read the article
mySQL need to merge fields and get unique rows

- by jiudev

i have a database with +1 million rows and the stuktur looks like: CREATE TABLE IF NOT EXISTS `Performance` ( `id` int(11) NOT NULL AUTO_INCREMENT, `CIDs` varchar(100) DEFAULT NULL, `COLOR` varchar(100) DEFAULT NULL, `Name` varchar(255) DEFAULT NULL, `XT` bigint(16) DEFAULT NULL, `MP` varchar(100) DEFAULT NULL, PRIMARY KEY (`id`), KEY `CIDs` (`CIDs`), KEY `COLOR` (`COLOR`), KEY `Name` (`Name`), KEY `XT` (`XT`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=0 ; insert into `Performance` (`id`, `CIDs`, `COLOR`, `Name`, `XT`, `MP`) VALUES (1, '1253374160', 'test test test test test', 'Load1', '89421331221', ''), (2, '1271672029', NULL, 'Load1', '19421331221', NULL), (3, '1188959688', NULL, 'Load2', '39421331221', NULL), (4, '1271672029', NULL, 'Load3', '49421341221', 'Description'), (5, '1271888888', NULL, 'Load4', '59421331221', 'Description'); The Output should look like: +----+------------+--------------------------+-------------+-------------+-------+-----------+---------+ | id | CIDs | COLOR | XT | MP | Name | PIDs | unqName | +----+------------+--------------------------+-------------+-------------+-------+-----------+---------+ | 1 | 1253374160 | test test test test test | 89421331221 | | Load1 | 1,2 | Load1 | | 3 | 1188959688 | NULL | 39421331221 | NULL | Load2 | 3 | Load2 | | 4 | 1271672029 | NULL | 49421341221 | Description | Load3 | 4,5 | Load3 | +----+------------+--------------------------+-------------+-------------+-------+-----------+---------+ any ideas, how i could do this as fast as possible? I have tried with some group by, but it takes some Minutes :/ Thanks Advance //edit: for the solution with the group by, i needed 4 subquerys :/ //edit2: as requested: select id, CIDs, COLOR, XT, MP, Name, concat(PIDs,",",GROUP_CONCAT(DISTINCT id)) as PIDs, IFNULL(Name,id) as unqName from ( select id, CIDs, COLOR, XT, MP, Name, concat(PIDs,",",GROUP_CONCAT(DISTINCT id)) as PIDs, IFNULL(MP,id) as unqMP from ( select id, CIDs, COLOR, XT, MP, Name, concat(PIDs,",",GROUP_CONCAT(DISTINCT id)) as PIDs, IFNULL(XT,id) as unqXT from ( select id, CIDs, COLOR, XT, MP, Name, GROUP_CONCAT(DISTINCT id) as PIDs, IFNULL(COLOR,id) as unqCOLOR from Performance group by unqCOLOR ) m group by unqXT ) x group by unqMP ) y group by unqName

Read the article
Null Reference Exception In LINQ DataContext

- by Frank

I have a Null Reference Exception Caused by this code: var recentOrderers = (from p in db.CMS where p.ODR_DATE > DateTime.Today - new TimeSpan(60, 0, 0, 0) select p.SOLDNUM).Distinct(); result = (from p in db.CMS where p.ORDER_ST2 == "SH" && p.ODR_DATE > DateTime.Today - new TimeSpan(365, 0, 0, 0) && p.ODR_DATE < DateTime.Today - new TimeSpan(60, 0, 0, 0) && !(recentOrderers.Contains(p.SOLDNUM))/**/ select p.SOLDNUM).Distinct().Count(); result is of double type. When I comment out: !(recentOrderers.Contains(p.SOLDNUM)) The code runs fine. I have verified that recentOrderers is not null, and when I run: if(recentOrderes.Contains(0)) return; Execution follows this path and returns. Not sure what is going on, since I use similar code above it: var m = (from p in db.CMS where p.ORDER_ST2 == "SH" select p.SOLDNUM).Distinct(); double result = (from p in db.CUST join r in db.DEMGRAPH on p.CUSTNUM equals r.CUSTNUM where p.CTYPE3 == "cmh" && !(m.Contains(p.CUSTNUM)) && r.ColNEWMEMBERDAT.Value.Year > 1900 select p.CUSTNUM).Distinct().Count(); which also runs flawlessly. After noting the similarity, can anyone help? Thanks in advance. -Frank GTP, Inc.

Read the article

Search Results

Search found 1421 results on 57 pages for 'distinct'.

Page 11/57 | < Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18 | Next Page >

- by Cronydevil

- by Theodore E O'Neal

- by Mike Femenella

- by Oscar Reyes

- by core_pro

- by pinaldave

- by vik20000in

- by spazzie

- by Rob Farley

- by Rob Farley

- by Paulo Morgado

- by thatjeffsmith

- by cash-cash

- by piemesons

- by Tony

- by Phil

- by IanC

- by Keith Thompson

- by pinaldave

- by Pinal Dave

- by Eugene Niemand

- by Saurabh Kumar

- by SkippyFire

- by jiudev

- by Frank

< Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18 | Next Page >