Search Results

Search found 763 results on 31 pages for 'union'.

Page 31/31 | < Previous Page | 27 28 29 30 31

Access Qry Questions

- by kralco626

It was suggested that I repost this questions as I didn't do a very good job discribing my issue the first time. (http://stackoverflow.com/questions/2921286/access-question) THE SITUATION: I have inspections from many months of many years. Sometimes there is more than one inspection in a month, sometimes there is no inspection. However, the report that is desired by the clients requires that I have EXACTLY ONE record per month for the time frame they request the report. They understand the data issues and have stated that if there is more than one inspection in a month to take the latest one. If the is not an inspection for that month, go back in time untill you find one and use that one. So a sample of the data is as follows: (I am including many records because I was told I did not include enough data on my last try) equip_id month year runtime date 1 5 2008 400 5/10/2008 12:34 PM 1 7 2008 500 7/12/2008 1:45 PM 1 8 2008 600 8/20/2008 1:12 PM 1 8 2008 605 8/30/2008 8:00 AM 1 1 2010 2000 1/12/2010 2:00 PM 1 3 2010 2200 3/24/2010 10:00 AM 2 7 2009 1000 7/20/2009 8:00 AM 2 10 2009 1400 10/14/2009 9:00 AM 2 1 2010 1600 1/15/2010 1:00 PM 2 1 2010 1610 1/30/2010 4:00 PM 2 3 2010 1800 3/15/2010 1:00PM After all the transformations to the data are done, it should look like this: equip_id month year runtime date 1 5 2008 400 5/10/2008 12:34 PM 1 6 2008 400 5/10/2008 12:34 PM 1 7 2008 500 7/12/2008 1:45 PM 1 8 2008 605 8/30/2008 8:00 AM 1 9 2008 605 8/30/2008 8:00 AM 1 10 2008 605 8/30/2008 8:00 AM 1 11 2008 605 8/30/2008 8:00 AM 1 12 2008 605 8/30/2008 8:00 AM 1 1 2009 605 8/30/2008 8:00 AM 1 2 2009 605 8/30/2008 8:00 AM 1 3 2009 605 8/30/2008 8:00 AM 1 4 2009 605 8/30/2008 8:00 AM 1 5 2009 605 8/30/2008 8:00 AM 1 6 2009 605 8/30/2008 8:00 AM 1 7 2009 605 8/30/2008 8:00 AM 1 8 2009 605 8/30/2008 8:00 AM 1 9 2009 605 8/30/2008 8:00 AM 1 10 2009 605 8/30/2008 8:00 AM 1 11 2009 605 8/30/2008 8:00 AM 1 12 2009 605 8/30/2008 8:00 AM 1 1 2010 2000 1/12/2010 2:00 PM 1 2 2010 2000 1/12/2010 2:00 PM 1 3 2010 2200 3/24/2010 10:00 AM 2 7 2009 1000 7/20/2009 8:00 AM 2 8 2009 1000 7/20/2009 8:00 AM 2 9 2009 1000 7/20/2009 8:00 AM 2 10 2009 1400 10/14/2009 9:00 AM 2 11 2009 1400 10/14/2009 9:00 AM 2 12 2009 1400 10/14/2009 9:00 AM 2 1 2010 1610 1/30/2010 4:00 PM 2 2 2010 1610 1/30/2010 4:00 PM 2 3 2010 1800 3/15/2010 1:00PM I think that this is the most accurate dipiction of the problem that I can give. I will now say what I have tried. Although if someone else has a better approach, I am perfectly willing to throw away what I have done and do it differently... STEP 1: create a query that removes the duplicates from the data. Ie. only one record per equip_id for each month/year, keeping the latest one. (done successfully) STEP 2: create a table of the date ranges the client wants the report for. (This is done dynamically at runtime) This table two field, Month and Year. So if the client wants a report from FEb 2008 to March 2010 the table would look like: Month Year 2 2008 3 2008 . . . 12 2008 1 2009 . . . 12 2009 1 2010 2 2010 3 2010 I then left joined this table with my query from step 1. So now I have a record for every month and every year that they want the report for, with nulls(or blanks) or sometimes 0s (not sure why, access is weird, but sometiems they are nulls and sumtimes they are 0s...) for the runtimes that are not avaiable. I don't particurally like this solution, but ill do it if i have to. (this is also done successfully) STEP 3: Fill in the missing runtime values. This I HAVE NOT done successfully. Note that if the request range for the report is feb 2008 to march 2010 and the oldest record for a particular equip_id is say june 2008, it is O.K. for the runtimes to be null (or zeros) for feb - may 2008. I am working with the following query for this step: SELECT equip_id as e_id,year,month, (select top 1 runhours from qry_1_c_One_Record_per_Month a where a.equip_id = e_id order by year,month) FROM qry_1_c_One_Record_per_Month where runhours is null or runhours = 0; UNION SELECT equip_id, year, month, runhours FROM qry_1_c_One_Record_per_Month WHERE .runhours Is Not Null And runhours <> 0 However I clearly can't check the a.equip_id = e_id ... so i don't have anyway to make sure i'm looking at the correct equip_id SUMMARY: So like i said i'm willing to throw away any part, or all of what I tried. Just trying to give everyone a complete picture. I REALLY apreciate ANY help! Thanks so much in advance!

Read the article
Does this language feature already exist?

- by Pindatjuh

I'm currently developing a new language for programming in a continuous environment (compare it to electrical engineering), and I've got some ideas on a certain language construction. Let me explain the feature by explanation and then by definition: x = a U b; Where x is a variable and a and b are other variables (or static values). This works like a union between a and b; no duplicates and no specific order. with(x) { // regular 'with' usage; using the global interpretation of "x" x = 5; // will replace the original definition of "x = a U b;" } with(x = a) { // this code block is executed when the "x" variable // has the "a" variable assigned. All references in // this code-block to "x" are references to "a". So saying: x = 5; // would only change the variable "a". If the variable "a" // later on changes, x still equals to 5, in this fashion: // 'x = a U b U 5;' // '[currentscope] = 5;' // thus, 'a = 5;' } with(x = b) { // same but with "b" } with(x != a) { // here the "x" variable refers to any variable // but "a"; thus saying x = 5; // is equal to the rewriting of // 'x = a U b U 5;' // 'b = 5;' (since it was the scope of this block) } with(x = (a U b)) { // guaranteed that "x" is 'a U b'; interacting with "x" // will interact with both "a" and "b". x = 5; // makes both "a" and "b" equal to 5; also the "x" variable // is updated to contain: // 'x = a U b U 5;' // '[currentscope] = 5;' // 'a U b = 5;' // and thus: 'a = 5; b = 5;'. } // etc. In the above, all code-blocks are executed, but the "scope" changes in each block how x is interpreted. In the first block, x is guaranteed to be a: thus interacting with x inside that block will interact on a. The second and the third code-block are only equal in this situation (because not a: then there only remains b). The last block guarantees that x is at least a or b. Further more; U is not the "bitwise or operator", but I've called it the "and/or"-operator. Its definition is: "U" = "and" U "or" (On my blog, http://cplang.wordpress.com/2009/12/19/binop-and-or/, there is more (mathematical) background information on this operator. It's loosely based on sets. Using different syntax, changed it in this question.) Update: more examples. print = "Hello world!" U "How are you?"; // this will print // both values, but the // order doesn't matter. // 'userkey' is a variable containing a key. with(userkey = "a") { print = userkey; // will only print "a". } with(userkey = ("shift" U "a")) { // pressed both "shift" and the "a" key. print = userkey; // will "print" shift and "a", even // if the user also pressed "ctrl": // the interpretation of "userkey" is changed, // such that it only contains the matched cases. } with((userkey = "shift") U (userkey = "a")) { // same as if-statement above this one, showing the distributivity. } x = 5 U 6 U 7; y = x + x; // will be: // y = (5 U 6 U 7) + (5 U 6 U 7) // = 10 U 11 U 12 U 13 U 14 somewantedkey = "ctrl" U "alt" U "space" with(userkey = somewantedkey) { // must match all elements of "somewantedkey" // (distributed the Boolean equals operated) // thus only executed when all the defined keys are pressed } with(somewantedkey = userkey) { // matches only one of the provided "somewantedkey" // thus when only "space" is pressed, this block is executed. } Update2: more examples and some more context. with(x = (a U b)) { // this } // can be written as with((x = a) U (x = b)) { // this: changing the variable like x = 5; // will be rewritten as: // a = 5 and b = 5 } Some background information: I'm building a language which is "time-independent", like Java is "platform-independant". Everything stated in the language is "as is", and is continuously actively executed. This means; the programmer does not know in which order (unless explicitly stated using constructions) elements are, nor when statements are executed. The language is completely separated from the "time"-concept, i.e. it's continuously executed: with(a < 5) { a++; } // this is a loop-structure; // how and when it's executed isn't known however. with(a) { // everytime the "a" variable changes, this code-block is executed. b = 4; with(b < 3) { // runs only three times. } with(b > 0) { b = b - 1; // runs four times } } Update 3: After pondering on the type of this language feature; it closely resemblances Netbeans Platform's Lookup, where each "with"-statement a synchronized agent is, working on it's specific "filter" of objects. Instead of type-based, this is variable-based (fundamentally quite the same; just a different way of identifiying objects). I greatly thank all of you for providing me with very insightful information and links/hints to great topics I can research. Thanks. I do not know if this construction already exists, so that's my question: does this language feature already exist?

Read the article
Improving Partitioned Table Join Performance

- by Paul White

The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) ); CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY); WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50); -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE); -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory: Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); CREATE TABLE #P ( partition_number integer PRIMARY KEY); INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1; SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; DROP TABLE #P; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

Read the article
Plan Caching and Query Memory Part I – When not to use stored procedure or other plan caching mechanisms like sp_executesql or prepared statement

- by sqlworkshops

The most common performance mistake SQL Server developers make: SQL Server estimates memory requirement for queries at compilation time. This mechanism is fine for dynamic queries that need memory, but not for queries that cache the plan. With dynamic queries the plan is not reused for different set of parameters values / predicates and hence different amount of memory can be estimated based on different set of parameter values / predicates. Common memory allocating queries are that perform Sort and do Hash Match operations like Hash Join or Hash Aggregation or Hash Union. This article covers Sort with examples. It is recommended to read Plan Caching and Query Memory Part II after this article which covers Hash Match operations. When the plan is cached by using stored procedure or other plan caching mechanisms like sp_executesql or prepared statement, SQL Server estimates memory requirement based on first set of execution parameters. Later when the same stored procedure is called with different set of parameter values, the same amount of memory is used to execute the stored procedure. This might lead to underestimation / overestimation of memory on plan reuse, overestimation of memory might not be a noticeable issue for Sort operations, but underestimation of memory will lead to spill over tempdb resulting in poor performance. This article covers underestimation / overestimation of memory for Sort. Plan Caching and Query Memory Part II covers underestimation / overestimation for Hash Match operation. It is important to note that underestimation of memory for Sort and Hash Match operations lead to spill over tempdb and hence negatively impact performance. Overestimation of memory affects the memory needs of other concurrently executing queries. In addition, it is important to note, with Hash Match operations, overestimation of memory can actually lead to poor performance. To read additional articles I wrote click here. In most cases it is cheaper to pay for the compilation cost of dynamic queries than huge cost for spill over tempdb, unless memory requirement for a stored procedure does not change significantly based on predicates. The best way to learn is to practice. To create the below tables and reproduce the behavior, join the mailing list by using this link: www.sqlworkshops.com/ml and I will send you the table creation script. Most of these concepts are also covered in our webcasts: www.sqlworkshops.com/webcasts Enough theory, let’s see an example where we sort initially 1 month of data and then use the stored procedure to sort 6 months of data. Let’s create a stored procedure that sorts customers by name within certain date range. --Example provided by www.sqlworkshops.com create proc CustomersByCreationDate @CreationDateFrom datetime, @CreationDateTo datetime as begin declare @CustomerID int, @CustomerName varchar(48), @CreationDate datetime select @CustomerName = c.CustomerName, @CreationDate = c.CreationDate from Customers c where c.CreationDate between @CreationDateFrom and @CreationDateTo order by c.CustomerName option (maxdop 1) end go Let’s execute the stored procedure initially with 1 month date range. set statistics time on go --Example provided by www.sqlworkshops.com exec CustomersByCreationDate '2001-01-01', '2001-01-31' go The stored procedure took 48 ms to complete. The stored procedure was granted 6656 KB based on 43199.9 rows being estimated. The estimated number of rows, 43199.9 is similar to actual number of rows 43200 and hence the memory estimation should be ok. There was no Sort Warnings in SQL Profiler. Now let’s execute the stored procedure with 6 month date range. --Example provided by www.sqlworkshops.com exec CustomersByCreationDate '2001-01-01', '2001-06-30' go The stored procedure took 679 ms to complete. The stored procedure was granted 6656 KB based on 43199.9 rows being estimated. The estimated number of rows, 43199.9 is way different from the actual number of rows 259200 because the estimation is based on the first set of parameter value supplied to the stored procedure which is 1 month in our case. This underestimation will lead to sort spill over tempdb, resulting in poor performance. There was Sort Warnings in SQL Profiler. To monitor the amount of data written and read from tempdb, one can execute select num_of_bytes_written, num_of_bytes_read from sys.dm_io_virtual_file_stats(2, NULL) before and after the stored procedure execution, for additional information refer to the webcast: www.sqlworkshops.com/webcasts. Let’s recompile the stored procedure and then let’s first execute the stored procedure with 6 month date range. In a production instance it is not advisable to use sp_recompile instead one should use DBCC FREEPROCCACHE (plan_handle). This is due to locking issues involved with sp_recompile, refer to our webcasts for further details. exec sp_recompile CustomersByCreationDate go --Example provided by www.sqlworkshops.com exec CustomersByCreationDate '2001-01-01', '2001-06-30' go Now the stored procedure took only 294 ms instead of 679 ms. The stored procedure was granted 26832 KB of memory. The estimated number of rows, 259200 is similar to actual number of rows of 259200. Better performance of this stored procedure is due to better estimation of memory and avoiding sort spill over tempdb. There was no Sort Warnings in SQL Profiler. Now let’s execute the stored procedure with 1 month date range. --Example provided by www.sqlworkshops.com exec CustomersByCreationDate '2001-01-01', '2001-01-31' go The stored procedure took 49 ms to complete, similar to our very first stored procedure execution. This stored procedure was granted more memory (26832 KB) than necessary memory (6656 KB) based on 6 months of data estimation (259200 rows) instead of 1 month of data estimation (43199.9 rows). This is because the estimation is based on the first set of parameter value supplied to the stored procedure which is 6 months in this case. This overestimation did not affect performance, but it might affect performance of other concurrent queries requiring memory and hence overestimation is not recommended. This overestimation might affect performance Hash Match operations, refer to article Plan Caching and Query Memory Part II for further details. Let’s recompile the stored procedure and then let’s first execute the stored procedure with 2 day date range. exec sp_recompile CustomersByCreationDate go --Example provided by www.sqlworkshops.com exec CustomersByCreationDate '2001-01-01', '2001-01-02' go The stored procedure took 1 ms. The stored procedure was granted 1024 KB based on 1440 rows being estimated. There was no Sort Warnings in SQL Profiler. Now let’s execute the stored procedure with 6 month date range. --Example provided by www.sqlworkshops.com exec CustomersByCreationDate '2001-01-01', '2001-06-30' go The stored procedure took 955 ms to complete, way higher than 679 ms or 294ms we noticed before. The stored procedure was granted 1024 KB based on 1440 rows being estimated. But we noticed in the past this stored procedure with 6 month date range needed 26832 KB of memory to execute optimally without spill over tempdb. This is clear underestimation of memory and the reason for the very poor performance. There was Sort Warnings in SQL Profiler. Unlike before this was a Multiple pass sort instead of Single pass sort. This occurs when granted memory is too low. Intermediate Summary: This issue can be avoided by not caching the plan for memory allocating queries. Other possibility is to use recompile hint or optimize for hint to allocate memory for predefined date range. Let’s recreate the stored procedure with recompile hint. --Example provided by www.sqlworkshops.com drop proc CustomersByCreationDate go create proc CustomersByCreationDate @CreationDateFrom datetime, @CreationDateTo datetime as begin declare @CustomerID int, @CustomerName varchar(48), @CreationDate datetime select @CustomerName = c.CustomerName, @CreationDate = c.CreationDate from Customers c where c.CreationDate between @CreationDateFrom and @CreationDateTo order by c.CustomerName option (maxdop 1, recompile) end go Let’s execute the stored procedure initially with 1 month date range and then with 6 month date range. --Example provided by www.sqlworkshops.com exec CustomersByCreationDate '2001-01-01', '2001-01-30' exec CustomersByCreationDate '2001-01-01', '2001-06-30' go The stored procedure took 48ms and 291 ms in line with previous optimal execution times. The stored procedure with 1 month date range has good estimation like before. The stored procedure with 6 month date range also has good estimation and memory grant like before because the query was recompiled with current set of parameter values. The compilation time and compilation CPU of 1 ms is not expensive in this case compared to the performance benefit. Let’s recreate the stored procedure with optimize for hint of 6 month date range. --Example provided by www.sqlworkshops.com drop proc CustomersByCreationDate go create proc CustomersByCreationDate @CreationDateFrom datetime, @CreationDateTo datetime as begin declare @CustomerID int, @CustomerName varchar(48), @CreationDate datetime select @CustomerName = c.CustomerName, @CreationDate = c.CreationDate from Customers c where c.CreationDate between @CreationDateFrom and @CreationDateTo order by c.CustomerName option (maxdop 1, optimize for (@CreationDateFrom = '2001-01-01', @CreationDateTo ='2001-06-30')) end go Let’s execute the stored procedure initially with 1 month date range and then with 6 month date range. --Example provided by www.sqlworkshops.com exec CustomersByCreationDate '2001-01-01', '2001-01-30' exec CustomersByCreationDate '2001-01-01', '2001-06-30' go The stored procedure took 48ms and 291 ms in line with previous optimal execution times. The stored procedure with 1 month date range has overestimation of rows and memory. This is because we provided hint to optimize for 6 months of data. The stored procedure with 6 month date range has good estimation and memory grant because we provided hint to optimize for 6 months of data. Let’s execute the stored procedure with 12 month date range using the currently cashed plan for 6 month date range. --Example provided by www.sqlworkshops.com exec CustomersByCreationDate '2001-01-01', '2001-12-31' go The stored procedure took 1138 ms to complete. 2592000 rows were estimated based on optimize for hint value for 6 month date range. Actual number of rows is 524160 due to 12 month date range. The stored procedure was granted enough memory to sort 6 month date range and not 12 month date range, so there will be spill over tempdb. There was Sort Warnings in SQL Profiler. As we see above, optimize for hint cannot guarantee enough memory and optimal performance compared to recompile hint. This article covers underestimation / overestimation of memory for Sort. Plan Caching and Query Memory Part II covers underestimation / overestimation for Hash Match operation. It is important to note that underestimation of memory for Sort and Hash Match operations lead to spill over tempdb and hence negatively impact performance. Overestimation of memory affects the memory needs of other concurrently executing queries. In addition, it is important to note, with Hash Match operations, overestimation of memory can actually lead to poor performance. Summary: Cached plan might lead to underestimation or overestimation of memory because the memory is estimated based on first set of execution parameters. It is recommended not to cache the plan if the amount of memory required to execute the stored procedure has a wide range of possibilities. One can mitigate this by using recompile hint, but that will lead to compilation overhead. However, in most cases it might be ok to pay for compilation rather than spilling sort over tempdb which could be very expensive compared to compilation cost. The other possibility is to use optimize for hint, but in case one sorts more data than hinted by optimize for hint, this will still lead to spill. On the other side there is also the possibility of overestimation leading to unnecessary memory issues for other concurrently executing queries. In case of Hash Match operations, this overestimation of memory might lead to poor performance. When the values used in optimize for hint are archived from the database, the estimation will be wrong leading to worst performance, so one has to exercise caution before using optimize for hint, recompile hint is better in this case. I explain these concepts with detailed examples in my webcasts (www.sqlworkshops.com/webcasts), I recommend you to watch them. The best way to learn is to practice. To create the above tables and reproduce the behavior, join the mailing list at www.sqlworkshops.com/ml and I will send you the relevant SQL Scripts. Register for the upcoming 3 Day Level 400 Microsoft SQL Server 2008 and SQL Server 2005 Performance Monitoring & Tuning Hands-on Workshop in London, United Kingdom during March 15-17, 2011, click here to register / Microsoft UK TechNet.These are hands-on workshops with a maximum of 12 participants and not lectures. For consulting engagements click here. Disclaimer and copyright information:This article refers to organizations and products that may be the trademarks or registered trademarks of their various owners. Copyright of this article belongs to R Meyyappan / www.sqlworkshops.com. You may freely use the ideas and concepts discussed in this article with acknowledgement (www.sqlworkshops.com), but you may not claim any of it as your own work. This article is for informational purposes only; you use any of the suggestions given here entirely at your own risk. R Meyyappan [email protected] LinkedIn: http://at.linkedin.com/in/rmeyyappan

Read the article
Sniffing out SQL Code Smells: Inconsistent use of Symbolic names and Datatypes

- by Phil Factor

It is an awkward feeling. You’ve just delivered a database application that seems to be working fine in production, and you just run a few checks on it. You discover that there is a potential bug that, out of sheer good chance, hasn’t kicked in to produce an error; but it lurks, like a smoking bomb. Worse, maybe you find that the bug has started its evil work of corrupting the data, but in ways that nobody has, so far detected. You investigate, and find the damage. You are somehow going to have to repair it. Yes, it still very occasionally happens to me. It is not a nice feeling, and I do anything I can to prevent it happening. That’s why I’m interested in SQL code smells. SQL Code Smells aren’t necessarily bad practices, but just show you where to focus your attention when checking an application. Sometimes with databases the bugs can be subtle. SQL is rather like HTML: the language does its best to try to carry out your wishes, rather than to be picky about your bugs. Most of the time, this is a great benefit, but not always. One particular place where this can be detrimental is where you have implicit conversion between different data types. Most of the time it is completely harmless but we’re concerned about the occasional time it isn’t. Let’s give an example: String truncation. Let’s give another even more frightening one, rounding errors on assignment to a number of different precision. Each requires a blog-post to explain in detail and I’m not now going to try. Just remember that it is not always a good idea to assign data to variables, parameters or even columns when they aren’t the same datatype, especially if you are relying on implicit conversion to work its magic.For details of the problem and the consequences, see here: SR0014: Data loss might occur when casting from {Type1} to {Type2} . For any experienced Database Developer, this is a more frightening read than a Vampire Story. This is why one of the SQL Code Smells that makes me edgy, in my own or other peoples’ code, is to see parameters, variables and columns that have the same names and different datatypes. Whereas quite a lot of this is perfectly normal and natural, you need to check in case one of two things have gone wrong. Either sloppy naming, or mixed datatypes. Sure it is hard to remember whether you decided that the length of a log entry was 80 or 100 characters long, or the precision of a number. That is why a little check like this I’m going to show you is excellent for tidying up your code before you check it back into source Control! 1/ Checking Parameters only If you were just going to check parameters, you might just do this. It simply groups all the parameters, either input or output, of all the routines (e.g. stored procedures or functions) by their name and checks to see, in the HAVING clause, whether their data types are all the same. If not, it lists all the examples and their origin (the routine) Even this little check can occasionally be scarily revealing. ;WITH userParameter AS ( SELECT c.NAME AS ParameterName, OBJECT_SCHEMA_NAME(c.object_ID) + '.' + OBJECT_NAME(c.object_ID) AS ObjectName, t.name + ' ' + CASE --we may have to put in the length WHEN t.name IN ('char', 'varchar', 'nchar', 'nvarchar') THEN '(' + CASE WHEN c.max_length = -1 THEN 'MAX' ELSE CONVERT(VARCHAR(4), CASE WHEN t.name IN ('nchar', 'nvarchar') THEN c.max_length / 2 ELSE c.max_length END) END + ')' WHEN t.name IN ('decimal', 'numeric') THEN '(' + CONVERT(VARCHAR(4), c.precision) + ',' + CONVERT(VARCHAR(4), c.Scale) + ')' ELSE '' END --we've done with putting in the length + CASE WHEN XML_collection_ID <> 0 THEN --deal with object schema names '(' + CASE WHEN is_XML_Document = 1 THEN 'DOCUMENT ' ELSE 'CONTENT ' END + COALESCE( (SELECT QUOTENAME(ss.name) + '.' + QUOTENAME(sc.name) FROM sys.xml_schema_collections sc INNER JOIN Sys.Schemas ss ON sc.schema_ID = ss.schema_ID WHERE sc.xml_collection_ID = c.XML_collection_ID),'NULL') + ')' ELSE '' END AS [DataType] FROM sys.parameters c INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID WHERE OBJECT_SCHEMA_NAME(c.object_ID) <> 'sys' AND parameter_id>0)SELECT CONVERT(CHAR(80),objectName+'.'+ParameterName),DataType FROM UserParameterWHERE ParameterName IN (SELECT ParameterName FROM UserParameter GROUP BY ParameterName HAVING MIN(Datatype)<>MAX(DataType))ORDER BY ParameterName so, in a very small example here, we have a @ClosingDelimiter variable that is only CHAR(1) when, by the looks of it, it should be up to ten characters long, or even worse, a function that should be a char(1) and seems to let in a string of ten characters. Worth investigating. Then we have a @Comment variable that can't decide whether it is a VARCHAR(2000) or a VARCHAR(MAX) 2/ Columns and Parameters Actually, once we’ve cleared up the mess we’ve made of our parameter-naming in the database we’re inspecting, we’re going to be more interested in listing both columns and parameters. We can do this by modifying the routine to list columns as well as parameters. Because of the slight complexity of creating the string version of the datatypes, we will create a fake table of both columns and parameters so that they can both be processed the same way. After all, we want the datatypes to match Unfortunately, parameters do not expose all the attributes we are interested in, such as whether they are nullable (oh yes, subtle bugs happen if this isn’t consistent for a datatype). We’ll have to leave them out for this check. Voila! A slight modification of the first routine ;WITH userObject AS ( SELECT Name AS DataName,--the actual name of the parameter or column ('@' removed) --and the qualified object name of the routine OBJECT_SCHEMA_NAME(ObjectID) + '.' + OBJECT_NAME(ObjectID) AS ObjectName, --now the harder bit: the definition of the datatype. TypeName + ' ' + CASE --we may have to put in the length. e.g. CHAR (10) WHEN TypeName IN ('char', 'varchar', 'nchar', 'nvarchar') THEN '(' + CASE WHEN MaxLength = -1 THEN 'MAX' ELSE CONVERT(VARCHAR(4), CASE WHEN TypeName IN ('nchar', 'nvarchar') THEN MaxLength / 2 ELSE MaxLength END) END + ')' WHEN TypeName IN ('decimal', 'numeric')--a BCD number! THEN '(' + CONVERT(VARCHAR(4), Precision) + ',' + CONVERT(VARCHAR(4), Scale) + ')' ELSE '' END --we've done with putting in the length + CASE WHEN XML_collection_ID <> 0 --tush tush. XML THEN --deal with object schema names '(' + CASE WHEN is_XML_Document = 1 THEN 'DOCUMENT ' ELSE 'CONTENT ' END + COALESCE( (SELECT TOP 1 QUOTENAME(ss.name) + '.' + QUOTENAME(sc.Name) FROM sys.xml_schema_collections sc INNER JOIN Sys.Schemas ss ON sc.schema_ID = ss.schema_ID WHERE sc.xml_collection_ID = XML_collection_ID),'NULL') + ')' ELSE '' END AS [DataType], DataObjectType FROM (Select t.name AS TypeName, REPLACE(c.name,'@','') AS Name, c.max_length AS MaxLength, c.precision AS [Precision], c.scale AS [Scale], c.[Object_id] AS ObjectID, XML_collection_ID, is_XML_Document,'P' AS DataobjectType FROM sys.parameters c INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID AND parameter_id>0 UNION all Select t.name AS TypeName, c.name AS Name, c.max_length AS MaxLength, c.precision AS [Precision], c.scale AS [Scale], c.[Object_id] AS ObjectID, XML_collection_ID,is_XML_Document, 'C' AS DataobjectType FROM sys.columns c INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID WHERE OBJECT_SCHEMA_NAME(c.object_ID) <> 'sys' )f)SELECT CONVERT(CHAR(80),objectName+'.' + CASE WHEN DataobjectType ='P' THEN '@' ELSE '' END + DataName),DataType FROM UserObjectWHERE DataName IN (SELECT DataName FROM UserObject GROUP BY DataName HAVING MIN(Datatype)<>MAX(DataType))ORDER BY DataName Hmm. I can tell you I found quite a few minor issues with the various tabases I tested this on, and found some potential bugs that really leap out at you from the results. Here is the start of the result for AdventureWorks. Yes, AccountNumber is, for some reason, a Varchar(10) in the Customer table. Hmm. odd. Why is a city fifty characters long in that view? The idea of the description of a colour being 256 characters long seems over-ambitious. Go down the list and you'll spot other mistakes. There are no bugs, but just mess. We started out with a listing to examine parameters, then we mixed parameters and columns. Our last listing is for a slightly more in-depth look at table columns. You’ll notice that we’ve delibarately removed the indication of whether a column is persisted, or is an identity column because that gives us false positives for our code smells. If you just want to browse your metadata for other reasons (and it can quite help in some circumstances) then uncomment them! ;WITH userColumns AS ( SELECT c.NAME AS columnName, OBJECT_SCHEMA_NAME(c.object_ID) + '.' + OBJECT_NAME(c.object_ID) AS ObjectName, REPLACE(t.name + ' ' + CASE WHEN is_computed = 1 THEN ' AS ' + --do DDL for a computed column (SELECT definition FROM sys.computed_columns cc WHERE cc.object_id = c.object_id AND cc.column_ID = c.column_ID) --we may have to put in the length WHEN t.Name IN ('char', 'varchar', 'nchar', 'nvarchar') THEN '(' + CASE WHEN c.Max_Length = -1 THEN 'MAX' ELSE CONVERT(VARCHAR(4), CASE WHEN t.Name IN ('nchar', 'nvarchar') THEN c.Max_Length / 2 ELSE c.Max_Length END) END + ')' WHEN t.name IN ('decimal', 'numeric') THEN '(' + CONVERT(VARCHAR(4), c.precision) + ',' + CONVERT(VARCHAR(4), c.Scale) + ')' ELSE '' END + CASE WHEN c.is_rowguidcol = 1 THEN ' ROWGUIDCOL' ELSE '' END + CASE WHEN XML_collection_ID <> 0 THEN --deal with object schema names '(' + CASE WHEN is_XML_Document = 1 THEN 'DOCUMENT ' ELSE 'CONTENT ' END + COALESCE((SELECT QUOTENAME(ss.name) + '.' + QUOTENAME(sc.name) FROM sys.xml_schema_collections sc INNER JOIN Sys.Schemas ss ON sc.schema_ID = ss.schema_ID WHERE sc.xml_collection_ID = c.XML_collection_ID), 'NULL') + ')' ELSE '' END + CASE WHEN is_identity = 1 THEN CASE WHEN OBJECTPROPERTY(object_id, 'IsUserTable') = 1 AND COLUMNPROPERTY(object_id, c.name, 'IsIDNotForRepl') = 0 AND OBJECTPROPERTY(object_id, 'IsMSShipped') = 0 THEN '' ELSE ' NOT FOR REPLICATION ' END ELSE '' END + CASE WHEN c.is_nullable = 0 THEN ' NOT NULL' ELSE ' NULL' END + CASE WHEN c.default_object_id <> 0 THEN ' DEFAULT ' + object_Definition(c.default_object_id) ELSE '' END + CASE WHEN c.collation_name IS NULL THEN '' WHEN c.collation_name <> (SELECT collation_name FROM sys.databases WHERE name = DB_NAME()) COLLATE Latin1_General_CI_AS THEN COALESCE(' COLLATE ' + c.collation_name, '') ELSE '' END,' ',' ') AS [DataType]FROM sys.columns c INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID WHERE OBJECT_SCHEMA_NAME(c.object_ID) <> 'sys')SELECT CONVERT(CHAR(80),objectName+'.'+columnName),DataType FROM UserColumnsWHERE columnName IN (SELECT columnName FROM UserColumns GROUP BY columnName HAVING MIN(Datatype)<>MAX(DataType))ORDER BY columnName If you take a look down the results against Adventureworks, you'll see once again that there are things to investigate, mostly, in the illustration, discrepancies between null and non-null datatypes So I here you ask, what about temporary variables within routines? If ever there was a source of elusive bugs, you'll find it there. Sadly, these temporary variables are not stored in the metadata so we'll have to find a more subtle way of flushing these out, and that will, I'm afraid, have to wait!

Read the article
26 Days: Countdown to Oracle OpenWorld 2012

- by Michael Snow

Welcome to our countdown to Oracle OpenWorld! Oracle OpenWorld 2012 is just around the corner. In less than 26 days, San Francisco will be invaded by an expected 50,000 people from all over the world. Here on the Oracle WebCenter team, we’ve all been working to help make the experience a great one for all our WebCenter customers. For a sneak peak – we’ll be spending this week giving you a teaser of what to look forward to if you are joining us in San Francisco from September 30th through October 4th. We have Oracle WebCenter sessions covering all topics imaginable. Take a look and use the tools we provide to build out your schedule in advance and reserve your seats in your favorite sessions. That gives you plenty of time to plan for your week with us in San Francisco. If unfortunately, your boss denied your request to attend - there are still some ways that you can join in the experience virtually On-Demand. This year - we are expanding even more up North of Market Street and will be taking over Union Square as well. Check out this map of San Francisco to get a sense of how much of a footprint Oracle OpenWorld has grown to this year. With so much to see and so many sessions to learn from - its no wonder that people get excited. Add to that a good mix of fun and all of the possible WebCenter sessions you could attend - you won't want to sleep at all to take full advantage of such an opportunity. We'll also have our annual WebCenter Customer Appreciation reception - stay tuned this week for some more info on registration to make sure you'll be able to join us. If you've been following the America's Cup at all and believe in EXTREME PERFORMANCE you'll definitely want to take a look at this video from last year's OpenWorld Keynote. 12.00 Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-family:"Calibri","sans-serif"; mso-ascii- mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi- mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Important OpenWorld Links: Attendee / Presenters Toolkit Oracle Schedule Builder WebCenter Sessions (listed in the catalog under Fusion Middleware as "Portals, Sites, Content, and Collaboration" ) Oracle Music Festival - AMAZING Line up!! Oracle Customer Appreciation Night -LOOK HERE!! Oracle OpenWorld LIVE On-Demand Here are all the WebCenter sessions broken down by day for your viewing pleasure. Monday, October 1st CON8885 - Simplify CRM Engagement with Contextual Collaboration Are your sales teams disconnected and disengaged? Do you want a tool for easily connecting expertise across your organization and providing visibility into the complete sales process? Do you want a way to enhance and retain organization knowledge? Oracle Social Network is the answer. Attend this session to learn how to make CRM easy, effective, and efficient for use across virtual sales teams. Also learn how Oracle Social Network can drive sales force collaboration with natural conversations throughout the sales cycle, promote sales team productivity through purposeful social networking without the noise, and build cross-team knowledge by integrating conversations with CRM and other business applications. CON8268 - Oracle WebCenter Strategy: Engaging Your Customers. Empowering Your Business Oracle WebCenter is a user engagement platform for social business, connecting people and information. Attend this session to learn about the Oracle WebCenter strategy, and understand where Oracle is taking the platform to help companies engage customers, empower employees, and enable partners. Business success starts with ensuring that everyone is engaged with the right people and the right information and can access what they need through the channel of their choice—Web, mobile, or social. Are you giving customers, employees, and partners the best-possible experience? Come learn how you can! ¶ HOL10208 - Add Social Capabilities to Your Enterprise Applications Oracle Social Network enables you to add real-time collaboration capabilities into your enterprise applications, so that conversations can happen directly within your business systems. In this hands-on lab, you will try out the Oracle Social Network product to collaborate with other attendees, using real-time conversations with document sharing capabilities. Next you will embed social capabilities into a sample Web-based enterprise application, using embedded UI components. Experts will also write simple REST-based integrations, using the Oracle Social Network API to programmatically create social interactions. ¶ CON8893 - Improve Employee Productivity with Intuitive and Social Work Environments Social technologies have already transformed the ways customers, employees, partners, and suppliers communicate and stay informed. Forward-thinking organizations today need technologies and infrastructures to help them advance to the next level and integrate social activities with business applications to deliver a user experience that simplifies business processes and enterprise application engagement. Attend this session to hear from an innovative Oracle Social Network customer and learn how you can improve productivity with intuitive and social work environments and empower your employees with innovative social tools to enable contextual access to content and dynamic personalization of solutions. ¶ CON8270 - Oracle WebCenter Content Strategy and Vision Oracle WebCenter provides a strategic content infrastructure for managing documents, images, e-mails, and rich media files. With a single repository, organizations can address any content use case, such as accounts payable, HR onboarding, document management, compliance, records management, digital asset management, or Website management. In this session, learn about future plans for how Oracle WebCenter will address new use cases as well as new integrations with Oracle Fusion Middleware and Oracle Applications, leveraging your investments by making your users more productive and error-free. ¶ CON8269 - Oracle WebCenter Sites Strategy and Vision Oracle’s Web experience management solution, Oracle WebCenter Sites, enables organizations to use the online channel to drive customer acquisition and brand loyalty. It helps marketers and business users easily create and manage contextually relevant, social, interactive online experiences across multiple channels on a global scale. In this session, learn about future plans for how Oracle WebCenter Sites will provide you with the tools, capabilities, and integrations you need in order to continue to address your customers’ evolving requirements for engaging online experiences and keep moving your business forward. ¶ CON8896 - Living with SharePoint SharePoint is a popular platform, but it’s not always the best fit for Oracle customers. In this session, you’ll discover the technical and nontechnical limitations and pitfalls of SharePoint and learn about Oracle alternatives for collaboration, portals, enterprise and Web content management, social computing, and application integration. The presentation shows you how to integrate with SharePoint when business or IT requirements dictate and covers cloud-based (Office 365) and on-premises versions of SharePoint. Presented by a former Microsoft director of SharePoint product management and backed by independent customer research, this session will prepare you to answer the question “Why don’t we just use SharePoint for that?’ the next time it comes up in your organization. ¶ CON7843 - Content-Enabling Enterprise Processes with Oracle WebCenter Organizations today continually strive to automate business processes, reduce costs, and improve efficiency. Many business processes are content-intensive and unstructured, requiring ad hoc collaboration, and distributed in nature, requiring many approvals and generating huge volumes of paper. In this session, learn how Oracle and SYSTIME have partnered to help a customer content-enable its enterprise with Oracle WebCenter Content and Oracle WebCenter Imaging 11g and integrate them with Oracle Applications. ¶ CON6114 - Tape Robotics’ Newest Superhero: Now Fueled by Oracle Software For small, midsize, and rapidly growing businesses that want the most energy-efficient, scalable storage infrastructure to meet their rapidly growing data demands, Oracle’s most recent addition to its award-winning tape portfolio leverages several pieces of Oracle software. With Oracle Linux, Oracle WebLogic, and Oracle Fusion Middleware tools, the library achieves a higher level of usability than previous products while offering customers a familiar interface for management, plus ease of use. This session examines the competitive advantages of the tape library and how Oracle software raises customer satisfaction. Learn how the combination of Oracle engineered systems, Oracle Secure Backup, and Oracle’s StorageTek tape libraries provide end-to-end coverage of your data. ¶ CON9437 - Mobile Access Management With more than five billion mobile devices on the planet and an increasing number of users using their own devices to access corporate data and applications, securely extending identity management to mobile devices has become a hot topic. This session focuses on how to extend your existing identity management infrastructure and policies to securely and seamlessly enable mobile user access. CON7815 - Customer Experience Online in Cloud: Oracle WebCenter Sites, Oracle ATG Apps, Oracle Exalogic Oracle WebCenter Sites and Oracle’s ATG product line together can provide a compelling marketing and e-commerce experience. When you couple them with the extreme performance of Oracle Exalogic, you’ll see unmatched scalability that provides you with a true cloud-based solution. In this session, you’ll learn how running Oracle WebCenter Sites and ATG applications on Oracle Exalogic delivers both a private and a public cloud experience. Find out what it takes to get these systems working together and delivering engaging Web experiences. Even if you aren’t considering Oracle Exalogic today, the rich Web experience of Oracle WebCenter, paired with the depth of the ATG product line, can provide your business full support, from merchandising through sale completion. ¶ CON8271 - Oracle WebCenter Portal Strategy and Vision To innovate and keep a competitive edge, organizations need to leverage the power of agile and responsive Web applications. Oracle WebCenter Portal enables you to do just that, by delivering intuitive user experiences for enterprise applications to drive innovation with composite applications and mashups. Attend this session to learn firsthand from customers how Oracle WebCenter Portal extends the value of existing enterprise applications, business processes, and content; delivers a superior business user experience; and maximizes limited IT resources. ¶ CON8880 - The Connected Customer Experience Begins with the Online Channel There’s a lot of talk these days about how to connect the customer journey across various touchpoints—from Websites and e-commerce to call centers and in-store—to provide experiences that are more relevant and engaging and ultimately gain competitive edge. Doing it all at once isn’t a realistic objective, so where do you start? Come to this session, and hear about three steps you can take that can help you begin your journey toward delivering the connected customer experience. You’ll hear how Oracle now has an integrated digital marketing platform for your corporate Website, your e-commerce site, your self-service portal, and your marketing and loyalty campaigns, and you’ll learn what you can do today to begin executing on your customer experience initiatives. ¶ GEN11451 - General Session: Building Mobile Applications with Oracle Cloud With the prevalence of smart mobile devices, companies are facing an increased demand to provide access to data and applications from new channels. However, developing applications for mobile devices poses some unique challenges. Come to this session to learn how Oracle addresses these challenges, offering a simpler way to develop and deploy cross-device mobile applications. See how Oracle Cloud enables you to access applications, data, and services from mobile channels in an easier way. CON8272 - Oracle Social Network Strategy and Vision One key way of increasing employee productivity is by bringing people, processes, and information together—providing new social capabilities to enable business users to quickly correspond and collaborate on business activities. Oracle WebCenter provides a user engagement platform with social and collaborative technologies to empower business users to focus on their key business processes, applications, and content in the context of their role and process. Attend this session to hear how the latest social capabilities in Oracle Social Network are enabling organizations to transform themselves into social businesses. --- Tuesday, October 2nd HOL10194 - Enterprise Content Management Simplified: Oracle WebCenter Content’s Next-Generation UI Regardless of the nature of your business, unstructured content underpins many of its daily functions. Whether you are working with traditional presentations, spreadsheets, or text documents—or even with digital assets such as images and multimedia files—your content needs to be accessible and manageable in convenient and intuitive ways to make working with the content easier. Additionally, you need the ability to easily share documents with coworkers to facilitate a collaborative working environment. Come to this session to see how Oracle WebCenter Content’s next-generation user interface helps modern knowledge workers easily manage personal and enterprise documents in a collaborative environment.¶ CON8877 - Develop a Mobile Strategy with Oracle WebCenter: Engage Customers, Employees, and Partners Mobile technology has gone from nice-to-have to a cornerstone of user engagement. Mobile access enables users to have information available at their fingertips, enabling them to take action the moment they make a decision, interact in the moment of convenience, and take advantage of new service offerings in their preferred channels. All your employees have your mobile applications in their pocket; now what are you going to do? It is a critical step for companies to think through what their employees, customers, and partners really need on their devices. Attend this session to see how Oracle WebCenter enables you to better engage your customers, employees, and partners by providing a unified experience across multiple channels. ¶ CON9447 - Enabling Access for Hundreds of Millions of Users How do you grow your business by identifying, authenticating, authorizing, and federating users on the Web, leveraging social identity and the open source OAuth protocol? How do you scale your access management solution to support hundreds of millions of users? With social identity support out of the box, Oracle’s access management solution is also benchmarked for 250-million-user deployment according to real-world customer scenarios. In this session, you will learn about the social identity capability and the 250-million-user benchmark testing of Oracle Access Manager and Oracle Adaptive Access Manager running on Oracle Exalogic and Oracle Exadata. ¶ HOL10207 - Build an Intranet Portal with Oracle WebCenter In this hands-on lab, you’ll work with Oracle WebCenter Portal and Oracle WebCenter Content to build out an enterprise portal that maximizes the productivity of teams and individual contributors. Using browser-based tools, you’ll manage site resources such as page styles, templates, and navigation. You’ll edit content stored in Oracle WebCenter Content directly from your portal. You’ll also experience the latest features that promote collaboration, social networking, and personal productivity. ¶ CON2906 - Get Proactive: Best Practices for Maintaining Oracle Fusion Middleware You chose Oracle Fusion Middleware products to help your organization deliver superior business results. Now learn how to take full advantage of your software with all the great tools, resources, and product updates you’re entitled to through Oracle Support. In this session, Oracle product experts provide proven best practices to help you work more efficiently, plan and prepare for upgrades and patching more effectively, and manage risk. Topics include configuration management tools, remote diagnostics, My Oracle Support Community, and My Oracle Support Lifecycle Advisors. New users and Oracle Fusion Middleware experts alike are guaranteed to leave with fresh ideas and practical, easy-to-implement next steps. ¶ CON8878 - Oracle WebCenter’s Cloud Strategy: From Social and Platform Services to Mashups Cloud computing represents a paradigm shift in how we build applications, automate processes, collaborate, and share and in how we secure our enterprise. Additionally, as you adopt cloud-based services in your organization, it’s likely that you will still have many critical on-premises applications running. With these mixed environments, multiple user interfaces, different security, and multiple datasources and content sources, how do you start evolving your strategy to account for these challenges? Oracle WebCenter offers a complete array of technologies enabling you to solve these challenges and prepare you for the cloud. Attend this session to learn how you can use Oracle WebCenter in the cloud as well as create on-premises and cloud application mash-ups. ¶ CON8901 - Optimize Enterprise Business Processes with Oracle WebCenter and Oracle BPM Do you have business processes that span multiple applications? Are you grappling with how to have visibility across these business processes; how to manage content that is associated with these processes; and, most importantly, how to model and optimize these business processes? Attend this session to hear how Oracle WebCenter and Oracle Business Process Management provide a unique set of integrated solutions to provide a composite application dashboard across these business processes and offer a solution for content-centric business processes. ¶ CON8883 - Deliver Engaging Interfaces to Oracle Applications with Oracle WebCenter Critical business processes live within enterprise applications, and application users need to manage and execute these processes as effectively as possible. Oracle provides a comprehensive user engagement platform to increase user productivity and optimize overall processes within Oracle Applications—Oracle E-Business Suite and Oracle’s Siebel, PeopleSoft, and JD Edwards product families—and third-party applications. Attend this session to learn how you can integrate these applications with Oracle WebCenter to deliver composite application dashboards to your end users—whether they are your customers, partners, or employees—for enhanced usability and Web 2.0–enabled enterprise portals.¶ Wednesday, October 3rd CON8895 - Future-Ready Intranets: How Aramark Re-engineered the Application Landscape There are essential techniques and technologies you can use to deliver employee portals that garner higher productivity, improve business efficiency, and increase user engagement. Attend this session to learn how you can leverage Oracle WebCenter Portal as a user engagement platform for bringing together business process management, enterprise content management, and business intelligence into a highly relevant and integrated experience. Hear how Aramark has leveraged Oracle WebCenter Portal and Oracle WebCenter Content to deliver a unified workspace providing simpler navigation and processing, consolidation of tools, easy access to information, integrated search, and single sign-on. ¶ CON8886 - Content Consolidation: Save Money, Increase Efficiency, and Eliminate Silos Organizations are looking for ways to save money and be more efficient. With content in many different places, it’s difficult to know where to look for a document and whether the document is the most current version. With Oracle WebCenter, content can be consolidated into one best-of-breed repository that is secure, scalable, and integrated with your business processes and applications. Users can find the content they need, where they need it, and ensure that it is the right content. This session covers content challenges that affect your business; content consolidation that can lead to savings in storage and administration costs and can lower risks; and how companies are realizing savings. ¶ CON8911 - Improve Online Experiences for Customers and Partners with Self-Service Portals Are you able to provide your customers and partners an easy-to-use online self-service experience? Are you processing high-volume transactions and struggling with call center bottlenecks or back-end systems that won’t integrate, causing order delays and customer frustration? Are you looking to target content such as product and service offerings to your end users? This session shares approaches to providing targeted delivery as well as strategies and best practices for transforming your business by providing an intuitive user experience for your customers and partners. ¶ CON6156 - Top 10 Ways to Integrate Oracle WebCenter Content This session covers 10 common ways to integrate Oracle WebCenter Content with other enterprise applications and middleware. It discusses out-of-the-box modules that provide expanded features in Oracle WebCenter Content—such as enterprise search, SOA, and BPEL—as well as developer tools you can use to create custom integrations. The presentation also gives guidance on which integration option may work best in your environment. ¶ HOL10207 - Build an Intranet Portal with Oracle WebCenter In this hands-on lab, you’ll work with Oracle WebCenter Portal and Oracle WebCenter Content to build out an enterprise portal that maximizes the productivity of teams and individual contributors. Using browser-based tools, you’ll manage site resources such as page styles, templates, and navigation. You’ll edit content stored in Oracle WebCenter Content directly from your portal. You’ll also experience the latest features that promote collaboration, social networking, and personal productivity. ¶ CON7817 - Migration to Oracle WebCenter Imaging 11g Customers today continually strive to automate business processes, reduce costs, and improve efficiency. The accounts payable process—which is often distributed in nature, requires many approvals, and generates huge volumes of paper invoices—is automated by many customers. In this session, learn how Oracle and SYSTIME have partnered to help a customer migrate its existing Oracle Imaging and Process Management Release 7.6 to the latest Oracle WebCenter Imaging 11g and integrate it with Oracle’s JD Edwards family of products. ¶ CON8910 - How to Engage Customers Across Web, Mobile, and Social Channels Whether on desktops at the office, on tablets at home, or on mobile phones when on the go, today’s customers are always connected. To engage today’s customers, you need to make the online customer experience connected and consistent across a host of devices and multiple channels, including Web, mobile, and social networks. Managing this multichannel environment can result in lots of headaches without the right tools. Attend this session to learn how Oracle WebCenter Sites solves the challenge of multichannel customer engagement. ¶ HOL10206 - Oracle WebCenter Sites 11g: Transforming the Content Contributor Experience Oracle WebCenter Sites 11g makes it easy for marketers and business users to contribute to and manage Websites with the new visual, contextual, and intuitive Web authoring interface. In this hands-on lab, you will create and manage content for a sports-themed Website, using many of the new and enhanced features of the 11g release. ¶ CON8900 - Building Next-Generation Portals: An Interactive Customer Panel Discussion Social and collaborative technologies have changed how people interact, learn, and collaborate, and providing a modern, social Web presence is imperative to remain competitive in today’s market. Can your business benefit from a more collaborative and interactive portal environment for employees, customers, and partners? Attend this session to hear from Oracle WebCenter Portal customers as they share their strategies and best practices for providing users with a modern experience that adapts to their needs and includes personalized access to content in context. The panel also addresses how customers have benefited from creating next-generation portals by migrating from older portal technologies to Oracle WebCenter Portal. ¶ CON9625 - Taking Control of Oracle WebCenter Security Organizations are increasingly looking to extend their Oracle WebCenter portal for social business, to serve external users and provide seamless access to the right information. In particular, many organizations are extending Oracle WebCenter in a business-to-business scenario requiring secure identification and authorization of business partners and their users. This session focuses on how customers are leveraging, securing, and providing access control to Oracle WebCenter portal and mobile solutions. You will learn best practices and hear real-world examples of how to provide flexible and granular access control for Oracle WebCenter deployments, using Oracle Platform Security Services and Oracle Access Management Suite product offerings. ¶ CON8891 - Extending Social into Enterprise Applications and Business Processes Oracle Social Network is an extensible social platform that enables contextual collaboration within enterprise applications and business processes, providing relevant data from across various enterprise systems in one place. Attend this session to see how an Oracle Social Network customer is integrating multiple applications—such as CRM, HCM, and business processes—into Oracle Social Network and Oracle WebCenter to enable individuals and teams to solve complex cross-organizational business problems more effectively by utilizing the social enterprise. ¶ Thursday, October 4th CON8899 - Becoming a Social Business: Stories from the Front Lines of Change What does it really mean to be a social business? How can you change our organization to embrace social approaches? What pitfalls do you need to avoid? In this lively panel discussion, customer and industry thought leaders in social business explore these topics and more as they share their stories of the good, the bad, and the ugly that can happen when embracing social methods and technologies to improve business success. Using moderated questions and open Q&A from the audience, the panel discusses vital topics such as the critical factors for success, the major issues to avoid, how to gain senior executive support for social efforts, how to handle undesired behavior, and how to measure business impact. It takes a thought-provoking look at becoming a social business from the inside. ¶ CON6851 - Oracle WebCenter and Oracle Business Intelligence Enterprise Edition to Create Vendor Portals Large manufacturers of grocery items routinely find themselves depending on the inventory management expertise of their wholesalers and distributors. Inventory costs can be managed more efficiently by the manufacturers if they have better insight into the inventory levels of items carried by their distributors. This creates a unique opportunity for distributors and wholesalers to leverage this knowledge into a revenue-generating subscription service. Oracle Business Intelligence Enterprise Edition and Oracle WebCenter Portal play a key part in enabling creation of business-managed business intelligence portals for vendors. This session discusses one customer that implemented this by leveraging Oracle WebCenter and Oracle Business Intelligence Enterprise Edition. ¶ CON8879 - Provide a Personalized and Consistent Customer Experience in Your Websites and Portals Your customers engage with your company online in different ways throughout their journey—from prospecting by acquiring information on your corporate Website to transacting through self-service applications on your customer portal—and then the cycle begins again when they look for new products and services. Ensuring that the customer experience is consistent and personalized across online properties—from branding and content to interactions and transactions—can be a daunting task. Oracle WebCenter enables you to speak and interact with your customers with one voice across your Websites and portals by providing an integrated platform for delivery of self-service and engagement that unifies and personalizes the online experience. Learn more in this session. ¶ CON8898 - Land Mines, Potholes, and Dirt Roads: Navigating the Way to ECM Nirvana Ten years ago, people were predicting that by this time in history, we’d be some kind of utopian paperless society. As we all know, we’re not there yet, but are we getting closer? What is keeping companies from driving down the road to enterprise content management bliss? Most people understand that using ECM as a central platform enables organizations to expedite document-centric processes, but most business processes in organizations are still heavily paper-based. Many of these processes could be automated and improved with an ECM platform infrastructure. In this panel discussion, you’ll hear from Oracle WebCenter customers that have already solved some of these challenges as they share their strategies for success and roads to avoid along your journey. ¶ CON8908 - Oracle WebCenter Portal: Creating and Using Content Presenter Templates Oracle WebCenter Portal applications use task flows to display and integrate content stored in the Oracle WebCenter Content server. Among the most flexible task flows is Content Presenter, which renders various types of content on an Oracle WebCenter Portal page. Although Oracle WebCenter Portal comes with a set of predefined Content Presenter templates, developers can create their own templates for specific rendering needs. This session shows the lifecycle of developing Content Presenter task flows, including how to create, package, import, modify at runtime, and use such templates. In addition to simple examples with Oracle Application Development Framework (Oracle ADF) UI elements to render the content, it shows how to use other UI technologies, CSS files, and JavaScript libraries. ¶ CON8897 - Using Web Experience Management to Drive Online Marketing Success Every year, the online channel becomes more imperative for driving organizational top-line revenue, but for many companies, mastering how to best market their products and services in a fast-evolving online world with high customer expectations for personalized experiences can be a complex proposition. Come to this panel discussion, and hear directly from online marketers how they are succeeding today by using Web experience management to drive marketing success, using capabilities such as targeting and optimization, user-generated content, mobile site publishing, and site visitor personalization to deliver engaging online experiences. ¶ CON8892 - Oracle’s Journey to Social Business Social business is a revolution, one that is causing rapidly accelerating change in how companies and customers engage with one another and how employees work together. Oracle’s goal in becoming a social business is to create a socially connected organization in which working collaboratively across geographical locations, lines of business, and management chains is second nature, enabling innovative solutions to business challenges. We can achieve this by connecting the right people, finding the right content, communicating with the right people, collaborating at the right time, and building the right communities in the right context—all ready in the CLOUD. Attend this session to see how Oracle is transforming itself into a social business. ¶ ------------ If you've read all the way to the end here - we are REALLY looking forward to seeing you in San Francisco.

Read the article
New features of C# 4.0

This article covers New features of C# 4.0. Article has been divided into below sections. Introduction. Dynamic Lookup. Named and Optional Arguments. Features for COM interop. Variance. Relationship with Visual Basic. Resources. Other interested readings… 22 New Features of Visual Studio 2008 for .NET Professionals 50 New Features of SQL Server 2008 IIS 7.0 New features Introduction It is now close to a year since Microsoft Visual C# 3.0 shipped as part of Visual Studio 2008. In the VS Managed Languages team we are hard at work on creating the next version of the language (with the unsurprising working title of C# 4.0), and this document is a first public description of the planned language features as we currently see them. Please be advised that all this is in early stages of production and is subject to change. Part of the reason for sharing our plans in public so early is precisely to get the kind of feedback that will cause us to improve the final product before it rolls out. Simultaneously with the publication of this whitepaper, a first public CTP (community technology preview) of Visual Studio 2010 is going out as a Virtual PC image for everyone to try. Please use it to play and experiment with the features, and let us know of any thoughts you have. We ask for your understanding and patience working with very early bits, where especially new or newly implemented features do not have the quality or stability of a final product. The aim of the CTP is not to give you a productive work environment but to give you the best possible impression of what we are working on for the next release. The CTP contains a number of walkthroughs, some of which highlight the new language features of C# 4.0. Those are excellent for getting a hands-on guided tour through the details of some common scenarios for the features. You may consider this whitepaper a companion document to these walkthroughs, complementing them with a focus on the overall language features and how they work, as opposed to the specifics of the concrete scenarios. C# 4.0 The major theme for C# 4.0 is dynamic programming. Increasingly, objects are “dynamic” in the sense that their structure and behavior is not captured by a static type, or at least not one that the compiler knows about when compiling your program. Some examples include a. objects from dynamic programming languages, such as Python or Ruby b. COM objects accessed through IDispatch c. ordinary .NET types accessed through reflection d. objects with changing structure, such as HTML DOM objects While C# remains a statically typed language, we aim to vastly improve the interaction with such objects. A secondary theme is co-evolution with Visual Basic. Going forward we will aim to maintain the individual character of each language, but at the same time important new features should be introduced in both languages at the same time. They should be differentiated more by style and feel than by feature set. The new features in C# 4.0 fall into four groups: Dynamic lookup Dynamic lookup allows you to write method, operator and indexer calls, property and field accesses, and even object invocations which bypass the C# static type checking and instead gets resolved at runtime. Named and optional parameters Parameters in C# can now be specified as optional by providing a default value for them in a member declaration. When the member is invoked, optional arguments can be omitted. Furthermore, any argument can be passed by parameter name instead of position. COM specific interop features Dynamic lookup as well as named and optional parameters both help making programming against COM less painful than today. On top of that, however, we are adding a number of other small features that further improve the interop experience. Variance It used to be that an IEnumerable<string> wasn’t an IEnumerable<object>. Now it is – C# embraces type safe “co-and contravariance” and common BCL types are updated to take advantage of that. Dynamic Lookup Dynamic lookup allows you a unified approach to invoking things dynamically. With dynamic lookup, when you have an object in your hand you do not need to worry about whether it comes from COM, IronPython, the HTML DOM or reflection; you just apply operations to it and leave it to the runtime to figure out what exactly those operations mean for that particular object. This affords you enormous flexibility, and can greatly simplify your code, but it does come with a significant drawback: Static typing is not maintained for these operations. A dynamic object is assumed at compile time to support any operation, and only at runtime will you get an error if it wasn’t so. Oftentimes this will be no loss, because the object wouldn’t have a static type anyway, in other cases it is a tradeoff between brevity and safety. In order to facilitate this tradeoff, it is a design goal of C# to allow you to opt in or opt out of dynamic behavior on every single call. The dynamic type C# 4.0 introduces a new static type called dynamic. When you have an object of type dynamic you can “do things to it” that are resolved only at runtime: dynamic d = GetDynamicObject(…); d.M(7); The C# compiler allows you to call a method with any name and any arguments on d because it is of type dynamic. At runtime the actual object that d refers to will be examined to determine what it means to “call M with an int” on it. The type dynamic can be thought of as a special version of the type object, which signals that the object can be used dynamically. It is easy to opt in or out of dynamic behavior: any object can be implicitly converted to dynamic, “suspending belief” until runtime. Conversely, there is an “assignment conversion” from dynamic to any other type, which allows implicit conversion in assignment-like constructs: dynamic d = 7; // implicit conversion int i = d; // assignment conversion Dynamic operations Not only method calls, but also field and property accesses, indexer and operator calls and even delegate invocations can be dispatched dynamically: dynamic d = GetDynamicObject(…); d.M(7); // calling methods d.f = d.P; // getting and settings fields and properties d[“one”] = d[“two”]; // getting and setting thorugh indexers int i = d + 3; // calling operators string s = d(5,7); // invoking as a delegate The role of the C# compiler here is simply to package up the necessary information about “what is being done to d”, so that the runtime can pick it up and determine what the exact meaning of it is given an actual object d. Think of it as deferring part of the compiler’s job to runtime. The result of any dynamic operation is itself of type dynamic. Runtime lookup At runtime a dynamic operation is dispatched according to the nature of its target object d: COM objects If d is a COM object, the operation is dispatched dynamically through COM IDispatch. This allows calling to COM types that don’t have a Primary Interop Assembly (PIA), and relying on COM features that don’t have a counterpart in C#, such as indexed properties and default properties. Dynamic objects If d implements the interface IDynamicObject d itself is asked to perform the operation. Thus by implementing IDynamicObject a type can completely redefine the meaning of dynamic operations. This is used intensively by dynamic languages such as IronPython and IronRuby to implement their own dynamic object models. It will also be used by APIs, e.g. by the HTML DOM to allow direct access to the object’s properties using property syntax. Plain objects Otherwise d is a standard .NET object, and the operation will be dispatched using reflection on its type and a C# “runtime binder” which implements C#’s lookup and overload resolution semantics at runtime. This is essentially a part of the C# compiler running as a runtime component to “finish the work” on dynamic operations that was deferred by the static compiler. Example Assume the following code: dynamic d1 = new Foo(); dynamic d2 = new Bar(); string s; d1.M(s, d2, 3, null); Because the receiver of the call to M is dynamic, the C# compiler does not try to resolve the meaning of the call. Instead it stashes away information for the runtime about the call. This information (often referred to as the “payload”) is essentially equivalent to: “Perform an instance method call of M with the following arguments: 1. a string 2. a dynamic 3. a literal int 3 4. a literal object null” At runtime, assume that the actual type Foo of d1 is not a COM type and does not implement IDynamicObject. In this case the C# runtime binder picks up to finish the overload resolution job based on runtime type information, proceeding as follows: 1. Reflection is used to obtain the actual runtime types of the two objects, d1 and d2, that did not have a static type (or rather had the static type dynamic). The result is Foo for d1 and Bar for d2. 2. Method lookup and overload resolution is performed on the type Foo with the call M(string,Bar,3,null) using ordinary C# semantics. 3. If the method is found it is invoked; otherwise a runtime exception is thrown. Overload resolution with dynamic arguments Even if the receiver of a method call is of a static type, overload resolution can still happen at runtime. This can happen if one or more of the arguments have the type dynamic: Foo foo = new Foo(); dynamic d = new Bar(); var result = foo.M(d); The C# runtime binder will choose between the statically known overloads of M on Foo, based on the runtime type of d, namely Bar. The result is again of type dynamic. The Dynamic Language Runtime An important component in the underlying implementation of dynamic lookup is the Dynamic Language Runtime (DLR), which is a new API in .NET 4.0. The DLR provides most of the infrastructure behind not only C# dynamic lookup but also the implementation of several dynamic programming languages on .NET, such as IronPython and IronRuby. Through this common infrastructure a high degree of interoperability is ensured, but just as importantly the DLR provides excellent caching mechanisms which serve to greatly enhance the efficiency of runtime dispatch. To the user of dynamic lookup in C#, the DLR is invisible except for the improved efficiency. However, if you want to implement your own dynamically dispatched objects, the IDynamicObject interface allows you to interoperate with the DLR and plug in your own behavior. This is a rather advanced task, which requires you to understand a good deal more about the inner workings of the DLR. For API writers, however, it can definitely be worth the trouble in order to vastly improve the usability of e.g. a library representing an inherently dynamic domain. Open issues There are a few limitations and things that might work differently than you would expect. · The DLR allows objects to be created from objects that represent classes. However, the current implementation of C# doesn’t have syntax to support this. · Dynamic lookup will not be able to find extension methods. Whether extension methods apply or not depends on the static context of the call (i.e. which using clauses occur), and this context information is not currently kept as part of the payload. · Anonymous functions (i.e. lambda expressions) cannot appear as arguments to a dynamic method call. The compiler cannot bind (i.e. “understand”) an anonymous function without knowing what type it is converted to. One consequence of these limitations is that you cannot easily use LINQ queries over dynamic objects: dynamic collection = …; var result = collection.Select(e => e + 5); If the Select method is an extension method, dynamic lookup will not find it. Even if it is an instance method, the above does not compile, because a lambda expression cannot be passed as an argument to a dynamic operation. There are no plans to address these limitations in C# 4.0. Named and Optional Arguments Named and optional parameters are really two distinct features, but are often useful together. Optional parameters allow you to omit arguments to member invocations, whereas named arguments is a way to provide an argument using the name of the corresponding parameter instead of relying on its position in the parameter list. Some APIs, most notably COM interfaces such as the Office automation APIs, are written specifically with named and optional parameters in mind. Up until now it has been very painful to call into these APIs from C#, with sometimes as many as thirty arguments having to be explicitly passed, most of which have reasonable default values and could be omitted. Even in APIs for .NET however you sometimes find yourself compelled to write many overloads of a method with different combinations of parameters, in order to provide maximum usability to the callers. Optional parameters are a useful alternative for these situations. Optional parameters A parameter is declared optional simply by providing a default value for it: public void M(int x, int y = 5, int z = 7); Here y and z are optional parameters and can be omitted in calls: M(1, 2, 3); // ordinary call of M M(1, 2); // omitting z – equivalent to M(1, 2, 7) M(1); // omitting both y and z – equivalent to M(1, 5, 7) Named and optional arguments C# 4.0 does not permit you to omit arguments between commas as in M(1,,3). This could lead to highly unreadable comma-counting code. Instead any argument can be passed by name. Thus if you want to omit only y from a call of M you can write: M(1, z: 3); // passing z by name or M(x: 1, z: 3); // passing both x and z by name or even M(z: 3, x: 1); // reversing the order of arguments All forms are equivalent, except that arguments are always evaluated in the order they appear, so in the last example the 3 is evaluated before the 1. Optional and named arguments can be used not only with methods but also with indexers and constructors. Overload resolution Named and optional arguments affect overload resolution, but the changes are relatively simple: A signature is applicable if all its parameters are either optional or have exactly one corresponding argument (by name or position) in the call which is convertible to the parameter type. Betterness rules on conversions are only applied for arguments that are explicitly given – omitted optional arguments are ignored for betterness purposes. If two signatures are equally good, one that does not omit optional parameters is preferred. M(string s, int i = 1); M(object o); M(int i, string s = “Hello”); M(int i); M(5); Given these overloads, we can see the working of the rules above. M(string,int) is not applicable because 5 doesn’t convert to string. M(int,string) is applicable because its second parameter is optional, and so, obviously are M(object) and M(int). M(int,string) and M(int) are both better than M(object) because the conversion from 5 to int is better than the conversion from 5 to object. Finally M(int) is better than M(int,string) because no optional arguments are omitted. Thus the method that gets called is M(int). Features for COM interop Dynamic lookup as well as named and optional parameters greatly improve the experience of interoperating with COM APIs such as the Office Automation APIs. In order to remove even more of the speed bumps, a couple of small COM-specific features are also added to C# 4.0. Dynamic import Many COM methods accept and return variant types, which are represented in the PIAs as object. In the vast majority of cases, a programmer calling these methods already knows the static type of a returned object from context, but explicitly has to perform a cast on the returned value to make use of that knowledge. These casts are so common that they constitute a major nuisance. In order to facilitate a smoother experience, you can now choose to import these COM APIs in such a way that variants are instead represented using the type dynamic. In other words, from your point of view, COM signatures now have occurrences of dynamic instead of object in them. This means that you can easily access members directly off a returned object, or you can assign it to a strongly typed local variable without having to cast. To illustrate, you can now say excel.Cells[1, 1].Value = "Hello"; instead of ((Excel.Range)excel.Cells[1, 1]).Value2 = "Hello"; and Excel.Range range = excel.Cells[1, 1]; instead of Excel.Range range = (Excel.Range)excel.Cells[1, 1]; Compiling without PIAs Primary Interop Assemblies are large .NET assemblies generated from COM interfaces to facilitate strongly typed interoperability. They provide great support at design time, where your experience of the interop is as good as if the types where really defined in .NET. However, at runtime these large assemblies can easily bloat your program, and also cause versioning issues because they are distributed independently of your application. The no-PIA feature allows you to continue to use PIAs at design time without having them around at runtime. Instead, the C# compiler will bake the small part of the PIA that a program actually uses directly into its assembly. At runtime the PIA does not have to be loaded. Omitting ref Because of a different programming model, many COM APIs contain a lot of reference parameters. Contrary to refs in C#, these are typically not meant to mutate a passed-in argument for the subsequent benefit of the caller, but are simply another way of passing value parameters. It therefore seems unreasonable that a C# programmer should have to create temporary variables for all such ref parameters and pass these by reference. Instead, specifically for COM methods, the C# compiler will allow you to pass arguments by value to such a method, and will automatically generate temporary variables to hold the passed-in values, subsequently discarding these when the call returns. In this way the caller sees value semantics, and will not experience any side effects, but the called method still gets a reference. Open issues A few COM interface features still are not surfaced in C#. Most notably these include indexed properties and default properties. As mentioned above these will be respected if you access COM dynamically, but statically typed C# code will still not recognize them. There are currently no plans to address these remaining speed bumps in C# 4.0. Variance An aspect of generics that often comes across as surprising is that the following is illegal: IList<string> strings = new List<string>(); IList<object> objects = strings; The second assignment is disallowed because strings does not have the same element type as objects. There is a perfectly good reason for this. If it were allowed you could write: objects[0] = 5; string s = strings[0]; Allowing an int to be inserted into a list of strings and subsequently extracted as a string. This would be a breach of type safety. However, there are certain interfaces where the above cannot occur, notably where there is no way to insert an object into the collection. Such an interface is IEnumerable<T>. If instead you say: IEnumerable<object> objects = strings; There is no way we can put the wrong kind of thing into strings through objects, because objects doesn’t have a method that takes an element in. Variance is about allowing assignments such as this in cases where it is safe. The result is that a lot of situations that were previously surprising now just work. Covariance In .NET 4.0 the IEnumerable<T> interface will be declared in the following way: public interface IEnumerable<out T> : IEnumerable { IEnumerator<T> GetEnumerator(); } public interface IEnumerator<out T> : IEnumerator { bool MoveNext(); T Current { get; } } The “out” in these declarations signifies that the T can only occur in output position in the interface – the compiler will complain otherwise. In return for this restriction, the interface becomes “covariant” in T, which means that an IEnumerable<A> is considered an IEnumerable<B> if A has a reference conversion to B. As a result, any sequence of strings is also e.g. a sequence of objects. This is useful e.g. in many LINQ methods. Using the declarations above: var result = strings.Union(objects); // succeeds with an IEnumerable<object> This would previously have been disallowed, and you would have had to to some cumbersome wrapping to get the two sequences to have the same element type. Contravariance Type parameters can also have an “in” modifier, restricting them to occur only in input positions. An example is IComparer<T>: public interface IComparer<in T> { public int Compare(T left, T right); } The somewhat baffling result is that an IComparer<object> can in fact be considered an IComparer<string>! It makes sense when you think about it: If a comparer can compare any two objects, it can certainly also compare two strings. This property is referred to as contravariance. A generic type can have both in and out modifiers on its type parameters, as is the case with the Func<…> delegate types: public delegate TResult Func<in TArg, out TResult>(TArg arg); Obviously the argument only ever comes in, and the result only ever comes out. Therefore a Func<object,string> can in fact be used as a Func<string,object>. Limitations Variant type parameters can only be declared on interfaces and delegate types, due to a restriction in the CLR. Variance only applies when there is a reference conversion between the type arguments. For instance, an IEnumerable<int> is not an IEnumerable<object> because the conversion from int to object is a boxing conversion, not a reference conversion. Also please note that the CTP does not contain the new versions of the .NET types mentioned above. In order to experiment with variance you have to declare your own variant interfaces and delegate types. COM Example Here is a larger Office automation example that shows many of the new C# features in action. using System; using System.Diagnostics; using System.Linq; using Excel = Microsoft.Office.Interop.Excel; using Word = Microsoft.Office.Interop.Word; class Program { static void Main(string[] args) { var excel = new Excel.Application(); excel.Visible = true; excel.Workbooks.Add(); // optional arguments omitted excel.Cells[1, 1].Value = "Process Name"; // no casts; Value dynamically excel.Cells[1, 2].Value = "Memory Usage"; // accessed var processes = Process.GetProcesses() .OrderByDescending(p => p.WorkingSet) .Take(10); int i = 2; foreach (var p in processes) { excel.Cells[i, 1].Value = p.ProcessName; // no casts excel.Cells[i, 2].Value = p.WorkingSet; // no casts i++; } Excel.Range range = excel.Cells[1, 1]; // no casts Excel.Chart chart = excel.ActiveWorkbook.Charts. Add(After: excel.ActiveSheet); // named and optional arguments chart.ChartWizard( Source: range.CurrentRegion, Title: "Memory Usage in " + Environment.MachineName); //named+optional chart.ChartStyle = 45; chart.CopyPicture(Excel.XlPictureAppearance.xlScreen, Excel.XlCopyPictureFormat.xlBitmap, Excel.XlPictureAppearance.xlScreen); var word = new Word.Application(); word.Visible = true; word.Documents.Add(); // optional arguments word.Selection.Paste(); } } The code is much more terse and readable than the C# 3.0 counterpart. Note especially how the Value property is accessed dynamically. This is actually an indexed property, i.e. a property that takes an argument; something which C# does not understand. However the argument is optional. Since the access is dynamic, it goes through the runtime COM binder which knows to substitute the default value and call the indexed property. Thus, dynamic COM allows you to avoid accesses to the puzzling Value2 property of Excel ranges. Relationship with Visual Basic A number of the features introduced to C# 4.0 already exist or will be introduced in some form or other in Visual Basic: · Late binding in VB is similar in many ways to dynamic lookup in C#, and can be expected to make more use of the DLR in the future, leading to further parity with C#. · Named and optional arguments have been part of Visual Basic for a long time, and the C# version of the feature is explicitly engineered with maximal VB interoperability in mind. · NoPIA and variance are both being introduced to VB and C# at the same time. VB in turn is adding a number of features that have hitherto been a mainstay of C#. As a result future versions of C# and VB will have much better feature parity, for the benefit of everyone. Resources All available resources concerning C# 4.0 can be accessed through the C# Dev Center. Specifically, this white paper and other resources can be found at the Code Gallery site. Enjoy! span.fullpost {display:none;}

Read the article
Using R to Analyze G1GC Log Files

- by user12620111

Using R to Analyze G1GC Log Files body, td { font-family: sans-serif; background-color: white; font-size: 12px; margin: 8px; } tt, code, pre { font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace; } h1 { font-size:2.2em; } h2 { font-size:1.8em; } h3 { font-size:1.4em; } h4 { font-size:1.0em; } h5 { font-size:0.9em; } h6 { font-size:0.8em; } a:visited { color: rgb(50%, 0%, 50%); } pre { margin-top: 0; max-width: 95%; border: 1px solid #ccc; white-space: pre-wrap; } pre code { display: block; padding: 0.5em; } code.r, code.cpp { background-color: #F8F8F8; } table, td, th { border: none; } blockquote { color:#666666; margin:0; padding-left: 1em; border-left: 0.5em #EEE solid; } hr { height: 0px; border-bottom: none; border-top-width: thin; border-top-style: dotted; border-top-color: #999999; } @media print { * { background: transparent !important; color: black !important; filter:none !important; -ms-filter: none !important; } body { font-size:12pt; max-width:100%; } a, a:visited { text-decoration: underline; } hr { visibility: hidden; page-break-before: always; } pre, blockquote { padding-right: 1em; page-break-inside: avoid; } tr, img { page-break-inside: avoid; } img { max-width: 100% !important; } @page :left { margin: 15mm 20mm 15mm 10mm; } @page :right { margin: 15mm 10mm 15mm 20mm; } p, h2, h3 { orphans: 3; widows: 3; } h2, h3 { page-break-after: avoid; } } pre .operator, pre .paren { color: rgb(104, 118, 135) } pre .literal { color: rgb(88, 72, 246) } pre .number { color: rgb(0, 0, 205); } pre .comment { color: rgb(76, 136, 107); } pre .keyword { color: rgb(0, 0, 255); } pre .identifier { color: rgb(0, 0, 0); } pre .string { color: rgb(3, 106, 7); } var hljs=new function(){function m(p){return p.replace(/&/gm,"&").replace(/"}while(y.length||w.length){var v=u().splice(0,1)[0];z+=m(x.substr(q,v.offset-q));q=v.offset;if(v.event=="start"){z+=t(v.node);s.push(v.node)}else{if(v.event=="stop"){var p,r=s.length;do{r--;p=s[r];z+=("")}while(p!=v.node);s.splice(r,1);while(r'+M[0]+""}else{r+=M[0]}O=P.lR.lastIndex;M=P.lR.exec(L)}return r+L.substr(O,L.length-O)}function J(L,M){if(M.sL&&e[M.sL]){var r=d(M.sL,L);x+=r.keyword_count;return r.value}else{return F(L,M)}}function I(M,r){var L=M.cN?'':"";if(M.rB){y+=L;M.buffer=""}else{if(M.eB){y+=m(r)+L;M.buffer=""}else{y+=L;M.buffer=r}}D.push(M);A+=M.r}function G(N,M,Q){var R=D[D.length-1];if(Q){y+=J(R.buffer+N,R);return false}var P=q(M,R);if(P){y+=J(R.buffer+N,R);I(P,M);return P.rB}var L=v(D.length-1,M);if(L){var O=R.cN?"":"";if(R.rE){y+=J(R.buffer+N,R)+O}else{if(R.eE){y+=J(R.buffer+N,R)+O+m(M)}else{y+=J(R.buffer+N+M,R)+O}}while(L1){O=D[D.length-2].cN?"":"";y+=O;L--;D.length--}var r=D[D.length-1];D.length--;D[D.length-1].buffer="";if(r.starts){I(r.starts,"")}return R.rE}if(w(M,R)){throw"Illegal"}}var E=e[B];var D=[E.dM];var A=0;var x=0;var y="";try{var s,u=0;E.dM.buffer="";do{s=p(C,u);var t=G(s[0],s[1],s[2]);u+=s[0].length;if(!t){u+=s[1].length}}while(!s[2]);if(D.length1){throw"Illegal"}return{r:A,keyword_count:x,value:y}}catch(H){if(H=="Illegal"){return{r:0,keyword_count:0,value:m(C)}}else{throw H}}}function g(t){var p={keyword_count:0,r:0,value:m(t)};var r=p;for(var q in e){if(!e.hasOwnProperty(q)){continue}var s=d(q,t);s.language=q;if(s.keyword_count+s.rr.keyword_count+r.r){r=s}if(s.keyword_count+s.rp.keyword_count+p.r){r=p;p=s}}if(r.language){p.second_best=r}return p}function i(r,q,p){if(q){r=r.replace(/^((]+|\t)+)/gm,function(t,w,v,u){return w.replace(/\t/g,q)})}if(p){r=r.replace(/\n/g,"")}return r}function n(t,w,r){var x=h(t,r);var v=a(t);var y,s;if(v){y=d(v,x)}else{return}var q=c(t);if(q.length){s=document.createElement("pre");s.innerHTML=y.value;y.value=k(q,c(s),x)}y.value=i(y.value,w,r);var u=t.className;if(!u.match("(\\s|^)(language-)?"+v+"(\\s|$)")){u=u?(u+" "+v):v}if(/MSIE [678]/.test(navigator.userAgent)&&t.tagName=="CODE"&&t.parentNode.tagName=="PRE"){s=t.parentNode;var p=document.createElement("div");p.innerHTML=""+y.value+"";t=p.firstChild.firstChild;p.firstChild.cN=s.cN;s.parentNode.replaceChild(p.firstChild,s)}else{t.innerHTML=y.value}t.className=u;t.result={language:v,kw:y.keyword_count,re:y.r};if(y.second_best){t.second_best={language:y.second_best.language,kw:y.second_best.keyword_count,re:y.second_best.r}}}function o(){if(o.called){return}o.called=true;var r=document.getElementsByTagName("pre");for(var p=0;p|=||=||=|\\?|\\[|\\{|\\(|\\^|\\^=|\\||\\|=|\\|\\||~";this.ER="(?![\\s\\S])";this.BE={b:"\\\\.",r:0};this.ASM={cN:"string",b:"'",e:"'",i:"\\n",c:[this.BE],r:0};this.QSM={cN:"string",b:'"',e:'"',i:"\\n",c:[this.BE],r:0};this.CLCM={cN:"comment",b:"//",e:"$"};this.CBLCLM={cN:"comment",b:"/\\*",e:"\\*/"};this.HCM={cN:"comment",b:"#",e:"$"};this.NM={cN:"number",b:this.NR,r:0};this.CNM={cN:"number",b:this.CNR,r:0};this.BNM={cN:"number",b:this.BNR,r:0};this.inherit=function(r,s){var p={};for(var q in r){p[q]=r[q]}if(s){for(var q in s){p[q]=s[q]}}return p}}();hljs.LANGUAGES.cpp=function(){var a={keyword:{"false":1,"int":1,"float":1,"while":1,"private":1,"char":1,"catch":1,"export":1,virtual:1,operator:2,sizeof:2,dynamic_cast:2,typedef:2,const_cast:2,"const":1,struct:1,"for":1,static_cast:2,union:1,namespace:1,unsigned:1,"long":1,"throw":1,"volatile":2,"static":1,"protected":1,bool:1,template:1,mutable:1,"if":1,"public":1,friend:2,"do":1,"return":1,"goto":1,auto:1,"void":2,"enum":1,"else":1,"break":1,"new":1,extern:1,using:1,"true":1,"class":1,asm:1,"case":1,typeid:1,"short":1,reinterpret_cast:2,"default":1,"double":1,register:1,explicit:1,signed:1,typename:1,"try":1,"this":1,"switch":1,"continue":1,wchar_t:1,inline:1,"delete":1,alignof:1,char16_t:1,char32_t:1,constexpr:1,decltype:1,noexcept:1,nullptr:1,static_assert:1,thread_local:1,restrict:1,_Bool:1,complex:1},built_in:{std:1,string:1,cin:1,cout:1,cerr:1,clog:1,stringstream:1,istringstream:1,ostringstream:1,auto_ptr:1,deque:1,list:1,queue:1,stack:1,vector:1,map:1,set:1,bitset:1,multiset:1,multimap:1,unordered_set:1,unordered_map:1,unordered_multiset:1,unordered_multimap:1,array:1,shared_ptr:1}};return{dM:{k:a,i:"",k:a,r:10,c:["self"]}]}}}();hljs.LANGUAGES.r={dM:{c:[hljs.HCM,{cN:"number",b:"\\b0[xX][0-9a-fA-F]+[Li]?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+(?:[eE][+\\-]?\\d*)?L\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+\\.(?!\\d)(?:i\\b)?",e:hljs.IMMEDIATE_RE,r:1},{cN:"number",b:"\\b\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"keyword",b:"(?:tryCatch|library|setGeneric|setGroupGeneric)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\.",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\d+(?![\\w.])",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\b(?:function)",e:hljs.IMMEDIATE_RE,r:2},{cN:"keyword",b:"(?:if|in|break|next|repeat|else|for|return|switch|while|try|stop|warning|require|attach|detach|source|setMethod|setClass)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"literal",b:"(?:NA|NA_integer_|NA_real_|NA_character_|NA_complex_)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"literal",b:"(?:NULL|TRUE|FALSE|T|F|Inf|NaN)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"identifier",b:"[a-zA-Z.][a-zA-Z0-9._]*\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"operator",b:"|=|| Using R to Analyze G1GC Log Files Using R to Analyze G1GC Log Files Introduction Working in Oracle Platform Integration gives an engineer opportunities to work on a wide array of technologies. My team’s goal is to make Oracle applications run best on the Solaris/SPARC platform. When looking for bottlenecks in a modern applications, one needs to be aware of not only how the CPUs and operating system are executing, but also network, storage, and in some cases, the Java Virtual Machine. I was recently presented with about 1.5 GB of Java Garbage First Garbage Collector log file data. If you’re not familiar with the subject, you might want to review Garbage First Garbage Collector Tuning by Monica Beckwith. The customer had been running Java HotSpot 1.6.0_31 to host a web application server. I was told that the Solaris/SPARC server was running a Java process launched using a commmand line that included the following flags: -d64 -Xms9g -Xmx9g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintFlagsFinal -XX:+DisableExplicitGC -XX:+UnlockExperimentalVMOptions -XX:ParallelGCThreads=8 Several sources on the internet indicate that if I were to print out the 1.5 GB of log files, it would require enough paper to fill the bed of a pick up truck. Of course, it would be fruitless to try to scan the log files by hand. Tools will be required to summarize the contents of the log files. Others have encountered large Java garbage collection log files. There are existing tools to analyze the log files: IBM’s GC toolkit The chewiebug GCViewer gchisto HPjmeter Instead of using one of the other tools listed, I decide to parse the log files with standard Unix tools, and analyze the data with R. Data Cleansing The log files arrived in two different formats. I guess that the difference is that one set of log files was generated using a more verbose option, maybe -XX:+PrintHeapAtGC, and the other set of log files was generated without that option. Format 1 In some of the log files, the log files with the less verbose format, a single trace, i.e. the report of a singe garbage collection event, looks like this: {Heap before GC invocations=12280 (full 61): garbage-first heap total 9437184K, used 7499918K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 1 young (4096K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. 2014-05-14T07:24:00.988-0700: 60586.353: [GC pause (young) 7324M->7320M(9216M), 0.1567265 secs] Heap after GC invocations=12281 (full 61): garbage-first heap total 9437184K, used 7496533K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 0 young (0K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. } A simple grep can be used to extract a summary: $ grep "\[ GC pause (young" g1gc.log 2014-05-13T13:24:35.091-0700: 3.109: [GC pause (young) 20M->5029K(9216M), 0.0146328 secs] 2014-05-13T13:24:35.440-0700: 3.459: [GC pause (young) 9125K->6077K(9216M), 0.0086723 secs] 2014-05-13T13:24:37.581-0700: 5.599: [GC pause (young) 25M->8470K(9216M), 0.0203820 secs] 2014-05-13T13:24:42.686-0700: 10.704: [GC pause (young) 44M->15M(9216M), 0.0288848 secs] 2014-05-13T13:24:48.941-0700: 16.958: [GC pause (young) 51M->20M(9216M), 0.0491244 secs] 2014-05-13T13:24:56.049-0700: 24.066: [GC pause (young) 92M->26M(9216M), 0.0525368 secs] 2014-05-13T13:25:34.368-0700: 62.383: [GC pause (young) 602M->68M(9216M), 0.1721173 secs] But that format wasn't easily read into R, so I needed to be a bit more tricky. I used the following Unix command to create a summary file that was easy for R to read. $ echo "SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime" $ grep "\[GC pause (young" g1gc.log | grep -v mark | sed -e 's/[A-SU-z,]/ /g' -e 's/->/ /' -e 's/: / /g' | more SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime 2014-05-13T13:24:35.091-0700 3.109 20 5029 9216 0.0146328 2014-05-13T13:24:35.440-0700 3.459 9125 6077 9216 0.0086723 2014-05-13T13:24:37.581-0700 5.599 25 8470 9216 0.0203820 2014-05-13T13:24:42.686-0700 10.704 44 15 9216 0.0288848 2014-05-13T13:24:48.941-0700 16.958 51 20 9216 0.0491244 2014-05-13T13:24:56.049-0700 24.066 92 26 9216 0.0525368 2014-05-13T13:25:34.368-0700 62.383 602 68 9216 0.1721173 Format 2 In some of the log files, the log files with the more verbose format, a single trace, i.e. the report of a singe garbage collection event, was more complicated than Format 1. Here is a text file with an example of a single G1GC trace in the second format. As you can see, it is quite complicated. It is nice that there is so much information available, but the level of detail can be overwhelming. I wrote this awk script (download) to summarize each trace on a single line. #!/usr/bin/env awk -f BEGIN { printf("SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize\n") } ###################### # Save count data from lines that are at the start of each G1GC trace. # Each trace starts out like this: # {Heap before GC invocations=14 (full 0): # garbage-first heap total 9437184K, used 325496K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) ###################### /{Heap.*full/{ gsub ( "\\)" , "" ); nf=split($0,a,"="); split(a[2],b," "); getline; if ( match($0, "first") ) { G1GC=1; IncrementalCount=b[1]; FullCount=substr( b[3], 1, length(b[3])-1 ); } else { G1GC=0; } } ###################### # Pull out time stamps that are in lines with this format: # 2014-05-12T14:02:06.025-0700: 94.312: [GC pause (young), 0.08870154 secs] ###################### /GC pause/ { DateTime=$1; SecondsSinceLaunch=substr($2, 1, length($2)-1); } ###################### # Heap sizes are in lines that look like this: # [ 4842M->4838M(9216M)] ###################### /\[ .*]$/ { gsub ( "\\[" , "" ); gsub ( "\ \]" , "" ); gsub ( "->" , " " ); gsub ( "\$ " , " " ); gsub ( "\ $" , " " ); split($0,a," "); if ( split(a[1],b,"M") > 1 ) {BeforeSize=b[1]*1024;} if ( split(a[1],b,"K") > 1 ) {BeforeSize=b[1];} if ( split(a[2],b,"M") > 1 ) {AfterSize=b[1]*1024;} if ( split(a[2],b,"K") > 1 ) {AfterSize=b[1];} if ( split(a[3],b,"M") > 1 ) {TotalSize=b[1]*1024;} if ( split(a[3],b,"K") > 1 ) {TotalSize=b[1];} } ###################### # Emit an output line when you find input that looks like this: # [Times: user=1.41 sys=0.08, real=0.24 secs] ###################### /\[Times/ { if (G1GC==1) { gsub ( "," , "" ); split($2,a,"="); UserTime=a[2]; split($3,a,"="); SysTime=a[2]; split($4,a,"="); RealTime=a[2]; print DateTime,SecondsSinceLaunch,IncrementalCount,FullCount,UserTime,SysTime,RealTime,BeforeSize,AfterSize,TotalSize; G1GC=0; } } The resulting summary is about 25X smaller that the original file, but still difficult for a human to digest. SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ... 2014-05-12T18:36:34.669-0700: 3985.744 561 0 0.57 0.06 0.16 1724416 1720320 9437184 2014-05-12T18:36:34.839-0700: 3985.914 562 0 0.51 0.06 0.19 1724416 1720320 9437184 2014-05-12T18:36:35.069-0700: 3986.144 563 0 0.60 0.04 0.27 1724416 1721344 9437184 2014-05-12T18:36:35.354-0700: 3986.429 564 0 0.33 0.04 0.09 1725440 1722368 9437184 2014-05-12T18:36:35.545-0700: 3986.620 565 0 0.58 0.04 0.17 1726464 1722368 9437184 2014-05-12T18:36:35.726-0700: 3986.801 566 0 0.43 0.05 0.12 1726464 1722368 9437184 2014-05-12T18:36:35.856-0700: 3986.930 567 0 0.30 0.04 0.07 1726464 1723392 9437184 2014-05-12T18:36:35.947-0700: 3987.023 568 0 0.61 0.04 0.26 1727488 1723392 9437184 2014-05-12T18:36:36.228-0700: 3987.302 569 0 0.46 0.04 0.16 1731584 1724416 9437184 Reading the Data into R Once the GC log data had been cleansed, either by processing the first format with the shell script, or by processing the second format with the awk script, it was easy to read the data into R. g1gc.df = read.csv("summary.txt", row.names = NULL, stringsAsFactors=FALSE,sep="") str(g1gc.df) ## 'data.frame': 8307 obs. of 10 variables: ## $ row.names : chr "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ... ## $ SecondsSinceLaunch: num 1.16 1.47 1.97 3.83 6.1 ... ## $ IncrementalCount : int 0 1 2 3 4 5 6 7 8 9 ... ## $ FullCount : int 0 0 0 0 0 0 0 0 0 0 ... ## $ UserTime : num 0.11 0.05 0.04 0.21 0.08 0.26 0.31 0.33 0.34 0.56 ... ## $ SysTime : num 0.04 0.01 0.01 0.05 0.01 0.06 0.07 0.06 0.07 0.09 ... ## $ RealTime : num 0.02 0.02 0.01 0.04 0.02 0.04 0.05 0.04 0.04 0.06 ... ## $ BeforeSize : int 8192 5496 5768 22528 24576 43008 34816 53248 55296 93184 ... ## $ AfterSize : int 1400 1672 2557 4907 7072 14336 16384 18432 19456 21504 ... ## $ TotalSize : int 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 ... head(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount ## 1 2014-05-12T14:00:32.868-0700: 1.161 0 ## 2 2014-05-12T14:00:33.179-0700: 1.472 1 ## 3 2014-05-12T14:00:33.677-0700: 1.969 2 ## 4 2014-05-12T14:00:35.538-0700: 3.830 3 ## 5 2014-05-12T14:00:37.811-0700: 6.103 4 ## 6 2014-05-12T14:00:41.428-0700: 9.720 5 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 1 0 0.11 0.04 0.02 8192 1400 9437184 ## 2 0 0.05 0.01 0.02 5496 1672 9437184 ## 3 0 0.04 0.01 0.01 5768 2557 9437184 ## 4 0 0.21 0.05 0.04 22528 4907 9437184 ## 5 0 0.08 0.01 0.02 24576 7072 9437184 ## 6 0 0.26 0.06 0.04 43008 14336 9437184 Basic Statistics Once the data has been read into R, simple statistics are very easy to generate. All of the numbers from high school statistics are available via simple commands. For example, generate a summary of every column: summary(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount FullCount ## Length:8307 Min. : 1 Min. : 0 Min. : 0.0 ## Class :character 1st Qu.: 9977 1st Qu.:2048 1st Qu.: 0.0 ## Mode :character Median :12855 Median :4136 Median : 12.0 ## Mean :12527 Mean :4156 Mean : 31.6 ## 3rd Qu.:15758 3rd Qu.:6262 3rd Qu.: 61.0 ## Max. :55484 Max. :8391 Max. :113.0 ## UserTime SysTime RealTime BeforeSize ## Min. :0.040 Min. :0.0000 Min. : 0.0 Min. : 5476 ## 1st Qu.:0.470 1st Qu.:0.0300 1st Qu.: 0.1 1st Qu.:5137920 ## Median :0.620 Median :0.0300 Median : 0.1 Median :6574080 ## Mean :0.751 Mean :0.0355 Mean : 0.3 Mean :5841855 ## 3rd Qu.:0.920 3rd Qu.:0.0400 3rd Qu.: 0.2 3rd Qu.:7084032 ## Max. :3.370 Max. :1.5600 Max. :488.1 Max. :8696832 ## AfterSize TotalSize ## Min. : 1380 Min. :9437184 ## 1st Qu.:5002752 1st Qu.:9437184 ## Median :6559744 Median :9437184 ## Mean :5785454 Mean :9437184 ## 3rd Qu.:7054336 3rd Qu.:9437184 ## Max. :8482816 Max. :9437184 Q: What is the total amount of User CPU time spent in garbage collection? sum(g1gc.df$UserTime) ## [1] 6236 As you can see, less than two hours of CPU time was spent in garbage collection. Is that too much? To find the percentage of time spent in garbage collection, divide the number above by total_elapsed_time*CPU_count. In this case, there are a lot of CPU’s and it turns out the the overall amount of CPU time spent in garbage collection isn’t a problem when viewed in isolation. When calculating rates, i.e. events per unit time, you need to ask yourself if the rate is homogenous across the time period in the log file. Does the log file include spikes of high activity that should be separately analyzed? Averaging in data from nights and weekends with data from business hours may alias problems. If you have a reason to suspect that the garbage collection rates include peaks and valleys that need independent analysis, see the “Time Series” section, below. Q: How much garbage is collected on each pass? The amount of heap space that is recovered per GC pass is surprisingly low: At least one collection didn’t recover any data. (“Min.=0”) 25% of the passes recovered 3MB or less. (“1st Qu.=3072”) Half of the GC passes recovered 4MB or less. (“Median=4096”) The average amount recovered was 56MB. (“Mean=56390”) 75% of the passes recovered 36MB or less. (“3rd Qu.=36860”) At least one pass recovered 2GB. (“Max.=2121000”) g1gc.df$Delta = g1gc.df$BeforeSize - g1gc.df$AfterSize summary(g1gc.df$Delta) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0 3070 4100 56400 36900 2120000 Q: What is the maximum User CPU time for a single collection? The worst garbage collection (“Max.”) is many standard deviations away from the mean. The data appears to be right skewed. summary(g1gc.df$UserTime) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.040 0.470 0.620 0.751 0.920 3.370 sd(g1gc.df$UserTime) ## [1] 0.3966 Basic Graphics Once the data is in R, it is trivial to plot the data with formats including dot plots, line charts, bar charts (simple, stacked, grouped), pie charts, boxplots, scatter plots histograms, and kernel density plots. Histogram of User CPU Time per Collection I don't think that this graph requires any explanation. hist(g1gc.df$UserTime, main="User CPU Time per Collection", xlab="Seconds", ylab="Frequency") Box plot to identify outliers When the initial data is viewed with a box plot, you can see the one crazy outlier in the real time per GC. Save this data point for future analysis and drop the outlier so that it’s not throwing off our statistics. Now the box plot shows many outliers, which will be examined later, using times series analysis. Notice that the scale of the x-axis changes drastically once the crazy outlier is removed. par(mfrow=c(2,1)) boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(dominated by a crazy outlier)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") crazy.outlier.df=g1gc.df[g1gc.df$RealTime > 400,] g1gc.df=g1gc.df[g1gc.df$RealTime < 400,] boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(crazy outlier excluded)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") box(which = "outer", lty = "solid") Here is the crazy outlier for future analysis: crazy.outlier.df ## row.names SecondsSinceLaunch IncrementalCount ## 8233 2014-05-12T23:15:43.903-0700: 20741 8316 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 8233 112 0.55 0.42 488.1 8381440 8235008 9437184 ## Delta ## 8233 146432 R Time Series Data To analyze the garbage collection as a time series, I’ll use Z’s Ordered Observations (zoo). “zoo is the creator for an S3 class of indexed totally ordered observations which includes irregular time series.” require(zoo) ## Loading required package: zoo ## ## Attaching package: 'zoo' ## ## The following objects are masked from 'package:base': ## ## as.Date, as.Date.numeric head(g1gc.df[,1]) ## [1] "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" ## [3] "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ## [5] "2014-05-12T14:00:37.811-0700:" "2014-05-12T14:00:41.428-0700:" options("digits.secs"=3) times=as.POSIXct( g1gc.df[,1], format="%Y-%m-%dT%H:%M:%OS%z:") g1gc.z = zoo(g1gc.df[,-c(1)], order.by=times) head(g1gc.z) ## SecondsSinceLaunch IncrementalCount FullCount ## 2014-05-12 17:00:32.868 1.161 0 0 ## 2014-05-12 17:00:33.178 1.472 1 0 ## 2014-05-12 17:00:33.677 1.969 2 0 ## 2014-05-12 17:00:35.538 3.830 3 0 ## 2014-05-12 17:00:37.811 6.103 4 0 ## 2014-05-12 17:00:41.427 9.720 5 0 ## UserTime SysTime RealTime BeforeSize AfterSize ## 2014-05-12 17:00:32.868 0.11 0.04 0.02 8192 1400 ## 2014-05-12 17:00:33.178 0.05 0.01 0.02 5496 1672 ## 2014-05-12 17:00:33.677 0.04 0.01 0.01 5768 2557 ## 2014-05-12 17:00:35.538 0.21 0.05 0.04 22528 4907 ## 2014-05-12 17:00:37.811 0.08 0.01 0.02 24576 7072 ## 2014-05-12 17:00:41.427 0.26 0.06 0.04 43008 14336 ## TotalSize Delta ## 2014-05-12 17:00:32.868 9437184 6792 ## 2014-05-12 17:00:33.178 9437184 3824 ## 2014-05-12 17:00:33.677 9437184 3211 ## 2014-05-12 17:00:35.538 9437184 17621 ## 2014-05-12 17:00:37.811 9437184 17504 ## 2014-05-12 17:00:41.427 9437184 28672 Example of Two Benchmark Runs in One Log File The data in the following graph is from a different log file, not the one of primary interest to this article. I’m including this image because it is an example of idle periods followed by busy periods. It would be uninteresting to average the rate of garbage collection over the entire log file period. More interesting would be the rate of garbage collect in the two busy periods. Are they the same or different? Your production data may be similar, for example, bursts when employees return from lunch and idle times on weekend evenings, etc. Once the data is in an R Time Series, you can analyze isolated time windows. Clipping the Time Series data Flashing back to our test case… Viewing the data as a time series is interesting. You can see that the work intensive time period is between 9:00 PM and 3:00 AM. Lets clip the data to the interesting period: par(mfrow=c(2,1)) plot(g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Complete Log File", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") clipped.g1gc.z=window(g1gc.z, start=as.POSIXct("2014-05-12 21:00:00"), end=as.POSIXct("2014-05-13 03:00:00")) plot(clipped.g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Limited to Benchmark Execution", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") box(which = "outer", lty = "solid") Cumulative Incremental and Full GC count Here is the cumulative incremental and full GC count. When the line is very steep, it indicates that the GCs are repeating very quickly. Notice that the scale on the Y axis is different for full vs. incremental. plot(clipped.g1gc.z[,c(2:3)], main="Cumulative Incremental and Full GC count", xlab="Time of Day", col="#1b9e77") GC Analysis of Benchmark Execution using Time Series data In the following series of 3 graphs: The “After Size” show the amount of heap space in use after each garbage collection. Many Java objects are still referenced, i.e. alive, during each garbage collection. This may indicate that the application has a memory leak, or may indicate that the application has a very large memory footprint. Typically, an application's memory footprint plateau's in the early stage of execution. One would expect this graph to have a flat top. The steep decline in the heap space may indicate that the application crashed after 2:00. The second graph shows that the outliers in real execution time, discussed above, occur near 2:00. when the Java heap seems to be quite full. The third graph shows that Full GCs are infrequent during the first few hours of execution. The rate of Full GC's, (the slope of the cummulative Full GC line), changes near midnight. plot(clipped.g1gc.z[,c("AfterSize","RealTime","FullCount")], xlab="Time of Day", col=c("#1b9e77","red","#1b9e77")) GC Analysis of heap recovered Each GC trace includes the amount of heap space in use before and after the individual GC event. During garbage coolection, unreferenced objects are identified, the space holding the unreferenced objects is freed, and thus, the difference in before and after usage indicates how much space has been freed. The following box plot and bar chart both demonstrate the same point - the amount of heap space freed per garbage colloection is surprisingly low. par(mfrow=c(2,1)) boxplot(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", horizontal = TRUE, col="red") hist(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", breaks=100, col="red") box(which = "outer", lty = "solid") This graph is the most interesting. The dark blue area shows how much heap is occupied by referenced Java objects. This represents memory that holds live data. The red fringe at the top shows how much data was recovered after each garbage collection. barplot(clipped.g1gc.z[,c("AfterSize","Delta")], col=c("#7570b3","#e7298a"), xlab="Time of Day", border=NA) legend("topleft", c("Live Objects","Heap Recovered on GC"), fill=c("#7570b3","#e7298a")) box(which = "outer", lty = "solid") When I discuss the data in the log files with the customer, I will ask for an explaination for the large amount of referenced data resident in the Java heap. There are two are posibilities: There is a memory leak and the amount of space required to hold referenced objects will continue to grow, limited only by the maximum heap size. After the maximum heap size is reached, the JVM will throw an “Out of Memory” exception every time that the application tries to allocate a new object. If this is the case, the aplication needs to be debugged to identify why old objects are referenced when they are no longer needed. The application has a legitimate requirement to keep a large amount of data in memory. The customer may want to further increase the maximum heap size. Another possible solution would be to partition the application across multiple cluster nodes, where each node has responsibility for managing a unique subset of the data. Conclusion In conclusion, R is a very powerful tool for the analysis of Java garbage collection log files. The primary difficulty is data cleansing so that information can be read into an R data frame. Once the data has been read into R, a rich set of tools may be used for thorough evaluation.

Read the article
Binary Cosine Cofficient

- by hairyyak

I was given the following forumulae for calculating this sim=|QnD| / v|Q|v|D| I went ahed and implemented a class to compare strings consisting of a series of words #pragma once #include <vector> #include <string> #include <iostream> #include <vector> using namespace std; class StringSet { public: StringSet(void); StringSet( const string the_strings[], const int no_of_strings); ~StringSet(void); StringSet( const vector<string> the_strings); void add_string( const string the_string); bool remove_string( const string the_string); void clear_set(void); int no_of_strings(void) const; friend ostream& operator <<(ostream& outs, StringSet& the_strings); friend StringSet operator *(const StringSet& first, const StringSet& second); friend StringSet operator +(const StringSet& first, const StringSet& second); double binary_coefficient( const StringSet& the_second_set); private: vector<string> set; }; #include "StdAfx.h" #include "StringSet.h" #include <iterator> #include <algorithm> #include <stdexcept> #include <iostream> #include <cmath> StringSet::StringSet(void) { } StringSet::~StringSet(void) { } StringSet::StringSet( const vector<string> the_strings) { set = the_strings; } StringSet::StringSet( const string the_strings[], const int no_of_strings) { copy( the_strings, &the_strings[no_of_strings], back_inserter(set)); } void StringSet::add_string( const string the_string) { try { if( find( set.begin(), set.end(), the_string) == set.end()) { set.push_back(the_string); } else { //String is already in the set. throw domain_error("String is already in the set"); } } catch( domain_error e) { cout << e.what(); exit(1); } } bool StringSet::remove_string( const string the_string) { //Found the occurrence of the string. return it an iterator pointing to it. vector<string>::iterator iter; if( ( iter = find( set.begin(), set.end(), the_string) ) != set.end()) { set.erase(iter); return true; } return false; } void StringSet::clear_set(void) { set.clear(); } int StringSet::no_of_strings(void) const { return set.size(); } ostream& operator <<(ostream& outs, StringSet& the_strings) { vector<string>::const_iterator const_iter = the_strings.set.begin(); for( ; const_iter != the_strings.set.end(); const_iter++) { cout << *const_iter << " "; } cout << endl; return outs; } //This function returns the union of the two string sets. StringSet operator *(const StringSet& first, const StringSet& second) { vector<string> new_string_set; new_string_set = first.set; for( unsigned int i = 0; i < second.set.size(); i++) { vector<string>::const_iterator const_iter = find(new_string_set.begin(), new_string_set.end(), second.set[i]); //String is new - include it. if( const_iter == new_string_set.end() ) { new_string_set.push_back(second.set[i]); } } StringSet the_set(new_string_set); return the_set; } //This method returns the intersection of the two string sets. StringSet operator +(const StringSet& first, const StringSet& second) { //For each string in the first string look though the second and see if //there is a matching pair, in which case include the string in the set. vector<string> new_string_set; vector<string>::const_iterator const_iter = first.set.begin(); for ( ; const_iter != first.set.end(); ++const_iter) { //Then search through the entire second string to see if //there is a duplicate. vector<string>::const_iterator const_iter2 = second.set.begin(); for( ; const_iter2 != second.set.end(); const_iter2++) { if( *const_iter == *const_iter2 ) { new_string_set.push_back(*const_iter); } } } StringSet new_set(new_string_set); return new_set; } double StringSet::binary_coefficient( const StringSet& the_second_set) { double coefficient; StringSet intersection = the_second_set + set; coefficient = intersection.no_of_strings() / sqrt((double) no_of_strings()) * sqrt((double)the_second_set.no_of_strings()); return coefficient; } However when I try and calculate the coefficient using the following main function: // Exercise13.cpp : main project file. #include "stdafx.h" #include <boost/regex.hpp> #include "StringSet.h" using namespace System; using namespace System::Runtime::InteropServices; using namespace boost; //This function takes as input a string, which //is then broken down into a series of words //where the punctuaction is ignored. StringSet break_string( const string the_string) { regex re; cmatch matches; StringSet words; string search_pattern = "\\b(\\w)+\\b"; try { // Assign the regular expression for parsing. re = search_pattern; } catch( regex_error& e) { cout << search_pattern << " is not a valid regular expression: \"" << e.what() << "\"" << endl; exit(1); } sregex_token_iterator p(the_string.begin(), the_string.end(), re, 0); sregex_token_iterator end; for( ; p != end; ++p) { string new_string(p->first, p->second); String^ copy_han = gcnew String(new_string.c_str()); String^ copy_han2 = copy_han->ToLower(); char* str2 = (char*)(void*)Marshal::StringToHGlobalAnsi(copy_han2); string new_string2(str2); words.add_string(new_string2); } return words; } int main(array<System::String ^> ^args) { StringSet words = break_string("Here is a string, with some; words"); StringSet words2 = break_string("There is another string,"); cout << words.binary_coefficient(words2); return 0; } I get an index which is 1.5116 rather than a value from 0 to 1. Does anybody have a clue why this is the case? Any help would be appreciated.

Read the article
Mysql - help me optimize this query

- by sandeepan-nath

About the system: -The system has a total of 8 tables - Users - Tutor_Details (Tutors are a type of User,Tutor_Details table is linked to Users) - learning_packs, (stores packs created by tutors) - learning_packs_tag_relations, (holds tag relations meant for search) - tutors_tag_relations and tags and orders (containing purchase details of tutor's packs), order_details linked to orders and tutor_details. For a more clear idea about the tables involved please check the The tables section in the end. -A tags based search approach is being followed.Tag relations are created when new tutors register and when tutors create packs (this makes tutors and packs searcheable). For details please check the section How tags work in this system? below. Following is a simpler representation (not the actual) of the more complex query which I am trying to optimize:- I have used statements like explanation of parts in the query select SUM(DISTINCT( t.tag LIKE "%Dictatorship%" )) as key_1_total_matches, SUM(DISTINCT( t.tag LIKE "%democracy%" )) as key_2_total_matches, td., u., count(distinct(od.id_od)), if (lp.id_lp > 0) then some conditional logic on lp fields else 0 as tutor_popularity from Tutor_Details AS td JOIN Users as u on u.id_user = td.id_user LEFT JOIN Learning_Packs_Tag_Relations AS lptagrels ON td.id_tutor = lptagrels.id_tutor LEFT JOIN Learning_Packs AS lp ON lptagrels.id_lp = lp.id_lp LEFT JOIN `some other tables on lp.id_lp - let's call learning pack tables set (including Learning_Packs table)` LEFT JOIN Order_Details as od on td.id_tutor = od.id_author LEFT JOIN Orders as o on od.id_order = o.id_order LEFT JOIN Tutors_Tag_Relations as ttagrels ON td.id_tutor = ttagrels.id_tutor JOIN Tags as t on (t.id_tag = ttagrels.id_tag) OR (t.id_tag = lptagrels.id_tag) where some condition on Users table's fields AND CASE WHEN ((t.id_tag = lptagrels.id_tag) AND (lp.id_lp 0)) THEN `some conditions on learning pack tables set` ELSE 1 END AND CASE WHEN ((t.id_tag = wtagrels.id_tag) AND (wc.id_wc 0)) THEN `some conditions on webclasses tables set` ELSE 1 END AND CASE WHEN (od.id_od0) THEN od.id_author = td.id_tutor and some conditions on Orders table's fields ELSE 1 END AND ( t.tag LIKE "%Dictatorship%" OR t.tag LIKE "%democracy%") group by td.id_tutor HAVING key_1_total_matches = 1 AND key_2_total_matches = 1 order by tutor_popularity desc, u.surname asc, u.name asc limit 0,20 ===================================================================== What does the above query do? Does AND logic search on the search keywords (2 in this example - "Democracy" and "Dictatorship"). Returns only those tutors for which both the keywords are present in the union of the two sets - tutors details and details of all the packs created by a tutor. To make things clear - Suppose a Tutor name "Sandeepan Nath" has created a pack "My first pack", then:- Searching "Sandeepan Nath" returns Sandeepan Nath. Searching "Sandeepan first" returns Sandeepan Nath. Searching "Sandeepan second" does not return Sandeepan Nath. ====================================================================================== The problem The results returned by the above query are correct (AND logic working as per expectation), but the time taken by the query on heavily loaded databases is like 25 seconds as against normal query timings of the order of 0.005 - 0.0002 seconds, which makes it totally unusable. It is possible that some of the delay is being caused because all the possible fields have not yet been indexed, but I would appreciate a better query as a solution, optimized as much as possible, displaying the same results ========================================================================================== How tags work in this system? When a tutor registers, tags are entered and tag relations are created with respect to tutor's details like name, surname etc. When a Tutors create packs, again tags are entered and tag relations are created with respect to pack's details like pack name, description etc. tag relations for tutors stored in tutors_tag_relations and those for packs stored in learning_packs_tag_relations. All individual tags are stored in tags table. ==================================================================== The tables Most of the following tables contain many other fields which I have omitted here. CREATE TABLE IF NOT EXISTS users ( id_user int(10) unsigned NOT NULL AUTO_INCREMENT, name varchar(100) NOT NULL DEFAULT '', surname varchar(155) NOT NULL DEFAULT '', PRIMARY KEY (id_user) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=636 ; CREATE TABLE IF NOT EXISTS tutor_details ( id_tutor int(10) NOT NULL AUTO_INCREMENT, id_user int(10) NOT NULL DEFAULT '0', PRIMARY KEY (id_tutor), KEY Users_FKIndex1 (id_user) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=51 ; CREATE TABLE IF NOT EXISTS orders ( id_order int(10) unsigned NOT NULL AUTO_INCREMENT, PRIMARY KEY (id_order), KEY Orders_FKIndex1 (id_user), ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=275 ; ALTER TABLE orders ADD CONSTRAINT Orders_ibfk_1 FOREIGN KEY (id_user) REFERENCES users (id_user) ON DELETE NO ACTION ON UPDATE NO ACTION; CREATE TABLE IF NOT EXISTS order_details ( id_od int(10) unsigned NOT NULL AUTO_INCREMENT, id_order int(10) unsigned NOT NULL DEFAULT '0', id_author int(10) NOT NULL DEFAULT '0', PRIMARY KEY (id_od), KEY Order_Details_FKIndex1 (id_order) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=284 ; ALTER TABLE order_details ADD CONSTRAINT Order_Details_ibfk_1 FOREIGN KEY (id_order) REFERENCES orders (id_order) ON DELETE NO ACTION ON UPDATE NO ACTION; CREATE TABLE IF NOT EXISTS learning_packs ( id_lp int(10) unsigned NOT NULL AUTO_INCREMENT, id_author int(10) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (id_lp), KEY Learning_Packs_FKIndex2 (id_author), KEY id_lp (id_lp) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=23 ; CREATE TABLE IF NOT EXISTS tags ( id_tag int(10) unsigned NOT NULL AUTO_INCREMENT, tag varchar(255) DEFAULT NULL, PRIMARY KEY (id_tag), UNIQUE KEY tag (tag), KEY id_tag (id_tag), KEY tag_2 (tag), KEY tag_3 (tag) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=3419 ; CREATE TABLE IF NOT EXISTS tutors_tag_relations ( id_tag int(10) unsigned NOT NULL DEFAULT '0', id_tutor int(10) DEFAULT NULL, KEY Tutors_Tag_Relations (id_tag), KEY id_tutor (id_tutor), KEY id_tag (id_tag) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; ALTER TABLE tutors_tag_relations ADD CONSTRAINT Tutors_Tag_Relations_ibfk_1 FOREIGN KEY (id_tag) REFERENCES tags (id_tag) ON DELETE NO ACTION ON UPDATE NO ACTION; CREATE TABLE IF NOT EXISTS learning_packs_tag_relations ( id_tag int(10) unsigned NOT NULL DEFAULT '0', id_tutor int(10) DEFAULT NULL, id_lp int(10) unsigned DEFAULT NULL, KEY Learning_Packs_Tag_Relations_FKIndex1 (id_tag), KEY id_lp (id_lp), KEY id_tag (id_tag) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; ALTER TABLE learning_packs_tag_relations ADD CONSTRAINT Learning_Packs_Tag_Relations_ibfk_1 FOREIGN KEY (id_tag) REFERENCES tags (id_tag) ON DELETE NO ACTION ON UPDATE NO ACTION; =================================================================================== Following is the exact query (this includes classes also - tutors can create classes and search terms are matched with classes created by tutors):- select count(distinct(od.id_od)) as tutor_popularity, CASE WHEN (IF((wc.id_wc 0), ( wc.wc_api_status = 1 AND wc.wc_type = 0 AND wc.class_date '2010-06-01 22:00:56' AND wccp.status = 1 AND (wccp.country_code='IE' or wccp.country_code IN ('INT'))), 0)) THEN 1 ELSE 0 END as 'classes_published', CASE WHEN (IF((lp.id_lp 0), (lp.id_status = 1 AND lp.published = 1 AND lpcp.status = 1 AND (lpcp.country_code='IE' or lpcp.country_code IN ('INT'))),0)) THEN 1 ELSE 0 END as 'packs_published', td . * , u . * from Tutor_Details AS td JOIN Users as u on u.id_user = td.id_user LEFT JOIN Learning_Packs_Tag_Relations AS lptagrels ON td.id_tutor = lptagrels.id_tutor LEFT JOIN Learning_Packs AS lp ON lptagrels.id_lp = lp.id_lp LEFT JOIN Learning_Packs_Categories AS lpc ON lpc.id_lp_cat = lp.id_lp_cat LEFT JOIN Learning_Packs_Categories AS lpcp ON lpcp.id_lp_cat = lpc.id_parent LEFT JOIN Learning_Pack_Content as lpct on (lp.id_lp = lpct.id_lp) LEFT JOIN Webclasses_Tag_Relations AS wtagrels ON td.id_tutor = wtagrels.id_tutor LEFT JOIN WebClasses AS wc ON wtagrels.id_wc = wc.id_wc LEFT JOIN Learning_Packs_Categories AS wcc ON wcc.id_lp_cat = wc.id_wp_cat LEFT JOIN Learning_Packs_Categories AS wccp ON wccp.id_lp_cat = wcc.id_parent LEFT JOIN Order_Details as od on td.id_tutor = od.id_author LEFT JOIN Orders as o on od.id_order = o.id_order LEFT JOIN Tutors_Tag_Relations as ttagrels ON td.id_tutor = ttagrels.id_tutor JOIN Tags as t on (t.id_tag = ttagrels.id_tag) OR (t.id_tag = lptagrels.id_tag) OR (t.id_tag = wtagrels.id_tag) where (u.country='IE' or u.country IN ('INT')) AND CASE WHEN ((t.id_tag = lptagrels.id_tag) AND (lp.id_lp 0)) THEN lp.id_status = 1 AND lp.published = 1 AND lpcp.status = 1 AND (lpcp.country_code='IE' or lpcp.country_code IN ('INT')) ELSE 1 END AND CASE WHEN ((t.id_tag = wtagrels.id_tag) AND (wc.id_wc 0)) THEN wc.wc_api_status = 1 AND wc.wc_type = 0 AND wc.class_date '2010-06-01 22:00:56' AND wccp.status = 1 AND (wccp.country_code='IE' or wccp.country_code IN ('INT')) ELSE 1 END AND CASE WHEN (od.id_od0) THEN od.id_author = td.id_tutor and o.order_status = 'paid' and CASE WHEN (od.id_wc 0) THEN od.can_attend_class=1 ELSE 1 END ELSE 1 END AND 1 group by td.id_tutor order by tutor_popularity desc, u.surname asc, u.name asc limit 0,20 Please note - The provided database structure does not show all the fields and tables as in this query

Read the article
WordPress not resizing images with Nginx + php-fpm and other issues

- by Julian Fernandes

Recently i setup a Ubuntu 12.04 VPS with 512mb/1ghz CPU, Nginx + php-fpm + Varnish + APC + Percona's MySQL server + CloudFlare Pro for our Ubuntu LoCo Team's WordPress blog. The blog get about 3~4k daily hits, use about 180MB and 8~20% CPU. Everything seems to be working insanely fast... page load is really good and is about 16x faster than any of our competitors... but there is one problem. When we upload a image, WordPress don't resize it, so all we can do it insert the full image in the post. If the imagem have, let's say, 30kb, it resize fine... but if the image have 100kb+, it won't... In nginx error logs i see this: upstream timed out (110: Connection timed out) while reading response header from upstream, client: 150.162.216.64, server: www.ubuntubrsc.com, request: "POST /wp-admin/async-upload.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "www.ubuntubrsc.com", referrer: "http://www.ubuntubrsc.com/wp-admin/media-upload.php?post_id=2668&" It seems to be related with the issue, but i dunno. When that timeout happens, i started to get it when i'm trying to view a post too: upstream timed out (110: Connection timed out) while reading response header from upstream, client: 150.162.216.64, server: www.ubuntubrsc.com, request: "GET /tutoriais-gimp-6-adicionando-aplicando-novos-pinceis.html HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "www.ubuntubrsc.com", referrer: "http://www.ubuntubrsc.com/" And only a restart of php5-fpm fix it. I tryed increasing some timeouts and stuffs but it did not worked, so i guess it's some kind of limitation i did not figured yet. Could someone help me with it, please? /etc/nginx/nginx.conf: user www-data; worker_processes 1; pid /var/run/nginx.pid; events { worker_connections 1024; use epoll; multi_accept on; } http { ## # Basic Settings ## sendfile on; tcp_nopush on; tcp_nodelay off; keepalive_timeout 15; keepalive_requests 2000; types_hash_max_size 2048; server_tokens off; server_name_in_redirect off; open_file_cache max=1000 inactive=300s; open_file_cache_valid 360s; open_file_cache_min_uses 2; open_file_cache_errors off; server_names_hash_bucket_size 64; # server_name_in_redirect off; client_body_buffer_size 128K; client_header_buffer_size 1k; client_max_body_size 2m; large_client_header_buffers 4 8k; client_body_timeout 10m; client_header_timeout 10m; send_timeout 10m; include /etc/nginx/mime.types; default_type application/octet-stream; ## # Logging Settings ## error_log /var/log/nginx/error.log; access_log off; ## # CloudFlare's IPs (uncomment when site goes live) ## set_real_ip_from 204.93.240.0/24; set_real_ip_from 204.93.177.0/24; set_real_ip_from 199.27.128.0/21; set_real_ip_from 173.245.48.0/20; set_real_ip_from 103.22.200.0/22; set_real_ip_from 141.101.64.0/18; set_real_ip_from 108.162.192.0/18; set_real_ip_from 190.93.240.0/20; real_ip_header CF-Connecting-IP; set_real_ip_from 127.0.0.1/32; ## # Gzip Settings ## gzip on; gzip_disable "msie6"; gzip_vary on; gzip_proxied any; gzip_comp_level 9; gzip_min_length 1000; gzip_proxied expired no-cache no-store private auth; gzip_buffers 32 8k; # gzip_http_version 1.1; gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript; ## # nginx-naxsi config ## # Uncomment it if you installed nginx-naxsi ## #include /etc/nginx/naxsi_core.rules; ## # nginx-passenger config ## # Uncomment it if you installed nginx-passenger ## #passenger_root /usr; #passenger_ruby /usr/bin/ruby; ## # Virtual Host Configs ## include /etc/nginx/conf.d/*.conf; include /etc/nginx/sites-enabled/*; } /etc/nginx/fastcgi_params: fastcgi_param QUERY_STRING $query_string; fastcgi_param REQUEST_METHOD $request_method; fastcgi_param CONTENT_TYPE $content_type; fastcgi_param CONTENT_LENGTH $content_length; fastcgi_param SCRIPT_FILENAME $request_filename; fastcgi_param SCRIPT_NAME $fastcgi_script_name; fastcgi_param REQUEST_URI $request_uri; fastcgi_param DOCUMENT_URI $document_uri; fastcgi_param DOCUMENT_ROOT $document_root; fastcgi_param SERVER_PROTOCOL $server_protocol; fastcgi_param GATEWAY_INTERFACE CGI/1.1; fastcgi_param SERVER_SOFTWARE nginx/$nginx_version; fastcgi_param REMOTE_ADDR $remote_addr; fastcgi_param REMOTE_PORT $remote_port; fastcgi_param SERVER_ADDR $server_addr; fastcgi_param SERVER_PORT $server_port; fastcgi_param SERVER_NAME $server_name; fastcgi_param HTTPS $https; fastcgi_send_timeout 180; fastcgi_read_timeout 180; fastcgi_buffer_size 128k; fastcgi_buffers 256 4k; # PHP only, required if PHP was built with --enable-force-cgi-redirect fastcgi_param REDIRECT_STATUS 200; /etc/nginx/sites-avaiable/default: ## # DEFAULT HANDLER # ubuntubrsc.com ## server { listen 8080; # Make site available from main domain server_name www.ubuntubrsc.com; # Root directory root /var/www; index index.php index.html index.htm; include /var/www/nginx.conf; access_log off; location / { try_files $uri $uri/ /index.php?q=$uri&$args; } location = /favicon.ico { log_not_found off; access_log off; } location = /robots.txt { allow all; log_not_found off; access_log off; } location ~ /\. { deny all; access_log off; log_not_found off; } location ~* ^/wp-content/uploads/.*.php$ { deny all; access_log off; log_not_found off; } rewrite /wp-admin$ $scheme://$host$uri/ permanent; error_page 404 = @wordpress; log_not_found off; location @wordpress { include /etc/nginx/fastcgi_params; fastcgi_pass unix:/var/run/php5-fpm.sock; fastcgi_param SCRIPT_NAME /index.php; fastcgi_param SCRIPT_FILENAME $document_root/index.php; } location ~ \.php$ { try_files $uri =404; include /etc/nginx/fastcgi_params; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; if (-f $request_filename) { fastcgi_pass unix:/var/run/php5-fpm.sock; } } } server { listen 8080; server_name ubuntubrsc.* www.ubuntubrsc.net www.ubuntubrsc.org www.ubuntubrsc.com.br www.ubuntubrsc.info www.ubuntubrsc.in; return 301 $scheme://www.ubuntubrsc.com$request_uri; } /var/www/nginx.conf: # BEGIN W3TC Minify cache location ~ /wp-content/w3tc/min.*\.js$ { types {} default_type application/x-javascript; expires modified 31536000s; add_header X-Powered-By "W3 Total Cache/0.9.2.5b"; add_header Vary "Accept-Encoding"; add_header Pragma "public"; add_header Cache-Control "max-age=31536000, public, must-revalidate, proxy-revalidate"; } location ~ /wp-content/w3tc/min.*\.css$ { types {} default_type text/css; expires modified 31536000s; add_header X-Powered-By "W3 Total Cache/0.9.2.5b"; add_header Vary "Accept-Encoding"; add_header Pragma "public"; add_header Cache-Control "max-age=31536000, public, must-revalidate, proxy-revalidate"; } location ~ /wp-content/w3tc/min.*js\.gzip$ { gzip off; types {} default_type application/x-javascript; expires modified 31536000s; add_header X-Powered-By "W3 Total Cache/0.9.2.5b"; add_header Vary "Accept-Encoding"; add_header Pragma "public"; add_header Cache-Control "max-age=31536000, public, must-revalidate, proxy-revalidate"; add_header Content-Encoding gzip; } location ~ /wp-content/w3tc/min.*css\.gzip$ { gzip off; types {} default_type text/css; expires modified 31536000s; add_header X-Powered-By "W3 Total Cache/0.9.2.5b"; add_header Vary "Accept-Encoding"; add_header Pragma "public"; add_header Cache-Control "max-age=31536000, public, must-revalidate, proxy-revalidate"; add_header Content-Encoding gzip; } # END W3TC Minify cache # BEGIN W3TC Browser Cache gzip on; gzip_types text/css application/x-javascript text/x-component text/richtext image/svg+xml text/plain text/xsd text/xsl text/xml image/x-icon; location ~ \.(css|js|htc)$ { expires 31536000s; add_header Pragma "public"; add_header Cache-Control "max-age=31536000, public, must-revalidate, proxy-revalidate"; add_header X-Powered-By "W3 Total Cache/0.9.2.5b"; } location ~ \.(html|htm|rtf|rtx|svg|svgz|txt|xsd|xsl|xml)$ { expires 3600s; add_header Pragma "public"; add_header Cache-Control "max-age=3600, public, must-revalidate, proxy-revalidate"; add_header X-Powered-By "W3 Total Cache/0.9.2.5b"; try_files $uri $uri/ $uri.html /index.php?$args; } location ~ \.(asf|asx|wax|wmv|wmx|avi|bmp|class|divx|doc|docx|eot|exe|gif|gz|gzip|ico|jpg|jpeg|jpe|mdb|mid|midi|mov|qt|mp3|m4a|mp4|m4v|mpeg|mpg|mpe|mpp|otf|odb|odc|odf|odg|odp|ods|odt|ogg|pdf|png|pot|pps|ppt|pptx|ra|ram|svg|svgz|swf|tar|tif|tiff|ttf|ttc|wav|wma|wri|xla|xls|xlsx|xlt|xlw|zip)$ { expires 31536000s; add_header Pragma "public"; add_header Cache-Control "max-age=31536000, public, must-revalidate, proxy-revalidate"; add_header X-Powered-By "W3 Total Cache/0.9.2.5b"; } # END W3TC Browser Cache # BEGIN W3TC Minify core rewrite ^/wp-content/w3tc/min/w3tc_rewrite_test$ /wp-content/w3tc/min/index.php?w3tc_rewrite_test=1 last; set $w3tc_enc ""; if ($http_accept_encoding ~ gzip) { set $w3tc_enc .gzip; } if (-f $request_filename$w3tc_enc) { rewrite (.*) $1$w3tc_enc break; } rewrite ^/wp-content/w3tc/min/(.+\.(css|js))$ /wp-content/w3tc/min/index.php?file=$1 last; # END W3TC Minify core # BEGIN W3TC Skip 404 error handling by WordPress for static files if (-f $request_filename) { break; } if (-d $request_filename) { break; } if ($request_uri ~ "(robots\.txt|sitemap(_index)?\.xml(\.gz)?|[a-z0-9_\-]+-sitemap([0-9]+)?\.xml(\.gz)?)") { break; } if ($request_uri ~* \.(css|js|htc|htm|rtf|rtx|svg|svgz|txt|xsd|xsl|xml|asf|asx|wax|wmv|wmx|avi|bmp|class|divx|doc|docx|eot|exe|gif|gz|gzip|ico|jpg|jpeg|jpe|mdb|mid|midi|mov|qt|mp3|m4a|mp4|m4v|mpeg|mpg|mpe|mpp|otf|odb|odc|odf|odg|odp|ods|odt|ogg|pdf|png|pot|pps|ppt|pptx|ra|ram|svg|svgz|swf|tar|tif|tiff|ttf|ttc|wav|wma|wri|xla|xls|xlsx|xlt|xlw|zip)$) { return 404; } # END W3TC Skip 404 error handling by WordPress for static files # BEGIN Better WP Security location ~ /\.ht { deny all; } location ~ wp-config.php { deny all; } location ~ readme.html { deny all; } location ~ readme.txt { deny all; } location ~ /install.php { deny all; } set $susquery 0; set $rule_2 0; set $rule_3 0; rewrite ^wp-includes/(.*).php /not_found last; rewrite ^/wp-admin/includes(.*)$ /not_found last; if ($request_method ~* "^(TRACE|DELETE|TRACK)"){ return 403; } set $rule_0 0; if ($request_method ~ "POST"){ set $rule_0 1; } if ($uri ~ "^(.*)wp-comments-post.php*"){ set $rule_0 2$rule_0; } if ($http_user_agent ~ "^$"){ set $rule_0 4$rule_0; } if ($rule_0 = "421"){ return 403; } if ($args ~* "\.\./") { set $susquery 1; } if ($args ~* "boot.ini") { set $susquery 1; } if ($args ~* "tag=") { set $susquery 1; } if ($args ~* "ftp:") { set $susquery 1; } if ($args ~* "http:") { set $susquery 1; } if ($args ~* "https:") { set $susquery 1; } if ($args ~* "(<|%3C).*script.*(>|%3E)") { set $susquery 1; } if ($args ~* "mosConfig_[a-zA-Z_]{1,21}(=|%3D)") { set $susquery 1; } if ($args ~* "base64_encode") { set $susquery 1; } if ($args ~* "(%24&x)") { set $susquery 1; } if ($args ~* "(\[|\]|$|$|<|>|ê|\"|;|\?|\*|=$)"){ set $susquery 1; } if ($args ~* "("|'|<|>|\|{|||%24&x)"){ set $susquery 1; } if ($args ~* "(%0|%A|%B|%C|%D|%E|%F|127.0)") { set $susquery 1; } if ($args ~* "(globals|encode|localhost|loopback)") { set $susquery 1; } if ($args ~* "(request|select|insert|concat|union|declare)") { set $susquery 1; } if ($http_cookie !~* "wordpress_logged_in_" ) { set $susquery "${susquery}2"; set $rule_2 1; set $rule_3 1; } if ($susquery = 12) { return 403; } # END Better WP Security /etc/php5/fpm/php-fpm.conf: pid = /var/run/php5-fpm.pid error_log = /var/log/php5-fpm.log emergency_restart_threshold = 3 emergency_restart_interval = 1m process_control_timeout = 10s events.mechanism = epoll /etc/php5/fpm/php.ini (only options i changed): open_basedir ="/var/www/" disable_functions = pcntl_alarm,pcntl_fork,pcntl_waitpid,pcntl_wait,pcntl_wifexited,pcntl_wifstopped,pcntl_wifsignaled,pcntl_wexitstatus,pcntl_wtermsig,pcntl_wstopsig,pcntl_signal,pcntl_signal_dispatch,pcntl_get_last_error,pcntl_strerror,pcntl_sigprocmask,pcntl_sigwaitinfo,pcntl_sigtimedwait,pcntl_exec,pcntl_getpriority,pcntl_setpriority,dl,system,shell_exec,fsockopen,parse_ini_file,passthru,popen,proc_open,proc_close,shell_exec,show_source,symlink,proc_close,proc_get_status,proc_nice,proc_open,proc_terminate,shell_exec ,highlight_file,escapeshellcmd,define_syslog_variables,posix_uname,posix_getpwuid,apache_child_terminate,posix_kill,posix_mkfifo,posix_setpgid,posix_setsid,posix_setuid,escapeshellarg,posix_uname,ftp_exec,ftp_connect,ftp_login,ftp_get,ftp_put,ftp_nb_fput,ftp_raw,ftp_rawlist,ini_alter,ini_restore,inject_code,syslog,openlog,define_syslog_variables,apache_setenv,mysql_pconnect,eval,phpAds_XmlRpc,phpA ds_remoteInfo,phpAds_xmlrpcEncode,phpAds_xmlrpcDecode,xmlrpc_entity_decode,fp,fput,virtual,show_source,pclose,readfile,wget expose_php = off max_execution_time = 30 max_input_time = 60 memory_limit = 128M display_errors = Off post_max_size = 2M allow_url_fopen = off default_socket_timeout = 60 APC settings: [APC] apc.enabled = 1 apc.shm_segments = 1 apc.shm_size = 64M apc.optimization = 0 apc.num_files_hint = 4096 apc.ttl = 60 apc.user_ttl = 7200 apc.gc_ttl = 0 apc.cache_by_default = 1 apc.filters = "" apc.mmap_file_mask = "/tmp/apc.XXXXXX" apc.slam_defense = 0 apc.file_update_protection = 2 apc.enable_cli = 0 apc.max_file_size = 10M apc.stat = 1 apc.write_lock = 1 apc.report_autofilter = 0 apc.include_once_override = 0 apc.localcache = 0 apc.localcache.size = 512 apc.coredump_unmap = 0 apc.stat_ctime = 0 /etc/php5/fpm/pool.d/www.conf user = www-data group = www-data listen = /var/run/php5-fpm.sock listen.owner = www-data listen.group = www-data listen.mode = 0666 pm = ondemand pm.max_children = 5 pm.process_idle_timeout = 3s; pm.max_requests = 50 I also started to get 404 errors in front page if i use W3 Total Cache's Page Cache (Disk Enhanced). It worked fine untill somedays ago, and then, out of nowhere, it started to happen. Tonight i will disable my mobile plugin and activate only W3 Total Cache to see if it's a conflict with them... And to finish all this, i have been getting this error: PHP Warning: apc_store(): Unable to allocate memory for pool. in /var/www/wp-content/plugins/w3-total-cache/lib/W3/Cache/Apc.php on line 41 I already modifed my APC settings, but no sucess. So... could anyone help me with those issuees, please? Ooohh... if it helps, i instaled PHP like this: sudo apt-get install php5-fpm php5-suhosin php-apc php5-gd php5-imagick php5-curl And Nginx from the official PPA. Sorry for my bad english and thanks for your time people! (:

Read the article
Installing vim7.2 on Solaris Sparc 10 as non-root

- by Tobbe

I'm trying to install vim to $HOME/bin by compiling the sources. ./configure --prefix=$home/bin seems to work, but when running make I get: > make Starting make in the src directory. If there are problems, cd to the src directory and run make there cd src && make first gcc -c -I. -Iproto -DHAVE_CONFIG_H -DFEAT_GUI_GTK -I/usr/include/gtk-2.0 -I/usr/lib/gtk-2.0/include -I/usr/include/atk-1.0 -I/usr/include/pango-1.0 -I/usr/openwin/include -I/usr/sfw/include -I/usr/sfw/include/freetype2 -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -g -O2 -I/usr/openwin/include -o objects/buffer.o buffer.c In file included from buffer.c:28: vim.h:41: error: syntax error before ':' token In file included from os_unix.h:29, from vim.h:245, from buffer.c:28: /usr/include/sys/stat.h:251: error: syntax error before "blksize_t" /usr/include/sys/stat.h:255: error: syntax error before '}' token /usr/include/sys/stat.h:309: error: syntax error before "blksize_t" /usr/include/sys/stat.h:310: error: conflicting types for 'st_blocks' /usr/include/sys/stat.h:252: error: previous declaration of 'st_blocks' was here /usr/include/sys/stat.h:313: error: syntax error before '}' token In file included from /opt/local/bin/../lib/gcc/sparc-sun-solaris2.6/3.4.6/include/sys/signal.h:132, from /usr/include/signal.h:26, from os_unix.h:163, from vim.h:245, from buffer.c:28: /usr/include/sys/siginfo.h:259: error: syntax error before "ctid_t" /usr/include/sys/siginfo.h:292: error: syntax error before '}' token /usr/include/sys/siginfo.h:294: error: syntax error before '}' token /usr/include/sys/siginfo.h:390: error: syntax error before "ctid_t" /usr/include/sys/siginfo.h:398: error: conflicting types for '__fault' /usr/include/sys/siginfo.h:267: error: previous declaration of '__fault' was here /usr/include/sys/siginfo.h:404: error: conflicting types for '__file' /usr/include/sys/siginfo.h:273: error: previous declaration of '__file' was here /usr/include/sys/siginfo.h:420: error: conflicting types for '__prof' /usr/include/sys/siginfo.h:287: error: previous declaration of '__prof' was here /usr/include/sys/siginfo.h:424: error: conflicting types for '__rctl' /usr/include/sys/siginfo.h:291: error: previous declaration of '__rctl' was here /usr/include/sys/siginfo.h:426: error: syntax error before '}' token /usr/include/sys/siginfo.h:428: error: syntax error before '}' token /usr/include/sys/siginfo.h:432: error: syntax error before "k_siginfo_t" /usr/include/sys/siginfo.h:437: error: syntax error before '}' token In file included from /usr/include/signal.h:26, from os_unix.h:163, from vim.h:245, from buffer.c:28: /opt/local/bin/../lib/gcc/sparc-sun-solaris2.6/3.4.6/include/sys/signal.h:173: error: syntax error before "siginfo_t" In file included from os_unix.h:163, from vim.h:245, from buffer.c:28: /usr/include/signal.h:111: error: syntax error before "siginfo_t" /usr/include/signal.h:113: error: syntax error before "siginfo_t" buffer.c: In function `buflist_new': buffer.c:1502: error: storage size of 'st' isn't known buffer.c: In function `buflist_findname': buffer.c:1989: error: storage size of 'st' isn't known buffer.c: In function `setfname': buffer.c:2578: error: storage size of 'st' isn't known buffer.c: In function `otherfile_buf': buffer.c:2836: error: storage size of 'st' isn't known buffer.c: In function `buf_setino': buffer.c:2874: error: storage size of 'st' isn't known buffer.c: In function `buf_same_ino': buffer.c:2894: error: dereferencing pointer to incomplete type buffer.c:2895: error: dereferencing pointer to incomplete type *** Error code 1 make: Fatal error: Command failed for target `objects/buffer.o' Current working directory /home/xluntor/vim72/src *** Error code 1 make: Fatal error: Command failed for target `first' How do I fix the make errors? Or is there another way to install vim as non-root? Thanks in advance EDIT: I took a look at the google groups link Sarah posted. The "Compiling Vim" page linked from there was for Linux, so the commands doesn't even work on Solars. But it did hint at logging the output of ./configure to a file, so I did that. Here it is: ./configure output removed. New version further down. Does anyone spot anything critical missing? EDIT 2: So I downloaded the vim package from sunfreeware. I couldn't just install it, since I don't have root privileges, but I was able to extract the package file. This was the file structure in it: `-- SMCvim `-- reloc |-- bin |-- doc | `-- vim `-- share |-- man | `-- man1 `-- vim `-- vim72 |-- autoload | `-- xml |-- colors |-- compiler |-- doc |-- ftplugin |-- indent |-- keymap |-- lang |-- macros | |-- hanoi | |-- life | |-- maze | `-- urm |-- plugin |-- print |-- spell |-- syntax |-- tools `-- tutor I moved the three files (vim, vimtutor, xdd) in SMCvim/reloc/bin to $HOME/bin, so now I can finally run $HOME/bin/vim! But where do I put the "share" directory and its content? EDIT 3: It might also be worth noting that there already exists an install of vim on the system, but it is broken. When I try to run it I get: ld.so.1: vim: fatal: libgtk-1.2.so.0: open failed: No such file or directory "which vim" outputs /opt/local/bin/vim EDIT 4: Trying to compile this on Solaris 10. uname -a SunOS ws005-22 5.10 Generic_141414-10 sun4u sparc SUNW,SPARC-Enterprise New ./configure output: ./configure --prefix=$home/bin ac_cv_sizeof_int=8 --enable-rubyinterp configure: loading cache auto/config.cache checking whether make sets $(MAKE)... yes checking for gcc... gcc checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... unsupported checking how to run the C preprocessor... gcc -E checking for grep that handles long lines and -e... /usr/sfw/bin/ggrep checking for egrep... /usr/sfw/bin/ggrep -E checking for library containing strerror... none required checking for gawk... gawk checking for strip... strip checking for ANSI C header files... yes checking for sys/wait.h that is POSIX.1 compatible... no configure: checking for buggy tools... checking for BeOS... no checking for QNX... no checking for Darwin (Mac OS X)... no checking --with-local-dir argument... Defaulting to /usr/local checking --with-vim-name argument... Defaulting to vim checking --with-ex-name argument... Defaulting to ex checking --with-view-name argument... Defaulting to view checking --with-global-runtime argument... no checking --with-modified-by argument... no checking if character set is EBCDIC... no checking --disable-selinux argument... no checking for is_selinux_enabled in -lselinux... no checking --with-features argument... Defaulting to normal checking --with-compiledby argument... no checking --disable-xsmp argument... no checking --disable-xsmp-interact argument... no checking --enable-mzschemeinterp argument... no checking --enable-perlinterp argument... no checking --enable-pythoninterp argument... no checking --enable-tclinterp argument... no checking --enable-rubyinterp argument... yes checking for ruby... /opt/sfw/bin/ruby checking Ruby version... OK checking Ruby header files... /opt/sfw/lib/ruby/1.6/sparc-solaris2.10 checking --enable-cscope argument... no checking --enable-workshop argument... no checking --disable-netbeans argument... no checking for socket in -lsocket... yes checking for gethostbyname in -lnsl... yes checking whether compiling netbeans integration is possible... no checking --enable-sniff argument... no checking --enable-multibyte argument... no checking --enable-hangulinput argument... no checking --enable-xim argument... defaulting to auto checking --enable-fontset argument... no checking for xmkmf... /usr/openwin/bin/xmkmf checking for X... libraries /usr/openwin/lib, headers /usr/openwin/include checking whether -R must be followed by a space... no checking for gethostbyname... yes checking for connect... yes checking for remove... yes checking for shmat... yes checking for IceConnectionNumber in -lICE... yes checking if X11 header files can be found... yes checking for _XdmcpAuthDoIt in -lXdmcp... no checking for IceOpenConnection in -lICE... yes checking for XpmCreatePixmapFromData in -lXpm... yes checking if X11 header files implicitly declare return values... no checking --enable-gui argument... yes/auto - automatic GUI support checking whether or not to look for GTK... yes checking whether or not to look for GTK+ 2... yes checking whether or not to look for GNOME... no checking whether or not to look for Motif... yes checking whether or not to look for Athena... yes checking whether or not to look for neXtaw... yes checking whether or not to look for Carbon... yes checking --with-gtk-prefix argument... no checking --with-gtk-exec-prefix argument... no checking --disable-gtktest argument... gtk test enabled checking for gtk-config... /opt/local/bin/gtk-config checking for pkg-config... /usr/bin/pkg-config checking for GTK - version = 2.2.0... yes; found version 2.4.9 checking X11/SM/SMlib.h usability... yes checking X11/SM/SMlib.h presence... yes checking for X11/SM/SMlib.h... yes checking X11/xpm.h usability... yes checking X11/xpm.h presence... yes checking for X11/xpm.h... yes checking X11/Sunkeysym.h usability... yes checking X11/Sunkeysym.h presence... yes checking for X11/Sunkeysym.h... yes checking for XIMText in X11/Xlib.h... yes X GUI selected; xim has been enabled checking whether toupper is broken... no checking whether __DATE__ and __TIME__ work... yes checking elf.h usability... yes checking elf.h presence... yes checking for elf.h... yes checking for main in -lelf... yes checking for dirent.h that defines DIR... yes checking for library containing opendir... none required checking for sys/wait.h that defines union wait... no checking stdarg.h usability... yes checking stdarg.h presence... yes checking for stdarg.h... yes checking stdlib.h usability... yes checking stdlib.h presence... yes checking for stdlib.h... yes checking string.h usability... yes checking string.h presence... yes checking for string.h... yes checking sys/select.h usability... yes checking sys/select.h presence... yes checking for sys/select.h... yes checking sys/utsname.h usability... yes checking sys/utsname.h presence... yes checking for sys/utsname.h... yes checking termcap.h usability... yes checking termcap.h presence... yes checking for termcap.h... yes checking fcntl.h usability... yes checking fcntl.h presence... yes checking for fcntl.h... yes checking sgtty.h usability... yes checking sgtty.h presence... yes checking for sgtty.h... yes checking sys/ioctl.h usability... yes checking sys/ioctl.h presence... yes checking for sys/ioctl.h... yes checking sys/time.h usability... yes checking sys/time.h presence... yes checking for sys/time.h... yes checking sys/types.h usability... yes checking sys/types.h presence... yes checking for sys/types.h... yes checking termio.h usability... yes checking termio.h presence... yes checking for termio.h... yes checking iconv.h usability... yes checking iconv.h presence... yes checking for iconv.h... yes checking langinfo.h usability... yes checking langinfo.h presence... yes checking for langinfo.h... yes checking math.h usability... yes checking math.h presence... yes checking for math.h... yes checking unistd.h usability... yes checking unistd.h presence... yes checking for unistd.h... yes checking stropts.h usability... no checking stropts.h presence... yes configure: WARNING: stropts.h: present but cannot be compiled configure: WARNING: stropts.h: check for missing prerequisite headers? configure: WARNING: stropts.h: see the Autoconf documentation configure: WARNING: stropts.h: section "Present But Cannot Be Compiled" configure: WARNING: stropts.h: proceeding with the preprocessor's result configure: WARNING: stropts.h: in the future, the compiler will take precedence checking for stropts.h... yes checking errno.h usability... yes checking errno.h presence... yes checking for errno.h... yes checking sys/resource.h usability... yes checking sys/resource.h presence... yes checking for sys/resource.h... yes checking sys/systeminfo.h usability... yes checking sys/systeminfo.h presence... yes checking for sys/systeminfo.h... yes checking locale.h usability... yes checking locale.h presence... yes checking for locale.h... yes checking sys/stream.h usability... no checking sys/stream.h presence... yes configure: WARNING: sys/stream.h: present but cannot be compiled configure: WARNING: sys/stream.h: check for missing prerequisite headers? configure: WARNING: sys/stream.h: see the Autoconf documentation configure: WARNING: sys/stream.h: section "Present But Cannot Be Compiled" configure: WARNING: sys/stream.h: proceeding with the preprocessor's result configure: WARNING: sys/stream.h: in the future, the compiler will take precedence checking for sys/stream.h... yes checking termios.h usability... yes checking termios.h presence... yes checking for termios.h... yes checking libc.h usability... no checking libc.h presence... no checking for libc.h... no checking sys/statfs.h usability... yes checking sys/statfs.h presence... yes checking for sys/statfs.h... yes checking poll.h usability... yes checking poll.h presence... yes checking for poll.h... yes checking sys/poll.h usability... yes checking sys/poll.h presence... yes checking for sys/poll.h... yes checking pwd.h usability... yes checking pwd.h presence... yes checking for pwd.h... yes checking utime.h usability... yes checking utime.h presence... yes checking for utime.h... yes checking sys/param.h usability... yes checking sys/param.h presence... yes checking for sys/param.h... yes checking libintl.h usability... yes checking libintl.h presence... yes checking for libintl.h... yes checking libgen.h usability... yes checking libgen.h presence... yes checking for libgen.h... yes checking util/debug.h usability... no checking util/debug.h presence... no checking for util/debug.h... no checking util/msg18n.h usability... no checking util/msg18n.h presence... no checking for util/msg18n.h... no checking frame.h usability... no checking frame.h presence... no checking for frame.h... no checking sys/acl.h usability... yes checking sys/acl.h presence... yes checking for sys/acl.h... yes checking sys/access.h usability... no checking sys/access.h presence... no checking for sys/access.h... no checking sys/sysctl.h usability... no checking sys/sysctl.h presence... no checking for sys/sysctl.h... no checking sys/sysinfo.h usability... yes checking sys/sysinfo.h presence... yes checking for sys/sysinfo.h... yes checking wchar.h usability... yes checking wchar.h presence... yes checking for wchar.h... yes checking wctype.h usability... yes checking wctype.h presence... yes checking for wctype.h... yes checking for sys/ptem.h... no checking for pthread_np.h... no checking strings.h usability... yes checking strings.h presence... yes checking for strings.h... yes checking if strings.h can be included after string.h... yes checking whether gcc needs -traditional... no checking for an ANSI C-conforming const... yes checking for mode_t... yes checking for off_t... yes checking for pid_t... yes checking for size_t... yes checking for uid_t in sys/types.h... yes checking whether time.h and sys/time.h may both be included... yes checking for ino_t... yes checking for dev_t... yes checking for rlim_t... yes checking for stack_t... yes checking whether stack_t has an ss_base field... no checking --with-tlib argument... empty: automatic terminal library selection checking for tgetent in -lncurses... yes checking whether we talk terminfo... yes checking what tgetent() returns for an unknown terminal... zero checking whether termcap.h contains ospeed... yes checking whether termcap.h contains UP, BC and PC... yes checking whether tputs() uses outfuntype... no checking whether sys/select.h and sys/time.h may both be included... yes checking for /dev/ptc... no checking for SVR4 ptys... yes checking for ptyranges... don't know checking default tty permissions/group... can't determine - assume ptys are world accessable world checking return type of signal handlers... void checking for struct sigcontext... no checking getcwd implementation is broken... no checking for bcmp... yes checking for fchdir... yes checking for fchown... yes checking for fseeko... yes checking for fsync... yes checking for ftello... yes checking for getcwd... yes checking for getpseudotty... no checking for getpwnam... yes checking for getpwuid... yes checking for getrlimit... yes checking for gettimeofday... yes checking for getwd... yes checking for lstat... yes checking for memcmp... yes checking for memset... yes checking for nanosleep... no checking for opendir... yes checking for putenv... yes checking for qsort... yes checking for readlink... yes checking for select... yes checking for setenv... yes checking for setpgid... yes checking for setsid... yes checking for sigaltstack... yes checking for sigstack... yes checking for sigset... yes checking for sigsetjmp... yes checking for sigaction... yes checking for sigvec... no checking for strcasecmp... yes checking for strerror... yes checking for strftime... yes checking for stricmp... no checking for strncasecmp... yes checking for strnicmp... no checking for strpbrk... yes checking for strtol... yes checking for tgetent... yes checking for towlower... yes checking for towupper... yes checking for iswupper... yes checking for usleep... yes checking for utime... yes checking for utimes... yes checking for st_blksize... no checking whether stat() ignores a trailing slash... no checking for iconv_open()... yes; with -liconv checking for nl_langinfo(CODESET)... yes checking for strtod in -lm... yes checking for strtod() and other floating point functions... yes checking --disable-acl argument... no checking for acl_get_file in -lposix1e... no checking for acl_get_file in -lacl... no checking for POSIX ACL support... no checking for Solaris ACL support... yes checking for AIX ACL support... no checking --disable-gpm argument... no checking for gpm... no checking --disable-sysmouse argument... no checking for sysmouse... no checking for rename... yes checking for sysctl... not usable checking for sysinfo... not usable checking for sysinfo.mem_unit... no checking for sysconf... yes checking size of int... (cached) 8 checking whether memmove handles overlaps... yes checking for _xpg4_setrunelocale in -lxpg4... no checking how to create tags... ctags -t checking how to run man with a section nr... man -s checking --disable-nls argument... no checking for msgfmt... msgfmt checking for NLS... no "po/Makefile" - disabled checking dlfcn.h usability... yes checking dlfcn.h presence... yes checking for dlfcn.h... yes checking for dlopen()... yes checking for dlsym()... yes checking setjmp.h usability... yes checking setjmp.h presence... yes checking for setjmp.h... yes checking for GCC 3 or later... yes configure: updating cache auto/config.cache configure: creating auto/config.status config.status: creating auto/config.mk config.status: creating auto/config.h Make: make Starting make in the src directory. If there are problems, cd to the src directory and run make there cd src && make first mkdir objects CC="gcc -Iproto -DHAVE_CONFIG_H -DFEAT_GUI_GTK -I/usr/include/gtk-2.0 -I/usr/lib/gtk-2.0/include -I/usr/include/atk-1.0 -I/usr/include/pango-1.0 -I/usr/openwin/include -I/usr/sfw/include -I/usr/sfw/include/freetype2 -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/openwin/include -I/opt/sfw/lib/ruby/1.6/sparc-solaris2.10 " srcdir=. sh ./osdef.sh gcc -c -I. -Iproto -DHAVE_CONFIG_H -DFEAT_GUI_GTK -I/usr/include/gtk-2.0 -I/usr/lib/gtk-2.0/include -I/usr/include/atk-1.0 -I/usr/include/pango-1.0 -I/usr/openwin/include -I/usr/sfw/include -I/usr/sfw/include/freetype2 -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -g -O2 -I/usr/openwin/include -I/opt/sfw/lib/ruby/1.6/sparc-solaris2.10 -o objects/buffer.o buffer.c In file included from os_unix.h:29, from vim.h:245, from buffer.c:28: /usr/include/sys/stat.h:251: error: syntax error before "blksize_t" /usr/include/sys/stat.h:255: error: syntax error before '}' token /usr/include/sys/stat.h:309: error: syntax error before "blksize_t" /usr/include/sys/stat.h:310: error: conflicting types for 'st_blocks' /usr/include/sys/stat.h:252: error: previous declaration of 'st_blocks' was here /usr/include/sys/stat.h:313: error: syntax error before '}' token In file included from /opt/local/bin/../lib/gcc/sparc-sun-solaris2.6/3.4.6/include/sys/signal.h:132, from /usr/include/signal.h:26, from os_unix.h:163, from vim.h:245, from buffer.c:28: /usr/include/sys/siginfo.h:259: error: syntax error before "ctid_t" /usr/include/sys/siginfo.h:292: error: syntax error before '}' token /usr/include/sys/siginfo.h:294: error: syntax error before '}' token /usr/include/sys/siginfo.h:390: error: syntax error before "ctid_t" /usr/include/sys/siginfo.h:398: error: conflicting types for '__fault' /usr/include/sys/siginfo.h:267: error: previous declaration of '__fault' was here /usr/include/sys/siginfo.h:404: error: conflicting types for '__file' /usr/include/sys/siginfo.h:273: error: previous declaration of '__file' was here /usr/include/sys/siginfo.h:420: error: conflicting types for '__prof' /usr/include/sys/siginfo.h:287: error: previous declaration of '__prof' was here /usr/include/sys/siginfo.h:424: error: conflicting types for '__rctl' /usr/include/sys/siginfo.h:291: error: previous declaration of '__rctl' was here /usr/include/sys/siginfo.h:426: error: syntax error before '}' token /usr/include/sys/siginfo.h:428: error: syntax error before '}' token /usr/include/sys/siginfo.h:432: error: syntax error before "k_siginfo_t" /usr/include/sys/siginfo.h:437: error: syntax error before '}' token In file included from /usr/include/signal.h:26, from os_unix.h:163, from vim.h:245, from buffer.c:28: /opt/local/bin/../lib/gcc/sparc-sun-solaris2.6/3.4.6/include/sys/signal.h:173: error: syntax error before "siginfo_t" In file included from os_unix.h:163, from vim.h:245, from buffer.c:28: /usr/include/signal.h:111: error: syntax error before "siginfo_t" /usr/include/signal.h:113: error: syntax error before "siginfo_t" buffer.c: In function `buflist_new': buffer.c:1502: error: storage size of 'st' isn't known buffer.c: In function `buflist_findname': buffer.c:1989: error: storage size of 'st' isn't known buffer.c: In function `setfname': buffer.c:2578: error: storage size of 'st' isn't known buffer.c: In function `otherfile_buf': buffer.c:2836: error: storage size of 'st' isn't known buffer.c: In function `buf_setino': buffer.c:2874: error: storage size of 'st' isn't known buffer.c: In function `buf_same_ino': buffer.c:2894: error: dereferencing pointer to incomplete type buffer.c:2895: error: dereferencing pointer to incomplete type *** Error code 1 make: Fatal error: Command failed for target `objects/buffer.o' Current working directory /home/xluntor/vim72/src *** Error code 1 make: Fatal error: Command failed for target `first'

Read the article
What are good design practices when working with Entity Framework

- by AD

This will apply mostly for an asp.net application where the data is not accessed via soa. Meaning that you get access to the objects loaded from the framework, not Transfer Objects, although some recommendation still apply. This is a community post, so please add to it as you see fit. Applies to: Entity Framework 1.0 shipped with Visual Studio 2008 sp1. Why pick EF in the first place? Considering it is a young technology with plenty of problems (see below), it may be a hard sell to get on the EF bandwagon for your project. However, it is the technology Microsoft is pushing (at the expense of Linq2Sql, which is a subset of EF). In addition, you may not be satisfied with NHibernate or other solutions out there. Whatever the reasons, there are people out there (including me) working with EF and life is not bad.make you think. EF and inheritance The first big subject is inheritance. EF does support mapping for inherited classes that are persisted in 2 ways: table per class and table the hierarchy. The modeling is easy and there are no programming issues with that part. (The following applies to table per class model as I don't have experience with table per hierarchy, which is, anyway, limited.) The real problem comes when you are trying to run queries that include one or many objects that are part of an inheritance tree: the generated sql is incredibly awful, takes a long time to get parsed by the EF and takes a long time to execute as well. This is a real show stopper. Enough that EF should probably not be used with inheritance or as little as possible. Here is an example of how bad it was. My EF model had ~30 classes, ~10 of which were part of an inheritance tree. On running a query to get one item from the Base class, something as simple as Base.Get(id), the generated SQL was over 50,000 characters. Then when you are trying to return some Associations, it degenerates even more, going as far as throwing SQL exceptions about not being able to query more than 256 tables at once. Ok, this is bad, EF concept is to allow you to create your object structure without (or with as little as possible) consideration on the actual database implementation of your table. It completely fails at this. So, recommendations? Avoid inheritance if you can, the performance will be so much better. Use it sparingly where you have to. In my opinion, this makes EF a glorified sql-generation tool for querying, but there are still advantages to using it. And ways to implement mechanism that are similar to inheritance. Bypassing inheritance with Interfaces First thing to know with trying to get some kind of inheritance going with EF is that you cannot assign a non-EF-modeled class a base class. Don't even try it, it will get overwritten by the modeler. So what to do? You can use interfaces to enforce that classes implement some functionality. For example here is a IEntity interface that allow you to define Associations between EF entities where you don't know at design time what the type of the entity would be. public enum EntityTypes{ Unknown = -1, Dog = 0, Cat } public interface IEntity { int EntityID { get; } string Name { get; } Type EntityType { get; } } public partial class Dog : IEntity { // implement EntityID and Name which could actually be fields // from your EF model Type EntityType{ get{ return EntityTypes.Dog; } } } Using this IEntity, you can then work with undefined associations in other classes // lets take a class that you defined in your model. // that class has a mapping to the columns: PetID, PetType public partial class Person { public IEntity GetPet() { return IEntityController.Get(PetID,PetType); } } which makes use of some extension functions: public class IEntityController { static public IEntity Get(int id, EntityTypes type) { switch (type) { case EntityTypes.Dog: return Dog.Get(id); case EntityTypes.Cat: return Cat.Get(id); default: throw new Exception("Invalid EntityType"); } } } Not as neat as having plain inheritance, particularly considering you have to store the PetType in an extra database field, but considering the performance gains, I would not look back. It also cannot model one-to-many, many-to-many relationship, but with creative uses of 'Union' it could be made to work. Finally, it creates the side effet of loading data in a property/function of the object, which you need to be careful about. Using a clear naming convention like GetXYZ() helps in that regards. Compiled Queries Entity Framework performance is not as good as direct database access with ADO (obviously) or Linq2SQL. There are ways to improve it however, one of which is compiling your queries. The performance of a compiled query is similar to Linq2Sql. What is a compiled query? It is simply a query for which you tell the framework to keep the parsed tree in memory so it doesn't need to be regenerated the next time you run it. So the next run, you will save the time it takes to parse the tree. Do not discount that as it is a very costly operation that gets even worse with more complex queries. There are 2 ways to compile a query: creating an ObjectQuery with EntitySQL and using CompiledQuery.Compile() function. (Note that by using an EntityDataSource in your page, you will in fact be using ObjectQuery with EntitySQL, so that gets compiled and cached). An aside here in case you don't know what EntitySQL is. It is a string-based way of writing queries against the EF. Here is an example: "select value dog from Entities.DogSet as dog where dog.ID = @ID". The syntax is pretty similar to SQL syntax. You can also do pretty complex object manipulation, which is well explained [here][1]. Ok, so here is how to do it using ObjectQuery< string query = "select value dog " + "from Entities.DogSet as dog " + "where dog.ID = @ID"; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance)); oQuery.Parameters.Add(new ObjectParameter("ID", id)); oQuery.EnablePlanCaching = true; return oQuery.FirstOrDefault(); The first time you run this query, the framework will generate the expression tree and keep it in memory. So the next time it gets executed, you will save on that costly step. In that example EnablePlanCaching = true, which is unnecessary since that is the default option. The other way to compile a query for later use is the CompiledQuery.Compile method. This uses a delegate: static readonly Func<Entities, int, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, Dog>((ctx, id) => ctx.DogSet.FirstOrDefault(it => it.ID == id)); or using linq static readonly Func<Entities, int, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, Dog>((ctx, id) => (from dog in ctx.DogSet where dog.ID == id select dog).FirstOrDefault()); to call the query: query_GetDog.Invoke( YourContext, id ); The advantage of CompiledQuery is that the syntax of your query is checked at compile time, where as EntitySQL is not. However, there are other consideration... Includes Lets say you want to have the data for the dog owner to be returned by the query to avoid making 2 calls to the database. Easy to do, right? EntitySQL string query = "select value dog " + "from Entities.DogSet as dog " + "where dog.ID = @ID"; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance)).Include("Owner"); oQuery.Parameters.Add(new ObjectParameter("ID", id)); oQuery.EnablePlanCaching = true; return oQuery.FirstOrDefault(); CompiledQuery static readonly Func<Entities, int, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, Dog>((ctx, id) => (from dog in ctx.DogSet.Include("Owner") where dog.ID == id select dog).FirstOrDefault()); Now, what if you want to have the Include parametrized? What I mean is that you want to have a single Get() function that is called from different pages that care about different relationships for the dog. One cares about the Owner, another about his FavoriteFood, another about his FavotireToy and so on. Basicly, you want to tell the query which associations to load. It is easy to do with EntitySQL public Dog Get(int id, string include) { string query = "select value dog " + "from Entities.DogSet as dog " + "where dog.ID = @ID"; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance)) .IncludeMany(include); oQuery.Parameters.Add(new ObjectParameter("ID", id)); oQuery.EnablePlanCaching = true; return oQuery.FirstOrDefault(); } The include simply uses the passed string. Easy enough. Note that it is possible to improve on the Include(string) function (that accepts only a single path) with an IncludeMany(string) that will let you pass a string of comma-separated associations to load. Look further in the extension section for this function. If we try to do it with CompiledQuery however, we run into numerous problems: The obvious static readonly Func<Entities, int, string, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, string, Dog>((ctx, id, include) => (from dog in ctx.DogSet.Include(include) where dog.ID == id select dog).FirstOrDefault()); will choke when called with: query_GetDog.Invoke( YourContext, id, "Owner,FavoriteFood" ); Because, as mentionned above, Include() only wants to see a single path in the string and here we are giving it 2: "Owner" and "FavoriteFood" (which is not to be confused with "Owner.FavoriteFood"!). Then, let's use IncludeMany(), which is an extension function static readonly Func<Entities, int, string, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, string, Dog>((ctx, id, include) => (from dog in ctx.DogSet.IncludeMany(include) where dog.ID == id select dog).FirstOrDefault()); Wrong again, this time it is because the EF cannot parse IncludeMany because it is not part of the functions that is recognizes: it is an extension. Ok, so you want to pass an arbitrary number of paths to your function and Includes() only takes a single one. What to do? You could decide that you will never ever need more than, say 20 Includes, and pass each separated strings in a struct to CompiledQuery. But now the query looks like this: from dog in ctx.DogSet.Include(include1).Include(include2).Include(include3) .Include(include4).Include(include5).Include(include6) .[...].Include(include19).Include(include20) where dog.ID == id select dog which is awful as well. Ok, then, but wait a minute. Can't we return an ObjectQuery< with CompiledQuery? Then set the includes on that? Well, that what I would have thought so as well: static readonly Func<Entities, int, ObjectQuery<Dog>> query_GetDog = CompiledQuery.Compile<Entities, int, string, ObjectQuery<Dog>>((ctx, id) => (ObjectQuery<Dog>)(from dog in ctx.DogSet where dog.ID == id select dog)); public Dog GetDog( int id, string include ) { ObjectQuery<Dog> oQuery = query_GetDog(id); oQuery = oQuery.IncludeMany(include); return oQuery.FirstOrDefault; } That should have worked, except that when you call IncludeMany (or Include, Where, OrderBy...) you invalidate the cached compiled query because it is an entirely new one now! So, the expression tree needs to be reparsed and you get that performance hit again. So what is the solution? You simply cannot use CompiledQueries with parametrized Includes. Use EntitySQL instead. This doesn't mean that there aren't uses for CompiledQueries. It is great for localized queries that will always be called in the same context. Ideally CompiledQuery should always be used because the syntax is checked at compile time, but due to limitation, that's not possible. An example of use would be: you may want to have a page that queries which two dogs have the same favorite food, which is a bit narrow for a BusinessLayer function, so you put it in your page and know exactly what type of includes are required. Passing more than 3 parameters to a CompiledQuery Func is limited to 5 parameters, of which the last one is the return type and the first one is your Entities object from the model. So that leaves you with 3 parameters. A pitance, but it can be improved on very easily. public struct MyParams { public string param1; public int param2; public DateTime param3; } static readonly Func<Entities, MyParams, IEnumerable<Dog>> query_GetDog = CompiledQuery.Compile<Entities, MyParams, IEnumerable<Dog>>((ctx, myParams) => from dog in ctx.DogSet where dog.Age == myParams.param2 && dog.Name == myParams.param1 and dog.BirthDate > myParams.param3 select dog); public List<Dog> GetSomeDogs( int age, string Name, DateTime birthDate ) { MyParams myParams = new MyParams(); myParams.param1 = name; myParams.param2 = age; myParams.param3 = birthDate; return query_GetDog(YourContext,myParams).ToList(); } Return Types (this does not apply to EntitySQL queries as they aren't compiled at the same time during execution as the CompiledQuery method) Working with Linq, you usually don't force the execution of the query until the very last moment, in case some other functions downstream wants to change the query in some way: static readonly Func<Entities, int, string, IEnumerable<Dog>> query_GetDog = CompiledQuery.Compile<Entities, int, string, IEnumerable<Dog>>((ctx, age, name) => from dog in ctx.DogSet where dog.Age == age && dog.Name == name select dog); public IEnumerable<Dog> GetSomeDogs( int age, string name ) { return query_GetDog(YourContext,age,name); } public void DataBindStuff() { IEnumerable<Dog> dogs = GetSomeDogs(4,"Bud"); // but I want the dogs ordered by BirthDate gridView.DataSource = dogs.OrderBy( it => it.BirthDate ); } What is going to happen here? By still playing with the original ObjectQuery (that is the actual return type of the Linq statement, which implements IEnumerable), it will invalidate the compiled query and be force to re-parse. So, the rule of thumb is to return a List< of objects instead. static readonly Func<Entities, int, string, IEnumerable<Dog>> query_GetDog = CompiledQuery.Compile<Entities, int, string, IEnumerable<Dog>>((ctx, age, name) => from dog in ctx.DogSet where dog.Age == age && dog.Name == name select dog); public List<Dog> GetSomeDogs( int age, string name ) { return query_GetDog(YourContext,age,name).ToList(); //<== change here } public void DataBindStuff() { List<Dog> dogs = GetSomeDogs(4,"Bud"); // but I want the dogs ordered by BirthDate gridView.DataSource = dogs.OrderBy( it => it.BirthDate ); } When you call ToList(), the query gets executed as per the compiled query and then, later, the OrderBy is executed against the objects in memory. It may be a little bit slower, but I'm not even sure. One sure thing is that you have no worries about mis-handling the ObjectQuery and invalidating the compiled query plan. Once again, that is not a blanket statement. ToList() is a defensive programming trick, but if you have a valid reason not to use ToList(), go ahead. There are many cases in which you would want to refine the query before executing it. Performance What is the performance impact of compiling a query? It can actually be fairly large. A rule of thumb is that compiling and caching the query for reuse takes at least double the time of simply executing it without caching. For complex queries (read inherirante), I have seen upwards to 10 seconds. So, the first time a pre-compiled query gets called, you get a performance hit. After that first hit, performance is noticeably better than the same non-pre-compiled query. Practically the same as Linq2Sql When you load a page with pre-compiled queries the first time you will get a hit. It will load in maybe 5-15 seconds (obviously more than one pre-compiled queries will end up being called), while subsequent loads will take less than 300ms. Dramatic difference, and it is up to you to decide if it is ok for your first user to take a hit or you want a script to call your pages to force a compilation of the queries. Can this query be cached? { Dog dog = from dog in YourContext.DogSet where dog.ID == id select dog; } No, ad-hoc Linq queries are not cached and you will incur the cost of generating the tree every single time you call it. Parametrized Queries Most search capabilities involve heavily parametrized queries. There are even libraries available that will let you build a parametrized query out of lamba expressions. The problem is that you cannot use pre-compiled queries with those. One way around that is to map out all the possible criteria in the query and flag which one you want to use: public struct MyParams { public string name; public bool checkName; public int age; public bool checkAge; } static readonly Func<Entities, MyParams, IEnumerable<Dog>> query_GetDog = CompiledQuery.Compile<Entities, MyParams, IEnumerable<Dog>>((ctx, myParams) => from dog in ctx.DogSet where (myParams.checkAge == true && dog.Age == myParams.age) && (myParams.checkName == true && dog.Name == myParams.name ) select dog); protected List<Dog> GetSomeDogs() { MyParams myParams = new MyParams(); myParams.name = "Bud"; myParams.checkName = true; myParams.age = 0; myParams.checkAge = false; return query_GetDog(YourContext,myParams).ToList(); } The advantage here is that you get all the benifits of a pre-compiled quert. The disadvantages are that you most likely will end up with a where clause that is pretty difficult to maintain, that you will incur a bigger penalty for pre-compiling the query and that each query you run is not as efficient as it could be (particularly with joins thrown in). Another way is to build an EntitySQL query piece by piece, like we all did with SQL. protected List<Dod> GetSomeDogs( string name, int age) { string query = "select value dog from Entities.DogSet where 1 = 1 "; if( !String.IsNullOrEmpty(name) ) query = query + " and dog.Name == @Name "; if( age > 0 ) query = query + " and dog.Age == @Age "; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>( query, YourContext ); if( !String.IsNullOrEmpty(name) ) oQuery.Parameters.Add( new ObjectParameter( "Name", name ) ); if( age > 0 ) oQuery.Parameters.Add( new ObjectParameter( "Age", age ) ); return oQuery.ToList(); } Here the problems are: - there is no syntax checking during compilation - each different combination of parameters generate a different query which will need to be pre-compiled when it is first run. In this case, there are only 4 different possible queries (no params, age-only, name-only and both params), but you can see that there can be way more with a normal world search. - Noone likes to concatenate strings! Another option is to query a large subset of the data and then narrow it down in memory. This is particularly useful if you are working with a definite subset of the data, like all the dogs in a city. You know there are a lot but you also know there aren't that many... so your CityDog search page can load all the dogs for the city in memory, which is a single pre-compiled query and then refine the results protected List<Dod> GetSomeDogs( string name, int age, string city) { string query = "select value dog from Entities.DogSet where dog.Owner.Address.City == @City "; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>( query, YourContext ); oQuery.Parameters.Add( new ObjectParameter( "City", city ) ); List<Dog> dogs = oQuery.ToList(); if( !String.IsNullOrEmpty(name) ) dogs = dogs.Where( it => it.Name == name ); if( age > 0 ) dogs = dogs.Where( it => it.Age == age ); return dogs; } It is particularly useful when you start displaying all the data then allow for filtering. Problems: - Could lead to serious data transfer if you are not careful about your subset. - You can only filter on the data that you returned. It means that if you don't return the Dog.Owner association, you will not be able to filter on the Dog.Owner.Name So what is the best solution? There isn't any. You need to pick the solution that works best for you and your problem: - Use lambda-based query building when you don't care about pre-compiling your queries. - Use fully-defined pre-compiled Linq query when your object structure is not too complex. - Use EntitySQL/string concatenation when the structure could be complex and when the possible number of different resulting queries are small (which means fewer pre-compilation hits). - Use in-memory filtering when you are working with a smallish subset of the data or when you had to fetch all of the data on the data at first anyway (if the performance is fine with all the data, then filtering in memory will not cause any time to be spent in the db). Singleton access The best way to deal with your context and entities accross all your pages is to use the singleton pattern: public sealed class YourContext { private const string instanceKey = "On3GoModelKey"; YourContext(){} public static YourEntities Instance { get { HttpContext context = HttpContext.Current; if( context == null ) return Nested.instance; if (context.Items[instanceKey] == null) { On3GoEntities entity = new On3GoEntities(); context.Items[instanceKey] = entity; } return (YourEntities)context.Items[instanceKey]; } } class Nested { // Explicit static constructor to tell C# compiler // not to mark type as beforefieldinit static Nested() { } internal static readonly YourEntities instance = new YourEntities(); } } NoTracking, is it worth it? When executing a query, you can tell the framework to track the objects it will return or not. What does it mean? With tracking enabled (the default option), the framework will track what is going on with the object (has it been modified? Created? Deleted?) and will also link objects together, when further queries are made from the database, which is what is of interest here. For example, lets assume that Dog with ID == 2 has an owner which ID == 10. Dog dog = (from dog in YourContext.DogSet where dog.ID == 2 select dog).FirstOrDefault(); //dog.OwnerReference.IsLoaded == false; Person owner = (from o in YourContext.PersonSet where o.ID == 10 select dog).FirstOrDefault(); //dog.OwnerReference.IsLoaded == true; If we were to do the same with no tracking, the result would be different. ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>) (from dog in YourContext.DogSet where dog.ID == 2 select dog); oDogQuery.MergeOption = MergeOption.NoTracking; Dog dog = oDogQuery.FirstOrDefault(); //dog.OwnerReference.IsLoaded == false; ObjectQuery<Person> oPersonQuery = (ObjectQuery<Person>) (from o in YourContext.PersonSet where o.ID == 10 select o); oPersonQuery.MergeOption = MergeOption.NoTracking; Owner owner = oPersonQuery.FirstOrDefault(); //dog.OwnerReference.IsLoaded == false; Tracking is very useful and in a perfect world without performance issue, it would always be on. But in this world, there is a price for it, in terms of performance. So, should you use NoTracking to speed things up? It depends on what you are planning to use the data for. Is there any chance that the data your query with NoTracking can be used to make update/insert/delete in the database? If so, don't use NoTracking because associations are not tracked and will causes exceptions to be thrown. In a page where there are absolutly no updates to the database, you can use NoTracking. Mixing tracking and NoTracking is possible, but it requires you to be extra careful with updates/inserts/deletes. The problem is that if you mix then you risk having the framework trying to Attach() a NoTracking object to the context where another copy of the same object exist with tracking on. Basicly, what I am saying is that Dog dog1 = (from dog in YourContext.DogSet where dog.ID == 2).FirstOrDefault(); ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>) (from dog in YourContext.DogSet where dog.ID == 2 select dog); oDogQuery.MergeOption = MergeOption.NoTracking; Dog dog2 = oDogQuery.FirstOrDefault(); dog1 and dog2 are 2 different objects, one tracked and one not. Using the detached object in an update/insert will force an Attach() that will say "Wait a minute, I do already have an object here with the same database key. Fail". And when you Attach() one object, all of its hierarchy gets attached as well, causing problems everywhere. Be extra careful. How much faster is it with NoTracking It depends on the queries. Some are much more succeptible to tracking than other. I don't have a fast an easy rule for it, but it helps. So I should use NoTracking everywhere then? Not exactly. There are some advantages to tracking object. The first one is that the object is cached, so subsequent call for that object will not hit the database. That cache is only valid for the lifetime of the YourEntities object, which, if you use the singleton code above, is the same as the page lifetime. One page request == one YourEntity object. So for multiple calls for the same object, it will load only once per page request. (Other caching mechanism could extend that). What happens when you are using NoTracking and try to load the same object multiple times? The database will be queried each time, so there is an impact there. How often do/should you call for the same object during a single page request? As little as possible of course, but it does happens. Also remember the piece above about having the associations connected automatically for your? You don't have that with NoTracking, so if you load your data in multiple batches, you will not have a link to between them: ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>)(from dog in YourContext.DogSet select dog); oDogQuery.MergeOption = MergeOption.NoTracking; List<Dog> dogs = oDogQuery.ToList(); ObjectQuery<Person> oPersonQuery = (ObjectQuery<Person>)(from o in YourContext.PersonSet select o); oPersonQuery.MergeOption = MergeOption.NoTracking; List<Person> owners = oPersonQuery.ToList(); In this case, no dog will have its .Owner property set. Some things to keep in mind when you are trying to optimize the performance. No lazy loading, what am I to do? This can be seen as a blessing in disguise. Of course it is annoying to load everything manually. However, it decreases the number of calls to the db and forces you to think about when you should load data. The more you can load in one database call the better. That was always true, but it is enforced now with this 'feature' of EF. Of course, you can call if( !ObjectReference.IsLoaded ) ObjectReference.Load(); if you want to, but a better practice is to force the framework to load the objects you know you will need in one shot. This is where the discussion about parametrized Includes begins to make sense. Lets say you have you Dog object public class Dog { public Dog Get(int id) { return YourContext.DogSet.FirstOrDefault(it => it.ID == id ); } } This is the type of function you work with all the time. It gets called from all over the place and once you have that Dog object, you will do very different things to it in different functions. First, it should be pre-compiled, because you will call that very often. Second, each different pages will want to have access to a different subset of the Dog data. Some will want the Owner, some the FavoriteToy, etc. Of course, you could call Load() for each reference you need anytime you need one. But that will generate a call to the database each time. Bad idea. So instead, each page will ask for the data it wants to see when it first request for the Dog object: static public Dog Get(int id) { return GetDog(entity,"");} static public Dog Get(int id, string includePath) { string query = "select value o " + " from YourEntities.DogSet as o " +

Read the article

Search Results

Search found 763 results on 31 pages for 'union'.

Page 31/31 | < Previous Page | 27 28 29 30 31

- by kralco626

- by Pindatjuh

- by Paul White

- by sqlworkshops

- by Phil Factor

- by Michael Snow

- by user12620111

- by hairyyak

- by sandeepan-nath

- by Julian Fernandes

- by Tobbe

- by AD

< Previous Page | 27 28 29 30 31