Search Results

Search found 21436 results on 858 pages for 'draw order'.

Page 141/858 | < Previous Page | 137 138 139 140 141 142 143 144 145 146 147 148  | Next Page >

  • Splitting a UL into three even lists

    - by Andy
    I am printing a menu using UL, the trouble is the order that is generated by my script is ignored because im printing the LI one after the other and they're spanning three across. So the order is 1 , 2 , 3 as opposed to 1 2 3 To counteract this i wanted to split my single UL into three that way the order would be maintained. Here is my code currently which works perfectly to print a single UL. //Category Drop Down Menu $this->CategoryDropDownMenu = '<ul id="subcatmenu">'; foreach($sitemap->CategoryMenu as $val) $this->CategoryDropDownMenu .= '<li><a href="'.$val[host].$val[link].'"><span>'.htmlspecialchars($val[title]).'</span></a></li>'; $this->CategoryDropDownMenu .= '</ul>';

    Read the article

  • mysql result for pagination

    - by Reteras Remus
    The query is: SELECT * FROM `news` ORDER BY `id` LIMIT ($curr_page * 5), ( ($curr_page * 5) + 5 ) Where $curr_page is a php variable which is getting a value from $_GET['page'] I want to make a pagination (5 news on each page), but I don't know why the mysql is returning me extra values. On the first page the result ok: $curr_page = 0 The query would be: SELECT * FROM `news` ORDER BY `id` LIMIT 0, 5 But on the second page, the result from the query is adding extra news, 10 instead of 5. The query on the second page: SELECT * FROM `news` ORDER BY `id` LIMIT 5, 10 Whats wrong? Why the result has 10 values instead of 5? Thank you!

    Read the article

  • Help in restructuring a project

    - by mrblah
    I have a commerce application, asp.net mvc. I want it to be extensible in the sense others can create other payment providers, as long as they adhere to the interfaces. /blah.core /blah.web /blah.Authorize.net (Implementation of a payment provider using interfaces Ipaymentconfig and paymentdata class) Now the problem is this: /blah.core - PaymentData /blah.core.interfaces - IPaymentConfig where Payment Data looks like: using blah.core; public class PaymentData { public Order Order {get;set;} } IPayment data contains classes from blah.core like the Order class. Now I want to use the actual Authorize.net implementation, so when I tried to reference it in the blah.core project I got a circular dependency error. How could I solve this problem? Many have said to break out the interfaces into their own project, but the problem is PaymentData references entities that are found in blah.core also, so there doesn't seem to be a way around this (in my head anyhow). How can I redesign this?

    Read the article

  • Sending mail with Gmail Account using System.Net.Mail in ASP.NET

    - by Jalpesh P. Vadgama
    Any web application is in complete without mail functionality you should have to write send mail functionality. Like if there is shopping cart application for example then when a order created on the shopping cart you need to send an email to administrator of website for Order notification and for customer you need to send an email of receipt of order. So any web application is not complete without sending email. This post is also all about sending email. In post I will explain that how we can send emails from our Gmail Account without purchasing any smtp server etc. There are some limitations for sending email from Gmail Account. Please note following things. Gmail will have fixed number of quota for sending emails per day. So you can not send more then that emails for the day. Your from email address always will be your account email address which you are using for sending email. You can not send an email to unlimited numbers of people. Gmail ant spamming policy will restrict this. Gmail provide both Popup and SMTP settings both should be active in your account where you testing. You can enable that via clicking on setting link in gmail account and go to Forwarding and POP/Imap. So if you are using mail functionality for limited emails then Gmail is Best option. But if you are sending thousand of email daily then it will not be Good Idea. Here is the code for sending mail from Gmail Account. using System.Net.Mail; namespace Experiement { public partial class WebForm1 : System.Web.UI.Page { protected void Page_Load(object sender,System.EventArgs e) { MailMessage mailMessage = new MailMessage(new MailAddress("[email protected]") ,new MailAddress("[email protected]")); mailMessage.Subject = "Sending mail through gmail account"; mailMessage.IsBodyHtml = true; mailMessage.Body = "<B>Sending mail thorugh gmail from asp.net</B>"; System.Net.NetworkCredential networkCredentials = new System.Net.NetworkCredential("[email protected]", "yourpassword"); SmtpClient smtpClient = new SmtpClient(); smtpClient.EnableSsl = true; smtpClient.UseDefaultCredentials = false; smtpClient.Credentials = networkCredentials; smtpClient.Host = "smtp.gmail.com"; smtpClient.Port = 587; smtpClient.Send(mailMessage); Response.Write("Mail Successfully sent"); } } } That’s run this application and you will get like below in your account. Technorati Tags: Gmail,System.NET.Mail,ASP.NET

    Read the article

  • Improving Partitioned Table Join Performance

    - by Paul White
    The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x,   CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) );   CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x,   CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY);   WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME();   SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID;   SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50);   -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE);   -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory:   Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME();   CREATE TABLE #P ( partition_number integer PRIMARY KEY);   INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1;   SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals;   DROP TABLE #P;   SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated  – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

    Read the article

  • Getting the innermost .NET Exception

    - by Rick Strahl
    Here's a trivial but quite useful function that I frequently need in dynamic execution of code: Finding the innermost exception when an exception occurs, because for many operations (for example Reflection invocations or Web Service calls) the top level errors returned can be rather generic. A good example - common with errors in Reflection making a method invocation - is this generic error: Exception has been thrown by the target of an invocation In the debugger it looks like this: In this case this is an AJAX callback, which dynamically executes a method (ExecuteMethod code) which in turn calls into an Amazon Web Service using the old Amazon WSE101 Web service extensions for .NET. An error occurs in the Web Service call and the innermost exception holds the useful error information which in this case points at an invalid web.config key value related to the System.Net connection APIs. The "Exception has been thrown by the target of an invocation" error is the Reflection APIs generic error message that gets fired when you execute a method dynamically and that method fails internally. The messages basically says: "Your code blew up in my face when I tried to run it!". Which of course is not very useful to tell you what actually happened. If you drill down the InnerExceptions eventually you'll get a more detailed exception that points at the original error and code that caused the exception. In the code above the actually useful exception is two innerExceptions down. In most (but not all) cases when inner exceptions are returned, it's the innermost exception that has the information that is really useful. It's of course a fairly trivial task to do this in code, but I do it so frequently that I use a small helper method for this: /// <summary> /// Returns the innermost Exception for an object /// </summary> /// <param name="ex"></param> /// <returns></returns> public static Exception GetInnerMostException(Exception ex) { Exception currentEx = ex; while (currentEx.InnerException != null) { currentEx = currentEx.InnerException; } return currentEx; } This code just loops through all the inner exceptions (if any) and assigns them to a temporary variable until there are no more inner exceptions. The end result is that you get the innermost exception returned from the original exception. It's easy to use this code then in a try/catch handler like this (from the example above) to retrieve the more important innermost exception: object result = null; string stringResult = null; try { if (parameterList != null) // use the supplied parameter list result = helper.ExecuteMethod(methodToCall,target, parameterList.ToArray(), CallbackMethodParameterType.Json,ref attr); else // grab the info out of QueryString Values or POST buffer during parameter parsing // for optimization result = helper.ExecuteMethod(methodToCall, target, null, CallbackMethodParameterType.Json, ref attr); } catch (Exception ex) { Exception activeException = DebugUtils.GetInnerMostException(ex); WriteErrorResponse(activeException.Message, ( HttpContext.Current.IsDebuggingEnabled ? ex.StackTrace : null ) ); return; } Another function that is useful to me from time to time is one that returns all inner exceptions and the original exception as an array: /// <summary> /// Returns an array of the entire exception list in reverse order /// (innermost to outermost exception) /// </summary> /// <param name="ex">The original exception to work off</param> /// <returns>Array of Exceptions from innermost to outermost</returns> public static Exception[] GetInnerExceptions(Exception ex) {     List<Exception> exceptions = new List<Exception>();     exceptions.Add(ex);       Exception currentEx = ex;     while (currentEx.InnerException != null)     {         exceptions.Add(ex);     }       // Reverse the order to the innermost is first     exceptions.Reverse();       return exceptions.ToArray(); } This function loops through all the InnerExceptions and returns them and then reverses the order of the array returning the innermost exception first. This can be useful in certain error scenarios where exceptions stack and you need to display information from more than one of the exceptions in order to create a useful error message. This is rare but certain database exceptions bury their exception info in mutliple inner exceptions and it's easier to parse through them in an array then to manually walk the exception stack. It's also useful if you need to log errors and want to see the all of the error detail from all exceptions. None of this is rocket science, but it's useful to have some helpers that make retrieval of the critical exception info trivial. Resources DebugUtils.cs utility class in the West Wind Web Toolkit© Rick Strahl, West Wind Technologies, 2005-2011Posted in CSharp  .NET  

    Read the article

  • SQL SERVER – SSMS: Backup and Restore Events Report

    - by Pinal Dave
    A DBA wears multiple hats and in fact does more than what an eye can see. One of the core task of a DBA is to take backups. This looks so trivial that most developers shrug this off as the only activity a DBA might be doing. I have huge respect for DBA’s all around the world because even if they seem cool with all the scripting, automation, maintenance works round the clock to keep the business working almost 365 days 24×7, their worth is knowing that one day when the systems / HDD crashes and you have an important delivery to make. So these backup tasks / maintenance jobs that have been done come handy and are no more trivial as they might seem to be as considered by many. So the important question like: “When was the last backup taken?”, “How much time did the last backup take?”, “What type of backup was taken last?” etc are tricky questions and this report lands answers to the same in a jiffy. So the SSMS report, we are talking can be used to find backups and restore operation done for the selected database. Whenever we perform any backup or restore operation, the information is stored in the msdb database. This report can utilize that information and provide information about the size, time taken and also the file location for those operations. Here is how this report can be launched.   Once we launch this report, we can see 4 major sections shown as listed below. Average Time Taken For Backup Operations Successful Backup Operations Backup Operation Errors Successful Restore Operations Let us look at each section next. Average Time Taken For Backup Operations Information shown in “Average Time Taken For Backup Operations” section is taken from a backupset table in the msdb database. Here is the query and the expanded version of that particular section USE msdb; SELECT (ROW_NUMBER() OVER (ORDER BY t1.TYPE))%2 AS l1 ,       1 AS l2 ,       1 AS l3 ,       t1.TYPE AS [type] ,       (AVG(DATEDIFF(ss,backup_start_date, backup_finish_date)))/60.0 AS AverageBackupDuration FROM backupset t1 INNER JOIN sys.databases t3 ON ( t1.database_name = t3.name) WHERE t3.name = N'AdventureWorks2014' GROUP BY t1.TYPE ORDER BY t1.TYPE On my small database the time taken for differential backup was less than a minute, hence the value of zero is displayed. This is an important piece of backup operation which might help you in planning maintenance windows. Successful Backup Operations Here is the expanded version of this section.   This information is derived from various backup tracking tables from msdb database.  Here is the simplified version of the query which can be used separately as well. SELECT * FROM sys.databases t1 INNER JOIN backupset t3 ON (t3.database_name = t1.name) LEFT OUTER JOIN backupmediaset t5 ON ( t3.media_set_id = t5.media_set_id) LEFT OUTER JOIN backupmediafamily t6 ON ( t6.media_set_id = t5.media_set_id) WHERE (t1.name = N'AdventureWorks2014') ORDER BY backup_start_date DESC,t3.backup_set_id,t6.physical_device_name; The report does some calculations to show the data in a more readable format. For example, the backup size is shown in KB, MB or GB. I have expanded first row by clicking on (+) on “Device type” column. That has shown me the path of the physical backup file. Personally looking at this section, the Backup Size, Device Type and Backup Name are critical and are worth a note. As mentioned in the previous section, this section also has the Duration embedded inside it. Backup Operation Errors This section of the report gets data from default trace. You might wonder how. One of the event which is tracked by default trace is “ErrorLog”. This means that whatever message is written to errorlog gets written to default trace file as well. Interestingly, whenever there is a backup failure, an error message is written to ERRORLOG and hence default trace. This section takes advantage of that and shows the information. We can read below message under this section, which confirms above logic. No backup operations errors occurred for (AdventureWorks2014) database in the recent past or default trace is not enabled. Successful Restore Operations This section may not be very useful in production server (do you perform a restore of database?) but might be useful in the development and log shipping secondary environment, where we might be interested to see restore operations for a particular database. Here is the expanded version of the section. To fill this section of the report, I have restored the same backups which were taken to populate earlier sections. Here is the simplified version of the query used to populate this output. USE msdb; SELECT * FROM restorehistory t1 LEFT OUTER JOIN restorefile t2 ON ( t1.restore_history_id = t2.restore_history_id) LEFT OUTER JOIN backupset t3 ON ( t1.backup_set_id = t3.backup_set_id) WHERE t1.destination_database_name = N'AdventureWorks2014' ORDER BY restore_date DESC,  t1.restore_history_id,t2.destination_phys_name Have you ever looked at the backup strategy of your key databases? Are they in sync and do we have scope for improvements? Then this is the report to analyze after a week or month of maintenance plans running in your database. Do chime in with what are the strategies you are using in your environments. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Backup and Restore, SQL Query, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, T SQL Tagged: SQL Reports

    Read the article

  • [GEEK SCHOOL] Network Security 1: Securing User Accounts and Passwords in Windows

    - by Matt Klein
    This How-To Geek School class is intended for people who want to learn more about security when using Windows operating systems. You will learn many principles that will help you have a more secure computing experience and will get the chance to use all the important security tools and features that are bundled with Windows. Obviously, we will share everything you need to know about using them effectively. In this first lesson, we will talk about password security; the different ways of logging into Windows and how secure they are. In the proceeding lesson, we will explain where Windows stores all the user names and passwords you enter while working in this operating systems, how safe they are, and how to manage this data. Moving on in the series, we will talk about User Account Control, its role in improving the security of your system, and how to use Windows Defender in order to protect your system from malware. Then, we will talk about the Windows Firewall, how to use it in order to manage the apps that get access to the network and the Internet, and how to create your own filtering rules. After that, we will discuss the SmartScreen Filter – a security feature that gets more and more attention from Microsoft and is now widely used in its Windows 8.x operating systems. Moving on, we will discuss ways to keep your software and apps up-to-date, why this is important and which tools you can use to automate this process as much as possible. Last but not least, we will discuss the Action Center and its role in keeping you informed about what’s going on with your system and share several tips and tricks about how to stay safe when using your computer and the Internet. Let’s get started by discussing everyone’s favorite subject: passwords. The Types of Passwords Found in Windows In Windows 7, you have only local user accounts, which may or may not have a password. For example, you can easily set a blank password for any user account, even if that one is an administrator. The only exception to this rule are business networks where domain policies force all user accounts to use a non-blank password. In Windows 8.x, you have both local accounts and Microsoft accounts. If you would like to learn more about them, don’t hesitate to read the lesson on User Accounts, Groups, Permissions & Their Role in Sharing, in our Windows Networking series. Microsoft accounts are obliged to use a non-blank password due to the fact that a Microsoft account gives you access to Microsoft services. Using a blank password would mean exposing yourself to lots of problems. Local accounts in Windows 8.1 however, can use a blank password. On top of traditional passwords, any user account can create and use a 4-digit PIN or a picture password. These concepts were introduced by Microsoft to speed up the sign in process for the Windows 8.x operating system. However, they do not replace the use of a traditional password and can be used only in conjunction with a traditional user account password. Another type of password that you encounter in Windows operating systems is the Homegroup password. In a typical home network, users can use the Homegroup to easily share resources. A Homegroup can be joined by a Windows device only by using the Homegroup password. If you would like to learn more about the Homegroup and how to use it for network sharing, don’t hesitate to read our Windows Networking series. What to Keep in Mind When Creating Passwords, PINs and Picture Passwords When creating passwords, a PIN, or a picture password for your user account, we would like you keep in mind the following recommendations: Do not use blank passwords, even on the desktop computers in your home. You never know who may gain unwanted access to them. Also, malware can run more easily as administrator because you do not have a password. Trading your security for convenience when logging in is never a good idea. When creating a password, make it at least eight characters long. Make sure that it includes a random mix of upper and lowercase letters, numbers, and symbols. Ideally, it should not be related in any way to your name, username, or company name. Make sure that your passwords do not include complete words from any dictionary. Dictionaries are the first thing crackers use to hack passwords. Do not use the same password for more than one account. All of your passwords should be unique and you should use a system like LastPass, KeePass, Roboform or something similar to keep track of them. When creating a PIN use four different digits to make things slightly harder to crack. When creating a picture password, pick a photo that has at least 10 “points of interests”. Points of interests are areas that serve as a landmark for your gestures. Use a random mixture of gesture types and sequence and make sure that you do not repeat the same gesture twice. Be aware that smudges on the screen could potentially reveal your gestures to others. The Security of Your Password vs. the PIN and the Picture Password Any kind of password can be cracked with enough effort and the appropriate tools. There is no such thing as a completely secure password. However, passwords created using only a few security principles are much harder to crack than others. If you respect the recommendations shared in the previous section of this lesson, you will end up having reasonably secure passwords. Out of all the log in methods in Windows 8.x, the PIN is the easiest to brute force because PINs are restricted to four digits and there are only 10,000 possible unique combinations available. The picture password is more secure than the PIN because it provides many more opportunities for creating unique combinations of gestures. Microsoft have compared the two login options from a security perspective in this post: Signing in with a picture password. In order to discourage brute force attacks against picture passwords and PINs, Windows defaults to your traditional text password after five failed attempts. The PIN and the picture password function only as alternative login methods to Windows 8.x. Therefore, if someone cracks them, he or she doesn’t have access to your user account password. However, that person can use all the apps installed on your Windows 8.x device, access your files, data, and so on. How to Create a PIN in Windows 8.x If you log in to a Windows 8.x device with a user account that has a non-blank password, then you can create a 4-digit PIN for it, to use it as a complementary login method. In order to create one, you need to go to “PC Settings”. If you don’t know how, then press Windows + C on your keyboard or flick from the right edge of the screen, on a touch-enabled device, then press “Settings”. The Settings charm is now open. Click or tap the link that says “Change PC settings”, on the bottom of the charm. In PC settings, go to Accounts and then to “Sign-in options”. Here you will find all the necessary options for changing your existing password, creating a PIN, or a picture password. To create a PIN, press the “Add” button in the PIN section. The “Create a PIN” wizard is started and you are asked to enter the password of your user account. Type it and press “OK”. Now you are asked to enter a 4-digit pin in the “Enter PIN” and “Confirm PIN” fields. The PIN has been created and you can now use it to log in to Windows. How to Create a Picture Password in Windows 8.x If you log in to a Windows 8.x device with a user account that has a non-blank password, then you can also create a picture password and use it as a complementary login method. In order to create one, you need to go to “PC settings”. In PC Settings, go to Accounts and then to “Sign-in options”. Here you will find all the necessary options for changing your existing password, creating a PIN, or a picture password. To create a picture password, press the “Add” button in the “Picture password” section. The “Create a picture password” wizard is started and you are asked to enter the password of your user account. You are shown a guide on how the picture password works. Take a few seconds to watch it and learn the gestures that can be used for your picture password. You will learn that you can create a combination of circles, straight lines, and taps. When ready, press “Choose picture”. Browse your Windows 8.x device and select the picture you want to use for your password and press “Open”. Now you can drag the picture to position it the way you want. When you like how the picture is positioned, press “Use this picture” on the left. If you are not happy with the picture, press “Choose new picture” and select a new one, as shown during the previous step. After you have confirmed that you want to use this picture, you are asked to set up your gestures for the picture password. Draw three gestures on the picture, any combination you wish. Please remember that you can use only three gestures: circles, straight lines, and taps. Once you have drawn those three gestures, you are asked to confirm. Draw the same gestures one more time. If everything goes well, you are informed that you have created your picture password and that you can use it the next time you sign in to Windows. If you don’t confirm the gestures correctly, you will be asked to try again, until you draw the same gestures twice. To close the picture password wizard, press “Finish”. Where Does Windows Store Your Passwords? Are They Safe? All the passwords that you enter in Windows and save for future use are stored in the Credential Manager. This tool is a vault with the usernames and passwords that you use to log on to your computer, to other computers on the network, to apps from the Windows Store, or to websites using Internet Explorer. By storing these credentials, Windows can automatically log you the next time you access the same app, network share, or website. Everything that is stored in the Credential Manager is encrypted for your protection.

    Read the article

  • SQL SERVER – Concurrency Basics – Guest Post by Vinod Kumar

    - by pinaldave
    This guest post is by Vinod Kumar. Vinod Kumar has worked with SQL Server extensively since joining the industry over a decade ago. Working on various versions from SQL Server 7.0, Oracle 7.3 and other database technologies – he now works with the Microsoft Technology Center (MTC) as a Technology Architect. Let us read the blog post in Vinod’s own voice. Learning is always fun when it comes to SQL Server and learning the basics again can be more fun. I did write about Transaction Logs and recovery over my blogs and the concept of simplifying the basics is a challenge. In the real world we always see checks and queues for a process – say railway reservation, banks, customer supports etc there is a process of line and queue to facilitate everyone. Shorter the queue higher is the efficiency of system (a.k.a higher is the concurrency). Every database does implement this using checks like locking, blocking mechanisms and they implement the standards in a way to facilitate higher concurrency. In this post, let us talk about the topic of Concurrency and what are the various aspects that one needs to know about concurrency inside SQL Server. Let us learn the concepts as one-liners: Concurrency can be defined as the ability of multiple processes to access or change shared data at the same time. The greater the number of concurrent user processes that can be active without interfering with each other, the greater the concurrency of the database system. Concurrency is reduced when a process that is changing data prevents other processes from reading that data or when a process that is reading data prevents other processes from changing that data. Concurrency is also affected when multiple processes are attempting to change the same data simultaneously. Two approaches to managing concurrent data access: Optimistic Concurrency Model Pessimistic Concurrency Model Concurrency Models Pessimistic Concurrency Default behavior: acquire locks to block access to data that another process is using. Assumes that enough data modification operations are in the system that any given read operation is likely affected by a data modification made by another user (assumes conflicts will occur). Avoids conflicts by acquiring a lock on data being read so no other processes can modify that data. Also acquires locks on data being modified so no other processes can access the data for either reading or modifying. Readers block writer, writers block readers and writers. Optimistic Concurrency Assumes that there are sufficiently few conflicting data modification operations in the system that any single transaction is unlikely to modify data that another transaction is modifying. Default behavior of optimistic concurrency is to use row versioning to allow data readers to see the state of the data before the modification occurs. Older versions of the data are saved so a process reading data can see the data as it was when the process started reading and not affected by any changes being made to that data. Processes modifying the data is unaffected by processes reading the data because the reader is accessing a saved version of the data rows. Readers do not block writers and writers do not block readers, but, writers can and will block writers. Transaction Processing A transaction is the basic unit of work in SQL Server. Transaction consists of SQL commands that read and update the database but the update is not considered final until a COMMIT command is issued (at least for an explicit transaction: marked with a BEGIN TRAN and the end is marked by a COMMIT TRAN or ROLLBACK TRAN). Transactions must exhibit all the ACID properties of a transaction. ACID Properties Transaction processing must guarantee the consistency and recoverability of SQL Server databases. Ensures all transactions are performed as a single unit of work regardless of hardware or system failure. A – Atomicity C – Consistency I – Isolation D- Durability Atomicity: Each transaction is treated as all or nothing – it either commits or aborts. Consistency: ensures that a transaction won’t allow the system to arrive at an incorrect logical state – the data must always be logically correct.  Consistency is honored even in the event of a system failure. Isolation: separates concurrent transactions from the updates of other incomplete transactions. SQL Server accomplishes isolation among transactions by locking data or creating row versions. Durability: After a transaction commits, the durability property ensures that the effects of the transaction persist even if a system failure occurs. If a system failure occurs while a transaction is in progress, the transaction is completely undone, leaving no partial effects on data. Transaction Dependencies In addition to supporting all four ACID properties, a transaction might exhibit few other behaviors (known as dependency problems or consistency problems). Lost Updates: Occur when two processes read the same data and both manipulate the data, changing its value and then both try to update the original data to the new value. The second process might overwrite the first update completely. Dirty Reads: Occurs when a process reads uncommitted data. If one process has changed data but not yet committed the change, another process reading the data will read it in an inconsistent state. Non-repeatable Reads: A read is non-repeatable if a process might get different values when reading the same data in two reads within the same transaction. This can happen when another process changes the data in between the reads that the first process is doing. Phantoms: Occurs when membership in a set changes. It occurs if two SELECT operations using the same predicate in the same transaction return a different number of rows. Isolation Levels SQL Server supports 5 isolation levels that control the behavior of read operations. Read Uncommitted All behaviors except for lost updates are possible. Implemented by allowing the read operations to not take any locks, and because of this, it won’t be blocked by conflicting locks acquired by other processes. The process can read data that another process has modified but not yet committed. When using the read uncommitted isolation level and scanning an entire table, SQL Server can decide to do an allocation order scan (in page-number order) instead of a logical order scan (following page pointers). If another process doing concurrent operations changes data and move rows to a new location in the table, the allocation order scan can end up reading the same row twice. Also can happen if you have read a row before it is updated and then an update moves the row to a higher page number than your scan encounters later. Performing an allocation order scan under Read Uncommitted can cause you to miss a row completely – can happen when a row on a high page number that hasn’t been read yet is updated and moved to a lower page number that has already been read. Read Committed Two varieties of read committed isolation: optimistic and pessimistic (default). Ensures that a read never reads data that another application hasn’t committed. If another transaction is updating data and has exclusive locks on data, your transaction will have to wait for the locks to be released. Your transaction must put share locks on data that are visited, which means that data might be unavailable for others to use. A share lock doesn’t prevent others from reading but prevents them from updating. Read committed (snapshot) ensures that an operation never reads uncommitted data, but not by forcing other processes to wait. SQL Server generates a version of the changed row with its previous committed values. Data being changed is still locked but other processes can see the previous versions of the data as it was before the update operation began. Repeatable Read This is a Pessimistic isolation level. Ensures that if a transaction revisits data or a query is reissued the data doesn’t change. That is, issuing the same query twice within a transaction cannot pickup any changes to data values made by another user’s transaction because no changes can be made by other transactions. However, this does allow phantom rows to appear. Preventing non-repeatable read is a desirable safeguard but cost is that all shared locks in a transaction must be held until the completion of the transaction. Snapshot Snapshot Isolation (SI) is an optimistic isolation level. Allows for processes to read older versions of committed data if the current version is locked. Difference between snapshot and read committed has to do with how old the older versions have to be. It’s possible to have two transactions executing simultaneously that give us a result that is not possible in any serial execution. Serializable This is the strongest of the pessimistic isolation level. Adds to repeatable read isolation level by ensuring that if a query is reissued rows were not added in the interim, i.e, phantoms do not appear. Preventing phantoms is another desirable safeguard, but cost of this extra safeguard is similar to that of repeatable read – all shared locks in a transaction must be held until the transaction completes. In addition serializable isolation level requires that you lock data that has been read but also data that doesn’t exist. Ex: if a SELECT returned no rows, you want it to return no. rows when the query is reissued. This is implemented in SQL Server by a special kind of lock called the key-range lock. Key-range locks require that there be an index on the column that defines the range of values. If there is no index on the column, serializable isolation requires a table lock. Gets its name from the fact that running multiple serializable transactions at the same time is equivalent of running them one at a time. Now that we understand the basics of what concurrency is, the subsequent blog posts will try to bring out the basics around locking, blocking, deadlocks because they are the fundamental blocks that make concurrency possible. Now if you are with me – let us continue learning for SQL Server Locking Basics. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Concurrency

    Read the article

  • Internet Protocol Suite: Transition Control Protocol (TCP) vs. User Datagram Protocol (UDP)

    How do we communicate over the Internet?  How is data transferred from one machine to another? These types of act ivies can only be done by using one of two Internet protocols currently. The collection of Internet Protocol consists of the Transition Control Protocol (TCP) and the User Datagram Protocol (UDP).  Both protocols are used to send data between two network end points, however they both have very distinct ways of transporting data from one endpoint to another. If transmission speed and reliability is the primary concern when trying to transfer data between two network endpoints then TCP is the proper choice. When a device attempts to send data to another endpoint using TCP it creates a direct connection between both devices until the transmission has completed. The direct connection between both devices ensures the reliability of the transmission due to the fact that no intermediate devices are needed to transfer the data. Due to the fact that both devices have to continuously poll the connection until transmission has completed increases the resources needed to perform the transmission. An example of this type of direct communication can be seen when a teacher tells a students to do their homework. The teacher is talking directly to the students in order to communicate that the homework needs to be done.  Students can then ask questions about the assignment to ensure that they have received the proper instructions for the assignment. UDP is a less resource intensive approach to sending data between to network endpoints. When a device uses UDP to send data across a network, the data is broken up and repackaged with the destination address. The sending device then releases the data packages to the network, but cannot ensure when or if the receiving device will actually get the data.  The sending device depends on other devices on the network to forward the data packages to the destination devices in order to complete the transmission. As you can tell this type of transmission is less resource intensive because not connection polling is needed,  but should not be used for transmitting data with speed or reliability requirements. This is due to the fact that the sending device can not ensure that the transmission is received.  An example of this type of communication can be seen when a teacher tells a student that they would like to speak with their parents. The teacher is relying on the student to complete the transmission to the parents, and the teacher has no guarantee that the student will actually inform the parents about the request. Both TCP and UPD are invaluable when attempting to send data across a network, but depending on the situation one protocol may be better than the other. Before deciding on which protocol to use an evaluation for transmission speed, reliability, latency, and overhead must be completed in order to define the best protocol for the situation.  

    Read the article

  • Asserting with JustMock

    In this post, i will be digging in a bit deep on Mock.Assert. This is the continuation from previous post and covers up the ways you can use assert for your mock expectations. I have used another traditional sample of Talisker that has a warehouse [Collaborator] and an order class [SUT] that will call upon the warehouse to see the stock and fill it up with items. Our sample, interface of warehouse and order looks similar to : public interface IWarehouse { bool HasInventory(string productName,...Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

    Read the article

  • OWB 11gR2 - Early Arriving Facts

    - by Dawei Sun
    A common challenge when building ETL components for a data warehouse is how to handle early arriving facts. OWB 11gR2 introduced a new feature to address this for dimensional objects entitled Orphan Management. An orphan record is one that does not have a corresponding existing parent record. Orphan management automates the process of handling source rows that do not meet the requirements necessary to form a valid dimension or cube record. In this article, a simple example will be provided to show you how to use Orphan Management in OWB. We first import a sample MDL file that contains all the objects we need. Then we take some time to examine all the objects. After that, we prepare the source data, deploy the target table and dimension/cube loading map. Finally, we run the loading maps, and check the data in target dimension/cube tables. OK, let’s start… 1. Import MDL file and examine sample project First, download zip file from here, which includes a MDL file and three source data files. Then we open OWB design center, import orphan_management.mdl by using the menu File->Import->Warehouse Builder Metadata. Now we have several objects in BI_DEMO project as below: Mapping LOAD_CHANNELS_OM: The mapping for dimension loading. Mapping LOAD_SALES_OM: The mapping for cube loading. Dimension CHANNELS_OM: The dimension that contains channels data. Cube SALES_OM: The cube that contains sales data. Table CHANNELS_OM: The star implementation table of dimension CHANNELS_OM. Table SALES_OM: The star implementation table of cube SALES_OM. Table SRC_CHANNELS: The source table of channels data, that will be loaded into dimension CHANNELS_OM. Table SRC_ORDERS and SRC_ORDER_ITEMS: The source tables of sales data that will be loaded into cube SALES_OM. Sequence CLASS_OM_DIM_SEQ: The sequence used for loading dimension CHANNELS_OM. Dimension CHANNELS_OM This dimension has a hierarchy with three levels: TOTAL, CLASS and CHANNEL. Each level has three attributes: ID (surrogate key), NAME and SOURCE_ID (business key). It has a standard star implementation. The orphan management policy and the default parent setting are shown in the following screenshots: The orphan management policy options that you can set for loading are: Reject Orphan: The record is not inserted. Default Parent: You can specify a default parent record. This default record is used as the parent record for any record that does not have an existing parent record. If the default parent record does not exist, Warehouse Builder creates the default parent record. You specify the attribute values of the default parent record at the time of defining the dimensional object. If any ancestor of the default parent does not exist, Warehouse Builder also creates this record. No Maintenance: This is the default behavior. Warehouse Builder does not actively detect, reject, or fix orphan records. While removing data from a dimension, you can select one of the following orphan management policies: Reject Removal: Warehouse Builder does not allow you to delete the record if it has existing child records. No Maintenance: This is the default behavior. Warehouse Builder does not actively detect, reject, or fix orphan records. (More details are at http://download.oracle.com/docs/cd/E11882_01/owb.112/e10935/dim_objects.htm#insertedID1) Cube SALES_OM This cube is references to dimension CHANNELS_OM. It has three measures: AMOUNT, QUANTITY and COST. The orphan management policy setting are shown as following screenshot: The orphan management policy options that you can set for loading are: No Maintenance: Warehouse Builder does not actively detect, reject, or fix orphan rows. Default Dimension Record: Warehouse Builder assigns a default dimension record for any row that has an invalid or null dimension key value. Use the Settings button to define the default parent row. Reject Orphan: Warehouse Builder does not insert the row if it does not have an existing dimension record. (More details are at http://download.oracle.com/docs/cd/E11882_01/owb.112/e10935/dim_objects.htm#BABEACDG) Mapping LOAD_CHANNELS_OM This mapping loads source data from table SRC_CHANNELS to dimension CHANNELS_OM. The operator CHANNELS_IN is bound to table SRC_CHANNELS; CHANNELS_OUT is bound to dimension CHANNELS_OM. The TOTALS operator is used for generating a constant value for the top level in the dimension. The CLASS_FILTER operator is used to filter out the “invalid” class name, so then we can see what will happen when those channel records with an “invalid” parent are loading into dimension. Some properties of the dimension operator in this mapping are important to orphan management. See the screenshot below: Create Default Level Records: If YES, then default level records will be created. This property must be set to YES for dimensions and cubes if one of their orphan management policies is “Default Parent” or “Default Dimension Record”. This property is set to NO by default, so the user may need to set this to YES manually. LOAD policy for INVALID keys/ LOAD policy for NULL keys: These two properties have the same meaning as in the dimension editor. The values are set to the same as the dimension value when user drops the dimension into the mapping. The user does not need to modify these properties. Record Error Rows: If YES, error rows will be inserted into error table when loading the dimension. REMOVE Orphan Policy: This property is used when removing data from a dimension. Since the dimension loading type is set to LOAD in this example, this property is disabled. Mapping LOAD_SALES_OM This mapping loads source data from table SRC_ORDERS and SRC_ORDER_ITEMS to cube SALES_OM. This mapping seems a little bit complicated, but operators in the red rectangle are used to filter out and generate the records with “invalid” or “null” dimension keys. Some properties of the cube operator in a mapping are important to orphan management. See the screenshot below: Enable Source Aggregation: Should be checked in this example. If the default dimension record orphan policy is set for the cube operator, then it is recommended that source aggregation also be enabled. Otherwise, the orphan management processing may produce multiple fact rows with the same default dimension references, which will cause an “unstable rowset” execution error in the database, since the dimension refs are used as update match attributes for updating the fact table. LOAD policy for INVALID keys/ LOAD policy for NULL keys: These two properties have the same meaning as in the cube editor. The values are set to the same as in the cube editor when the user drops the cube into the mapping. The user does not need to modify these properties. Record Error Rows: If YES, error rows will be inserted into error table when loading the cube. 2. Deploy objects and mappings We now can deploy the objects. First, make sure location SALES_WH_LOCAL has been correctly configured. Then open Control Center Manager by using the menu Tools->Control Center Manager. Expand BI_DEMO->SALES_WH_LOCAL, click SALES_WH node on the project tree. We can see the following objects: Deploy all the objects in the following order: Sequence CLASS_OM_DIM_SEQ Table CHANNELS_OM, SALES_OM, SRC_CHANNELS, SRC_ORDERS, SRC_ORDER_ITEMS Dimension CHANNELS_OM Cube SALES_OM Mapping LOAD_CHANNELS_OM, LOAD_SALES_OM Note that we deployed source tables as well. Normally, we import source table from database instead of deploying them to target schema. However, in this example, we designed the source tables in OWB and deployed them to database for the purpose of this demonstration. 3. Prepare and examine source data Before running the mappings, we need to populate and examine the source data first. Run SRC_CHANNELS.sql, SRC_ORDERS.sql and SRC_ORDER_ITEMS.sql as target user. Then we check the data in these three tables. Table SRC_CHANNELS SQL> select rownum, id, class, name from src_channels; Records 1~5 are correct; they should be loaded into dimension without error. Records 6,7 and 8 have null parents; they should be loaded into dimension with a default parent value, and should be inserted into error table at the same time. Records 9, 10 and 11 have “invalid” parents; they should be rejected by dimension, and inserted into error table. Table SRC_ORDERS and SRC_ORDER_ITEMS SQL> select rownum, a.id, a.channel, b.amount, b.quantity, b.cost from src_orders a, src_order_items b where a.id = b.order_id; Record 178 has null dimension reference; it should be loaded into cube with a default dimension reference, and should be inserted into error table at the same time. Record 179 has “invalid” dimension reference; it should be rejected by cube, and inserted into error table. Other records should be aggregated and loaded into cube correctly. 4. Run the mappings and examine the target data In the Control Center Manager, expand BI_DEMO-> SALES_WH_LOCAL-> SALES_WH-> Mappings, right click on LOAD_CHANNELS_OM node, click Start. Use the same way to run mapping LOAD_SALES_OM. When they successfully finished, we can check the data in target tables. Table CHANNELS_OM SQL> select rownum, total_id, total_name, total_source_id, class_id,class_name, class_source_id, channel_id, channel_name,channel_source_id from channels_om order by abs(dimension_key); Records 1,2 and 3 are the default dimension records for the three levels. Records 8, 10 and 15 are the loaded records that originally have null parents. We see their parents name (class_name) is set to DEF_CLASS_NAME. Those records whose CHANNEL_NAME are Special_4, Special_5 and Special_6 are not loaded to this table because of the invalid parent. Error Table CHANNELS_OM_ERR SQL> select rownum, class_source_id, channel_id, channel_name,channel_source_id, err$$$_error_reason from channels_om_err order by channel_name; We can see all the record with null parent or invalid parent are inserted into this error table. Error reason is “Default parent used for record” for the first three records, and “No parent found for record” for the last three. Table SALES_OM SQL> select a.*, b.channel_name from sales_om a, channels_om b where a.channels=b.channel_id; We can see the order record with null channel_name has been loaded into target table with a default channel_name. The one with “invalid” channel_name are not loaded. Error Table SALES_OM_ERR SQL> select a.amount, a.cost, a.quantity, a.channels, b.channel_name, a.err$$$_error_reason from sales_om_err a, channels_om b where a.channels=b.channel_id(+); We can see the order records with null or invalid channel_name are inserted into error table. If the dimension reference column is null, the error reason is “Default dimension record used for fact”. If it is invalid, the error reason is “Dimension record not found for fact”. Summary In summary, this article illustrated the Orphan Management feature in OWB 11gR2. Automated orphan management policies improve ETL developer and administrator productivity by addressing an important cause of cube and dimension load failures, without requiring developers to explicitly build logic to handle these orphan rows.

    Read the article

  • rsnapshot intervals in configuration file…

    - by Patrick
    A simple question about rsnapshot. In order to perform daily backups I'm going to add lines to cron in my Ubuntu. Then, why do I have also these lines in the rsnapshot.conf ? ######################################### # BACKUP INTERVALS # # Must be unique and in ascending order # # i.e. hourly, daily, weekly, etc. # ######################################### interval hourly 6 interval daily 7 interval weekly 4 #interval monthly 3 If I use cron, should I disable them ? thanks ps. I've just realized that in the crontab I still have "hourly" and "daily". Should I then uncomment only the one I use in the crontab ? And what's the point to specify hourly if it is already specified in cron ? I'm a bit confused. # crontab -e 0 */4 * * * /usr/local/bin/rsnapshot hourly 30 23 * * * /usr/local/bin/rsnapshot daily

    Read the article

  • Specifying a file name for the FTP and File based transports in OSB

    - by [email protected]
    A common question I receive is how to incorporate a variable value into a file name when using the FTP, SFTP, or File transports in Oracle Service Bus.  For example, if one of the fields in a message being put down to a file by the File transport is an order number variable, then how can you make the order number become part of the file name?  Another example might be if you want to specify the date in the file name.  The transport configuration wizard in OSB does not have an option to allow for this, other than allowing you to specify a static prefix of suffix variable.

    Read the article

  • How do I use with LTSP with a Dell FX170 thin client?

    - by v4169sgr
    Just got a reconditioned DELL Optiplex FX170 thin client delivered, without an image. On power-up, I see the message: DISK BOOT FAILURE, INSERT SYSTEM DISK AND PRESS ENTER I am very sure all the wires etc are connected properly. I would like it to PXE boot and find my LTSP installation. I have an HP T5525 with the same connectivity that works fine. On F12 there's a BIOS boot order menu. The only options available though are LS120, the HDD, CD / DVD [via USB presumably], and various USB options. I do not see 'PXE' or any network card. And I don't know how to change the boot order :( Any help appreciated!

    Read the article

  • CodeStock 2012 Review: Michael Eaton( @mjeaton ) - 3 Simple Things for Increased Productivity

    3 Simple Things for Increased ProductivitySpeaker: Michael EatonTwitter: @mjeatonBlog: http://mjeaton.net/blog This was the first time I had seen Michael Eaton speak but have hear a lot of really good things about his speaking abilities. Needless to say I was really looking forward to his session. He basically addressed the topic of distractions and how they can decrease or increase your productivity as a developer. He makes the case that in order to become more productive you must block/limit all distractions. For example, he covered his top distractions as a developer. Top Distractions Social Media(Twitter, Reddit, Facebook) Wiki sites Phone Email Video Games Coworkers, Friends, Family Michael stated that he uses various types of music to help him block out these distractions in order for him to get into his coding zone. While he states that music works for him, he also notes that he knows of others that cannot really work with music. I have to say I am in the latter group because I require a quiet environment in order to work. A few session attendees also recommended listening to really loud white noise or music in another language other than your own. This allows for less focus to be placed on words being sung compared to the rhythmic beats being played. I have to say that I have not tried these suggestions yet but will in the near future. However, distractions can be very beneficial to productivity in that they give your mind a chance to relax and not think about the issues at hand. He spoke highly of taking vacations, and setting boundaries at work so that develops prevent the problem of burnout. One way he suggested that developer’s combat distractions is to use the Pomodoro technique. In his example he selects one task to do for 20 minutes and he can only do that task during that time. He ignores all other distractions until this task or time limit is complete. After it is completed he allows himself to relax and distract himself for another 5- 10 minutes before his next Pomodoro. This allows him to stay completely focused on a task and when the time is up he can then focus on other things.

    Read the article

  • How to bill a client for frequently-interrupted time

    - by Greg
    I find that when I'm working on hourly-billable projects (in particular, those that are research/design/architecture-oriented as opposed to straight coding) that I'm easily distracted by any number of things (email, grab a drink (loss of focus, but nature happens), link off the webpage I was reading, wandering mind (easy when the job calls for a lot of thinking), etc.) This results in very fragmented time, far too incremental IMO to accurately track with a timeclock, and some time very gray. I frequently end up billing for only some fraction of the elapsed time I spent in order to feel fair, but sometimes it takes a really long time to put in an 8-hour day. By contrast, when I've worked for salary I've not worried about whether I'm actively working at any given minute, I just get the job done, and I've never had anything but stellar reviews/feedback from past salaried employers, so I think I get the job done well. I personally believe in an 80/20 cycle: I get 80% of my work done during an inspired 20% of my time. But I have to screw around the other 80% of the time in order to get that first 20%. So the question: what billing/time-tracking policy can I adopt in order to be fair to my hourly customers without having to write off my own less-productive 80% that a salaried employer is willing to overlook in light of the complete package? Note: This question is not about how to be more productive or focused. It's about how to work around whatever salient limitations that I have in a way that's both fair to me and to my customers. Update: A little clarification (to pre-emptively stop some righteous indignation): I currently have a half dozen different project/client groups. It's not a great situation and I'm working at reducing it down to two, but that's my current reality. It's very easy to get off on a thread related to a different project than the one I'm clocking, and I'm not always conscious of it at the time. [I did not intend the question to mean that I was off playing games or making personal calls, etc., and have adjusted wording above to be clearer. Most of the time. I am only human, and sometimes the mind does force you to take a break! :-)]

    Read the article

  • Not so long ago in a city not so far away by Carlos Martin

    - by Maria Sandu
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 This is the story of how the EMEA Presales Center turned an Oracle intern into a trusted technology advisor for both Oracle’s Sales and customers. It was the summer of 2011 when I was finishing my Computer Engineering studies as well as my internship at Oracle when I was offered what could possibly be THE dream job for any young European Computer Engineer. Apart from that, it also seemed like the role was particularly tailored to me as I could leverage almost everything I learned at University and during the internship. And all of it in one of the best cities to live in, not only from my home country but arguably from Europe: Malaga! A day at EPC As part of the EPC Technology pillar, and later on completely focused on WebCenter, there was no way to describe a normal day on the job as each day had something unique. Some days I was researching documentation in order to elaborate accurate answers for a customer’s question within a Request for Information or Proposal (RFI/RFP), other days I was doing heavy programming in order to bring a Proof of Concept (PoC) for a customer to life and last not but least, some days I presented to the customer via webconference the demo I built for them the past weeks. So as you can see, the role has research, development and presentation, could you ask for more? Well, don’t worry because there IS more! Internationality As the organization’s name suggests, EMEA Presales Center, it is the Center of Presales within Europe, Middle East and Africa so I got the chance to work with great professionals from all this regions, expanding my network and learning things from one country to apply them to others. In addition to that, the teams based in the Malaga office are comprised of many young professionals hailing mainly from Western and Central European countries (although there are a couple of exceptions!) with very different backgrounds and personalities which guaranteed many laughs and stories during lunch or coffee breaks (or even while working on projects!). Furthermore, having EPC offices in Bucharest and Bangalore and thanks to today’s tele-presence technologies, I was working every day with people from India or Romania as if they were sitting right next to me and the bonding with them got stronger day by day. Career development Apart from the research and self-study I’ve earlier mentioned, one of the EPC’s Key Performance Indicators (KPI) is that 15% of your time is spent on training so you get lots and lots of trainings in order to develop both your technical product knowledge and your presentation, negotiation and other soft skills. Sometimes the training is via webcast, sometimes the trainer comes to the office and sometimes, the best times, you get to travel abroad in order to attend a training, which also helps you to further develop your network by meeting face to face with many people you only know from some email or instant messaging interaction. And as the months go by, your skills improving at a very fast pace, your relevance increasing with each new project you successfully deliver, it’s only a matter of time (and a bit of self-promoting!) that you get the attention of the manager of a more senior team and are offered the opportunity to take a new step in your professional career. For me it took 2 years to move to my current position, Technology Sales Consultant at the Oracle Direct organization. During those 2 years I had built a good relationship with the Oracle Direct Spanish sales and sales managers, who are also based in the Malaga office. I supported their former Sales Consultant in a couple of presentations and demos and were very happy with my overall performance and attitude so even before the position got eventually vacant, I got a heads-up from then in advance that their current Sales Consultant was going to move to a different position. To me it felt like a natural step, same as when I joined EPC, I had at least a 50% of the “homework” already done but wanted to experience that extra 50% to add new product and soft skills to my arsenal. The rest is history, I’ve been in the role for more than half a year as I’m writing this, achieved already some important wins, gained a lot of trust and confidence in front of customers and broadened my view of Oracle’s Fusion Middleware portfolio. I look back at the 2 years I spent in EPC and think: “boy, I’d recommend that experience to absolutely anyone with the slightest interest in IT, there are so many different things you can do as there are different kind of roles you can end up taking thanks to the experience gained at EPC” /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;}

    Read the article

  • WCF Error when using “Match Data” function in MDS Excel AddIn

    - by Davide Mauri
    If you’re using MDS and DQS with the Excel Integration you may get an error when trying to use the “Match Data” feature that uses DQS in order to help to identify duplicate data in your data set. The error is quite obscure and you have to enable WCF error reporting in order to have the error details and you’ll discover that they are related to some missing permission in MDS and DQS_STAGING_DATA database. To fix the problem you just have to give the needed permession, as the following script does: use MDS go GRANT SELECT ON mdm.tblDataQualityOperationsState TO [VMSRV02\mdsweb] GRANT INSERT ON mdm.tblDataQualityOperationsState TO [VMSRV02\mdsweb] GRANT DELETE ON mdm.tblDataQualityOperationsState TO [VMSRV02\mdsweb] GRANT UPDATE ON mdm.tblDataQualityOperationsState TO [VMSRV02\mdsweb] USE [DQS_STAGING_DATA] GO ALTER AUTHORIZATION ON SCHEMA::[db_datareader] TO [VMSRV02\mdsweb] ALTER AUTHORIZATION ON SCHEMA::[db_datawriter] TO [VMSRV02\mdsweb] ALTER AUTHORIZATION ON SCHEMA::[db_ddladmin] TO [VMSRV02\mdsweb] GO Where “VMSRV02\mdsweb” is the user you configured for MDS Service execution. If you don’t remember it, you can just check which account has been assigned to the IIS application pool that your MDS website is using:

    Read the article

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

  • Translate jQuery UI Datepicker format to .Net Date format

    - by Michael Freidgeim
    I needed to use the same date format in client jQuery UI Datepicker and server ASP.NET code. The actual format can be different for different localization cultures.I decided to translate Datepicker format to .Net Date format similar as it was asked to do opposite operation in http://stackoverflow.com/questions/8531247/jquery-datepickers-dateformat-how-to-integrate-with-net-current-culture-date Note that replace command need to replace whole words and order of calls is importantFunction that does opposite operation (translate  .Net Date format toDatepicker format) is described in http://www.codeproject.com/Articles/62031/JQueryUI-Datepicker-in-ASP-NET-MVC /// <summary> /// Uses regex '\b' as suggested in //http://stackoverflow.com/questions/6143642/way-to-have-string-replace-only-hit-whole-words /// </summary> /// <param name="original"></param> /// <param name="wordToFind"></param> /// <param name="replacement"></param> /// <param name="regexOptions"></param> /// <returns></returns> static public string ReplaceWholeWord(this string original, string wordToFind, string replacement, RegexOptions regexOptions = RegexOptions.None) { string pattern = String.Format(@"\b{0}\b", wordToFind); string ret=Regex.Replace(original, pattern, replacement, regexOptions); return ret; } /// <summary> /// E.g "DD, d MM, yy" to ,"dddd, d MMMM, yyyy" /// </summary> /// <param name="datePickerFormat"></param> /// <returns></returns> /// <remarks> /// Idea to replace from http://stackoverflow.com/questions/8531247/jquery-datepickers-dateformat-how-to-integrate-with-net-current-culture-date ///From http://docs.jquery.com/UI/Datepicker/$.datepicker.formatDate to http://msdn.microsoft.com/en-us/library/8kb3ddd4.aspx ///Format a date into a string value with a specified format. ///d - day of month (no leading zero) ---.Net the same ///dd - day of month (two digit) ---.Net the same ///D - day name short ---.Net "ddd" ///DD - day name long ---.Net "dddd" ///m - month of year (no leading zero) ---.Net "M" ///mm - month of year (two digit) ---.Net "MM" ///M - month name short ---.Net "MMM" ///MM - month name long ---.Net "MMMM" ///y - year (two digit) ---.Net "yy" ///yy - year (four digit) ---.Net "yyyy" /// </remarks> public static string JQueryDatePickerFormatToDotNetDateFormat(string datePickerFormat) { string sRet = datePickerFormat.ReplaceWholeWord("DD", "dddd").ReplaceWholeWord("D", "ddd"); sRet = sRet.ReplaceWholeWord("M", "MMM").ReplaceWholeWord("MM", "MMMM").ReplaceWholeWord("m", "M").ReplaceWholeWord("mm", "MM");//order is important sRet = sRet.ReplaceWholeWord("yy", "yyyy").ReplaceWholeWord("y", "yy");//order is important return sRet; }

    Read the article

  • CodeStock 2012 Review: Michael Eaton( @mjeaton ) - 3 Simple Things for Increased Productivity

    3 Simple Things for Increased ProductivitySpeaker: Michael EatonTwitter: @mjeatonBlog: http://mjeaton.net/blog This was the first time I had seen Michael Eaton speak but have hear a lot of really good things about his speaking abilities. Needless to say I was really looking forward to his session. He basically addressed the topic of distractions and how they can decrease or increase your productivity as a developer. He makes the case that in order to become more productive you must block/limit all distractions. For example, he covered his top distractions as a developer. Top Distractions Social Media(Twitter, Reddit, Facebook) Wiki sites Phone Email Video Games Coworkers, Friends, Family Michael stated that he uses various types of music to help him block out these distractions in order for him to get into his coding zone. While he states that music works for him, he also notes that he knows of others that cannot really work with music. I have to say I am in the latter group because I require a quiet environment in order to work. A few session attendees also recommended listening to really loud white noise or music in another language other than your own. This allows for less focus to be placed on words being sung compared to the rhythmic beats being played. I have to say that I have not tried these suggestions yet but will in the near future. However, distractions can be very beneficial to productivity in that they give your mind a chance to relax and not think about the issues at hand. He spoke highly of taking vacations, and setting boundaries at work so that develops prevent the problem of burnout. One way he suggested that developer’s combat distractions is to use the Pomodoro technique. In his example he selects one task to do for 20 minutes and he can only do that task during that time. He ignores all other distractions until this task or time limit is complete. After it is completed he allows himself to relax and distract himself for another 5- 10 minutes before his next Pomodoro. This allows him to stay completely focused on a task and when the time is up he can then focus on other things.

    Read the article

  • Detecting 404 errors after a new site design

    - by James Crowley
    We recently re-designed Developer Fusion and as part of that we needed to ensure that any external links were not broken in the process. In order to monitor this, we used the awesome LogParser tool. All you need to do is open up a command prompt, navigate to the directory with your web site's log files in, and run a query like this: "c:\program files (x86)\log parser 2.2\logparser" "SELECT top 500 cs-uri-stem,count(*) FROM u_ex*.log WHERE sc-status=404 GROUP BY cs-uri-stem order by count(*) desc" -rtp:-1 topMissingUrls.txt And you've got a text file with the top 500 requested URLs that are returning 404. Simple!

    Read the article

< Previous Page | 137 138 139 140 141 142 143 144 145 146 147 148  | Next Page >