Search Results

Search found 23323 results on 933 pages for 'worst is better'.

Page 371/933 | < Previous Page | 367 368 369 370 371 372 373 374 375 376 377 378 | Next Page >

Prioritizing Product Features

- by Robert May

A very common task in Agile Environments is prioritization. Teams that are functioning well will prioritize new features, old features, the backlog, and any other source of stories for the team, and they’ll do it regularly. Not all teams are good at prioritizing according to the real return on investment that building stories will yield to the company. This is unfortunate. Too often, teams end up building features that are less valuable, and everyone seems to know it except perhaps the product owner! Most features built into software are never even used. Clearly, not much return for features that go unused. So how does a company avoid building features that add little value to the company? This is a tough question to answer, but usually, this prioritization starts at the top with the executives of the company. After all, they’re responsible for the overall vision of the company. Here’s what I recommend: Know your market. Know your customers and users. Know where you’re going and what you want to achieve. Implement the Vision Know Your Market We often see companies that don’t know their market. Personally, I’m surprised by this. These companies don’t know who their competitors are, don’t know what features make their product desirable in the market, and in many cases, get by with saying, “I’ve been doing this for XX years. I know what the market wants!” In many cases, they equate “marketing” with “advertising” and don’t understand the difference. This is almost never true. Good companies will spend significant amounts of time and money finding out who they’re competing against and what makes their competitors successful in the marketplace. Good companies understand that marketing involves more than just advertising. Often, marketing is mostly research and analysis, not sales. Until you understand your market, you cannot know what features will give you the best return on your investment dollar. Good companies have a marketing department and can answer the next important step which is to know your customers and your users. Know your Customers and Users First, note that I included both customers and users. They’re often not the same thing. Users use the product that you build. Customers buy the product that you build. It’s a subtle difference, but too often, I’ve seen companies that focus exclusively on one or the other and are not successful simply because they ignore an important part of the group. If your company is doing appropriate marketing, you know that these are two different aspects of your product and that both deserve attention to have a product that is successful in your target market. Your marketing department should be spending a lot of time understanding these personas and then conveying that information to the company. I’m always surprised when development teams think that they can build a product that people want to use without understanding the users of that product. Developers think differently than most people in the world. They know what the computer is doing. The computer isn’t magic to them. So when they assume that they know how to build something, they bring with them quite a bit of baggage. Never assume that you know your customer unless you’re regularly having interaction with them. Also, don’t just leave this to Marketing or Product Management. Take them time to get your developers out with the customers as well. Developers are very smart people, and often, seeing how someone uses their software inspires them to make a much better product. Very often, because the users and customers aren’t know, teams will spend a significant amount of time building apps that are super flexible and configurable so that any possible combination of feature can be used. This demonstrates a clear lack of understanding of the customer. Most configuration questions can quickly be answered by talking to the customer. In most cases, if your software requires significant setup and configuration before its usable, you probably don’t know your customers and users very well. Until you know your customers, you cannot know what features will be most valuable to your customers and you cannot build those features in a way that your customers can use. Know Where You’re Going and What You Want to Achieve Many companies suffer from not having a plan. Executives will tell the team to make them a plan. The team, not knowing their market and customers and users, will come up with a plan that doesn’t reflect reality and doesn’t consider ROI. Management then wonders why the product is doing poorly in the market place. Instead of leaving this up to the teams, as executives, work with Marketing to understand what broad categories of features will sell the most product in the marketplace. Then, once you’ve determined that, give this vision to the team and let them run with it. Revise the vision as needed, but avoid changing streams frequently. Sure, sometimes you need to, but often, executives will change priorities many times a month, leading to nothing more than confusion. If the team has a vision, they’ll be able to execute that vision far better than they could otherwise. By knowing what products are most important, you can set budgetary goals and guidelines that will help you achieve the vision that was created. Implement the Vision Creating the vision is often where the general executives stop participating in the plan. The team is responsible for implementing that vision. Executives should attend showcases and and should remain aware of the progress that the team is making towards meeting the vision, however. Once a broad vision has been created, the team should break that vision down into minimal market features (MMF). These MMFs should be sized using story points so that, using the team’s velocity, an estimated cost can be determined for each feature. The product management team should then try to quantify the relative value of the MMFs based on customer feedback and interviews. Once the value and cost of creating the feature is understood, a return on investment can be calculated. The features should then be prioritized with the MMF’s that have the highest value and lowest cost rising to the top of features to implement. Don’t let politics get in the way! Once the MMF’s have been prioritized, they should go through release planning to schedule them for implementation. Conclusion By having a good grasp on the strategy of the company, your Agile teams can be much more effective. Each and every story the team is implementing will roll up into features that matter to the company and provide ROI to them. The steps outlined in this post should be repeated on a regular basis. I recommend reviewing them at least once per quarter to make sure that the vision hasn’t shifted and that the teams are still working on what matters most to the company. Technorati Tags: Agile,Product Owner,ROI

Read the article
How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article
ASP.NET MVC 2 throws exception for ‘favicon.ico’

- by nmarun

I must be on fire or something – third blog in 2 days… awesome! Before I begin, in case you’re wondering, favicon.ico is the small image that appears to the left of your web address, once the page loads. In order to learn more about MVC or any thing for that matter, it’s better to look at the source itself. Since MVC is open source (at least some part of it is), I started looking at the source code that’s available for download. While doing so, I hit Steve Sanderson’s blog site where he explains in great detail the way to debug your app using ASP.NET MVC source code. For those who are not aware, Steve Sanderson’s book - Pro ASP.NET MVC Framework, is one of the best books to learn about MVC. Alrighty, I followed the article and I hit F5 to debug the default / unchanged MVC project. I put a breakpoint in the DefaultControllerFactory.cs, CreateController() method. To know a little more about this class and the method, read this. Sure enough, the control stopped at the breakpoint and I hit F5 again and the page rendered just fine. But then what’s this? The breakpoint was hit again, as if something else was being requested. I now hovered my mouse over the ‘controllerName’ parameter and it says – favicon.ico. This by itself was more than enough for me to raise my eye-brows, but what happened next just took the ground below my feet. Oh, oh, I’m sorry I’m just typing, no code, no image, so here are a couple of screen captures. The first one shows the request for the Home controller; I get ‘Home’ when I hover over the parameter: And here’s the one that shows the same for call for ‘favicon.ico’. So, I step through the code and when the control reaches line 91 – GetControllerInstance() method, I step in. This is when I had the ‘ground-losing’ experience. Wow, an exception is being thrown for this file and that too in RTM. For some reason MVC thinks, this as a controller and tries to run it through the MvcHandler and it hits this snag. So it seems like this will happen for any MVC 2 site and this did not happen for me in the previous version of MVC. Before I get to how to resolve it, here’s another way of reproducing this exception. Revert back all your changes that you did as mentioned in Steve’s blog above. Now, add a class to your MVC project and call it say, MyControllerFactory and let this inherit from DefaultControllerFactory class. (Read this for details on the DefaultControllerFactory class is and how it is used in a different context). Add an override for the CreateController() method and for the sake of this blog, just copy the same content from the DefaultControllerFactory class. The last step is to tell your MVC app to use the MyControllerFactory class instead of the default one. To do this, go to your Global.asax.cs file and add line 6 of the snippet below: 1: protected void Application_Start() 2: { 3: AreaRegistration.RegisterAllAreas(); 4: 5: RegisterRoutes(RouteTable.Routes); 6: ControllerBuilder.Current.SetControllerFactory(new MyControllerFactory()); 7: } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Now, you’re ready to reproduce the issue. Just F5 the project and when you hit the overridden CreateController() method for the second time, this is what it looks like for me: And continuing further gives me the same exception. I believe this is something that MS should fix, as not having ‘favicon.ico’ file will be common for most of the applications. So I think the when you create an MVC project, line 6 should be added by default by Visual Studio itself: 1: public class MvcApplication : System.Web.HttpApplication 2: { 3: public static void RegisterRoutes(RouteCollection routes) 4: { 5: routes.IgnoreRoute("{resource}.axd/{*pathInfo}"); 6: routes.IgnoreRoute("favicon.ico"); 7: 8: routes.MapRoute( 9: "Default", // Route name 10: "{controller}/{action}/{id}", // URL with parameters 11: new { controller = "Home", action = "Index", id = UrlParameter.Optional } // Parameter defaults 12: ); 13: } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } There it is, that’s the solution to avoid the exception altogether. I tried this both IE8 and Firefox browsers and was able to successfully reproduce the error. Hope someone will look at this issue and find a fix. Just before I finish up, I found another ‘bug’, if you want to call it, with Visual Studio 2008. Remember how you could change what browser you want your application to run in by just right clicking on the .aspx file and choosing ‘Browse with…’? Seems like that’s missing when you’re working with an MVC project. In order to test the above bug in the other browser, I had to load a classic ASP.NET project, change the settings and then run my MVC project. Felt kinda ‘icky’, for lack of a better word.

Read the article
Of C# Iterators and Performance

- by James Michael Hare

Some of you reading this will be wondering, "what is an iterator" and think I'm locked in the world of C++. Nope, I'm talking C# iterators. No, not enumerators, iterators. So, for those of you who do not know what iterators are in C#, I will explain it in summary, and for those of you who know what iterators are but are curious of the performance impacts, I will explore that as well. Iterators have been around for a bit now, and there are still a bunch of people who don't know what they are or what they do. I don't know how many times at work I've had a code review on my code and have someone ask me, "what's that yield word do?" Basically, this post came to me as I was writing some extension methods to extend IEnumerable<T> -- I'll post some of the fun ones in a later post. Since I was filtering the resulting list down, I was using the standard C# iterator concept; but that got me wondering: what are the performance implications of using an iterator versus returning a new enumeration? So, to begin, let's look at a couple of methods. This is a new (albeit contrived) method called Every(...). The goal of this method is to access and enumeration and return every nth item in the enumeration (including the first). So Every(2) would return items 0, 2, 4, 6, etc. Now, if you wanted to write this in the traditional way, you may come up with something like this: public static IEnumerable<T> Every<T>(this IEnumerable<T> list, int interval) { List<T> newList = new List<T>(); int count = 0; foreach (var i in list) { if ((count++ % interval) == 0) { newList.Add(i); } } return newList; } So basically this method takes any IEnumerable<T> and returns a new IEnumerable<T> that contains every nth item. Pretty straight forward. The problem? Well, Every<T>(...) will construct a list containing every nth item whether or not you care. What happens if you were searching this result for a certain item and find that item after five tries? You would have generated the rest of the list for nothing. Enter iterators. This C# construct uses the yield keyword to effectively defer evaluation of the next item until it is asked for. This can be very handy if the evaluation itself is expensive or if there's a fair chance you'll never want to fully evaluate a list. We see this all the time in Linq, where many expressions are chained together to do complex processing on a list. This would be very expensive if each of these expressions evaluated their entire possible result set on call. Let's look at the same example function, this time using an iterator: public static IEnumerable<T> Every<T>(this IEnumerable<T> list, int interval) { int count = 0; foreach (var i in list) { if ((count++ % interval) == 0) { yield return i; } } } Notice it does not create a new return value explicitly, the only evidence of a return is the "yield return" statement. What this means is that when an item is requested from the enumeration, it will enter this method and evaluate until it either hits a yield return (in which case that item is returned) or until it exits the method or hits a yield break (in which case the iteration ends. Behind the scenes, this is all done with a class that the CLR creates behind the scenes that keeps track of the state of the iteration, so that every time the next item is asked for, it finds that item and then updates the current position so it knows where to start at next time. It doesn't seem like a big deal, does it? But keep in mind the key point here: it only returns items as they are requested. Thus if there's a good chance you will only process a portion of the return list and/or if the evaluation of each item is expensive, an iterator may be of benefit. This is especially true if you intend your methods to be chainable similar to the way Linq methods can be chained. For example, perhaps you have a List<int> and you want to take every tenth one until you find one greater than 10. We could write that as: List<int> someList = new List<int>(); // fill list here someList.Every(10).TakeWhile(i => i <= 10); Now is the difference more apparent? If we use the first form of Every that makes a copy of the list. It's going to copy the entire list whether we will need those items or not, that can be costly! With the iterator version, however, it will only take items from the list until it finds one that is > 10, at which point no further items in the list are evaluated. So, sounds neat eh? But what's the cost is what you're probably wondering. So I ran some tests using the two forms of Every above on lists varying from 5 to 500,000 integers and tried various things. Now, iteration isn't free. If you are more likely than not to iterate the entire collection every time, iterator has some very slight overhead: Copy vs Iterator on 100% of Collection (10,000 iterations) Collection Size Num Iterated Type Total ms 5 5 Copy 5 5 5 Iterator 5 50 50 Copy 28 50 50 Iterator 27 500 500 Copy 227 500 500 Iterator 247 5000 5000 Copy 2266 5000 5000 Iterator 2444 50,000 50,000 Copy 24,443 50,000 50,000 Iterator 24,719 500,000 500,000 Copy 250,024 500,000 500,000 Iterator 251,521 Notice that when iterating over the entire produced list, the times for the iterator are a little better for smaller lists, then getting just a slight bit worse for larger lists. In reality, given the number of items and iterations, the result is near negligible, but just to show that iterators come at a price. However, it should also be noted that the form of Every that returns a copy will have a left-over collection to garbage collect. However, if we only partially evaluate less and less through the list, the savings start to show and make it well worth the overhead. Let's look at what happens if you stop looking after 80% of the list: Copy vs Iterator on 80% of Collection (10,000 iterations) Collection Size Num Iterated Type Total ms 5 4 Copy 5 5 4 Iterator 5 50 40 Copy 27 50 40 Iterator 23 500 400 Copy 215 500 400 Iterator 200 5000 4000 Copy 2099 5000 4000 Iterator 1962 50,000 40,000 Copy 22,385 50,000 40,000 Iterator 19,599 500,000 400,000 Copy 236,427 500,000 400,000 Iterator 196,010 Notice that the iterator form is now operating quite a bit faster. But the savings really add up if you stop on average at 50% (which most searches would typically do): Copy vs Iterator on 50% of Collection (10,000 iterations) Collection Size Num Iterated Type Total ms 5 2 Copy 5 5 2 Iterator 4 50 25 Copy 25 50 25 Iterator 16 500 250 Copy 188 500 250 Iterator 126 5000 2500 Copy 1854 5000 2500 Iterator 1226 50,000 25,000 Copy 19,839 50,000 25,000 Iterator 12,233 500,000 250,000 Copy 208,667 500,000 250,000 Iterator 122,336 Now we see that if we only expect to go on average 50% into the results, we tend to shave off around 40% of the time. And this is only for one level deep. If we are using this in a chain of query expressions it only adds to the savings. So my recommendation? If you have a resonable expectation that someone may only want to partially consume your enumerable result, I would always tend to favor an iterator. The cost if they iterate the whole thing does not add much at all -- and if they consume only partially, you reap some really good performance gains. Next time I'll discuss some of my favorite extensions I've created to make development life a little easier and maintainability a little better.

Read the article
SQL SERVER – Faster SQL Server Databases and Applications – Power and Control with SafePeak Caching Options

- by Pinal Dave

Update: This blog post is written based on the SafePeak, which is available for free download. Today, I’d like to examine more closely one of my preferred technologies for accelerating SQL Server databases, SafePeak. Safepeak’s software provides a variety of advanced data caching options, techniques and tools to accelerate the performance and scalability of SQL Server databases and applications. I’d like to look more closely at some of these options, as some of these capabilities could help you address lagging database and performance on your systems. To better understand the available options, it is best to start by understanding the difference between the usual “Basic Caching” vs. SafePeak’s “Dynamic Caching”. Basic Caching Basic Caching (or the stale and static cache) is an ability to put the results from a query into cache for a certain period of time. It is based on TTL, or Time-to-live, and is designed to stay in cache no matter what happens to the data. For example, although the actual data can be modified due to DML commands (update/insert/delete), the cache will still hold the same obsolete query data. Meaning that with the Basic Caching is really static / stale cache. As you can tell, this approach has its limitations. Dynamic Caching Dynamic Caching (or the non-stale cache) is an ability to put the results from a query into cache while maintaining the cache transaction awareness looking for possible data modifications. The modifications can come as a result of: DML commands (update/insert/delete), indirect modifications due to triggers on other tables, executions of stored procedures with internal DML commands complex cases of stored procedures with multiple levels of internal stored procedures logic. When data modification commands arrive, the caching system identifies the related cache items and evicts them from cache immediately. In the dynamic caching option the TTL setting still exists, although its importance is reduced, since the main factor for cache invalidation (or cache eviction) become the actual data updates commands. Now that we have a basic understanding of the differences between “basic” and “dynamic” caching, let’s dive in deeper. SafePeak: A comprehensive and versatile caching platform SafePeak comes with a wide range of caching options. Some of SafePeak’s caching options are automated, while others require manual configuration. Together they provide a complete solution for IT and Data managers to reach excellent performance acceleration and application scalability for a wide range of business cases and applications. Automated caching of SQL Queries: Fully/semi-automated caching of all “read” SQL queries, containing any types of data, including Blobs, XMLs, Texts as well as all other standard data types. SafePeak automatically analyzes the incoming queries, categorizes them into SQL Patterns, identifying directly and indirectly accessed tables, views, functions and stored procedures; Automated caching of Stored Procedures: Fully or semi-automated caching of all read” stored procedures, including procedures with complex sub-procedure logic as well as procedures with complex dynamic SQL code. All procedures are analyzed in advance by SafePeak’s Metadata-Learning process, their SQL schemas are parsed – resulting with a full understanding of the underlying code, objects dependencies (tables, views, functions, sub-procedures) enabling automated or semi-automated (manually review and activate by a mouse-click) cache activation, with full understanding of the transaction logic for cache real-time invalidation; Transaction aware cache: Automated cache awareness for SQL transactions (SQL and in-procs); Dynamic SQL Caching: Procedures with dynamic SQL are pre-parsed, enabling easy cache configuration, eliminating SQL Server load for parsing time and delivering high response time value even in most complicated use-cases; Fully Automated Caching: SQL Patterns (including SQL queries and stored procedures) that are categorized by SafePeak as “read and deterministic” are automatically activated for caching; Semi-Automated Caching: SQL Patterns categorized as “Read and Non deterministic” are patterns of SQL queries and stored procedures that contain reference to non-deterministic functions, like getdate(). Such SQL Patterns are reviewed by the SafePeak administrator and in usually most of them are activated manually for caching (point and click activation); Fully Dynamic Caching: Automated detection of all dependent tables in each SQL Pattern, with automated real-time eviction of the relevant cache items in the event of “write” commands (a DML or a stored procedure) to one of relevant tables. A default setting; Semi Dynamic Caching: A manual cache configuration option enabling reducing the sensitivity of specific SQL Patterns to “write” commands to certain tables/views. An optimization technique relevant for cases when the query data is either known to be static (like archive order details), or when the application sensitivity to fresh data is not critical and can be stale for short period of time (gaining better performance and reduced load); Scheduled Cache Eviction: A manual cache configuration option enabling scheduling SQL Pattern cache eviction based on certain time(s) during a day. A very useful optimization technique when (for example) certain SQL Patterns can be cached but are time sensitive. Example: “select customers that today is their birthday”, an SQL with getdate() function, which can and should be cached, but the data stays relevant only until 00:00 (midnight); Parsing Exceptions Management: Stored procedures that were not fully parsed by SafePeak (due to too complex dynamic SQL or unfamiliar syntax), are signed as “Dynamic Objects” with highest transaction safety settings (such as: Full global cache eviction, DDL Check = lock cache and check for schema changes, and more). The SafePeak solution points the user to the Dynamic Objects that are important for cache effectiveness, provides easy configuration interface, allowing you to improve cache hits and reduce cache global evictions. Usually this is the first configuration in a deployment; Overriding Settings of Stored Procedures: Override the settings of stored procedures (or other object types) for cache optimization. For example, in case a stored procedure SP1 has an “insert” into table T1, it will not be allowed to be cached. However, it is possible that T1 is just a “logging or instrumentation” table left by developers. By overriding the settings a user can allow caching of the problematic stored procedure; Advanced Cache Warm-Up: Creating an XML-based list of queries and stored procedure (with lists of parameters) for periodically automated pre-fetching and caching. An advanced tool allowing you to handle more rare but very performance sensitive queries pre-fetch them into cache allowing high performance for users’ data access; Configuration Driven by Deep SQL Analytics: All SQL queries are continuously logged and analyzed, providing users with deep SQL Analytics and Performance Monitoring. Reduce troubleshooting from days to minutes with database objects and SQL Patterns heat-map. The performance driven configuration helps you to focus on the most important settings that bring you the highest performance gains. Use of SafePeak SQL Analytics allows continuous performance monitoring and analysis, easy identification of bottlenecks of both real-time and historical data; Cloud Ready: Available for instant deployment on Amazon Web Services (AWS). As you can see, there are many options to configure SafePeak’s SQL Server database and application acceleration caching technology to best fit a lot of situations. If you’re not familiar with their technology, they offer free-trial software you can download that comes with a free “help session” to help get you started. You can access the free trial here. Also, SafePeak is available to use on Amazon Cloud. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Windows Azure Virtual Machine Readiness and Capacity Assessment for SQL Server

- by SQLOS Team

Windows Azure Virtual Machine Readiness and Capacity Assessment for Windows Server Machine Running SQL Server With the release of MAP Toolkit 8.0 Beta, we have added a new scenario to assess your Windows Azure Virtual Machine Readiness. The MAP 8.0 Beta performs a comprehensive assessment of Windows Servers running SQL Server to determine you level of readiness to migrate an on-premise physical or virtual machine to Windows Azure Virtual Machines. The MAP Toolkit then offers suggested changes to prepare the machines for migration, such as upgrading the operating system or SQL Server. MAP Toolkit 8.0 Beta is available for download here Your participation and feedback is very important to make the MAP Toolkit work better for you. We encourage you to participate in the beta program and provide your feedback at [email protected] or through one of our surveys. Now, let’s walk through the MAP Toolkit task for completing the Windows Azure Virtual Machine assessment and capacity planning. The tasks include the following: Perform an inventory View the Windows Azure VM Readiness results and report Collect performance data for determine VM sizing View the Windows Azure Capacity results and report Perform an inventory: 1. To perform an inventory against a single machine or across a complete environment, choose Perform an Inventory to launch the Inventory and Assessment Wizard as shown below: 2. After the Inventory and Assessment Wizard launches, select either the Windows computers or SQL Server scenario to inventory Windows machines. HINT: If you don’t care about completely inventorying a machine, just select the SQL Server scenario. Click Next to Continue. 3. On the Discovery Methods page, select how you want to discover computers and then click Next to continue. Description of Discovery Methods: Use Active Directory Domain Services -- This method allows you to query a domain controller via the Lightweight Directory Access Protocol (LDAP) and select computers in all or specific domains, containers, or OUs. Use this method if all computers and devices are in AD DS. Windows networking protocols -- This method uses the WIN32 LAN Manager application programming interfaces to query the Computer Browser service for computers in workgroups and Windows NT 4.0–based domains. If the computers on the network are not joined to an Active Directory domain, use only the Windows networking protocols option to find computers. System Center Configuration Manager (SCCM) -- This method enables you to inventory computers managed by System Center Configuration Manager (SCCM). You need to provide credentials to the System Center Configuration Manager server in order to inventory the managed computers. When you select this option, the MAP Toolkit will query SCCM for a list of computers and then MAP will connect to these computers. Scan an IP address range -- This method allows you to specify the starting address and ending address of an IP address range. The wizard will then scan all IP addresses in the range and inventory only those computers. Note: This option can perform poorly, if many IP addresses aren’t being used within the range. Manually enter computer names and credentials -- Use this method if you want to inventory a small number of specific computers. Import computer names from a files -- Using this method, you can create a text file with a list of computer names that will be inventoried. 4. On the All Computers Credentials page, enter the accounts that have administrator rights to connect to the discovered machines. This does not need to a domain account, but needs to be a local administrator. I have entered my domain account that is an administrator on my local machine. Click Next after one or more accounts have been added. NOTE: The MAP Toolkit primarily uses Windows Management Instrumentation (WMI) to collect hardware, device, and software information from the remote computers. In order for the MAP Toolkit to successfully connect and inventory computers in your environment, you have to configure your machines to inventory through WMI and also allow your firewall to enable remote access through WMI. The MAP Toolkit also requires remote registry access for certain assessments. In addition to enabling WMI, you need accounts with administrative privileges to access desktops and servers in your environment. 5. On the Credentials Order page, select the order in which want the MAP Toolkit to connect to the machine and SQL Server. Generally just accept the defaults and click Next. 6. On the Enter Computers Manually page, click Create to pull up at dialog to enter one or more computer names. 7. On the Summary page confirm your settings and then click Finish. After clicking Finish the inventory process will start, as shown below: Windows Azure Readiness results and report After the inventory progress has completed, you can review the results under the Database scenario. On the tile, you will see the number of Windows Server machine with SQL Server that were analyzed, the number of machines that are ready to move without changes and the number of machines that require further changes. If you click this Azure VM Readiness tile, you will see additional details and can generate the Windows Azure VM Readiness Report. After the report is generated, select View | Saved Reports and Proposals to view the location of the report. Open up WindowsAzureVMReadiness* report in Excel. On the Windows tab, you can see the results of the assessment. This report has a column for the Operating System and SQL Server assessment and provides a recommendation on how to resolve, if there a component is not supported. Collect Performance Data Launch the Performance Wizard to collect performance information for the Windows Server machines that you would like the MAP Toolkit to suggest a Windows Azure VM size for. Windows Azure Capacity results and report After the performance metrics are collected, the Azure VM Capacity title will display the number of Virtual Machine sizes that are suggested for the Windows Server and Linux machines that were analyzed. You can then click on the Azure VM Capacity tile to see the capacity details and generate the Windows Azure VM Capacity Report. Within this report, you can view the performance data that was collected and the Virtual Machine sizes. MAP Toolkit 8.0 Beta is available for download here Your participation and feedback is very important to make the MAP Toolkit work better for you. We encourage you to participate in the beta program and provide your feedback at [email protected] or through one of our surveys. Useful References: Windows Azure Homepage How to guides for Windows Azure Virtual Machines Provisioning a SQL Server Virtual Machine on Windows Azure Windows Azure Pricing Peter Saddow Senior Program Manager – MAP Toolkit Team

Read the article
Charms and the App Bar

- by Dennis Vroegop

Ok. I admit. I made a mistake in the last post about our planespotter app. I have dedicated a full part of the hub to Social. I also had a section called Friends but that made sense since I said that “Friends” is a special group of people that connect to each other through our app and only our app. Social however is sharing our spots with Twitter, Facebook and so on. Now, we could write that functionality in our app in a different section but there is one small problem with that: users don’t expect that. Ok, I admit. The mistake was quite deliberate to give me an excuse to write this part. But still: the mistake is one I see a lot. People are trying to do stuff in their application that they shouldn’t be doing. This always strike me as slightly odd: why do some work when others have already done it for you and you can just use it? After all: good developers are lazy (lazy people will always try to find the easiest way to do something and in development land this usually means the cleanest and best to support way…) So. What is that part that Microsoft has done for us and we don’t have to do ourselves? The answer lies on the right hand of your Win8 screen: This is a screenshot of my tablet (as you can see I am writing this right now….) When I swipe my finger from out of the screen on the right inside the screen (or move the mouse to the upper right corner) this menu will appear. Next to settings and the start menu button we’ll find the Search and the Share charms. These are two ways that your app can share the information it contains with the rest of the world, or at least: the rest of your system. So don’t write a Search feature in your app. Don’t write a Share feature in your app. It’s here already. Users, once they are used to Windows 8, will use that feature and expect it to work. If it doesn’t, they won’t like your app and you can kiss you dreams of everlasting fame goodbye. So use these two. What are they? Well, simply they are parts of a contract. In your app you say somewhere in code that you are supporting Search and Share. So when the user selects Share the system will interrogate the current app in the foreground if it supports this feature. Your app will say “But why, yes, I do!” Then the system will ask the app “Ok then, wisecrack, then share!” and you will have to provide the system with some information about the format. Other applications have subscribed to be at the receiving end of the Share contract. They have told the system that they support Sharing (receiving) and which formats they understand. If one or more of them support the formats you specify, the user will see them. The user clicks / taps on the app of their choice and data is moved from your app to the new one. So if you say you support Facebook and Twitter users can post data from your app to these networks by selecting Share. The same applies to Search. Don’t make a “search” button in your app but use the contract to tell the system that you support search and use that instead. Users will be grateful (remember that bar with men/women/creatures that are waiting for you?) The more and more people get to know Windows 8, the more they will use this. And if you are one of the people who wrote an app that helped them learn the system, well, that’s even better. So. We don’t have a Share or a Search button. We do have other buttons. Most important: we probably need a “New Spot” button. And a “Filter” might be useful. Or someway to open the camera so you can add a picture to the spot. Where will be put those? The answer is the “Appbar” . This is a application / context aware menu that slides up from the bottom of the screen when you move your finger / mouse from below the screen into it. From above downwards works just as well. Here you see an example of the appbar from the People app. (click on it for a larger version). This appears whenever you slide your finger up from below of down from above. This is where you put your commands. Remember, this is context aware so this menu will change when you are in different parts of your app or when you have selected different items. There are a few conventions when you create this appbar. First, the items on the right are “General” items, meaning they have little to do with what is on the screen right now. I think this would be a great place to add our “New Spot” icon. On the far left are items associated with the current selected item or screen. So if you have a spot selected, the button for Add Photo should be visible here and on the left hand side. Not everything is as clear as this, but this is what you should strive for. Group items together. And please note: this is the only place in Metro design where we are allowed to use lines as separators. So when you want to separate a group of icons from another group, add a line. Also note the simplicity of the buttons. No colors, no lights or shadows, no 3D. After a couple of years of fancy almost realistic looking icons people have finally decided that hey, this is a virtual world: it’s ok to look virtual as well. So make things as readable and clear as possible and don’t try to duplicate nature. It’s all about the information, remember? (If you don’t remember I’d like to point you to a older blog post of mine about the what and why of Metro). So.. think about the buttons a bit and think about Share and Search. What will you put there? Remember: this is the way the users interact with your apps and while you shouldn’t judge a book by its covers when it comes to people, this isn’t entirely so when it comes to apps. People DO judge an app by its looks and the way it feels. Take advantage of that. History has learned that a crappy app with a GREAT user interface gets better reviews than a GREAT app with a lousy UI… I know: developers will find this extremely unfair but that’s the world we live in (No, I am not saying you should deliver rubbish apps). Next time: we’ll start by building the darn thing!

Read the article
Optimizing AES modes on Solaris for Intel Westmere

- by danx

Optimizing AES modes on Solaris for Intel Westmere Review AES is a strong method of symmetric (secret-key) encryption. It is a U.S. FIPS-approved cryptographic algorithm (FIPS 197) that operates on 16-byte blocks. AES has been available since 2001 and is widely used. However, AES by itself has a weakness. AES encryption isn't usually used by itself because identical blocks of plaintext are always encrypted into identical blocks of ciphertext. This encryption can be easily attacked with "dictionaries" of common blocks of text and allows one to more-easily discern the content of the unknown cryptotext. This mode of encryption is called "Electronic Code Book" (ECB), because one in theory can keep a "code book" of all known cryptotext and plaintext results to cipher and decipher AES. In practice, a complete "code book" is not practical, even in electronic form, but large dictionaries of common plaintext blocks is still possible. Here's a diagram of encrypting input data using AES ECB mode: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ AESKey-->(AES Encryption) AESKey-->(AES Encryption) | | | | \/ \/ CipherTextOutput CipherTextOutput Block 1 Block 2 What's the solution to the same cleartext input producing the same ciphertext output? The solution is to further process the encrypted or decrypted text in such a way that the same text produces different output. This usually involves an Initialization Vector (IV) and XORing the decrypted or encrypted text. As an example, I'll illustrate CBC mode encryption: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ IV >----->(XOR) +------------->(XOR) +---> . . . . | | | | | | | | \/ | \/ | AESKey-->(AES Encryption) | AESKey-->(AES Encryption) | | | | | | | | | \/ | \/ | CipherTextOutput ------+ CipherTextOutput -------+ Block 1 Block 2 The steps for CBC encryption are: Start with a 16-byte Initialization Vector (IV), choosen randomly. XOR the IV with the first block of input plaintext Encrypt the result with AES using a user-provided key. The result is the first 16-bytes of output cryptotext. Use the cryptotext (instead of the IV) of the previous block to XOR with the next input block of plaintext Another mode besides CBC is Counter Mode (CTR). As with CBC mode, it also starts with a 16-byte IV. However, for subsequent blocks, the IV is just incremented by one. Also, the IV ix XORed with the AES encryption result (not the plain text input). Here's an illustration: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ AESKey-->(AES Encryption) AESKey-->(AES Encryption) | | | | \/ \/ IV >----->(XOR) IV + 1 >---->(XOR) IV + 2 ---> . . . . | | | | \/ \/ CipherTextOutput CipherTextOutput Block 1 Block 2 Optimization Which of these modes can be parallelized? ECB encryption/decryption can be parallelized because it does more than plain AES encryption and decryption, as mentioned above. CBC encryption can't be parallelized because it depends on the output of the previous block. However, CBC decryption can be parallelized because all the encrypted blocks are known at the beginning. CTR encryption and decryption can be parallelized because the input to each block is known--it's just the IV incremented by one for each subsequent block. So, in summary, for ECB, CBC, and CTR modes, encryption and decryption can be parallelized with the exception of CBC encryption. How do we parallelize encryption? By interleaving. Usually when reading and writing data there are pipeline "stalls" (idle processor cycles) that result from waiting for memory to be loaded or stored to or from CPU registers. Since the software is written to encrypt/decrypt the next data block where pipeline stalls usually occurs, we can avoid stalls and crypt with fewer cycles. This software processes 4 blocks at a time, which ensures virtually no waiting ("stalling") for reading or writing data in memory. Other Optimizations Besides interleaving, other optimizations performed are Loading the entire key schedule into the 128-bit %xmm registers. This is done once for per 4-block of data (since 4 blocks of data is processed, when present). The following is loaded: the entire "key schedule" (user input key preprocessed for encryption and decryption). This takes 11, 13, or 15 registers, for AES-128, AES-192, and AES-256, respectively The input data is loaded into another %xmm register The same register contains the output result after encrypting/decrypting Using SSSE 4 instructions (AESNI). Besides the aesenc, aesenclast, aesdec, aesdeclast, aeskeygenassist, and aesimc AESNI instructions, Intel has several other instructions that operate on the 128-bit %xmm registers. Some common instructions for encryption are: pxor exclusive or (very useful), movdqu load/store a %xmm register from/to memory, pshufb shuffle bytes for byte swapping, pclmulqdq carry-less multiply for GCM mode Combining AES encryption/decryption with CBC or CTR modes processing. Instead of loading input data twice (once for AES encryption/decryption, and again for modes (CTR or CBC, for example) processing, the input data is loaded once as both AES and modes operations occur at in the same function Performance Everyone likes pretty color charts, so here they are. I ran these on Solaris 11 running on a Piketon Platform system with a 4-core Intel Clarkdale processor @3.20GHz. Clarkdale which is part of the Westmere processor architecture family. The "before" case is Solaris 11, unmodified. Keep in mind that the "before" case already has been optimized with hand-coded Intel AESNI assembly. The "after" case has combined AES-NI and mode instructions, interleaved 4 blocks at-a-time. « For the first table, lower is better (milliseconds). The first table shows the performance improvement using the Solaris encrypt(1) and decrypt(1) CLI commands. I encrypted and decrypted a 1/2 GByte file on /tmp (swap tmpfs). Encryption improved by about 40% and decryption improved by about 80%. AES-128 is slighty faster than AES-256, as expected. The second table shows more detail timings for CBC, CTR, and ECB modes for the 3 AES key sizes and different data lengths. » The results shown are the percentage improvement as shown by an internal PKCS#11 microbenchmark. And keep in mind the previous baseline code already had optimized AESNI assembly! The keysize (AES-128, 192, or 256) makes little difference in relative percentage improvement (although, of course, AES-128 is faster than AES-256). Larger data sizes show better improvement than 128-byte data. Availability This software is in Solaris 11 FCS. It is available in the 64-bit libcrypto library and the "aes" Solaris kernel module. You must be running hardware that supports AESNI (for example, Intel Westmere and Sandy Bridge, microprocessor architectures). The easiest way to determine if AES-NI is available is with the isainfo(1) command. For example, $ isainfo -v 64-bit amd64 applications pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu 32-bit i386 applications pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu No special configuration or setup is needed to take advantage of this software. Solaris libraries and kernel automatically determine if it's running on AESNI-capable machines and execute the correctly-tuned software for the current microprocessor. Summary Maximum throughput of AES cipher modes can be achieved by combining AES encryption with modes processing, interleaving encryption of 4 blocks at a time, and using Intel's wide 128-bit %xmm registers and instructions. References "Block cipher modes of operation", Wikipedia Good overview of AES modes (ECB, CBC, CTR, etc.) "Advanced Encryption Standard", Wikipedia "Current Modes" describes NIST-approved block cipher modes (ECB,CBC, CFB, OFB, CCM, GCM)

Read the article
Is RTD Stateless or Stateful?

- by [email protected]

Yes. A stateless service is one where each request is an independent transaction that can be processed by any of the servers in a cluster. A stateful service is one where state is kept in a server's memory from transaction to transaction, thus necessitating the proper routing of requests to the right server. The main advantage of stateless systems is simplicity of design. The main advantage of stateful systems is performance. I'm often asked whether RTD is a stateless or stateful service, so I wanted to clarify this issue in depth so that RTD's architecture will be properly understood. The short answer is: "RTD can be configured as a stateless or stateful service." The performance difference between stateless and stateful systems can be very significant, and while in a call center implementation it may be reasonable to use a pure stateless configuration, a web implementation that produces thousands of requests per second is practically impossible with a stateless configuration. RTD's performance is orders of magnitude better than most competing systems. RTD was architected from the ground up to achieve this performance. Features like automatic and dynamic compression of prediction models, automatic translation of metadata to machine code, lack of interpreted languages, and separation of model building from decisioning contribute to achieving this performance level. Because of this focus on performance we decided to have RTD's default configuration work in a stateful manner. By being stateful RTD requests are typically handled in a few milliseconds when repeated requests come to the same session. Now, those readers that have participated in implementations of RTD know that RTD's architecture is also focused on reducing Total Cost of Ownership (TCO) with features like automatic model building, automatic time windows, automatic maintenance of database tables, automatic evaluation of data mining models, automatic management of models partitioned by channel, geography, etcetera, and hot swapping of configurations. How do you reconcile the need for a low TCO and the need for performance? How do you get the performance of a stateful system with the simplicity of a stateless system? The answer is that you make the system behave like a stateless system to the exterior, but you let it automatically take advantage of situations where being stateful is better. For example, one of the advantages of stateless systems is that you can route a message to any server in a cluster, without worrying about sending it to the same server that was handling the session in previous messages. With an RTD stateful configuration you can still route the message to any server in the cluster, so from the point of view of the configuration of other systems, it is the same as a stateless service. The difference though comes in performance, because if the message arrives to the right server, RTD can serve it without any external access to the session's state, thus tremendously reducing processing time. In typical implementations it is not rare to have high percentages of messages routed directly to the right server, while those that are not, are easily handled by forwarding the messages to the right server. This architecture usually provides the best of both worlds with performance and simplicity of configuration. Configuring RTD as a pure stateless service A pure stateless configuration requires session data to be persisted at the end of handling each and every message and reloading that data at the beginning of handling any new message. This is of course, the root of the inefficiency of these configurations. This is also the reason why many "stateless" implementations actually do keep state to take advantage of a request coming back to the same server. Nevertheless, if the implementation requires a pure stateless decision service, this is easy to configure in RTD. The way to do it is: Mark every Integration Point to Close the session at the end of processing the message In the Session entity persist the session data on closing the session In the session entity check if a persisted version exists and load it An excellent solution for persisting the session data is Oracle Coherence, which provides a high performance, distributed cache that minimizes the performance impact of persisting and reloading the session. Alternatively, the session can be persisted to a local database. An interesting feature of the RTD stateless configuration is that it can cope with serializing concurrent requests for the same session. For example, if a web page produces two requests to the decision service, these requests could come concurrently to the decision services and be handled by different servers. Most stateless implementation would have the two requests step onto each other when saving the state, or fail one of the messages. When properly configured, RTD will make one message wait for the other before processing. A Word on Context Using the context of a customer interaction typically significantly increases lift. For example, offer success in a call center could double if the context of the call is taken into account. For this reason, it is important to utilize the contextual information in decision making. To make the contextual information available throughout a session it needs to be persisted. When there is a well defined owner for the information then there is no problem because in case of a session restart, the information can be easily retrieved. If there is no official owner of the information, then RTD can be configured to persist this information. Once again, RTD provides flexibility to ensure high performance when it is adequate to allow for some loss of state in the rare cases of server failure. For example, in a heavy use web site that serves 1000 pages per second the navigation history may be stored in the in memory session. In such sites it is typical that there is no OLTP that stores all the navigation events, therefore if an RTD server were to fail, it would be possible for the navigation to that point to be lost (note that a new session would be immediately established in one of the other servers). In most cases the loss of this navigation information would be acceptable as it would happen rarely. If it is desired to save this information, RTD would persist it every time the visitor navigates to a new page. Note that this practice is preferred whether RTD is configured in a stateless or stateful manner.

Read the article
SQL SERVER – Weekly Series – Memory Lane – #032

- by Pinal Dave

Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Complete Series of Database Coding Standards and Guidelines SQL SERVER Database Coding Standards and Guidelines – Introduction SQL SERVER – Database Coding Standards and Guidelines – Part 1 SQL SERVER – Database Coding Standards and Guidelines – Part 2 SQL SERVER Database Coding Standards and Guidelines Complete List Download Explanation and Example – SELF JOIN When all of the data you require is contained within a single table, but data needed to extract is related to each other in the table itself. Examples of this type of data relate to Employee information, where the table may have both an Employee’s ID number for each record and also a field that displays the ID number of an Employee’s supervisor or manager. To retrieve the data tables are required to relate/join to itself. Insert Multiple Records Using One Insert Statement – Use of UNION ALL This is very interesting question I have received from new developer. How can I insert multiple values in table using only one insert? Now this is interesting question. When there are multiple records are to be inserted in the table following is the common way using T-SQL. Function to Display Current Week Date and Day – Weekly Calendar Straight blog post with script to find current week date and day based on the parameters passed in the function. 2008 In my beginning years, I have almost same confusion as many of the developer had in their earlier years. Here are two of the interesting question which I have attempted to answer in my early year. Even if you are experienced developer may be you will still like to read following two questions: Order Of Column In Index Order of Conditions in WHERE Clauses Example of DISTINCT in Aggregate Functions Have you ever used DISTINCT with the Aggregation Function? Here is a simple example about how users can do it. Create a Comma Delimited List Using SELECT Clause From Table Column Straight to script example where I explained how to do something easy and quickly. Compound Assignment Operators SQL SERVER 2008 has introduced new concept of Compound Assignment Operators. Compound Assignment Operators are available in many other programming languages for quite some time. Compound Assignment Operators is operator where variables are operated upon and assigned on the same line. PIVOT and UNPIVOT Table Examples Here is a very interesting question – the answer to the question can be YES or NO both. “If we PIVOT any table and UNPIVOT that table do we get our original table?” Read the blog post to get the explanation of the question above. 2009 What is Interim Table – Simple Definition of Interim Table The interim table is a table that is generated by joining two tables and not the final result table. In other words, when two tables are joined they create an interim table as resultset but the resultset is not final yet. It may be possible that more tables are about to join on the interim table, and more operations are still to be applied on that table (e.g. Order By, Having etc). Besides, it may be possible that there is no interim table; sometimes final table is what is generated when the query is run. 2010 Stored Procedure and Transactions If Stored Procedure is transactional then, it should roll back complete transactions when it encounters any errors. Well, that does not happen in this case, which proves that Stored Procedure does not only provide just the transactional feature to a batch of T-SQL. Generate Database Script for SQL Azure When talking about SQL Azure the most common complaint I hear is that the script generated from stand-along SQL Server database is not compatible with SQL Azure. This was true for some time for sure but not any more. If you have SQL Server 2008 R2 installed you can follow the guideline below to generate a script which is compatible with SQL Azure. Convert IN to EXISTS – Performance Talk It is NOT necessary that every time when IN is replaced by EXISTS it gives better performance. However, in our case listed above it does for sure give better performance. You can read about this subject in the associated blog post. Subquery or Join – Various Options – SQL Server Engine Knows the Best Every single time whenever there is a performance tuning exercise, I hear the conversation from developer where some prefer subquery and some prefer join. In this two part blog post, I explain the same in the detail with examples. Part 1 | Part 2 Merge Operations – Insert, Update, Delete in Single Execution MERGE is a new feature that provides an efficient way to do multiple DML operations. In earlier versions of SQL Server, we had to write separate statements to INSERT, UPDATE, or DELETE data based on certain conditions; however, at present, by using the MERGE statement, we can include the logic of such data changes in one statement that even checks when the data is matched and then just update it, and similarly, when the data is unmatched, it is inserted. 2011 Puzzle – Statistics are not updated but are Created Once Here is the quick scenario about my setup. Create Table Insert 1000 Records Check the Statistics Now insert 10 times more 10,000 indexes Check the Statistics – it will be NOT updated – WHY? Question to You – When to use Function and When to use Stored Procedure Personally, I believe that they are both different things - they cannot be compared. I can say, it will be like comparing apples and oranges. Each has its own unique use. However, they can be used interchangeably at many times and in real life (i.e., production environment). I have personally seen both of these being used interchangeably many times. This is the precise reason for asking this question. 2012 In year 2012 I had two interesting series ran on the blog. If there is no fun in learning, the learning becomes a burden. For the same reason, I had decided to build a three part quiz around SEQUENCE. The quiz was to identify the next value of the sequence. I encourage all of you to take part in this fun quiz. Guess the Next Value – Puzzle 1 Guess the Next Value – Puzzle 2 Guess the Next Value – Puzzle 3 Guess the Next Value – Puzzle 4 Simple Example to Configure Resource Governor – Introduction to Resource Governor Resource Governor is a feature which can manage SQL Server Workload and System Resource Consumption. We can limit the amount of CPU and memory consumption by limiting /governing /throttling on the SQL Server. If there are different workloads running on SQL Server and each of the workload needs different resources or when workloads are competing for resources with each other and affecting the performance of the whole server resource governor is a very important task. Tricks to Replace SELECT * with Column Names – SQL in Sixty Seconds #017 – Video Retrieves unnecessary columns and increases network traffic When a new columns are added views needs to be refreshed manually Leads to usage of sub-optimal execution plan Uses clustered index in most of the cases instead of using optimal index It is difficult to debug SQL SERVER – Load Generator – Free Tool From CodePlex The best part of this SQL Server Load Generator is that users can run multiple simultaneous queries again SQL Server using different login account and different application name. The interface of the tool is extremely easy to use and very intuitive as well. A Puzzle – Swap Value of Column Without Case Statement Let us assume there is a single column in the table called Gender. The challenge is to write a single update statement which will flip or swap the value in the column. For example if the value in the gender column is ‘male’ swap it with ‘female’ and if the value is ‘female’ swap it with ‘male’. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
Monitoring Html Element CSS Changes in JavaScript

- by Rick Strahl

[ updated Feb 15, 2011: Added event unbinding to avoid unintended recursion ] Here's a scenario I've run into on a few occasions: I need to be able to monitor certain CSS properties on an HTML element and know when that CSS element changes. For example, I have a some HTML element behavior plugins like a drop shadow that attaches to any HTML element, but I then need to be able to automatically keep the shadow in sync with the window if the element dragged around the window or moved via code. Unfortunately there's no move event for HTML elements so you can't tell when it's location changes. So I've been looking around for some way to keep track of the element and a specific CSS property, but no luck. I suspect there's nothing native to do this so the only way I could think of is to use a timer and poll rather frequently for the property. I ended up with a generic jQuery plugin that looks like this: (function($){ $.fn.watch = function (props, func, interval, id) { /// <summary> /// Allows you to monitor changes in a specific /// CSS property of an element by polling the value. /// when the value changes a function is called. /// The function called is called in the context /// of the selected element (ie. this) /// </summary> /// <param name="prop" type="String">CSS Properties to watch sep. by commas</param> /// <param name="func" type="Function"> /// Function called when the value has changed. /// </param> /// <param name="interval" type="Number"> /// Optional interval for browsers that don't support DOMAttrModified or propertychange events. /// Determines the interval used for setInterval calls. /// </param> /// <param name="id" type="String">A unique ID that identifies this watch instance on this element</param> /// <returns type="jQuery" /> if (!interval) interval = 200; if (!id) id = "_watcher"; return this.each(function () { var _t = this; var el$ = $(this); var fnc = function () { __watcher.call(_t, id) }; var itId = null; var data = { id: id, props: props.split(","), func: func, vals: [props.split(",").length], fnc: fnc, origProps: props, interval: interval }; $.each(data.props, function (i) { data.vals[i] = el$.css(data.props[i]); }); el$.data(id, data); hookChange(el$, id, data.fnc); }); function hookChange(el$, id, fnc) { el$.each(function () { var el = $(this); if (typeof (el.get(0).onpropertychange) == "object") el.bind("propertychange." + id, fnc); else if ($.browser.mozilla) el.bind("DOMAttrModified." + id, fnc); else itId = setInterval(fnc, interval); }); } function __watcher(id) { var el$ = $(this); var w = el$.data(id); if (!w) return; var _t = this; if (!w.func) return; // must unbind or else unwanted recursion may occur el$.unwatch(id); var changed = false; var i = 0; for (i; i < w.props.length; i++) { var newVal = el$.css(w.props[i]); if (w.vals[i] != newVal) { w.vals[i] = newVal; changed = true; break; } } if (changed) w.func.call(_t, w, i); // rebind event hookChange(el$, id, w.fnc); } } $.fn.unwatch = function (id) { this.each(function () { var el = $(this); var fnc = el.data(id).fnc; try { if (typeof (this.onpropertychange) == "object") el.unbind("propertychange." + id, fnc); else if ($.browser.mozilla) el.unbind("DOMAttrModified." + id, fnc); else clearInterval(id); } // ignore if element was already unbound catch (e) { } }); return this; } })(jQuery); With this I can now monitor movement by monitoring say the top CSS property of the element. The following code creates a box and uses the draggable (jquery.ui) plugin and a couple of custom plugins that center and create a shadow. Here's how I can set this up with the watcher: $("#box") .draggable() .centerInClient() .shadow() .watch("top", function() { $(this).shadow(); },70,"_shadow"); ... $("#box") .unwatch("_shadow") .shadow("remove"); This code basically sets up the window to be draggable and initially centered and then a shadow is added. The .watch() call then assigns a CSS property to monitor (top in this case) and a function to call in response. The component now sets up a setInterval call and keeps on pinging this property every time. When the top value changes the supplied function is called. While this works and I can now drag my window around with the shadow following suit it's not perfect by a long shot. The shadow move is delayed and so drags behind the window, but using a higher timer value is not appropriate either as the UI starts getting jumpy if the timer's set with too small of an increment. This sort of monitor can be useful for other things as well where operations are maybe not quite as time critical as a UI operation taking place. Can anybody see a better a better way of capturing movement of an element on the page?© Rick Strahl, West Wind Technologies, 2005-2011Posted in ASP.NET JavaScript jQuery

Read the article
SQL SERVER – Weekly Series – Memory Lane – #050

- by Pinal Dave

Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Executing Remote Stored Procedure – Calling Stored Procedure on Linked Server In this example we see two different methods of how to call Stored Procedures remotely. Connection Property of SQL Server Management Studio SSMS A very simple example of the how to build connection properties for SQL Server with the help of SSMS. Sample Example of RANKING Functions – ROW_NUMBER, RANK, DENSE_RANK, NTILE SQL Server has a total of 4 ranking functions. Ranking functions return a ranking value for each row in a partition. All the ranking functions are non-deterministic. T-SQL Script to Add Clustered Primary Key Jr. DBA asked me three times in a day, how to create Clustered Primary Key. I gave him following sample example. That was the last time he asked “How to create Clustered Primary Key to table?” 2008 2008 – TRIM() Function – User Defined Function SQL Server does not have functions which can trim leading or trailing spaces of any string at the same time. SQL does have LTRIM() and RTRIM() which can trim leading and trailing spaces respectively. SQL Server 2008 also does not have TRIM() function. User can easily use LTRIM() and RTRIM() together and simulate TRIM() functionality. http://www.youtube.com/watch?v=1-hhApy6MHM 2009 Earlier I have written two different articles on the subject Remove Bookmark Lookup. This article is as part 3 of original article. Please read the first two articles here before continuing reading this article. Query Optimization – Remove Bookmark Lookup – Remove RID Lookup – Remove Key Lookup Query Optimization – Remove Bookmark Lookup – Remove RID Lookup – Remove Key Lookup – Part 2 Query Optimization – Remove Bookmark Lookup – Remove RID Lookup – Remove Key Lookup – Part 3 Interesting Observation – Query Hint – FORCE ORDER SQL Server never stops to amaze me. As regular readers of this blog already know that besides conducting corporate training, I work on large-scale projects on query optimizations and server tuning projects. In one of the recent projects, I have noticed that a Junior Database Developer used the query hint Force Order; when I asked for details, I found out that the basic concept was not properly understood by him. Queries Waiting for Memory Allocation to Execute In one of the recent projects, I was asked to create a report of queries that are waiting for memory allocation. The reason was that we were doubtful regarding whether the memory was sufficient for the application. The following query can be useful in similar cases. Queries that do not have to wait on a memory grant will not appear in the result set of following query. 2010 Quickest Way to Identify Blocking Query and Resolution – Dirty Solution As the title suggests, this is quite a dirty solution; it’s not as elegant as you expect. However, it works totally fine. Simple Explanation of Data Type Precedence While I was working on creating a question for SQL SERVER – SQL Quiz – The View, The Table and The Clustered Index Confusion, I had actually created yet another question along with this question. However, I felt that the one which is posted on the SQL Quiz is much better than this one because what makes that more challenging question is that it has a multiple answer. Encrypted Stored Procedure and Activity Monitor I recently had received questionable if any stored procedure is encrypted can we see its definition in Activity Monitor.Answer is - No. Let us do a quick test. Let us create following Stored Procedure and then launch the Activity Monitor and check the text. Indexed View always Use Index on Table A single table can have maximum 249 non clustered indexes and 1 clustered index. In SQL Server 2008, a single table can have maximum 999 non clustered indexes and 1 clustered index. It is widely believed that a table can have only 1 clustered index, and this belief is true. I have some questions for all of you. Let us assume that I am creating view from the table itself and then create a clustered index on it. In my view, I am selecting the complete table itself. 2011 Detecting Database Case Sensitive Property using fn_helpcollations() I received a question on how to determine the case sensitivity of the database. The quick answer to this is to identify the collation of the database and check the properties of the collation. I have previously written how one can identify database collation. Once you have figured out the collation of the database, you can put that in the WHERE condition of the following T-SQL and then check the case sensitivity from the description. Server Side Paging in SQL Server CE (Compact Edition) SQL Server Denali is coming up with new T-SQL of Paging. I have written about the same earlier.SQL SERVER – Server Side Paging in SQL Server Denali – A Better Alternative, SQL SERVER – Server Side Paging in SQL Server Denali Performance Comparison, SQL SERVER – Server Side Paging in SQL Server Denali – Part2 What is very interesting is that SQL Server CE 4.0 have the same feature introduced. Here is the quick example of the same. To run the script in the example, you will have to do installWebmatrix 4.0 and download sample database. Once done you can run following script. Why I am Going to Attend PASS Summit Unite 2011 The four-day event will be marked by a lot of learning, sharing, and networking, which will help me increase both my knowledge and contacts. Every year, PASS Summit provides me a golden opportunity to build my network as well as to identify and meet potential customers or employees. 2012 Manage Help Settings – CTRL + ALT + F1 This is very interesting read as my daughter once accidently came across a screen in SQL Server Management Studio. It took me 2-3 minutes to figure out how she has created the same screen. Recover the Accidentally Renamed Table “I accidentally renamed table in my SSMS. I was scrolling very fast and I made mistakes. It was either because I double clicked or clicked on F2 (shortcut key for renaming). However, I have made the mistake and now I have no idea how to fix this. If you have renamed the table, I think you pretty much is out of luck. Here are few things which you can do which can give you an idea about what your table name can be if you are lucky. Identify Numbers of Non Clustered Index on Tables for Entire Database Here is the script which will give you numbers of non clustered indexes on any table in entire database. Identify Most Resource Intensive Queries – SQL in Sixty Seconds #029 – Video Here is the complete complete script which I have used in the SQL in Sixty Seconds Video. Thanks Harsh for important Tip in the comment. http://www.youtube.com/watch?v=3kDHC_Tjrns Advanced Data Quality Services with Melissa Data – Azure Data Market For the purposes of the review, I used a database I had in an Excel spreadsheet with name and address information. Upon a cursory inspection, there are miscellaneous problems with these records; some addresses are missing ZIP codes, others missing a city, and some records are slightly misspelled or have unparsed suites. With DQS, I can easily add a knowledge base to help standardize my values, such as for state abbreviations. But how do I know that my address is correct? Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
Converting projects to use Automatic NuGet restore

- by terje

Originally posted on: http://geekswithblogs.net/terje/archive/2014/06/11/converting-projects-to-use-automatic-nuget-restore.aspxDownload tool In version 2.7 of NuGet automatic nuget restore was introduced, meaning you no longer need to distort your msbuild project files with nuget target information. Visual Studio and TFS 2013 build have this enabled by default. However, if your project was created before this was introduced, and/or if you have used the “Enable NuGet Package Restore” afterwards, you now have a series of unwanted things in your projects, and a series of project files that have been modified – and – you no longer neither want nor need this ! You might also get into some unwanted issues due to these modifications. This is a MSBuild modification that was needed only before NuGet 2.7 ! So: DON’T USE THIS FUNCTION !!! There is an issue https://nuget.codeplex.com/workitem/4019 on this on the NuGet project site to get this function removed, renamed or at least moved farther away from the top level (please help vote it up!). The response seems to be that it WILL BE removed, around version 3.0. This function does nothing you need after the introduction of NuGet 2.7. What is also unfortunate is the naming of it – it implies that it is needed, it is not, and what is worse, there is no corresponding function to remove what it does ! So to fix this use the tool named IFix, that will fix this issue for you - all free of course, and the code is open source. Also report issues there: https://github.com/OsirisTerje/IFix IFix information DOWNLOAD HERE This command line tool installs using an MSI, and add itself to the system path. If you work in a team, you will probably need to use the tool multiple times. Anyone in the team may at any time use the “Enable NuGet Package Restore” function and mess up your project again. The IFix program can be run either in a check modus, where it does not write anything back – it only checks if you have any issues, or in a Fix mode, where it will also perform the necessary fixes for you. The IFix program is used like this: IFix <command> [-c/--check] [-f/--fix] [-v/--verbose] The command in this case is “nugetrestore”. It will do a check from the location where it is being called, and run through all subfolders from that location. So “IFix nugetrestore --check” , will do the check , and “IFix nugetrestore --fix” will perform the changes, for all files and folders below the current working directory. (Note that --check can be replaced with only –c, and --fix with –f, and so on. ) BEWARE: When you run the fix option, all solutions to be affected must be closed in Visual Studio ! So, if you just want to DO it, then: IFix nugetrestore --check to see if you have issues then IFix nugetrestore --fix to fix them. How does it work IFix nugetrestore checks and optionally fixes four issues that the older enabling of nuget restore did. The issues are related to the MSBuild projess, and are: Deleting the nuget.targets file. Deleting the nuget.exe that is located under the .nuget folder Removing all references to nuget.targets in the solution file Removing all properties and target imports of nuget.targets inside the csproj files. IFix fixes these issues in the same sequence. The first step, removing the nuget.targets file is the most critical one, and all instances of the nuget.targets file within the scope of a solution has to be removed, and in addition it has to be done with the solution closed in Visual Studio. If Visual Studio finds a nuget.targets file, the csproj files will be automatically messed up again. This means the removal process above might need to be done multiple times, specially when you’re working with a team, and that solution context menu still has the “Enable NuGet Package Restore” function. Someone on the team might inadvertently do this at any time. It can be a good idea to add this check to a checkin policy – if you run TFS standard version control, but that will have no effect if you use TFS Git version control of course. So, better be prepared to run the IFix check from time to time. Or, even better, install IFix on your build servers, and add a call to IFix nugetrestore --check in the TFS Build script. How does it look As a first example I have run the IFix program from the top of a set of git repositories, so it spans multiple repositories with multiple solutions. The result from the check option is as follows: We see the four red lines, there is one for each of the four checks we talked about in the previous section. The fact that they are red, means we have that particular issue. The first section (above the first red text line) is the nuget targets section. Notice No.1, it says it has found no paths to copy. What IFix does here is to check if there are any defined paths to other nuget galleries. If there are, then those are copied over to the nuget.config file, where is where it should be in version 2.7 and above. No.2 says it has found the particular nuget.targets file, No.3 states it HAS found some other nuget galleries defines in the targets file, which then it would like to copy to the config.file. No.4 is the section for nuget.exe files, and list those it has found, and which it would like to delete. No 5 states it has found a reference to nuget.targets in the solution file. This reference comes from the fact that the .nuget folder is a solution folder, and the items within are described in the solution file. It then checks the csproj files, and as can be seen from the last red line, it ha found issues in 96 out of 198 csproj files. There are two possible issues in a csproj files. No.6 is the first one, and the most common and most important one, an “Import project” section. This is the section that calls the nuget.targets files. No.7 is another issue, which seems to sometimes be there, sometimes not, it is a RestorePackages property, which also should go away. Now, if we run the IFix nugetrestore –fix command, and then the check again after that, the result is: All green !

Read the article
Fraud Detection with the SQL Server Suite Part 2

- by Dejan Sarka

This is the second part of the fraud detection whitepaper. You can find the first part in my previous blog post about this topic. My Approach to Data Mining Projects It is impossible to evaluate the time and money needed for a complete fraud detection infrastructure in advance. Personally, I do not know the customer’s data in advance. I don’t know whether there is already an existing infrastructure, like a data warehouse, in place, or whether we would need to build one from scratch. Therefore, I always suggest to start with a proof-of-concept (POC) project. A POC takes something between 5 and 10 working days, and involves personnel from the customer’s site – either employees or outsourced consultants. The team should include a subject matter expert (SME) and at least one information technology (IT) expert. The SME must be familiar with both the domain in question as well as the meaning of data at hand, while the IT expert should be familiar with the structure of data, how to access it, and have some programming (preferably Transact-SQL) knowledge. With more than one IT expert the most time consuming work, namely data preparation and overview, can be completed sooner. I assume that the relevant data is already extracted and available at the very beginning of the POC project. If a customer wants to have their people involved in the project directly and requests the transfer of knowledge, the project begins with training. I strongly advise this approach as it offers the establishment of a common background for all people involved, the understanding of how the algorithms work and the understanding of how the results should be interpreted, a way of becoming familiar with the SQL Server suite, and more. Once the data has been extracted, the customer’s SME (i.e. the analyst), and the IT expert assigned to the project will learn how to prepare the data in an efficient manner. Together with me, knowledge and expertise allow us to focus immediately on the most interesting attributes and identify any additional, calculated, ones soon after. By employing our programming knowledge, we can, for example, prepare tens of derived variables, detect outliers, identify the relationships between pairs of input variables, and more, in only two or three days, depending on the quantity and the quality of input data. I favor the customer’s decision of assigning additional personnel to the project. For example, I actually prefer to work with two teams simultaneously. I demonstrate and explain the subject matter by applying techniques directly on the data managed by each team, and then both teams continue to work on the data overview and data preparation under our supervision. I explain to the teams what kind of results we expect, the reasons why they are needed, and how to achieve them. Afterwards we review and explain the results, and continue with new instructions, until we resolve all known problems. Simultaneously with the data preparation the data overview is performed. The logic behind this task is the same – again I show to the teams involved the expected results, how to achieve them and what they mean. This is also done in multiple cycles as is the case with data preparation, because, quite frankly, both tasks are completely interleaved. A specific objective of the data overview is of principal importance – it is represented by a simple star schema and a simple OLAP cube that will first of all simplify data discovery and interpretation of the results, and will also prove useful in the following tasks. The presence of the customer’s SME is the key to resolving possible issues with the actual meaning of the data. We can always replace the IT part of the team with another database developer; however, we cannot conduct this kind of a project without the customer’s SME. After the data preparation and when the data overview is available, we begin the scientific part of the project. I assist the team in developing a variety of models, and in interpreting the results. The results are presented graphically, in an intuitive way. While it is possible to interpret the results on the fly, a much more appropriate alternative is possible if the initial training was also performed, because it allows the customer’s personnel to interpret the results by themselves, with only some guidance from me. The models are evaluated immediately by using several different techniques. One of the techniques includes evaluation over time, where we use an OLAP cube. After evaluating the models, we select the most appropriate model to be deployed for a production test; this allows the team to understand the deployment process. There are many possibilities of deploying data mining models into production; at the POC stage, we select the one that can be completed quickly. Typically, this means that we add the mining model as an additional dimension to an existing DW or OLAP cube, or to the OLAP cube developed during the data overview phase. Finally, we spend some time presenting the results of the POC project to the stakeholders and managers. Even from a POC, the customer will receive lots of benefits, all at the sole risk of spending money and time for a single 5 to 10 day project: The customer learns the basic patterns of frauds and fraud detection The customer learns how to do the entire cycle with their own people, only relying on me for the most complex problems The customer’s analysts learn how to perform much more in-depth analyses than they ever thought possible The customer’s IT experts learn how to perform data extraction and preparation much more efficiently than they did before All of the attendees of this training learn how to use their own creativity to implement further improvements of the process and procedures, even after the solution has been deployed to production The POC output for a smaller company or for a subsidiary of a larger company can actually be considered a finished, production-ready solution It is possible to utilize the results of the POC project at subsidiary level, as a finished POC project for the entire enterprise Typically, the project results in several important “side effects” Improved data quality Improved employee job satisfaction, as they are able to proactively contribute to the central knowledge about fraud patterns in the organization Because eventually more minds get to be involved in the enterprise, the company should expect more and better fraud detection patterns After the POC project is completed as described above, the actual project would not need months of engagement from my side. This is possible due to our preference to transfer the knowledge onto the customer’s employees: typically, the customer will use the results of the POC project for some time, and only engage me again to complete the project, or to ask for additional expertise if the complexity of the problem increases significantly. I usually expect to perform the following tasks: Establish the final infrastructure to measure the efficiency of the deployed models Deploy the models in additional scenarios Through reports By including Data Mining Extensions (DMX) queries in OLTP applications to support real-time early warnings Include data mining models as dimensions in OLAP cubes, if this was not done already during the POC project Create smart ETL applications that divert suspicious data for immediate or later inspection I would also offer to investigate how the outcome could be transferred automatically to the central system; for instance, if the POC project was performed in a subsidiary whereas a central system is available as well Of course, for the actual project, I would repeat the data and model preparation as needed It is virtually impossible to tell in advance how much time the deployment would take, before we decide together with customer what exactly the deployment process should cover. Without considering the deployment part, and with the POC project conducted as suggested above (including the transfer of knowledge), the actual project should still only take additional 5 to 10 days. The approximate timeline for the POC project is, as follows: 1-2 days of training 2-3 days for data preparation and data overview 2 days for creating and evaluating the models 1 day for initial preparation of the continuous learning infrastructure 1 day for presentation of the results and discussion of further actions Quite frequently I receive the following question: are we going to find the best possible model during the POC project, or during the actual project? My answer is always quite simple: I do not know. Maybe, if we would spend just one hour more for data preparation, or create just one more model, we could get better patterns and predictions. However, we simply must stop somewhere, and the best possible way to do this, according to my experience, is to restrict the time spent on the project in advance, after an agreement with the customer. You must also never forget that, because we build the complete learning infrastructure and transfer the knowledge, the customer will be capable of doing further investigations independently and improve the models and predictions over time without the need for a constant engagement with me.

Read the article
Summit Time!

- by Ajarn Mark Caldwell

Boy, how time flies! I can hardly believe that the 2011 PASS Summit is just one week away. Maybe it snuck up on me because it’s a few weeks earlier than last year. Whatever the cause, I am really looking forward to next week. The PASS Summit is the largest SQL Server conference in the world and a fantastic networking opportunity thrown in for no additional charge. Here are a few thoughts to help you maximize the week. Networking As Karen Lopez (blog | @DataChick) mentioned in her presentation for the Professional Development Virtual Chapter just a couple of weeks ago, “Don’t wait until you need a new job to start networking.” You should always be working on your professional network. Some people, especially technical-minded people, get confused by the term networking. The first image that used to pop into my head was the image of some guy standing, awkwardly, off to the side of a cocktail party, trying to shmooze those around him. That’s not what I’m talking about. If you’re good at that sort of thing, and you can strike up a conversation with some stranger and learn all about them in 5 minutes, and walk away with your next business deal all but approved by the lawyers, then congratulations. But if you’re not, and most of us are not, I have two suggestions for you. First, register for Don Gabor’s 2-hour session on Tuesday at the Summit called Networking to Build Business Contacts. Don is a master at small talk, and at teaching others, and in just those two short hours will help you with important tips about breaking the ice, remembering names, and smooth transitions into and out of conversations. Then go put that great training to work right away at the Tuesday night Welcome Reception and meet some new people; which is really my second suggestion…just meet a few new people. You see, “networking” is about meeting new people and being friendly without trying to “work it” to get something out of the relationship at this point. In fact, Don will tell you that a better way to build the connection with someone is to look for some way that you can help them, not how they can help you. There are a ton of opportunities as long as you follow this one key point: Don’t stay in your hotel! At the least, get out and go to the free events such as the Tuesday night Welcome Reception, the Wednesday night Exhibitor Reception, and the Thursday night Community Appreciation Party. All three of these are perfect opportunities to meet other professionals with a similar job or interest as you, and you never know how that may help you out in the future. Maybe you just meet someone to say HI to at breakfast the next day instead of eating alone. Or maybe you cross paths several times throughout the Summit and compare notes on different sessions you attended. And you just might make new friends that you look forward to seeing year after year at the Summit. Who knows, it might even turn out that you have some specific experience that will help out that other person a few months’ from now when they run into the same challenge that you just overcame, or vice-versa. But the point is, if you don’t get out and meet people, you’ll never have the chance for anything else to happen in the future. One more tip for shy attendees of the Summit…if you can’t bring yourself to strike up conversation with strangers at these events, then at the least, after you sit through a good session that helps you out, go up to the speaker and introduce yourself and thank them for taking the time and effort to put together their presentation. Ideally, when you do this, tell them WHY it was beneficial to you (e.g. “Now I have a new idea of how to tackle a problem back at the office.”) I know you think the speakers are all full of confidence and are always receiving a ton of accolades and applause, but you’re wrong. Most of them will be very happy to hear first-hand that all the work they put into getting ready for their presentation is paying off for somebody. Training With over 170 technical sessions at the Summit, training is what it’s all about, and the training is fantastic! Of course there are the big-name trainers like Paul Randall, Kimberly Tripp, Kalen Delaney, Itzik Ben-Gan and several others, but I am always impressed by the quality of the training put on by so many other “regular” members of the SQL Server community. It is amazing how you don’t have to be a published author or otherwise recognized as an “expert” in an area in order to make a big impact on others just by sharing your personal experience and lessons learned. I would rather hear the story of, and lessons learned from, “some guy or gal” who has actually been through an issue and came out the other side, than I would a trained professor who is speaking just from theory or an intellectual understanding of a topic. In addition to the three full days of regular sessions, there are also two days of pre-conference intensive training available. There is an extra cost to this, but it is a fantastic opportunity. Think about it…you’re already coming to this area for training, so why not extend your stay a little bit and get some in-depth training on a particular topic or two? I did this for the first time last year. I attended one day of extra training and it was well worth the time and money. One of the best reasons for it is that I am extremely busy at home with my regular job and family, that it was hard to carve out the time to learn about the topic on my own. It worked out so well last year that I am doubling up and doing two days or “pre-cons” this year. And then there are the DVDs. I think these are another great option. I used the online schedule builder to get ready and have an idea of which sessions I want to attend and when they are (much better than trying to figure this out at the last minute every day). But the problem that I have run into (seems this happens every year) is that nearly every session block has two different sessions that I would like to attend. And some of them have three! ACK! That won’t work! What is a guy supposed to do? Well, one option is to purchase the DVDs which are recordings of the audio and projected images from each session so you can continue to attend sessions long after the Summit is officially over. Yes, many (possibly all) of these also get posted online and attendees can access those for no extra charge, but those are not necessarily all available as quickly as the DVD recording are, and the DVDs are often more convenient than downloading, especially if you want to share the training with someone who was not able to attend in person. Remember, I don’t make any money or get any other benefit if you buy the DVDs or from anything else that I have recommended here. These are just my own thoughts, trying to help out based on my experiences from the 8 or so Summits I have attended. There is nothing like the Summit. It is an awesome experience, fantastic training, and a whole lot of fun which is just compounded if you’ll take advantage of the first part of this article and make some new friends along the way.

Read the article
The SPARC SuperCluster

- by Karoly Vegh

Oracle has been providing a lead in the Engineered Systems business for quite a while now, in accordance with the motto "Hardware and Software Engineered to Work Together." Indeed it is hard to find a better definition of these systems. Allow me to summarize the idea. It is: Build a compute platform optimized to run your technologies Develop application aware, intelligently caching storage components Take an impressively fast network technology interconnecting it with the compute nodes Tune the application to scale with the nodes to yet unseen performance Reduce the amount of data moving via compression Provide this all in a pre-integrated single product with a single-pane management interface All these ideas have been around in IT for quite some time now. The real Oracle advantage is adding the last one to put these all together. Oracle has built quite a portfolio of Engineered Systems, to run its technologies - and run those like they never ran before. In this post I'll focus on one of them that serves as a consolidation demigod, a multi-purpose engineered system. As you probably have guessed, I am talking about the SPARC SuperCluster. It has many great features inherited from its predecessors, and it adds several new ones. Allow me to pick out and elaborate about some of the most interesting ones from a technological point of view. I. It is the SPARC SuperCluster T4-4. That is, as compute nodes, it includes SPARC T4-4 servers that we learned to appreciate and respect for their features: The SPARC T4 CPUs: Each CPU has 8 cores, each core runs 8 threads. The SPARC T4-4 servers have 4 sockets. That is, a single compute node can in parallel, simultaneously execute 256 threads. Now, a full-rack SPARC SuperCluster has 4 of these servers on board. Remember the keyword demigod. While retaining the forerunner SPARC T3's exceptional throughput, the SPARC T4 CPUs raise the bar with single performance too - a humble 5x better one than their ancestors. actually, the SPARC T4 CPU cores run in both single-threaded and multi-threaded mode, and switch between these two on-the-fly, fulfilling not only single-threaded OR multi-threaded applications' needs, but even mixed requirements (like in database workloads!). Data security, anyone? Every SPARC T4 CPU core has a built-in encryption engine, that is, encryption algorithms cast into silicon. A PCI controller right on the chip for customers who need I/O performance. Built-in, no-cost Virtualization: Oracle VM for SPARC (the former LDoms or Logical Domains) is not a server-emulation virtualization technology but rather a serverpartitioning one, the hypervisor runs in the server firmware, and all the VMs' HW resources (I/O, CPU, memory) are accessed natively, without performance overhead. This enables customers to run a number of Solaris 10 and Solaris 11 VMs separated, independent of each other within a physical server II. For Database performance, it includes Exadata Storage Cells - one of the main reasons why the Exadata Database Machine performs at diabolic speed. What makes them important? They provide DB backend storage for your Oracle Databases to run on the SPARC SuperCluster, that is what they are built and tuned for DB performance. These storage cells are SQL-aware. That is, if a SPARC T4 database compute node executes a query, it doesn't simply request tons of raw datablocks from the storage, filters the received data, and throws away most of it where the statement doesn't apply, but provides the SQL query to the storage node too. The storage cell software speaks SQL, that is, it is able to prefilter and through that transfer only the relevant data. With this, the traffic between database nodes and storage cells is reduced immensely. Less I/O is a good thing - as they say, all the CPUs of the world do one thing just as fast as any other - and that is waiting for I/O. They don't only pre-filter, but also provide data preprocessing features - e.g. if a DB-node requests an aggregate of data, they can calculate it, and handover only the results, not the whole set. Again, less data to transfer. They support the magical HCC, (Hybrid Columnar Compression). That is, data can be stored in a precompressed form on the storage. Less data to transfer. Of course one can't simply rely on disks for performance, there is Flash Storage included there for caching. III. The low latency, high-speed backbone network: InfiniBand, that interconnects all the members with: Real High Speed: 40 Gbit/s. Full Duplex, of course. Oh, and a really low latency. RDMA. Remote Direct Memory Access. This technology allows the DB nodes to do exactly that. Remotely, directly placing SQL commands into the Memory of the storage cells. Dodging all the network-stack bottlenecks, avoiding overhead, placing requests directly into the process queue. You can also run IP over InfiniBand if you please - that's the way the compute nodes can communicate with each other. IV. Including a general-purpose storage too: the ZFSSA, which is a unified storage, providing NAS and SAN access too, with the following features: NFS over RDMA over InfiniBand. Nothing is faster network-filesystem-wise. All the ZFS features onboard, hybrid storage pools, compression, deduplication, snapshot, replication, NFS and CIFS shares Storageheads in a HA-Cluster configuration providing availability of the data DTrace Live Analytics in a web-based Administration UI Being a general purpose application data storage for your non-database applications running on the SPARC SuperCluster over whichever protocol they prefer, easily replicating, snapshotting, cloning data for them. There's a lot of great technology included in Oracle's SPARC SuperCluster, we have talked its interior through. As for external scalability: you can start with a half- of full- rack SPARC SuperCluster, and scale out to several racks - that is, stacking not separate full-rack SPARC SuperClusters, but extending always one large instance of the size of several full-racks. Yes, over InfiniBand network. Add racks as you grow. What technologies shall run on it? SPARC SuperCluster is a general purpose scaleout consolidation/cloud environment. You can run Oracle Databases with RAC scaling, or Oracle Weblogic (end enjoy the SPARC T4's advantages to run Java). Remember, Oracle technologies have been integrated with the Oracle Engineered Systems - this is the Oracle on Oracle advantage. But you can run other software environments such as SAP if you please too. Run any application that runs on Oracle Solaris 10 or Solaris 11. Separate them in Virtual Machines, or even Oracle Solaris Zones, monitor and manage those from a central UI. Here the key takeaways once again: The SPARC SuperCluster: Is a pre-integrated Engineered System Contains SPARC T4-4 servers with built-in virtualization, cryptography, dynamic threading Contains the Exadata storage cells that intelligently offload the burden of the DB-nodes Contains a highly available ZFS Storage Appliance, that provides SAN/NAS storage in a unified way Combines all these elements over a high-speed, low-latency backbone network implemented with InfiniBand Can grow from a single half-rack to several full-rack size Supports the consolidation of hundreds of applications To summarize: All these technologies are great by themselves, but the real value is like in every other Oracle Engineered System: Integration. All these technologies are tuned to perform together. Together they are way more than the sum of all - and a careful and actually very time consuming integration process is necessary to orchestrate all these for performance. The SPARC SuperCluster's goal is to enable infrastructure operations and offer a pre-integrated solution that can be architected and delivered in hours instead of months of evaluations and tests. The tedious and most importantly time and resource consuming part of the work - testing and evaluating - has been done. Now go, provide services. -- charlie

Read the article
The Great Divorce

- by BlackRabbitCoder

I have a confession to make: I've been in an abusive relationship for more than 17 years now. Yes, I am not ashamed to admit it, but I'm finally doing something about it. I met her in college, she was new and sexy and amazingly fast -- and I'd never met anything like her before. Her style and her power captivated me and I couldn't wait to learn more about her. I took a chance on her, and though I learned a lot from her -- and will always be grateful for my time with her -- I think it's time to move on. Her name was C++, and she so outshone my previous love, C, that any thoughts of going back evaporated in the heat of this new romance. She promised me she'd be gentle and not hurt me the way C did. She promised me she'd clean-up after herself better than C did. She promised me she'd be less enigmatic and easier to keep happy than C was. But I was deceived. Oh sure, as far as truth goes, it wasn't a complete lie. To some extent she was more fun, more powerful, safer, and easier to maintain. But it just wasn't good enough -- or at least it's not good enough now. I loved C++, some part of me still does, it's my first-love of programming languages and I recognize its raw power, its blazing speed, and its improvements over its predecessor. But with today's hardware, at speeds we could only dream to conceive of twenty years ago, that need for speed -- at the cost of all else -- has died, and that has left my feelings for C++ moribund. If I ever need to write an operating system or a device driver, then I might need that speed. But 99% of the time I don't. I'm a business-type programmer and chances are 90% of you are too, and even the ones who need speed at all costs may be surprised by how much you sacrifice for that. That's not to say that I don't want my software to perform, and it's not to say that in the business world we don't care about speed or that our job is somehow less difficult or technical. There's many times we write programs to handle millions of real-time updates or handle thousands of financial transactions or tracking trading algorithms where every second counts. But if I choose to write my code in C++ purely for speed chances are I'll never notice the speed increase -- and equally true chances are it will be far more prone to crash and far less easy to maintain. Nearly without fail, it's the macro-optimizations you need, not the micro-optimizations. If I choose to write a O(n2) algorithm when I could have used a O(n) algorithm -- that can kill me. If I choose to go to the database to load a piece of unchanging data every time instead of caching it on first load -- that too can kill me. And if I cross the network multiple times for pieces of data instead of getting it all at once -- yes that can also kill me. But choosing an overly powerful and dangerous mid-level language to squeeze out every last drop of performance will realistically not make stock orders process any faster, and more likely than not open up the system to more risk of crashes and resource leaks. And that's when my love for C++ began to die. When I noticed that I didn't need that speed anymore. That that speed was really kind of a lie. Sure, I can be super efficient and pack bits in a byte instead of using separate boolean values. Sure, I can use an unsigned char instead of an int. But in the grand scheme of things it doesn't matter as much as you think it does. The key is maintainability, and that's where C++ failed me. I like to tell the other developers I work with that there's two levels of correctness in coding: Is it immediately correct? Will it stay correct? That is, you can hack together any piece of code and make it correct to satisfy a task at hand, but if a new developer can't come in tomorrow and make a fairly significant change to it without jeopardizing that correctness, it won't stay correct. Some people laugh at me when I say I now prefer maintainability over speed. But that is exactly the point. If you focus solely on speed you tend to produce code that is much harder to maintain over the long hall, and that's a load of technical debt most shops can't afford to carry and end up completely scrapping code before it's time. When good code is written well for maintainability, though, it can be correct both now and in the future. And you know the best part is? My new love is nearly as fast as C++, and in some cases even faster -- and better than that, I know C# will treat me right. Her creators have poured hundreds of thousands of hours of time into making her the sexy beast she is today. They made her easy to understand and not an enigmatic mess. They made her consistent and not moody and amorphous. And they made her perform as fast as I care to go by optimizing her both at compile time and a run-time. Her code is so elegant and easy on the eyes that I'm not worried where she will run to or what she'll pull behind my back. She is powerful enough to handle all my tasks, fast enough to execute them with blazing speed, maintainable enough so that I can rely on even fairly new peers to modify my work, and rich enough to allow me to satisfy any need. C# doesn't ask me to clean up her messes! She cleans up after herself and she tries to make my life easier for me by taking on most of those optimization tasks C++ asked me to take upon myself. Now, there are many of you who would say that I am the cause of my own grief, that it was my fault C++ didn't behave because I didn't pay enough attention to her. That I alone caused the pain she inflicted on me. And to some extent, you have a point. But she was so high maintenance, requiring me to know every twist and turn of her vast and unrestrained power that any wrong term or bout of forgetfulness was met with painful reminders that she wasn't going to watch my back when I made a mistake. But C#, she loves me when I'm good, and she loves me when I'm bad, and together we make beautiful code that is both fast and safe. So that's why I'm leaving C++ behind. She says she's changing for me, but I have no interest in what C++0x may bring. Oh, I'll still keep in touch, and maybe I'll see her now and again when she brings her problems to my door and asks for some attention -- for I always have a soft spot for her, you see. But she's out of my house now. I have three kids and a dog and a cat, and all require me to clean up after them, why should I have to clean up after my programming language as well?

Read the article
Day 3 - XNA: Hacking around with images

- by dapostolov

Yay! Today I'm going to get into some code! My mind has been on this all day! I find it amusing how I practice, daily, to be "in the moment" or "present" and the excitement and anticipation of this project seems to snatch it away from me frequently. WELL!!! (Shakes Excitedly) Let's do this =)! Let's code! For these next few days it is my intention to better understand image rendering using XNA; after said prototypes are complete I should (fingers crossed) be able to dive into my game code using the design document I hammered out the other night. On a personal note, I think the toughest thing right now is finding the time to do this project. Each night, after my little ones go to bed I can only really afford a couple hours of work on this project. However, I hope to utilise this time as best as I can because this is the first time in a while I've found a project that I've been passionate about. A friend recently asked me if I intend to go 3D or extend the game design. Yes. For now I'm keeping it simple. Lastly, just as a note, as I was doing some further research into image rendering this morning I came across some other XNA content and lessons learned. I believe this content could have probably been posted in the first couple of posts, however, I will share the new content as I learn it at the end of each day. Maybe I'll take some time later to fix the posts but for now Installation and Deployment - Lessons Learned I had installed the XNA studio (Day 1) and the site instructions were pretty easy to follow. However, I had a small difficulty with my development environment. You see, I run a virtual desktop development environment. Even though I was able to code and compile all the tutorials the game failed to run...because I lacked a 3D capable card; it was not detected on the virtual box... First Lesson: The XNA runtime needs to "see" the 3D card! No sweat, Il copied the files over to my parent box and executed the program. ERROR. Hmm... Second Lesson (which I should have probably known but I let the excitement get the better of me): you need the XNA runtime on the client PC to run the game, oh, and don't forget the .Net Runtime! Sprite, it ain't just a Soft Drink... With these prototypes I intend to understand and perform the following tasks. learn game development terminology how to place and position (rotate) a static image on the screen how to layer static images on the screen understand image scaling can we reuse images? understand how framerate is handled in XNA how to display text , basic shapes, and colors on the screen how to interact with an image (collision of user input?) how to animate an image and understand basic animation techniques how to detect colliding images or screen edges how to manipulate the image, lets say colors, stretching how to focus on a segment of an image...like only displaying a frame on a film reel what's the best way to manage images (compression, storage, location, prevent artwork theft, etc.) Well, let's start with this "prototype" task list for now...Today, let's get an image on the screen and maybe I can mark a few of the tasks as completed... C# Prototype1 New Visual Studio Project Select the XNA Game Studio 3.1 Project Type Select the Windows Game 3.1 Template Type Prototype1 in the Name textbox provided Press OK. At this point code has auto-magically been created. Feel free to press the F5 key to run your first XNA program. You should have a blue screen infront of you. Without getting into the nitty gritty right, the code that was generated basically creates some basic code to clear the window content with the lovely CornFlowerBlue color. Something to notice, when you move your mouse into the window...nothing. ooooo spoooky. Let's put an image on that screen! Step A - Get an Image into the solution Under "Content" in your Solution Explorer, right click and add a new folder and name it "Sprites". Copy a small image in there; I copied a "Royalty Free" wizard hat from a quick google search and named it wizards_hat.jpg (rightfully so!) Step B - Add the sprite and position fields Now, open/edit Game1.cs Locate the following line: SpriteBatch spriteBatch; Under this line type the following: SpriteBatch spriteBatch; // the line you are looking for... Texture2D sprite; Vector2 position; Step C - Load the image asset Locate the "Load Content" Method and duplicate the following: protected override void LoadContent() { spriteBatch = new SpriteBatch(GraphicsDevice); // your image name goes here... sprite = Content.Load<Texture2D>("Sprites\\wizards_hat"); position = new Vector2(200, 100); base.LoadContent(); } Step D - Draw the image Locate the "Draw" Method and duplicate the following: protected override void Draw(GameTime gameTime) { GraphicsDevice.Clear(Color.CornflowerBlue); spriteBatch.Begin(SpriteBlendMode.AlphaBlend); spriteBatch.Draw(sprite, position, Color.White); spriteBatch.End(); base.Draw(gameTime); } Step E - Compile and Run Engage! (F5) - Debug! Your image should now display on a cornflowerblue window about 200 pixels from the left and 100 pixels from the top. Awesome! =) Pretty cool how we only coded a few lines to display an image, but believe me, there is plenty going on behind the scenes. However, for now, I'm going to call it a night here. Blogging all this progress certainly takes time... However, tomorrow night I'm going to detail what we just did, plus start checking off points on that list! I'm wondering right now if I should add pictures / code to this post...let me know if you want them =) Best Regards, D.

Read the article
What's new in Solaris 11.1?

- by Karoly Vegh

Solaris 11.1 is released. This is the first release update since Solaris 11 11/11, the versioning has been changed from MM/YY style to 11.1 highlighting that this is Solaris 11 Update 1. Solaris 11 itself has been great. What's new in Solaris 11.1? Allow me to pick some new features from the What's New PDF that can be found in the official Oracle Solaris 11.1 Documentation. The updates are very numerous, I really can't include all. I. New AI Automated Installer RBAC profiles have been introduced to enable delegation of installation tasks. II. The interactive installer now supports installing the OS to iSCSI targets. III. ASR (Auto Service Request) and OCM (Oracle Configuration Manager) have been enabled by default to proactively provide support information and create service requests to speed up support processes. This is optional and can be disabled but helps a lot in supportcases. For further information, see: http://oracle.com/goto/solarisautoreg IV. The new command svcbundle helps you to create SMF manifests without having to struggle with XML editing. (btw, do you know the interactive editprop subcommand in svccfg? The listprop/setprop subcommands are great for scripting and automating, but for an interactive property editing session try, for example, this: svccfg -s svc:/application/pkg/system-repository:default editprop ) V. pfedit: Ever wondered how to delegate editing permissions to certain files? It is well known "sudo /usr/bin/vi /etc/hosts" is not the right way, for sudo elevates the complete vi process to admin levels, and the user can "break" out of the session as root with simply starting a shell from that vi. Now, the new pfedit command provides a solution exactly to this challenge - an auditable, secure, per-user configurable editing possibility. See the pfedit man page for examples. VI. rsyslog, the popular logging daemon (filters, SSL, formattable output, SQL collect...) has been included in Solaris 11.1 as an alternative to syslog. VII: Zones: Solaris Zones - as a major Solaris differentiator - got lots of love in terms of new features: ZOSS - Zones on Shared Storage: Placing your zones to shared storage (FC, iSCSI) has never been this easy - via zonecfg. parallell updates - with S11's bootenvironments updating zones was no problem and meant no downtime anyway, but still, now you can update them parallelly, a way faster update action if you are running a large number of zones. This is like parallell patching in Solaris 10, but with all the IPS/ZFS/S11 goodness. per-zone fstype statistics: Running zones on a shared filesystems complicate the I/O debugging, since ZFS collects all the random writes and delivers them sequentially to boost performance. Now, over kstat you can find out which zone's I/O has an impact on the other ones, see the examples in the documentation: http://docs.oracle.com/cd/E26502_01/html/E29024/gmheh.html#scrolltoc Zones got RDSv3 protocol support for InfiniBand, and IPoIB support with Crossbow's anet (automatic vnic creation) feature. NUMA I/O support for Zones: customers can now determine the NUMA I/O topology of the system from within zones. VIII: Security got a lot of attention too: Automated security/audit reporting, with builtin reporting templates e.g. for PCI (payment card industry) audits. PAM is now configureable on a per-user basis instead of system wide, allowing different authentication requirements for different users SSH in Solaris 11.1 now supports running in FIPS 140-2 mode, that is, in a U.S. government security accredited fashion. SHA512/224 and SHA512/256 cryptographic hash functions are implemented in a FIPS-compliant way - and on a T4 implemented in silicon! That is, goverment-approved cryptography at HW-speed. Generally, Solaris is currently under evaluation to be both FIPS and Common Criteria certified. IX. Networking, as one of the core strengths of Solaris 11, has been extended with: Data Center Bridging (DCB) - not only setups where network and storage share the same fabric (FCoE, anyone?) can have Quality-of-Service requirements. DCB enables peers to distinguish traffic based on priorities. Your NICs have to support DCB, see the documentation, and additional information on Wikipedia. DataLink MultiPathing, DLMP, enables link aggregation to span across multiple switches, even between those of different vendors. But there are essential differences to the good old bandwidth-aggregating LACP, see the documentation: http://docs.oracle.com/cd/E26502_01/html/E28993/gmdlu.html#scrolltoc VNIC live migration is now supported from one physical NIC to another on-the-fly X. Data management: FedFS, (Federated FileSystem) is new, it relies on Solaris 11's NFS referring mechanism to join separate shares of different NFS servers into a single filesystem namespace. The referring system has been there since S11 11/11, in Solaris 11.1 FedFS uses a LDAP - as the one global nameservice to bind them all. The iSCSI initiator now uses the T4 CPU's HW-implemented CRC32 algorithm - thus improving iSCSI throughput while reducing CPU utilization on a T4 Storage locking improvements are now RAC aware, speeding up throughput with better locking-communication between nodes up to 20%! XI: Kernel performance optimizations: The new Virtual Memory subsystem ("VM2") scales now to 100+ TB Memory ranges. The memory predictor monitors large memory page usage, and adjust memory page sizes to applications' needs OSM, the Optimized Shared Memory allows Oracle DBs' SGA to be resized online XII: The Power Aware Dispatcher in now by default enabled, reducing power consumption of idle CPUs. Also, the LDoms' Power Management policies and the poweradm settings in Solaris 11 OS will cooperate. XIII: x86 boot: upgrade to the (Grand Unified Bootloader) GRUB2. Because grub2 differs in the configuration syntactically from grub1, one shall not edit the new grub configuration (grub.cfg) but use the new bootadm features to update it. GRUB2 adds UEFI support and also support for disks over 2TB. XIV: Improved viewing of per-CPU statistics of mpstat. This one might seem of less importance at first, but nowadays having better sorting/filtering possibilities on a periodically updated mpstat output of 256+ vCPUs can be a blessing. XV: Support for Solaris Cluster 4.1: The What's New document doesn't actually mention this one, since OSC 4.1 has not been released at the time 11.1 was. But since then it is available, and it requires Solaris 11.1. And it's only a "pkg update" away. ...aand I seriously need to stop here. There's a lot I missed, Edge Virtual Bridging, lofi tuning, ZFS sharing and crypto enhancements, USB3.0, pulseaudio, trusted extensions updates, etc - but if I mention all those then I effectively copy the What's New document. Which I recommend reading now anyway, it is a great extract of the 300+ new projects and RFE-followups in S11.1. And this blogpost is a summary of that extract. For closing words, allow me to come back to Request For Enhancements, RFEs. Any customer can request features. Open up a Support Request, explain that this is an RFE, describe the feature you/your company desires to have in S11 implemented. The more SRs are collected for an RFE, the more chance it's got to get implemented. Feel free to provide feedback about the product, as well as about the Solaris 11.1 Documentation using the "Feedback" button there. Both the Solaris engineers and the documentation writers are eager to hear your input.Feel free to comment about this post too. Except that it's too long ;) wbr,charlie

Read the article
Agile Testing Days 2012 – Day 3 – Agile or agile?

- by Chris George

Another early start for my last Lean Coffee of the conference, and again it was not wasted. We had some really interesting discussions around how to determine what test automation is useful, if agile is not faster, why do it? and a rather existential discussion on whether unicorns exist! First keynote of the day was entitled “Fast Feedback Teams” by Ola Ellnestam. Again this relates nicely to the releasing faster talk on day 2, and something that we are looking at and some teams are actively trying. Introducing the notion of feedback, Ola describes a game he wrote for his eldest child. It was a simple game where every time he clicked a button, it displayed “You’ve Won!”. He then changed it to be a Win-Lose-Win-Lose pattern and watched the feedback from his son who then twigged the pattern and got his younger brother to play, alternating turns… genius! (must do that with my children). The idea behind this was that you need that feedback loop to learn and progress. If you are not getting the feedback you need to close that loop. An interesting point Ola made was to solve problems BEFORE writing software. It may be that you don’t have to write anything at all, perhaps it’s a communication/training issue? Perhaps the problem can be solved another way. Writing software, although it’s the business we are in, is expensive, and this should be taken into account. He again mentions frequent releases, and how they should be made as soon as stuff is ready to be released, don’t leave stuff on the shelf cause it’s not earning you anything, money or data. I totally agree with this and it’s something that we will be aiming for moving forwards. “Exceptions, Assumptions and Ambiguity: Finding the truth behind the story” by David Evans started off very promising by making references to ‘Grim up North’ referring to the north of England. Not sure it was appreciated by most of the audience, but it made me laugh! David explained how there are always risks associated with exceptions, giving the example of a one-way road near where he lives, with an exception sign giving rights to coaches to go the wrong way. Therefore you could merrily swing around the corner of the one way road straight into a coach! David showed the danger in making assumptions with lyrical quotes from Lola by The Kinks “I’m glad I’m a man, and so is Lola” and with a picture of a toilet flush that needed instructions to operate the full and half flush. With this particular flush, you pulled the handle all the way down to half flush, and half way down to full flush! hmmm, a bit of a crappy user experience methinks! Then through a clever use of a passage from the Jabberwocky, David then went onto show how mis-translation/ambiguity is the can completely distort the original meaning of something, and this is a real enemy of software development. This was all helping to demonstrate that the term Story is often heavily overloaded in the Agile world, and should really be stripped back to what it is really for, stating a business problem, and offering a technical solution. Therefore a story could be worded as “In order to {make some improvement}, we will { do something}”. The first ‘in order to’ statement is stakeholder neutral, and states the problem through requesting an improvement to the software/process etc. The second part of the story is the verb, the doing bit. So to achieve the ‘improvement’ which is not currently true, we will do something to make this true in the future. My PM is very interested in this, and he’s observed some of the problems of overloading stories so I’m hoping between us we can use some of David’s suggestions to help clarify our stories better. The second keynote of the day (and our last) proved to be the most entertaining and exhausting of the conference for me. “The ongoing evolution of testing in agile development” by Scott Barber. I’ve never had the pleasure of seeing Scott before… OMG I would love to have even half of the energy he has! What struck me during this presentation was Scott’s explanation of how testing has become the role/job that it is (largely) today, and how this has led to the need for ‘methodologies’ to make dev and test work! The argument that we should be trying to converge the roles again is a very valid one, and one that a couple of the teams at work are actively doing with great results. Making developers as responsible for quality as testers is something that has been lost over the years, but something that we are now striving to achieve. The idea that we (testers) should be testing experts/specialists, not testing ‘union members’, supports this idea so the entire team works on all aspects of a feature/product, with the ‘specialists’ taking the lead and advising/coaching the others. This leads to better propagation of information around the team, a greater holistic understanding of the project and it allows the team to continue functioning if some of it’s members are off sick, for example. Feeling somewhat drained from Scott’s keynote (but at the same time excited that alot of the points he raised supported actions we are taking at work), I headed into my last presentation for Agile Testing Days 2012 before having to make my way to Tegel to catch the flight home. “Thinking and working agile in an unbending world” with Pete Walen was a talk I was not going to miss! Having spoken to Pete several times during the past few days, I was looking forward to hearing what he was going to say, and I was not disappointed. Pete started off by trying to separate the definitions of ‘Agile’ as in the methodology, and ‘agile’ as in the adjective by pronouncing them the ‘english’ and ‘american’ ways. So Agile pronounced (Ajyle) and agile pronounced (ajul). There was much confusion around what the hell he was talking about, although I thought it was quite clear. Agile – Software development methodology agile – Marked by ready ability to move with quick easy grace; Having a quick resourceful and adaptable character. Anyway, that aside (although it provided a few laughs during the presentation), the point was that many teams that claim to be ‘Agile’ but are not, in fact, ‘agile’ by nature. Implementing ‘Agile’ methodologies that are so prescriptive actually goes against the very nature of Agile development where a team should anticipate, adapt and explore. Pete made a valid point that very few companies intentionally put up roadblocks to impede work, so if work is being blocked/delayed, why? This is where being agile as a team pays off because the team can inspect what’s going on, explore options and adapt their processes. It is through experimentation (and that means trying and failing as well as trying and succeeding) that a team will improve and grow leading to focussing on what really needs to be done to achieve X. So, that was it, the last talk of our conference. I was gutted that we had to miss the closing keynote from Matt Heusser, as Matt was another person I had spoken too a few times during the conference, but the flight would not wait, and just as well we left when we did because the traffic was a nightmare! My Takeaway Triple from Day 3: Release often and release small – don’t leave stuff on the shelf Keep the meaning of the word ‘agile’ in mind when working in ‘Agile Look at testing as more of a skill than a role

Read the article
Thou shalt not put code on a piedestal - Code is a tool, no more, no less

- by Ralf Westphal

“Write great code and everything else becomes easier” is what Paul Pagel believes in. That´s his version of an adage by Brian Marick he cites: “treat code as an end, not just a means.” And he concludes: “My post-Agile world is software craftsmanship.” I wonder, if that´s really the way to go. Will “simply” writing great code lead the software industry into the light? He´s alluding to the philosopher Kant who proposed, a human beings should never be treated as a means, but always as an end. But should we transfer this ethical statement into the world of software? I doubt it. Reason #1: Human beings are categorially different from code. They are autonomous entities who need to find a way of living happily together. To Kant it seemed this goal could only be reached if nobody (ab)used a human being for his/her purposes. Because using a human being, i.e. treating it as a means, would contradict the fundamental autonomy and freedom of human beings. People should hold up a symmetric view of their relationships: Since nobody wants to be (ab)used, nobody should (ab)use anybody else. If you want to be treated decently, with respect, in accordance with your own free will - which means as an end - then do the same to other people. Code is dead, it´s a product, it´s a tool for people to reach their goals. No company spends any money on code other than to save money or earn money in the long run. Code is not a puppy. Enterprises do not commission software development to just feel good in its company. Code is not a buddy. Code is a slave, if you will. A mechanical slave, a non-tangible robot. Code is a tool, is a tool. And if we start to treat it differently, if we elevate its status unduely… I guess that will contort our relationship in a contraproductive way. Please get me right: Just because something is “just a tool”, “just a product” does not mean we should not be careful while designing, building, using it. Right to the contrary. We should be very careful when writing code – but not for the code´s sake! We should be careful because we respect our customers who are fellow human beings who should be treated as an end. If we are careless, neglectful, ignorant when producing code on their behalf, then we´re using them. Being sloppy means you´re caring more for yourself that for your customer. You´re then treating the customer as a means to fulfill some of your own needs. That´s plain unethical behavior. Reason #2: The focus should always be on your purpose, not on any tool. But if code is treated as an end, then the focus is on the code. That might sound right, because where else should be your focus as a software developer? But, well, I´d say, your focus should be on delivering value to your customer. Because in the end your customer does not care if you write a single line of code. She just wants her problem to be solved. Solving problems is the purpose of any contractor. Code must be treated just as a means, a tool we know how to handle very well. But if we´re really trying to be craftsmen then we should be conscious about exactly that and act ethically. That means we must never be so focused on our tool as to be unable to suggest better solutions to the problems of our customers than code. I´m all with Paul when he urges us to “Write great code”. Sure, if you need to write code, then by all means do so. Write the best code you can think of – and then try to improve it. Paul has all the best intentions when he signs Brians “treat code as an end” - but as we all know: “The road to hell is paved with best intentions” ;-) Yes, I can imagine a “hell of code focus”. In fact, I don´t need to imagine it, I´m seeing it quite often. Because code hell is whereever two developers stand together and are so immersed in talking about all sorts of coding tricks, design patterns, code smells, technologies, platforms, tools that they lose sight of the big picture. Talking about TDD or SOLID or refactoring is a sign of consciousness – relative to the “cowboy coders” view of the world. But from yet another point of view TDD, SOLID, and refactoring are just cures for ailments within a system. And I fear, if “Writing great code” is the only focus or the main focus of software development, then we as an industry lose the ability to see that. Focus draws a line around something, it defines a horizon for perceptions and thinking. So if we focus on code our horizon ends where “the land of code” ends. I don´t think that should be our professional attitude. So what about Software Craftsmanship as the next big thing after Agility? I think Software Craftsmanship has an important message for all software developers and beyond. But to make it the successor of the Agility movement seems to miss a point. Agility never claimed to solve all software development problems, I´d say. So to blame it for having missed out on certain aspects of it is wrong. If I had to summarize Agility in one word I´d say “Value”. Agility put value for the customer back in software development. Focus on delivering value early and often – that´s Agility´s mantra. All else follows from that. And I ask you: Is that obsolete? Is delivering value not hip anymore? No, sure not. That´s our very purpose as software developers. So how can Agility become obsolete and need to be replaced? We need to do away with this “either/or”-thinking. It´s either Agility or Lean or Software Craftsmanship or whatnot. Instead we should start integrating concepts and movements. Think “both/and”. Think Agility plus Software Craftsmanship plus Lean plus whatnot. We don´t neet to tear down anything from a piedestal and replace it with a new idol. Instead we should do away with piedestals and arrange whatever is helpful is a circle. Then we can turn to concepts, movements for whatever they are best. After 10 years of Agility we should be able to identify what it was good at – and keep that. Keep Agility around and add whatever Agility was lacking or never concerned with. Add whatever is at the core of Software Craftsmanship. Add whatever is at the core of Lean etc. But don´t call out the age of Post-Agility. Because it better never will end. Because once we start to lose Agility´s core we´re losing focus of the customer.

Read the article
SQL Sentry First Impressions

- by AjarnMark

After struggling to defend my SQL Servers from a political attack recently, I realized that I needed better tools to back me up, and SQL Sentry is the leading candidate. A couple of weeks ago, seemingly from out of nowhere, complaints from the business users started coming in that one of the core internal applications was running dramatically slower than normal, and fingers were being pointed at the SQL Server. Unfortunately, we don’t have a production DBA whose entire job is to monitor and maintain our SQL Servers. The responsibility falls to me to do the best I can, investing only a small portion of my time, because there are so many other responsibilities to take care of, and our industry is still deep in recession. I inherited these SQL Servers and have made significant improvements in process and procedure, but I had not yet made the time to take real baseline measurements or keep a really close eye on the performance. Like many DBAs, I wrote several of my own tools and used the “built-in tools” like Profiler, PerfMon, and sp_who2 (did I mention most of our instances are SQL Server 2000?). These have all served me well for in-the-moment troubleshooting and maintenance, but they really fell down on the job when I was called upon to “prove” that SQL Server performance was acceptable and more importantly had not degraded recently (i.e. historical comparisons). I really didn’t have anything from a historical comparison perspective, but I was able to show that current performance was acceptable, and deflect attention back onto other components (which in fact turned out to be the real culprit). That experience dramatically illustrated the need for better monitoring tools. Coincidentally, I had been talking recently to my boss about the mini nightmare of monitoring several critical and interdependent overnight jobs that operate on separate instances of SQL Server. Among other tools, I had been using Idera’s SQL Job Manager which is a free tool and did a nice job of showing me job schedules and histories in a nice calendar view. This worked fairly well, and for the money (did I mention it was free?) it couldn’t be beat. But it is based on the stored job history in MSDB, and there were other performance problems that we ran into when we started changing the settings for how much job history to retain, in order to be able to look back a month or more in the calendar view. Another coincidence (if you believe in such things) was that when we had some of those performance challenges, I posted a couple of questions to the #sqlhelp hashtag on Twitter and Greg Gonzalez (@SQLSensei) suggested I check out SQL Sentry’s Event Manager. At the time, I just thought he worked there, but later found out that he founded the company. When I took a quick look at the features & benefits, the one that really jumped out at me is Chaining and Queueing which sounded like it would really help with our “interdependent jobs on different servers” issue. I know that is a lot of background story and coincidences, but hopefully you have stuck with me so far, and now we have arrived at the point where last week I downloaded and installed the 30-day trial of the SQL Sentry Power Suite, which is Event Manager plus Performance Advisor. And I must say that I really like what I see so far. Here are a few highlights: Great Support. I had two issues getting the trial setup and monitoring a handful of our servers. One of which was entirely my fault (missed a security setting in SQL 2008) and the other was mostly my fault (late change to some config settings that were apparently cached and did not get refreshed properly). In both cases, the support staff at SQL Sentry were very responsive and rather quickly figured out what the cause and fix was for each of them. This left me with a great impression of the company. Kudos to them! Chaining and Queueing. While I have not yet activated this feature, I am very excited about the possibilities. We have jobs on three different instances of SQL Server that have to be run in a certain order, and each has to finish before the next can successfully begin, and I believe this feature will ensure just that. It has been a real pain in the backside when one of those jobs runs just a little too long and does not finish before the job on another instance starts, thus triggering a chain reaction of either outright job failures, or worse, successful completion of completely invalid processing. Calendar View. I really, really like the Event Manager calendar view where I can see all jobs and events across all instances and identify potential resource contention as well as windows of opportunity for maintenance activity. Very well done, and based on Event Manager’s own database of accumulated historical information rather than querying the source instances every time. Performance Advisor Dashboard History View. This view let’s me quickly select a date and time range and it displays graphs of key SQL Server and Windows metrics. This is exactly the thing I needed to answer the “has performance changed recently” question at the beginning of this post. Reporting Services Subscription Jobs with Report Name. This was a big and VERY pleasant surprise. If you have ever looked at the list of SQL Server jobs that SQL Server Reporting Services creates when you make a Subscription, you will notice that they all have some sort of GUID as the name of the job. This is really ugly, and really annoying because when you are just looking at the SQL Agent and Job Activity Monitor, if you see that Job X failed, you really do not have any indication in the name or the properties of the Job itself, as to what Report that was for. But with SQL Sentry Event Manager you do. The Jobs list in the Navigator pane in SQL Sentry, amazingly, displays the name of the Report that the Subscription Job is for. And when you open it to see more details, it shows you the full Reporting Services path to that Report, so you can immediately track it down in the Report Manager in case you want to identify/notify the owner or edit the Subscription information. I did not expect this at all, but I sure do like it. HOORAY! That is just my first impressions from using the tools for a few days. And I haven’t even gotten into how it showed me where I was completely mistaken about one aspect of my SQL Server disk configurations. I’ll share that lesson in another blog entry. But I have to say it again, the combination of Event Manager and Performance Advisor working together have really made me a fan.

Read the article
3D Ball Physics Theory: collision response on ground and against walls?

- by David

I'm really struggling to get a strong grasp on how I should be handling collision response in a game engine I'm building around a 3D ball physics concept. Think Monkey Ball as an example of the type of gameplay. I am currently using sphere-to-sphere broad phase, then AABB to OBB testing (the final test I am using right now is one that checks if one of the 8 OBB points crosses the planes of the object it is testing against). This seems to work pretty well, and I am getting back: Plane that object is colliding against (with a point on the plane, the plane's normal, and the exact point of intersection. I've tried what feels like dozens of different high-level strategies for handling these collisions, without any real success. I think my biggest problem is understanding how to handle collisions against walls in the x-y axes (left/right, front/back), which I want to have elasticity, and the ground (z-axis) where I want an elastic reaction if the ball drops down, but then for it to eventually normalize and be kept "on the ground" (not go into the ground, but also not continue bouncing). Without kluging something together, I'm positive there is a good way to handle this, my theories just aren't getting me all the way there. For physics modeling and movement, I am trying to use a Euler based setup with each object maintaining a position (and destination position prior to collision detection), a velocity (which is added onto the position to determine the destination position), and an acceleration (which I use to store any player input being put on the ball, as well as gravity in the z coord). Starting from when I detect a collision, what is a good way to approach the response to get the expected behavior in all cases? Thanks in advance to anyone taking the time to assist... I am grateful for any pointers, and happy to post any additional info or code if it is useful. UPDATE Based on Steve H's and eBusiness' responses below, I have adapted my collision response to what makes a lot more sense now. It was close to right before, but I didn't have all the right pieces together at the right time! I have one problem left to solve, and that is what is causing the floor collision to hit every frame. Here's the collision response code I have now for the ball, then I'll describe the last bit I'm still struggling to understand. // if we are moving in the direction of the plane (against the normal)... if (m_velocity.dot(intersection.plane.normal) <= 0.0f) { float dampeningForce = 1.8f; // eventually create this value based on mass and acceleration // Calculate the projection velocity PVRTVec3 actingVelocity = m_velocity.project(intersection.plane.normal); m_velocity -= actingVelocity * dampeningForce; } // Clamp z-velocity to zero if we are within a certain threshold // -- NOTE: this was an experimental idea I had to solve the "jitter" bug I'll describe below float diff = 0.2f - abs(m_velocity.z); if (diff > 0.0f && diff <= 0.2f) { m_velocity.z = 0.0f; } // Take this object to its new destination position based on... // -- our pre-collision position + vector to the collision point + our new velocity after collision * time // -- remaining after the collision to finish the movement m_destPosition = m_position + intersection.diff + (m_velocity * intersection.tRemaining * GAMESTATE->dt); The above snippet is run after a collision is detected on the ball (collider) with a collidee (floor in this case). With a dampening force of 1.8f, the ball's reflected "upward" velocity will eventually be overcome by gravity, so the ball will essentially be stuck on the floor. THIS is the problem I have now... the collision code is running every frame (since the ball's z-velocity is constantly pushing it a collision with the floor below it). The ball is not technically stuck, I can move it around still, but the movement is really goofy because the velocity and position keep getting affected adversely by the above snippet. I was experimenting with an idea to clamp the z-velocity to zero if it was "close to zero", but this didn't do what I think... probably because the very next frame the ball gets a new gravity acceleration applied to its velocity regardless (which I think is good, right?). Collisions with walls are as they used to be and work very well. It's just this last bit of "stickiness" to deal with. The camera is constantly jittering up and down by extremely small fractions too when the ball is "at rest". I'll keep playing with it... I like puzzles like this, especially when I think I'm close. Any final ideas on what I could be doing wrong here? UPDATE 2 Good news - I discovered I should be subtracting the intersection.diff from the m_position (position prior to collision). The intersection.diff is my calculation of the difference in the vector of position to destPosition from the intersection point to the position. In this case, adding it was causing my ball to always go "up" just a little bit, causing the jitter. By subtracting it, and moving that clamper for the velocity.z when close to zero to being above the dot product (and changing the test from <= 0 to < 0), I now have the following: // Clamp z-velocity to zero if we are within a certain threshold float diff = 0.2f - abs(m_velocity.z); if (diff > 0.0f && diff <= 0.2f) { m_velocity.z = 0.0f; } // if we are moving in the direction of the plane (against the normal)... float dotprod = m_velocity.dot(intersection.plane.normal); if (dotprod < 0.0f) { float dampeningForce = 1.8f; // eventually create this value based on mass and acceleration? // Calculate the projection velocity PVRTVec3 actingVelocity = m_velocity.project(intersection.plane.normal); m_velocity -= actingVelocity * dampeningForce; } // Take this object to its new destination position based on... // -- our pre-collision position + vector to the collision point + our new velocity after collision * time // -- remaining after the collision to finish the movement m_destPosition = m_position - intersection.diff + (m_velocity * intersection.tRemaining * GAMESTATE->dt); UpdateWorldMatrix(m_destWorldMatrix, m_destOBB, m_destPosition, false); This is MUCH better. No jitter, and the ball now "rests" at the floor, while still bouncing off the floor and walls. The ONLY thing left is that the ball is now virtually "stuck". He can move but at a much slower rate, likely because the else of my dot product test is only letting the ball move at a rate multiplied against the tRemaining... I think this is a better solution than I had previously, but still somehow not the right idea. BTW, I'm trying to journal my progress through this problem for anyone else with a similar situation - hopefully it will serve as some help, as many similar posts have for me over the years.

Read the article
Moving from Tortoise to TFS

- by MarkPearl

The Past A few years ago my small software company made the jump from storing code on a shared folder to source code control. At the time we had evaluated a few of the options and settled on Tortoise SVN. The main motivation for going the SVN route was that we found a great plugin for Visual Studio that allowed us to avoid the command prompt for uploading changes (like I said we are windows programmers… command prompt bad!! ) and it was free. Up to now we have been pretty happy with SVN as it removed many of the worries that I had about how safe my code was on a shared folder and also gave us the opportunity to safely have several developers work on the same project at the same time. The only times when we have been unhappy has been when we have had SVN hell days – which pretty much occur when you are doing something out of the norm and suddenly SVN just won’t resolve conflicts or something along those lines. This happens once every 4 or 5 months and is not necessarily a problem caused directly by SVN – but a problem augmented by SVN. When you have SVN hell days you want to curse SVN! With that in mind I recently have been relooking at our source code control. I have explored using GIT and was very impressed by it and have also looked at TFS. From a source code control perspective I don’t want to get into a heated discussion on which one is better – but I do want to mention that I wear two hats in my organization – software developer & manager, and with the manager hat on I tend to sway the TFS route. So when I was given a coupon to test DiscountASP.Net Team Foundation Server Service for a year, I thought it was the perfect opportunity to try TFS in a distributed environment and also make the first step towards having an integrated development management system. Some of the things that appeal to me about DiscountASP’s offering are the following… Basic management / planning facilities like to do lists inside Visual Studio Daily backup of data on the server – we are developers, not IT managers and so the more of this I could outsource the better Distributed solution – all of us work remotely and so this was a big one as well. Registering and Setting Up with DiscountASP.NET The whole registration process was simple and intuitive. The web interface is not the most visually impressive one, but it is functional and a few seconds after I clicked the last submit button a email was sitting in my inbox giving me my control panel username and suggesting that I read the “Getting Started” article. The getting started article was easy to read and understand so no complaints there either. Next to set my dev environment to work. With a few references to the getting started article I had completed the whole setup process in a matter of minutes. Ten minutes after initiating the whole thing I was logged into VS2010 and creating my first TFS project. With the service that I signed up for, I have access for 5 users – which is sufficient for my internal needs. So from what I can tell, to set the rest of us up on the system I just need to supply them with their user credentials and url. My Concerns Resolved 1) Security So, a few concerns I had about the service. First and foremost – is it secure? I would hate for someone to get access to our code and the whole idea of putting it up on the internet is a concern for me. Turning to the Knowledge Base on the DiscountASP website this is one of the first question I can see answered. According to them it is secure. I have extracted their comment below regarding this. Our TFS hosting service is secure. We only accept HTTPS connections ensuring that any client-server data transmission is encrypted. At the network level, all of our systems are protected by multiple Juniper firewalls, Tipping Point's Intrusion Detection System (see Tipping Point's case study of our use here), and we also employ DDoS mitigation to add extra layers of security. Additionally, physical access to the servers is tightly restricted. Please see the security section of this Knowledge Base article for further details. 2) Web Portal Access The other big concern I have is regarding web portal access. In the ideal world I would like to be able to give my end users access to a web portal for reporting bugs etc. When I initially read through the FAQ of the site it mentioned that there was web portal access – but from what I can see this is just for “users”. Since I am limited to 5 users for the account, it would not be practical to set up external users that we could get feedback from on bugs etc. I would be interested if this is possible – and if so if someone could post it in the comments it would be much appreciated. If this isn’t possible, it is a slight let down as we rely heavily on end user feedback to get feedback and it would have been ideal to have gotten this within the service. Other than those two items, I didn’t have any real concerns that were unresolved. So where do I go from here? So time passed by from the initial writing of this post and as work whirred in and out of my inbox I have still not had a proper opportunity to give the service a test run. Recently though things have began to slow down and then surprise surprise I had another SVN Hell day. With that experience I had a new found resolve to get our team on TFS and so today we are going to start to use the service as a team. I am hoping that I do not have TFS hell days – but if I do, I will be sure to write about them. In short - the verdict is still out on whether this service is going to be invaluable to my business or whether it will create more headaches than it is worth BUT I am hopping it will be an invaluable service. I will only really be able to determine that in a few months… till then!

Read the article
Refactoring Part 1 : Intuitive Investments

- by Wes McClure

Fear, it’s what turns maintaining applications into a nightmare. Technology moves on, teams move on, someone is left to operate the application, what was green is now perceived brown. Eventually the business will evolve and changes will need to be made. The approach to those changes often dictates the long term viability of the application. Fear of change, lack of passion and a lack of interest in understanding the domain often leads to a paranoia to do anything that doesn’t involve duct tape and bailing twine. Don’t get me wrong, those have a place in the short term viability of a project but they don’t have a place in the long term. Add to it “us versus them” in regards to the original team and those that maintain it, internal politics and other factors and you have a recipe for disaster. This results in code that quickly becomes unmanageable. Even the most clever of designs will eventually become sub optimal and debt will amount that exponentially makes changes difficult. This is where refactoring comes in, and it’s something I’m very passionate about. Refactoring is about improving the process whereby we make change, it’s an exponential investment in the process of change. Without it we will incur exponential complexity that halts productivity. Investments, especially in the long term, require intuition and reflection. How can we tackle new development effectively via evolving the original design and paying off debt that has been incurred? The longer we wait to ask and answer this question, the more it will cost us. Small requests don’t warrant big changes, but realizing when changes now will pay off in the long term, and especially in the short term, is valuable. I have done my fair share of maintaining applications and continuously refactoring as needed, but recently I’ve begun work on a project that hasn’t had much debt, if any, paid down in years. This is the first in a series of blog posts to try to capture the process which is largely driven by intuition of smaller refactorings from other projects. Signs that refactoring could help: Testability How can decreasing test time not pay dividends? One of the first things I found was that a very important piece often takes 30+ minutes to test. I can only imagine how much time this has cost historically, but more importantly the time it might cost in the coming weeks: I estimate at least 10-20 hours per person! This is simply unacceptable for almost any situation. As it turns out, about 6 hours of working with this part of the application and I was able to cut the time down to under 30 seconds! In less than the lost time of one week, I was able to fix the problem for all future weeks! If we can’t test fast then we can’t change fast, nor with confidence. Code is used by end users and it’s also used by developers, consider your own needs in terms of the code base. Adding logic to enable/disable features during testing can help decouple parts of an application and lead to massive improvements. What exactly is so wrong about test code in real code? Often, these become features for operators and sometimes end users. If you cannot run an integration test within a test runner in your IDE, it’s time to refactor. Readability Are variables named meaningfully via a ubiquitous language? Is the code segmented functionally or behaviorally so as to minimize the complexity of any one area? Are aspects properly segmented to avoid confusion (security, logging, transactions, translations, dependency management etc) Is the code declarative (what) or imperative (how)? What matters, not how. LINQ is a great abstraction of the what, not how, of collection manipulation. The Reactive framework is a great example of the what, not how, of managing streams of data. Are constants abstracted and named, or are they just inline? Do people constantly bitch about the code/design? If the code is hard to understand, it will be hard to change with confidence. It’s a large undertaking if the original designers didn’t pay much attention to readability and as such will never be done to “completion.” Make sure not to go over board, instead use this as you change an application, not in lieu of changes (like with testability). Complexity Simplicity will never be achieved, it’s highly subjective. That said, a lot of code can be significantly simplified, tidy it up as you go. Refactoring will often converge upon a simplification step after enough time, keep an eye out for this. Understandability In the process of changing code, one often gains a better understanding of it. Refactoring code is a good way to learn how it works. However, it’s usually best in combination with other reasons, in effect killing two birds with one stone. Often this is done when readability is poor, in which case understandability is usually poor as well. In the large undertaking we are making with this legacy application, we will be replacing it. Therefore, understanding all of its features is important and this refactoring technique will come in very handy. Unused code How can deleting things not help? This is a freebie in refactoring, it’s very easy to detect with modern tools, especially in statically typed languages. We have VCS for a reason, if in doubt, delete it out (ok that was cheesy)! If you don’t know where to start when refactoring, this is an excellent starting point! Duplication Do not pray and sacrifice to the anti-duplication gods, there are excellent examples where consolidated code is a horrible idea, usually with divergent domains. That said, mediocre developers live by copy/paste. Other times features converge and aren’t combined. Tools for finding similar code are great in the example of copy/paste problems. Knowledge of the domain helps identify convergent concepts that often lead to convergent solutions and will give intuition for where to look for conceptual repetition. 80/20 and the Boy Scouts It’s often said that 80% of the time 20% of the application is used most. These tend to be the parts that are changed. There are also parts of the code where 80% of the time is spent changing 20% (probably for all the refactoring smells above). I focus on these areas any time I make a change and follow the philosophy of the Boy Scout in cleaning up more than I messed up. If I spend 2 hours changing an application, in the 20%, I’ll always spend at least 15 minutes cleaning it or nearby areas. This gives a huge productivity edge on developers that don’t. Ironically after a short period of time the 20% shrinks enough that we don’t have to spend 80% of our time there and can move on to other areas. Refactoring is highly subjective, never attempt to refactor to completion! Learn to be comfortable with leaving one part of the application in a better state than others. It’s an evolution, not a revolution. These are some simple areas to look into when making changes and can help get one started in the process. I’ve often found that refactoring is a convergent process towards simplicity that sometimes spans a few hours but often can lead to massive simplifications over the timespan of weeks and months of regular development.

Read the article

Search Results

Search found 23323 results on 933 pages for 'worst is better'.

Page 371/933 | < Previous Page | 367 368 369 370 371 372 373 374 375 376 377 378 | Next Page >

- by Robert May

- by rchrd

- by nmarun

- by James Michael Hare

- by Pinal Dave

- by SQLOS Team

- by Dennis Vroegop

- by danx

- by [email protected]

- by Pinal Dave

- by Rick Strahl

- by Pinal Dave

- by terje

- by Dejan Sarka

- by Ajarn Mark Caldwell

- by Karoly Vegh

- by BlackRabbitCoder

- by dapostolov

- by Karoly Vegh

- by Chris George

- by Ralf Westphal

- by AjarnMark

- by David

- by MarkPearl

- by Wes McClure

< Previous Page | 367 368 369 370 371 372 373 374 375 376 377 378 | Next Page >