Search Results

Search found 454 results on 19 pages for 'optimizations'.

Page 14/19 | < Previous Page | 10 11 12 13 14 15 16 17 18 19 | Next Page >

How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article
SQLAuthority News – Community Service and Public Speaking Engagements

- by pinaldave

Today is the last day of the year and I was going over my memories for year 2010. Almost all of them are good and I feel for sure better person in terms of knowledge, nature and overall human being. Looking back at the year, it is very satisfying as I was able to go out in public and help community out at various capacity. Thought, most of the time my contribution was as speaker, many times, I have reached out and helped organized event and worked at any capacity to get the event out. I have taken parts in many TechEds, PASS events, Virtual Tech Days, Various Community Events around the Globe and my contribution is not limited to my country only. Overall – I feel good to be part of this wonderful and supportive community. SQLAuthority News – A Successful Community TechDays at Ahmedabad – December 11, 2010 SQLAuthority News – A Successful Performance Tuning Seminar at Pune – Dec 4-5, 2010 SQL SERVER – A Successful Performance Tuning Seminar – Hyderabad – Nov 27-28, 2010 – Next Pune SQLAuthority News – SQLPASS Nov 8-11, 2010-Seattle – An Alternative Look at Experience SQLAuthority News – Statistics and Best Practices – Virtual Tech Days – Nov 22, 2010 SQLAuthority News – SQL Server Performance Optimizations Seminar – Grand Success – Colombo, Sri Lanka – Oct 4 – 5, 2010 SQL SERVER – Visiting Alma Mater – Delivering Session on Database Performance and Career – Nirma Institute of Technology SQLAuthority News – Feedback Received for Virtual Tech Days Sessions on Spatial Database SQLAuthority News – Community Tech Days, Ahmedabad – July 24, 2010 SQLAuthority News – SQL Data Camp, Chennai, July 17, 2010 – A Huge Success SQLAuthority News – 2 Sessions at TechInsight 2010 – June 29 – July 1, 2010 SQLAuthority News – Author Visit – SQL Server 2008 R2 Launch SQLAuthority News – Professional Development and Community SQLAuthority News – TechEd India – April 12-14, 2010 Bangalore – An Unforgettable Experience – An Opportunity of A Lifetime SQLAuthority News – Speaking Sessions at TechEd India – 3 Sessions – 1 Panel Discussion SQLAuthority News – Meeting with Allen Bailochan Tuladhar – An Unlimited Experience SQLAuthority News – Author Visit Review – TechMela Nepal – March 29-30, 2010 SQLAuthority News – Excellent Event – TechEd Sri Lanka – Feb 8, 2010 SQLAuthority News – Hyderabad Techies February Fever Feb 11, 2010 – Indexing for Performance SQLAuthority News – MUGH – Microsoft User Group Hyderabad – Feb 2, 2010 Session Review SQLAuthority News – Ahmedabad Community Tech Days – Jan 30, 2010 – Huge Success For earlier year’s contribution you can check my webpage over here. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: About Me, Pinal Dave, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQLAuthority Book Review, SQLAuthority News, T SQL, Technology

Read the article
Fastest pathfinding for static node matrix

- by Sean Martin

I'm programming a route finding routine in VB.NET for an online game I play, and I'm searching for the fastest route finding algorithm for my map type. The game takes place in space, with thousands of solar systems connected by jump gates. The game devs have provided a DB dump containing a list of every system and the systems it can jump to. The map isn't quite a node tree, since some branches can jump to other branches - more of a matrix. What I need is a fast pathfinding algorithm. I have already implemented an A* routine and a Dijkstra's, both find the best path but are too slow for my purposes - a search that considers about 5000 nodes takes over 20 seconds to compute. A similar program on a website can do the same search in less than a second. This website claims to use D*, which I have looked into. That algorithm seems more appropriate for dynamic maps rather than one that does not change - unless I misunderstand it's premise. So is there something faster I can use for a map that is not your typical tile/polygon base? GBFS? Perhaps a DFS? Or have I likely got some problem with my A* - maybe poorly chosen heuristics or movement cost? Currently my movement cost is the length of the jump (the DB dump has solar system coordinates as well), and the heuristic is a quick euclidean calculation from the node to the goal. In case anyone has some optimizations for my A*, here is the routine that consumes about 60% of my processing time, according to my profiler. The coordinateData table contains a list of every system's coordinates, and neighborNode.distance is the distance of the jump. Private Function findDistance(ByVal startSystem As Integer, ByVal endSystem As Integer) As Integer 'hCount += 1 'If hCount Mod 0 = 0 Then 'Return hCache 'End If 'Initialize variables to be filled Dim x1, x2, y1, y2, z1, z2 As Integer 'LINQ queries for solar system data Dim systemFromData = From result In jumpDataDB.coordinateDatas Where result.systemId = startSystem Select result.x, result.y, result.z Dim systemToData = From result In jumpDataDB.coordinateDatas Where result.systemId = endSystem Select result.x, result.y, result.z 'LINQ execute 'Fill variables with solar system data for from and to system For Each solarSystem In systemFromData x1 = (solarSystem.x) y1 = (solarSystem.y) z1 = (solarSystem.z) Next For Each solarSystem In systemToData x2 = (solarSystem.x) y2 = (solarSystem.y) z2 = (solarSystem.z) Next Dim x3 = Math.Abs(x1 - x2) Dim y3 = Math.Abs(y1 - y2) Dim z3 = Math.Abs(z1 - z2) 'Calculate distance and round 'Dim distance = Math.Round(Math.Sqrt(Math.Abs((x1 - x2) ^ 2) + Math.Abs((y1 - y2) ^ 2) + Math.Abs((z1 - z2) ^ 2))) Dim distance = firstConstant * Math.Min(secondConstant * (x3 + y3 + z3), Math.Max(x3, Math.Max(y3, z3))) 'Dim distance = Math.Abs(x1 - x2) + Math.Abs(z1 - z2) + Math.Abs(y1 - y2) 'hCache = distance Return distance End Function And the main loop, the other 30% 'Begin search While openList.Count() != 0 'Set current system and move node to closed currentNode = lowestF() move(currentNode.id) For Each neighborNode In neighborNodes If Not onList(neighborNode.toSystem, 0) Then If Not onList(neighborNode.toSystem, 1) Then Dim newNode As New nodeData() newNode.id = neighborNode.toSystem newNode.parent = currentNode.id newNode.g = currentNode.g + neighborNode.distance newNode.h = findDistance(newNode.id, endSystem) newNode.f = newNode.g + newNode.h newNode.security = neighborNode.security openList.Add(newNode) shortOpenList(OLindex) = newNode.id OLindex += 1 Else Dim proposedG As Integer = currentNode.g + neighborNode.distance If proposedG < gValue(neighborNode.toSystem) Then changeParent(neighborNode.toSystem, currentNode.id, proposedG) End If End If End If Next 'Check to see if done If currentNode.id = endSystem Then Exit While End If End While If clarification is needed on my spaghetti code, I'll try to explain.

Read the article
Low level programming - what's in it for me?

- by back2dos

For years I have considered digging into what I consider "low level" languages. For me this means C and assembly. However I had no time for this yet, nor has it EVER been neccessary. Now because I don't see any neccessity arising, I feel like I should either just schedule some point in time when I will study the subject or drop the plan forever. My Position For the past 4 years I have focused on "web technologies", which may change, and I am an application developer, which is unlikely to change. In application development, I think usability is the most important thing. You write applications to be "consumed" by users. The more usable those applications are, the more value you have produced. In order to achieve good usability, I believe the following things are viable Good design: Well-thought-out features accessible through a well-thought-out user interface. Correctness: The best design isn't worth anything, if not implemented correctly. Flexibility: An application A should constantly evolve, so that its users need not switch to a different application B, that has new features, that A could implement. Applications addressing the same problem should not differ in features but in philosophy. Performance: Performance contributes to a good user experience. An application is ideally always responsive and performs its tasks reasonably fast (based on their frequency). The value of performance optimization beyond the point where it is noticeable by the user is questionable. I think low level programming is not going to help me with that, except for performance. But writing a whole app in a low level language for the sake of performance is premature optimization to me. My Question What could low level programming teach me, what other languages wouldn't teach me? Am I missing something, or is it just a skill, that is of very little use for application development? Please understand, that I am not questioning the value of C and assembly. It's just that in my everyday life, I am quite happy that all the intricacies of that world are abstracted away and managed for me (mostly by layers written in C/C++ and assembly themselves). I just don't see any concepts, that could be new to me, only details I would have to stuff my head with. So what's in it for me? My Conclusion Thanks to everyone for their answers. I must say, nobody really surprised me, but at least now I am quite sure I will drop this area of interest until any need for it arises. To my understanding, writing assembly these days for processors as they are in use in today's CPUs is not only unneccesarily complicated, but risks to result in poorer runtime performance than a C counterpart. Optimizing by hand is nearly impossible due to OOE, while you do not get all kinds of optimizations a compiler can do automatically. Also, the code is either portable, because it uses a small subset of available commands, or it is optimized, but then it probably works on one architecture only. Writing C is not nearly as neccessary anymore, as it was in the past. If I were to write an application in C, I would just as much use tested and established libraries and frameworks, that would spare me implementing string copy routines, sorting algorithms and other kind of stuff serving as exercise at university. My own code would execute faster at the cost of type safety. I am neither keen on reeinventing the wheel in the course of normal app development, nor trying to debug by looking at core dumps :D I am currently experimenting with languages and interpreters, so if there is anything I would like to publish, I suppose I'd port a working concept to C, although C++ might just as well do the trick. Again, thanks to everyone for your answers and your insight.

Read the article
#OOW 2012: Big Data and The Social Revolution

- by Eric Bezille

As what was saying Cognizant CSO Malcolm Frank about the "Futur of Work", and how the Business should prepare in the face of the new generation not only of devices and "internet of things" but also due to their users ("The Millennials"), moving from "consumers" to "prosumers" : we are at a turning point today which is bringing us to the next IT Architecture Wave. So this is no more just about putting Big Data, Social Networks and Customer Experience (CxM) on top of old existing processes, it is about embracing the next curve, by identifying what processes need to be improve, but also and more importantly what processes are obsolete and need to be get ride of, and new processes put in place. It is about managing both the hierarchical and structured Enterprise and its social connections and influencers inside and outside of the Enterprise. And this does apply everywhere, up to the Utilities and Smart Grids, where it is no more just about delivering (faster) the same old 300 reports that have grown over time with those new technologies but to understand what need to be looked at, in real-time, down to an hand full relevant reports with the KPI relevant to the business. It is about how IT can anticipate the next wave, and is able to answers Business questions, and give those capabilities in real-time right at the hand of the decision makers... This is the turning curve, where IT is really moving from the past decade "Cost Center" to "Value for the Business", as Corporate Stakeholders will be able to touch the value directly at the tip of their fingers. It is all about making Data Driven Strategic decisions, encompassed and enriched by ALL the Data, and connected to customers/prosumers influencers. This brings to stakeholders the ability to make informed decisions on question like : “What would be the best Olympic Gold winner to represent my automotive brand ?”... in a few clicks and in real-time, based on social media analysis (twitter, Facebook, Google+...) and connections link to my Enterprise data. A true example demonstrated by Larry Ellison in real-time during his yesterday’s key notes, where “Hardware and Software Engineered to Work Together” is not only about extreme performances but also solutions that Business can touch thanks to well integrated Customer eXperience Management and Social Networking : bringing the capabilities to IT to move to the IT Architecture Next wave. An example, illustrated also todays in 2 others sessions, that I had the opportunity to attend. The first session bringing the “Internet of Things” in Oil&Gaz into actionable decisions thanks to Complex Event Processing capturing sensors data with the ready to run IT infrastructure leveraging Exalogic for the CEP side, Exadata for the enrich datasets and Exalytics to provide the informed decision interface up to end-user. The second session showing Real Time Decision engine in action for ACCOR hotels, with Eric Wyttynck, VP eCommerce, and his Technical Director Pascal Massenet. I have to close my post here, as I have to go to run our practical hands-on lab, cooked with Olivier Canonge, Christophe Pauliat and Simon Coter, illustrating in practice the Oracle Infrastructure Private Cloud recently announced last Sunday by Larry, and developed through many examples this morning by John Folwer. John also announced today Solaris 11.1 with a range of network innovation and virtualization at the OS level, as well as many optimizations for applications, like for Oracle RAC, with the introduction of the lock manager inside Solaris Kernel. Last but not least, he introduced Xsigo Datacenter Fabric for highly simplified networks and storage virtualization for your Cloud Infrastructure. Hoping you will get ready to jump on the next wave, we are here to help...

Read the article
Building ATLAS (and later Octave w/ ATLAS)

- by David Parks

I'm trying to set up ATLAS (so I can later compile octave with ATLAS support). If I'm correct, I still need to build this manually due to the environment specific optimizations. I do see a package for ATLAS, but it looks like it's using the cross platform generic build options (e.g. "it'll be slow"). So, running the configure script as described in the docs seems to go poorly. As a java developer I never do well at making heads or tails of errors in these build processes. Am I missing dependencies (if so is there any documentation on what I need)? allusers@vbubuntu:~/Downloads/atlas3.10.1/build_vbubuntu$ ../configure -b 64 -D c -DPentiumCPS=3000 --with-netlib-lapack-tarfile=/home/allusers/Downloads/lapack-3.5.0.tgz make: `xconfig' is up to date. ./xconfig -d s /home/allusers/Downloads/atlas3.10.1/build_vbubuntu/../ -d b /home/allusers/Downloads/atlas3.10.1/build_vbubuntu -b 64 -D c -DPentiumCPS=3000 -Si lapackref 1 OS configured as Linux (1) Assembly configured as GAS_x8664 (2) Vector ISA Extension configured as SSE3 (6,448) ERROR: enum fam=3, chip=2, mach=0 make[3]: *** [atlas_run] Error 44 make[2]: *** [IRunArchInfo_x86] Error 2 Architecture configured as Corei1 (25) ERROR: enum fam=3, chip=2, mach=0 make[3]: *** [atlas_run] Error 44 make[2]: *** [IRunArchInfo_x86] Error 2 Clock rate configured as 2350Mhz ERROR: enum fam=3, chip=2, mach=0 make[3]: *** [atlas_run] Error 44 make[2]: *** [IRunArchInfo_x86] Error 2 Maximum number of threads configured as 4 Parallel make command configured as '$(MAKE) -j 4' ERROR: enum fam=3, chip=2, mach=0 make[3]: *** [atlas_run] Error 44 make[2]: *** [IRunArchInfo_x86] Error 2 Cannot detect CPU throttling. rm -f config1.out make atlas_run atldir=/home/allusers/Downloads/atlas3.10.1/build_vbubuntu exe=xprobe_comp redir=config1.out \ args="-v 0 -o atlconf.txt -O 1 -A 25 -Si nof77 0 -V 448 -b 64 -d b /home/allusers/Downloads/atlas3.10.1/build_vbubuntu" make[1]: Entering directory `/home/allusers/Downloads/atlas3.10.1/build_vbubuntu' cd /home/allusers/Downloads/atlas3.10.1/build_vbubuntu ; ./xprobe_comp -v 0 -o atlconf.txt -O 1 -A 25 -Si nof77 0 -V 448 -b 64 -d b /home/allusers/Downloads/atlas3.10.1/build_vbubuntu > config1.out make[2]: gfortran: Command not found make[2]: *** [IRunF77Comp] Error 127 make[2]: g77: Command not found make[2]: *** [IRunF77Comp] Error 127 make[2]: f77: Command not found make[2]: *** [IRunF77Comp] Error 127 Unable to find usable compiler for F77; abortingMake sure compilers are in your path, and specify good compilers to configure (see INSTALL.txt or 'configure --help' for details)make[1]: *** [atlas_run] Error 8 make[1]: Leaving directory `/home/allusers/Downloads/atlas3.10.1/build_vbubuntu' make: *** [IRun_comp] Error 2 ERROR 512 IN SYSCMND: 'make IRun_comp args="-v 0 -o atlconf.txt -O 1 -A 25 -Si nof77 0 -V 448 -b 64"' mkdir src bin tune interfaces mkdir: cannot create directory ‘src’: File exists mkdir: cannot create directory ‘bin’: File exists mkdir: cannot create directory ‘tune’: File exists mkdir: cannot create directory ‘interfaces’: File exists make: *** [make_subdirs] Error 1 make -f Make.top startup make[1]: Entering directory `/home/allusers/Downloads/atlas3.10.1/build_vbubuntu' Make.top:1: Make.inc: No such file or directory Make.top:325: warning: overriding commands for target `/AtlasTest' Make.top:76: warning: ignoring old commands for target `/AtlasTest' make[1]: *** No rule to make target `Make.inc'. Stop. make[1]: Leaving directory `/home/allusers/Downloads/atlas3.10.1/build_vbubuntu' make: *** [startup] Error 2 mv: cannot move ‘lapack-3.5.0’ to ‘../reference/lapack-3.5.0’: Directory not empty mv: cannot stat ‘lib/Makefile’: No such file or directory ../configure: 450: ../configure: cannot create lib/Makefile: Directory nonexistent ../configure: 451: ../configure: cannot create lib/Makefile: Directory nonexistent ../configure: 452: ../configure: cannot create lib/Makefile: Directory nonexistent ../configure: 453: ../configure: cannot create lib/Makefile: Directory nonexistent ../configure: 509: ../configure: cannot create lib/Makefile: Directory nonexistent DONE configure

Read the article
Developer Training – 6 Online Courses to Learn SQL Server, MySQL and Technology

- by Pinal Dave

Video courses are the next big thing and I am so happy that I have so far authored 6 different video courses with Pluralsight. Here is the list of the courses. I have listed all of my video courses over here. Note: If you click on the courses and it does not open, you need to login to Pluralsight with a valid username and password or sign up for a FREE trial. Please leave a comment with your favorite course in the comment section. Random 10 winners will get surprise gift via email. Bonus: If you list your favorite module from the course site. SQL Server Performance: Introduction to Query Tuning SQL Server performance tuning is an in-depth topic, and an art to master. A key component of overall application performance tuning is query tuning. Writing queries in an efficient manner, and making sure they execute in the most optimal way possible, is always a challenge. The basics revolve around the details of how SQL Server carries out query execution, so the optimizations explored in this course follow along the same lines. Click to View Course SQL Server Performance: Indexing Basics Indexes are the most crucial objects of the database. They are the first stop for any DBA and Developer when it is about performance tuning. There is a good side as well evil side of the indexes. To master the art of performance tuning one has to understand the fundamentals of the indexes and the best practices associated with the same. This course is for every DBA and Developer who deals with performance tuning and wants to use indexes to improve the performance of the server. Click to View Course SQL Server Questions and Answers This course is designed to help you better understand how to use SQL Server effectively. The course presents many of the common misconceptions about SQL Server, and then carefully debunks those misconceptions with clear explanations and short but compelling demos, showing you how SQL Server really works. This course is for anyone working with SQL Server databases who wants to improve her knowledge and understanding of this complex platform. Click to View Course MySQL Fundamentals MySQL is a popular choice of database for use in web applications, and is a central component of the widely used LAMP open source web application software stack. This course covers the fundamentals of MySQL, including how to install MySQL as well as written basic data retrieval and data modification queries. Click to View Course Building a Successful Blog Expressing yourself is the most common behavior of humans. Blogging has made easy to express yourself. Just like a letter or book has a structure and formula, blogging also has structure and formula. In this introductory course on blogging we will go over a few of the basics of blogging and show the way to get started with blogging immediately. If you already have a blog, this course will be even more relevant as this will discuss many of the common questions and issue you face in your blogging routine. Click to View Course Introduction to ColdFusion ColdFusion is rapid web application development platform. In this course you will learn the basics of how to use ColdFusion platform and rapidly develop web sites. The course begins with learning basics of ColdFusion Markup Language and moves to common development language practices. From there we move to frequent database operations and advanced concepts of Forms, Sessions and Cookies. The last module sums up all the concepts covered in the course with sample application. Click to View Course Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Training, T SQL, Technology

Read the article
Exalytics and Oracle Business Intelligence Enterprise Edition (OBIEE) Partner Workshop

- by mseika

Workshop Description Oracle Fusion Middleware 11g is the #1 application infrastructure foundation. It enables enterprises to create and run agile and intelligent business applications and maximize IT efficiency by exploiting modern hardware and software architectures. Oracle Exalytics Business Intelligence Machine is the world’s first engineered system specifically designed to deliver high performance analysis, modeling and planning. Built using industry-standard hardware, market-leading business intelligence software and in-memory database technology, Oracle Exalytics is an optimized system that delivers unmatched speed, visualizations and scalability for Business Intelligence and Enterprise Performance Management applications. This FREE hands-on, partner workshop highlights both the hardware and software components that are engineered to work together to deliver Oracle Exalytics - an optimized version of the industry-leading Oracle TimesTen In-Memory Database with analytic extensions, a highly scalable Oracle server designed specifically for in-memory business intelligence, and Oracle’s proven Business Intelligence Foundation with enhanced visualization capabilities and performance optimizations. This workshop will provide hands-on experience with Oracle's latest engineered system. Topics covered will include TimesTen In-Memory Database and the new Summary Advisor for Exalytics, the technical details (including mobile features) of the latest release of visualization enhancements for OBI-EE, and technical updates on Essbase. After taking this course, you will be well prepared to architect, build, demo, and implement an end-to-end Exalytics solution. You will also be able to extend your current analytical and enterprise performance management application implementations with numerous Oracle technologies specifically enhanced to take advantage of the compute capacity and in-memory capabilities of Oracle Exalytics.If you are a BI or Data Warehouse Architect, developer or consultant, you don’t want to miss this 3-day workshop. Register Now! Presentations Exalytics Architectural Overview Upgrade and Lifecycle Management Times Ten for Exalytics Summary Advisor Utility Essbase and EPM System on Exalytics Dashboard and Analysis Interactions OBIEE 11.1.1.6 Features and Advanced Topics Lab OutlineThe labs showcase Oracle Exalytics core components and functionality and provide expertise of Oracle Business Intelligence 11.1.1.6 new features and updates from prior releases. The hands-on activities are based on an Oracle VirtualBox image with software and training samples pre-installed. Lab Environment Setup Creating and Working with Oracle TimesTen In-Memory Database Running Summary Advisor Utility Working with Exalytics Visualization Features – Dashboard and Analysis Interactions Audience Oracle Partners BI and EPM Application Developers and Implementers System Integrators and Solution Consultants Data Warehouse Developers Enterprise Architects Prerequisites Experience and understanding of OBIEE 11g is required Previous attendance of Oracle Business Intelligence Foundation Suite Workshop or BIEE 11gIntroduction Workshop is highly recommended Good understanding of data warehousing and data modeling for reporting and analysis purpose Strong experience with database technologies preferred Equipment RequirementsThis workshop requires attendees to provide their own laptops for this class.Attendee laptops must meet the following minimum hardware/software requirements: Hardware Minimum 8GB RAM 60 GB free space (includes staging) USB 2.0 port (at least one available) It is strongly recommended that you bring a mouse. You will be working in a development environment and using the mouse heavily. Software One of the following operating systems: 64-bit Windows host/laptop OS 64-bit host/laptop OS with a Windows VM (XP, Server, or Win 7, BIC2g, etc.) Internet Explorer 7.x/8.x or Firefox 3.5.x WINRAR or 7ziputility to unzip workshop files: Download-able from http://www.win-rar.com/download.html Download-able from http://www.7zip.com/ Oracle VirtualBox 4.0.2 or higher Downloadable from http://www.virtualbox.org/wiki/Downloads CPU virtualization mode needs to be enabled. We will provide guidance on the day of the workshop. Attendees will be given a VirtualBox image containing a pre-installed Oracle Exalytics environment. Schedule This workshop is 3 days. - Times vary by country!9:00am: Sign-in and technical setup 9:30am: Workshop starts 5:00pm: Workshop ends Oracle Exalytics and Business Intelligence (OBIEE) Workshop December 11-13, 2012: Oracle BVP, Birmingham, UK Register Here. Questions? Send email to: [email protected] Oracle Platform Technologies Enablement Services

Read the article
Clever memory usage through the years

- by Ben Emmett

A friend and I were recently talking about the really clever tricks people have used to get the most out of memory. I thought I’d share my favorites, and would love to hear yours too! Interleaving on drum memory Back in the ye olde days before I’d been born (we’re talking the 50s / 60s here), working memory commonly took the form of rotating magnetic drums. These would spin at a constant speed, and a fixed head would read from memory when the correct part of the drum passed it by, a bit like a primitive platter disk. Because each revolution took a few milliseconds, programmers took to manually arranging information non-sequentially on the drum, timing when an instruction or memory address would need to be accessed, then spacing information accordingly around the edge of the drum, thus reducing the access delay. Similar techniques were still used on hard disks and floppy disks into the 90s, but have become irrelevant with modern disk technologies. The Hashlife algorithm Conway’s Game of Life has attracted numerous implementations over the years, but Bill Gosper’s Hashlife algorithm is particularly impressive. Taking advantage of the repetitive nature of many cellular automata, it uses a quadtree structure to store the hashes of pieces of the overall grid. Over time there are fewer and fewer new structures which need to be evaluated, so it starts to run faster with larger grids, drastically outperforming other algorithms both in terms of speed and the size of grid which can be simulated. The actual amount of memory used is huge, but it’s used in a clever way, so makes the list . Elite’s procedural generation Ok, so this isn’t exactly a memory optimization – more a storage optimization – but it gets an honorable mention anyway. When writing Elite, David Braben and Ian Bell wanted to build a rich world which gamers could explore, but their 22K memory was something of a limitation (for comparison that’s about the size of my avatar picture at the top of this page). They procedurally generated all the characteristics of the 2048 planets in their virtual universe, including the names, which were stitched together using a lookup table of parts of names. In fact the original plans were for 2^52 planets, but it was decided that that was probably too many. Oh, and they did that all in assembly language. Other games of the time used similar techniques too – The Sentinel’s landscape generation algorithm being another example. Modern Garbage Collectors Garbage collection in managed languages like Java and .NET ensures that most of the time, developers stop needing to care about how they use and clean up memory as the garbage collector handles it automatically. Achieving this without killing performance is a near-miraculous feet of software engineering. Much like when learning chemistry, you find that every time you think you understand how the garbage collector works, it turns out to be a mere simplification; that there are yet more complexities and heuristics to help it run efficiently. Of course introducing memory problems is still possible (and there are tools like our memory profiler to help if that happens to you) but they’re much, much rarer. A cautionary note In the examples above, there were good and well understood reasons for the optimizations, but cunningly optimized code has usually had to trade away readability and maintainability to achieve its gains. Trying to optimize memory usage without being pretty confident that there’s actually a problem is doing it wrong. So what have I missed? Tell me about the ingenious (or stupid) tricks you’ve seen people use. Ben

Read the article
When row estimation goes wrong

- by Dave Ballantyne

Whilst working at a client site, I hit upon one of those issues that you are not sure if that this is something entirely new or a bug or a gap in your knowledge. The client had a large query that needed optimizing. The query itself looked pretty good, no udfs, UNION ALL were used rather than UNION, most of the predicates were sargable other than one or two minor ones. There were a few extra joins that could be eradicated and having fixed up the query I then started to dive into the plan. I could see all manor of spills in the hash joins and the sort operations, these are caused when SQL Server has not reserved enough memory and has to write to tempdb. A VERY expensive operation that is generally avoidable. These, however, are a symptom of a bad row estimation somewhere else, and when that bad estimation is combined with other estimation errors, chaos can ensue. Working my way back down the plan, I found the cause, and the more I thought about it the more i came convinced that the optimizer could be making a much more intelligent choice. First step is to reproduce and I was able to simplify the query down a single join between two tables, Product and ProductStatus, from a business point of view, quite fundamental, find the status of particular products to show if ‘active’ ,’inactive’ or whatever. The query itself couldn’t be any simpler The estimated plan looked like this: Ignore the “!” warning which is a missing index, but notice that Products has 27,984 rows and the join outputs 14,000. The actual plan shows how bad that estimation of 14,000 is : So every row in Products has a corresponding row in ProductStatus. This is unsurprising, in fact it is guaranteed, there is a trusted FK relationship between the two columns. There is no way that the actual output of the join can be different from the input. The optimizer is already partly aware of the foreign key meta data, and that can be seen in the simplifiction stage. If we drop the Description column from the query: the join to ProductStatus is optimized out. It serves no purpose to the query, there is no data required from the table and the optimizer knows that the FK will guarantee that a matching row will exist so it has been removed. Surely the same should be applied to the row estimations in the initial example, right ? If you think so, please upvote this connect item. So what are our options in fixing this error ? Simply changing the join to a left join will cause the optimizer to think that we could allow the rows not to exist. or a subselect would also work However, this is a client site, Im not able to change each and every query where this join takes place but there is a more global switch that will fix this error, TraceFlag 2301. This is described as, perhaps loosely, “Enable advanced decision support optimizations”. We can test this on the original query in isolation by using the “QueryTraceOn” option and lo and behold our estimated plan now has the ‘correct’ estimation. Many thanks goes to Paul White (b|t) for his help and keeping me sane through this

Read the article
Welcome To The Nashorn Blog

- by jlaskey

Welcome to all. Time to break the ice and instantiate The Nashorn Blog. I hope to contribute routinely, but we are very busy, at this point, preparing for the next development milestone and, of course, getting ready for open source. So, if there are long gaps between postings please forgive. We're just coming back from JavaOne and are stoked by the positive response to all the Nashorn sessions. It was great for the team to have the front and centre slide from Georges Saab early in the keynote. It seems we have support coming from all directions. Most of the session videos are posted. Check out the links. Nashorn: Optimizing JavaScript and Dynamic Language Execution on the JVM. Unfortunately, Marcus - the code generation juggernaut, got saddled with the first session of the first day. Still, he had a decent turnout. The talk focused on issues relating to optimizations we did to get good performance from the JVM. Much yet to be done but looking good. Nashorn: JavaScript on the JVM. This was the main talk about Nashorn. I delivered the little bit of this and a little bit of that session with an overview, a follow up on the open source announcement, a run through a few of the Nashorn features and some demos. The room was SRO, about 250±. High points: Sam Pullara, from Twitter, came forward to describe how painless it was to get Mustache.js up and running (20x over Rhino), and, John Ceccarelli, from NetBeans came forward to describe how Nashorn has become an integral part of Netbeans. A healthy Q & A at the end was very encouraging. Meet the Nashorn JavaScript Team. Michel, Attila, Marcus and myself hosted a Q & A. There was only a handful of people in the room (we assume it was because of a conflicting session ;-) .) Most of the questions centred around Node.jar, which leads me to believe, Nashorn + Node.jar is what has the most interest. Akhil, Mr. Node.jar, sitting in the audience, fielded the Node.jar questions. Nashorn, Node, and Java Persistence. Doug Clarke, Akhil and myself, discussed the title topics, followed by a lengthy Q & A (security had to hustle us out.) 80 or so in the room. Lots of questions about Node.jar. It was great to see Doug's use of Nashorn + JPA. Nashorn in action, with such elegance and grace. Putting the Metaobject Protocol to Work: Nashorn’s Java Bindings. Attila discussed how he applied Dynalink to Nashorn. Good turn out for this session as well. I have a feeling that once people discover and embrace this hidden gem, great things will happen for all languages running on the JVM. Finally, there were quite a few JavaOne sessions that focused on non-Java languages and their impact on the JVM. I've always believed that one's tool belt should carry a variety of programming languages, not just for domain/task applicability, but also to enhance your thinking and approaches to problem solving. For the most part, future blog entries will focus on 'how to' in Nashorn, but if you have any suggestions for topics you want discussed, please drop a line. Cheers.

Read the article
F# Objects – Integration with the other .Net Languages – Part 2

- by MarkPearl

So in part one of my posting I covered the real basics of object creation. Today I will hopefully dig a little deeper… My expert F# book brings up an interesting point – properties in F# are just syntactic sugar for method calls. This makes sense… for instance assume I had the following object with the property exposed called Firstname. type Person(Firstname : string, Lastname : string) = member v.Firstname = Firstname I could extend the Firstname property with the following code and everything would be hunky dory… type Person(Firstname : string, Lastname : string) = member v.Firstname = Console.WriteLine("Side Effect") Firstname All that this would do is each time I use the property Firstname, I would see the side effect printed to the screen saying “Side Effect”. Member methods have a very similar look & feel to properties, in fact the only difference really is that you declare that parameters are being passed in. type Person(Firstname : string, Lastname : string) = member v.FullName(middleName) = Firstname + " " + middleName + " " + Lastname In the code above, FullName requires the parameter middleName, and if viewed from another project in C# would show as a method and not a property. Precomputation Optimizations Okay, so something that is obvious once you think of it but that poses an interesting side effect of mutable value holders is pre-computation of results. All it is, is a slight difference in code but can result in quite a huge saving in performance. Basically pre-computation means you would not need to compute a value every time a method is called – but could perform the computation at the creation of the object (I hope I have got it right). In a way I battle to differentiate this from lazy evaluation but I will show an example to explain the principle. Let me try and show an example to illustrate the principle… assume the following F# module namespace myNamespace open System module myMod = let Add val1 val2 = Console.WriteLine("Compute") val1 + val2 type MathPrecompute(val1 : int, val2 : int) = let precomputedsum = Add val1 val2 member v.Sum = precomputedsum type MathNormalCompute(val1 : int, val2 : int) = member v.Sum = Add val1 val2 Now assume you have a C# console app that makes use of the objects with code similar to the following… using System; using myNamespace; namespace CSharpTest { class Program { static void Main(string[] args) { Console.WriteLine("Constructing Objects"); var myObj1 = new myMod.MathNormalCompute(10, 11); var myObj2 = new myMod.MathPrecompute(10, 11); Console.WriteLine(""); Console.WriteLine("Normal Compute Sum..."); Console.WriteLine(myObj1.Sum); Console.WriteLine(myObj1.Sum); Console.WriteLine(myObj1.Sum); Console.WriteLine(""); Console.WriteLine("Pre Compute Sum..."); Console.WriteLine(myObj2.Sum); Console.WriteLine(myObj2.Sum); Console.WriteLine(myObj2.Sum); Console.ReadKey(); } } } The output when running the console application would be as follows…. You will notice with the normal compute object that the system would call the Add function every time the method was called. With the Precompute object it only called the compute method when the object was created. Subtle, but something that could lead to major performance benefits. So… this post has gone off in a slight tangent but still related to F# objects.

Read the article
PulseAudio on Cygwin: Failed to create secure directory: Unknown error 13

- by Nithin

I am unable to run PulseAudio on Cygwin. Operating System: Windows 8 Pro 64 bit Cygwin Setup.exe Version: 2.831 (64 bit) PulseAudio Version: 2.1-1 When I run: pulseaudio -vv this is the output: D: [(null)] core-util.c: setpriority() worked. I: [(null)] core-util.c: Successfully gained nice level -11. I: [(null)] main.c: This is PulseAudio 2.1 D: [(null)] main.c: Compilation host: x86_64-unknown-cygwin D: [(null)] main.c: Compilation CFLAGS: -ggdb -O2 -pipe -fdebug-prefix-map=/usr/src/ports/pulseaudio/pulseaudio-2.1-1/build=/usr/src/debug/pulseaudio-2.1-1 -fdebug-prefix-map=/usr/src/ports/pulseaudio/pulseaudio-2.1-1/src/pulseaudio-2.1=/usr/src/debug/pulseaudio-2.1-1 -Wall -W -Wextra -Wno-long-long -Wvla -Wno-overlength-strings -Wunsafe-loop-optimizations -Wundef -Wformat=2 -Wlogical-op -Wsign-compare -Wformat-security -Wmissing-include-dirs -Wformat-nonliteral -Wpointer-arith -Winit-self -Wdeclaration-after-statement -Wfloat-equal -Wmissing-prototypes -Wredundant-decls -Wmissing-declarations -Wmissing-noreturn -Wshadow -Wendif-labels -Wcast-align -Wstrict-aliasing -Wwrite-strings -Wno-unused-parameter -ffast-math -Wp,-D_FORTIFY_SOURCE=2 -fno-common -fdiagnostics-show-option D: [(null)] main.c: Running on host: CYGWIN_NT-6.2 x86_64 1.7.25(0.270/5/3) 2013-08-31 20:37 D: [(null)] main.c: Found 4 CPUs. I: [(null)] main.c: Page size is 65536 bytes D: [(null)] main.c: Compiled with Valgrind support: no D: [(null)] main.c: Running in valgrind mode: no D: [(null)] main.c: Running in VM: no D: [(null)] main.c: Optimized build: yes D: [(null)] main.c: FASTPATH defined, only fast path asserts disabled. I: [(null)] main.c: Machine ID is 5d8bd07cb924c67197184e42527f2603. E: [(null)] core-util.c: Failed to create secure directory: Unknown error 13 When I instead run pulseaudio -vv --start the output is this: E: [autospawn] core-util.c: Failed to create secure directory: Unknown error 13 W: [autospawn] lock-autospawn.c: Cannot access autospawn lock. E: [(null)] main.c: Failed to acquire autospawn lock When I ran strace pulseaudio -vv, the red-colored lines in the output were: 28 1637050 [main] pulseaudio 5104 fhandler_pty_slave::write: (669): pty output_mutex(0xBC) released 26 1637076 [main] pulseaudio 5104 write: 7 = write(2, 0x3FE171079, 7) 42 1637118 [main] pulseaudio 5104 fhandler_pty_slave::write: pty0, write(0x60003BB40, 51) 27 1637145 [main] pulseaudio 5104 fhandler_pty_slave::write: (654): pty output_mutex (0xBC): waiting -1 ms 23 1637168 [main] pulseaudio 5104 fhandler_pty_slave::write: (654): pty output_mutex: acquired Failed to create secure directory: Unknown error 13 21 1637189 [main] pulseaudio 5104 fhandler_pty_slave::write: (669): pty output_mutex(0xBC) released 29 1637218 [main] pulseaudio 5104 write: 51 = write(2, 0x60003BB40, 51) 46 1637264 [main] pulseaudio 5104 fhandler_pty_slave::write: pty0, write(0x3FE17106F, 4) 24 1637288 [main] pulseaudio 5104 fhandler_pty_slave::write: (654): pty output_mutex (0xBC): waiting -1 ms 24 1637312 [main] pulseaudio 5104 fhandler_pty_slave::write: (654): pty output_mutex: acquired Please can someone help me?

Read the article
PHP at the root directory using Ngnix on Linode and Ubuntu 12.04

- by Steve Kinney

I originally set up my Linode to use it with the Sinatra applications using Phusion Passenger that I was developing and I have it working great for that. However, as time goes on, I find myself needing just a wee bit of PHP to do a server-side thing here or there. My basic set up was based off of this Linode recipe (I copied and pasted the parts that I needed—I did not install Redis and Node). If I go to http://scholarsnyc.com/index.php everything works great. If I just go the base URL however, I get a 403 Forbidden error (I have a vanilla HTML page there for now). I've played with file permissions and the same file will work if I call it directly. I've done my homework and nothing I try seems to work. I'm sure there is an obvious error. I'm also sure that there are some rookie mistakes in my Nginx configuration (some of those mistakes are the artifacts of trying different fixes from my research. user www-data www-data; worker_processes 1; events { worker_connections 1024; } upstream php { server 127.0.0.1:9001; } http { passenger_root /usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.12; passenger_ruby /usr/local/bin/ruby; include mime.types; default_type application/octet-stream; index index.php index.html index.htm; sendfile on; keepalive_timeout 65; server { server_name localhost scholarsnyc.com www.scholarsnyc.com; root /srv/www/scholarsnyc.com/public; location / { index index.php; } location ~ \.php$ { fastcgi_pass 127.0.0.1:9000; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; include fastcgi_params; } } server { server_name data.scholarsnyc.com; root /srv/www/data.scholarsnyc.com/public; passenger_enabled on; } server { server_name tech.scholarsnyc.com; root /srv/www/tech.scholarsnyc.com/public; location / { root /srv/www/tech.scholarsnyc.com/public; index index.php index.html index.htm; } } } Any other optimizations are also appreciated. I literally don't know what to do at this point.

Read the article
How to get the best LINPACK result and conquer the Top500?

- by knweiss

Given a large Linux HPC cluster with hundreds/thousands of nodes. What are your best practices to get the best possible LINPACK benchmark (HPL) result to submit for the Top500 supercomputer list? To give you an idea what kind of answers I would appreciate here are some sub-questions (with links): How to you tune the parameters (N, NB, P, Q, memory-alignment, etc) for the HPL.dat file (without spending too much time trying each possible permutation - esp with large problem sizes N)? Are there any Top500 submission rules to be aware of? What is allowed, what isn't? Which MPI product, which version? Does it make a difference? Any special host order in your MPI machine file? Do you use CPU pinning? How to you configure your interconnect? Which interconnect? Which BLAS package do you use for which CPU model? (Intel MKL, AMD ACML, GotoBLAS2, etc.) How do you prepare for the big run (on all nodes)? Start with small runs on a subset of nodes and then scale up? Is it really necessary to run LINPACK with a big run on all of the nodes (or is extrapolation allowed)? How do you optimize for the latest Intel/AMD CPUs? Hyperthreading? NUMA? Is it worth it to recompile the software stack or do you use precompiled binaries? Which settings? Which compiler optimizations, which compiler? (What about profile-based compilation?) How to get the best result given only a limited amount of time to do the benchmark run? (You can block a huge cluster forever) How do you prepare the individual nodes (stopping system daemons, freeing memory, etc)? How do you deal with hardware faults (ruining a huge run)? Are there any must-read documents or websites about this topic? E.g. I would love to hear about some background stories of some of the current Top500 systems and how they did their LINPACK benchmark. I deliberately don't want to mention concrete hardware details or discuss hardware recommendations because I don't want to limit the answers. However, feel free to mention hints e.g. for specific CPU models.

Read the article
How to tune system settings for mongoDB on Linux?

- by jsh

Trying to squeeze a lot out of one question here -- please bear with me. Although the MongoDB man pages make several useful recommendations about system settings like ulimit (http://docs.mongodb.org/manual/reference/ulimit/), and other production factors (http://docs.mongodb.org/manual/administration/production-notes/) they seem mysteriously silent on things like virtual memory and swap settings. The closest we get to a hint is that "...the operating system’s virtual memory subsystem manages MongoDB’s memory..." (http://docs.mongodb.org/manual/faq/fundamentals/#does-mongodb-require-a-lot-of-ram). Running the same job - high writes and high reads on about 10,000,000 records in a single collection -- on my 4-processor, 4GB RAM macbook and an 8-core ubuntu box with 64GB RAM I saw dramatically WORSE read performance on the linux box with factory settings, and could hear the disk constantly spinning, indicating high I/O and presumably swapping. Yes, other things were happening on the box, but there was plenty of free RAM, disk space, etc.; furthermore, I did not see evidence that Mongo was expanding to take advantage of all that free RAM as it is touted to do. Linux box default settings were as follows: vm.swappiness =60 vm.dirty_background_ratio = 10 vm.dirty_ratio = 20 vm.dirty_expire_centisecs =3000 vm.dirty_writeback_centisecs=500 I hazarded some guesses looking at docs and blogs for other types of databases (Oracle, MYSQL, etc.), experimented, and adjusted as below. vm.swappiness=10 vm.dirty_background_ratio=5 vm.dirty_ratio=5 vm.dirty_writeback_centisecs=250 vm.dirty_expire_centisecs=500 I saw some immediate apparent improvements in read time. However, when I ran my test jobs again, read performance continued to be painfully sluggish during heavy writes. Then, I REBUILT the collection from an available data source - and suddenly I can read at 1ms or less per record WHILE doing the write job! So the question is really two-fold: 1) What are appropriate VM settings for MongoDB on Linux? 2) (bonus) Does Mongo do some checking or optimization with the OS while data is being built? In other words, if I have built a large data set with suboptimal VM or I/O settings, does Mongo make assumptions during the memory-mapping process that will fail to take advantage of optimizations down the road? Obviously I don't fully grok memory mapping under the hood (I was hoping I wouldn't have to). Any help appreciated...thanks! -j

Read the article
Apache's htcacheclean doesn't scale: How to tame a huge Apache disk_cache?

- by flight

We have an Apache setup with a huge disk_cache (500.000 entries, 50 GB disk space used). The cache grows by 16 GB every day. My problem is that the cache seems to be growing nearly as fast as it's possible to remove files and directories from the cache filesystem! The cache partition is an ext3 filesystem (100GB, "-t news") on an iSCSI storage. The Apache server (which acts as a caching proxy) is a VM. The disk_cache is configured with CacheDirLevels=2 and CacheDirLength=1, and includes variants. A typical file path is "/htcache/B/x/i_iGfmmHhxJRheg8NHcQ.header.vary/A/W/oGX3MAV3q0bWl30YmA_A.header". When I try to call htcacheclean to tame the cache (non-daemon mode, "htcacheclean-t -p/htcache -l15G"), IOwait is going through the roof for several hours. Without any visible action. Only after hours, htcacheclean starts to delete files from the cache partition, which takes a couple more hours. (A similar problem was brought up in the Apache mailing list in 2009, without a solution: http://www.mail-archive.com/[email protected]/msg42683.html) The high IOwait leads to problems with the stability of the web server (the bridge to the Tomcat backend server sometimes stalls). I came up with my own prune script, which removes files and directories from random subdirectories of the cache. Only to find that the deletion rate of the script is just slightly higher than the cache growth rate. The script takes ~10 seconds to read the a subdirectory (e.g. /htcache/B/x) and frees some 5 MB of disk space. In this 10 seconds, the cache has grown by another 2 MB. As with htcacheclean, IOwait goes up to 25% when running the prune script continuously. Any idea? Is this a problem specific to the (rather slow) iSCSI storage? Should I choose a different file system for a huge disk_cache? ext2? ext4? Are there any kernel parameter optimizations for this kind of scenario? (I already tried the deadline scheduler and a smaller read_ahead_kb, without effect).

Read the article
Dec 5th Links: ASP.NET, ASP.NET MVC, jQuery, Silverlight, Visual Studio

- by ScottGu

Here is the latest in my link-listing series. Also check out my VS 2010 and .NET 4 series for another on-going blog series I’m working on. [In addition to blogging, I am also now using Twitter for quick updates and to share links. Follow me at: twitter.com/scottgu] ASP.NET ASP.NET Code Samples Collection: J.D. Meier has a great post that provides a detailed round-up of ASP.NET code samples and tutorials from a wide variety of sources. Lots of useful pointers. Slash your ASP.NET compile/load time without any hard work: Nice article that details a bunch of optimizations you can make to speed up ASP.NET project load and compile times. You might also want to read my previous blog post on this topic here. 10 Essential Tools for Building ASP.NET Websites: Great article by Stephen Walther on 10 great (and free) tools that enable you to more easily build great ASP.NET Websites. Highly recommended reading. Optimize Images using the ASP.NET Sprite and Image Optimization Framework: A nice article by 4GuysFromRolla that discusses how to use the open-source ASP.NET Sprite and Image Optimization Framework (one of the tools recommended by Stephen in the previous article). You can use this to significantly improve the load-time of your pages on the client. Formatting Dates, Times and Numbers in ASP.NET: Scott Mitchell has a great article that discusses formatting dates, times and numbers in ASP.NET. A very useful link to bookmark. Also check out James Michael’s DateTime is Packed with Goodies blog post for other DateTime tips. Examining ASP.NET’s Membership, Roles and Profile APIs (Part 18): Everything you could possibly want to known about ASP.NET’s built-in Membership, Roles and Profile APIs must surely be in this tutorial series. Part 18 covers how to store additional user info with Membership. ASP.NET with jQuery An Introduction to jQuery Templates: Stephen Walther has written an outstanding introduction and tutorial on the new jQuery Template plugin that the ASP.NET team has contributed to the jQuery project. Composition with jQuery Templates and jQuery Templates, Composite Rendering, and Remote Loading: Dave Ward has written two nice posts that talk about composition scenarios with jQuery Templates and some cool scenarios you can enable with them. Using jQuery and ASP.NET to Build a News Ticker: Scott Mitchell has a nice tutorial that demonstrates how to build a dynamically updated “news ticker” style UI with ASP.NET and jQuery. Checking All Checkboxes in a GridView using jQuery: Scott Mitchell has a nice post that covers how to use jQuery to enable a checkbox within a GridView’s header to automatically check/uncheck all checkboxes contained within rows of it. Using jQuery to POST Form Data to an ASP.NET AJAX Web Service: Rick Strahl has a nice post that discusses how to capture form variables and post them to an ASP.NET AJAX Web Service (.asmx). ASP.NET MVC ASP.NET MVC Diagnostics Using NuGet: Phil Haack has a nice post that demonstrates how to easily install a diagnostics page (using NuGet) that can help identify and diagnose common configuration issues within your apps. ASP.NET MVC 3 JsonValueProviderFactory: James Hughes has a nice post that discusses how to take advantage of the new JsonValueProviderFactory support built into ASP.NET MVC 3. This makes it easy to post JSON payloads to MVC action methods. Practical jQuery Mobile with ASP.NET MVC: James Hughes has another nice post that discusses how to use the new jQuery Mobile library with ASP.NET MVC to build great mobile web applications. Credit Card Validator for ASP.NET MVC 3: Benjii Me has a nice post that demonstrates how to build a [CreditCard] validator attribute that can be used to easily validate credit card numbers are in the correct format with ASP.NET MVC. Silverlight Silverlight FireStarter Keynote and Sessions: A great blog post from John Papa that contains pointers and descriptions of all the great Silverlight content we published last week at the Silverlight FireStarter. You can watch all of the talks online. More details on my keynote and Silverlight 5 announcements can be found here. 31 Days of Windows Phone 7: 31 great tutorials on how to build Windows Phone 7 applications (using Silverlight). Silverlight for Windows Phone Toolkit Update: David Anson has a nice post that discusses some of the additional controls provided with the Silverlight for Windows Phone Toolkit. Visual Studio JavaScript Editor Extensions: A nice (and free) Visual Studio plugin built by the web tools team that significantly improves the JavaScript intellisense support within Visual Studio. HTML5 Intellisense for Visual Studio: Gil has a blog post that discusses a new extension my team has posted to the Visual Studio Extension Gallery that adds HTML5 schema support to Visual Studio 2008 and 2010. Team Build + Web Deployment + Web Deploy + VS 2010 = Goodness: Visual blogs about how to enable a continuous deployment system with VS 2010, TFS 2010 and the Microsoft Web Deploy framework. Visual Studio 2010 Emacs Emulation Extension and VIM Emulation Extension: Check out these two extensions if you are fond of Emacs and VIM key bindings and want to enable them within Visual Studio 2010. Hope this helps, Scott

Read the article
Silverlight 5 Hosting :: Features in Silverlight 5 and Release Date

- by mbridge

Silverlight 5 is finally announced in the Silverlight FireStarter Event on the 2nd December, 2010. This new version of Silverlight which was earlier labeled as 'Future of Microsoft Silverlight' has now come much closer to go live as the first Silverlight 5 Beta version is expected to be shipped during the early months of 2011. However for the full fledged and the final release of Silverlight 5, we have to wait many more months as the same is likely to be made available within the Q3 2011. As would have been usually expected, this latest edition would feature many new capabilities thereby extending the developer productivity to a whole new dimension of premium media experience and feature-rich business applications. It comes along with many new feature updates as well as the inclusion of new technologies to improve the standard of the Silverlight applications which are now fine-tuned to produce next generation business and media solutions that is capable to meet the requirements of the advanced web-based app development. The Silverlight 5 is all set to replace the previous fourth version which now includes more than forty new features while also dropping various deprecated elements that was prevalent earlier. It has brought around some major performance enhancements and also included better support for various other tools and technologies. Following are some of the changes that are registered to be available under the Silverlight 5 Beta edition which is scheduled to be launched during the Q1 2011. Silverlight 5 : Premium Media Experiences The media features of Silverlight 5 has seen some major enhancements with a lot of optimizations being made to deliver richer solutions. It's capability has now been extended to make things easier, faster and capable of performing the desired tasks in the most efficient manner. The Silverlight media solutions has already been a part of many companies in the recent days where various on-demand Silverlight services were featured but with the arrival of the next generation premium media solution of Silverlight 5, it is expected to register new heights of success and global user acclamation for using it with many esteemed web-based projects and media solutions. - The most happening element in the new Silverlight 5 will be its support for utilizing the GPU based hardware acceleration which is intended to lower down the CPU load to a significant extent and thereby allowing faster rendering of media contents without consuming much resources. This feature is believed to be particularly helpful for low configured machines to run full HD media content without any lagging caused due to processor load. It will hence be one great feature to revolutionize the new generation high quality media contents to be available within the web in a more efficient manner with its hardware decoded video playback capabilities. - With the inclusion of hardware video decoding to minimize the processor load, the Silverlight 5 also comes with another optimization enhancement to also reduce the power consumption level by making new methods to deal with the power-saver settings. With this optimization in effect, the computer would be automatically allowed to switch to sleep mode while no video playback is in progress and also to prevent any screensavers to popup and cause annoyances during any video playback. There would also be other power saver options which will be made available to best suit the users requirements and purpose. - The Silverlight trickplay feature is another great way to tweak any silverlight powered media content as is used for many video tutorial sites or for dealing with any sort of presentations. This feature enables the user to modify the playback speed to either slowdown or speedup during the playback durations based on the requirements without compromising on the quality of output. Normally such manipulations always makes the content's audio to go off-pitch, but the same will not be the case with TrickPlay and the audio would seamlessly progress with the video without skipping any of its part. - In addition to all of the above, the new Silverlight 5 will be featuring wireless control of all the media contents by making use of remote controllers. With the use of such remote devices, it will be easier to handle the various media playback controls thereby providing more freedom while experiencing the premium media services. Silverlight 5 : Business Application Development The application development standard has been extended with more possibilities by bringing forth new and useful technologies and also reviving the existing methods to work better than what it was used to. From the UI improvements to advanced technical aspects, the Silverlight 5 scores high on all grounds to produce great next generation business delivered applications by putting in more creativity and resourceful touch to all the apps being produced with it. - The WPF feature of Silverlight is made more effective by introducing new standards of Databinding which is intended to improve the productivity standards of the Silverlight application developer. It brings in a lot of convenience in debugging the databinding components or expressions and hence making things work in a flawless manner. Some additional features related to databinding includes that of Ancestor RelativeSource, Implicit DataTemplates and Model View ViewModel (MVVM) support with DataContextChanged event and many other new features relating it. - It now comes with a refined text and printing service which facilitates better clarity of the text rendering and also many positive changes which are being applied to the layout pattern. New supports has been added to include OpenType font, multi-column text, linked-text containers and character leading support to name a few among the available features.This also includes some important printing aspects like that of Postscript Vector Printing API which allows to program our printing tasks in a user defined way and Pivot functionality for visualization concerns of informations. - The Graphics support is the key improvements being incorporated which now enables to utilize three dimensional graphics pattern using GPU acceleration. It can manage to provide some really cool visualizations being curved to provide media contents within the business apps with also the support for full HD contents at 1080p quality. - Silverlight 5 includes the support for 64-bit operating systems and relevant browsers and is also optimized to provide better performance. It can support the background thread for the networking which can reduce the latency of the network to a considerable extent. The Out-of-Browser functionality adds the support for utilizing various libraries and also the Win32 API. It also comes with testing support with VS 2010 which is mostly an automated procedure and has also enabled increased security aspects of all the Silverlight 5 developed applications by using the improved version of group policy support.

Read the article
SQLAuthority News – Training and Consultancy and Travel – Story of 30 Last 30 Days

- by pinaldave

Today’s blog post is not technical as usual. Here, I present a real story, and I also invite you all to share your thoughts or opinions on this post. I am a professional SQL Server Trainer; I also do consultation in the area of the Performance Tuning and Query Optimizations. In any month, I like the mix of both in my schedule. I prefer to do training for one week, and then commit the next week for some consultation work. Due to the advancement in technology, for most of the consultation works, there is no client location visit or first time visit for understanding the project. Usually, I conduct high-end training sessions or 400 level training, and these training sessions are very intensive most of the time. Always after completing the training for 5 days straight away at 400 level, I make sure to take out some time to cool down and relax. During this time, I prefer to work on optimization projects. Consultancy is great as it keeps me updated regarding what is going on in the real world. As we all know, all those trainers who have real world experience are always considered to be the best trainers. My learning is immense during my consultations with the real client and while resolving real problems. I share the same with my students the very next week when I go for training sessions. For the same reason, every class is different from the previous ones. An experience trainer would tell you that the class is best if it is driven by Students the way instructor wants! The best scenario is as described above; but you won’t get the best scenario all the time. I was on road for nearly 25 days out of the last 30 days and involved in doing various SQL Server-related trainings. Here is what I have done in the last 30 days. I have gathered the following details from my expenditure reports, which are maintained by my wife. There are few points related to my personal expenses and few other related to business. I maintain a separate list for each of these expenses, but here I have aggregated them. Last 30 days - Training 23 days - 4 – two days training classes – 8 days of training 3 – five days training classes – 15 days of training 1 – one day training classes – 2 days of training Flights 18 flights - 8 – Kingfisher 6 – Spicejet 2 – Jet light 2 – Jet connect Stay in different cities Hyderabad – 16 days Chennai – 6 days Bangalore – 2 days Ahmedabad – 6 days (Hometown) Meals – 54 (Averaging less than 2 per day) Room Services – 16 times Training Campuses – 20 times Restaurants – 6 times Home – 12 times Taxi/Cabs – 64 times (Averaging more than 2 per day) Hotel Cab – 34 times Meru Cab – 8 times Easy Cabs – 10 times Auto Rickshaw – 2 times Looking at the above statistics, I can see that I have eaten less than what I should have, which is not good, and traveled in taxi more than what I should have. Also the temperatures in different cities were very different, not to mention the humidity as well. I missed my family, especially my little girl (9 months). When I was at home, I used to have a proper healthy meal every single day; however, when I was traveling, the food was something I had to compromise on. I have previously written about my travel experience with different airlines, my opinion is still same about them. Well, I have question to all of you road warriors, how do you manage your health and enthusiasm during situations I am going through. I have couple of time stomach upset as well sour throat. I drink lots of water and do my best to keep up. Any idea? Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: About Me, Pinal Dave, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
SQL Authority News – Play by Play with Pinal Dave – A Birthday Gift

- by Pinal Dave

Today is my birthday. Personal Note When I was young, I was always looking forward to my birthday as on this day, I used to get gifts from everybody. Now when I am getting old on each of my birthday, I have almost same feeling but the direction is different. Now on each of my birthday, I feel like giving gifts to everybody. I have received lots of support, love and respect from everybody; and now I must return it back.Well, on this birthday, I have very unique gifts for everybody – my latest course on SQL Server. How I Tune Performance I often get questions where I am asked how do I work on a normal day. I am often asked that how do I work when I have performance tuning project is assigned to me. Lots of people have expressed their desire that they want me to explain and demonstrate my own method of solving performance problem when I am facing real world problem. It is a pretty difficult task as in the real world, nothing goes as planned and usually planned demonstrations have no place there. The real world, demands real solutions and in a timely fashion. If a consultant goes to industry and does not demonstrate his/her capabilities in very first few minutes, it does not matter how much fame he/she is, the door is shown to them eventually. It is true and in my early career, I have faced it quite commonly. I have learned the trick to be honest from the start and request absolutely transparent communication from the organization where I am to consult. Play by Play Play by Play is a very unique setup. It is not planned and it is a step by step course. It is like a reality show – a very real encounter to the problem and real problem solving approach. I had a great time doing this course. Geoffrey Grosenbach (VP of Pluralsight) sits down with me to see what a SQL Server Admin does in the real world. This Play-by-Play focuses on SQL Server performance tuning and I go over optimizing queries and fine-tuning the server. The table of content of this course is very simple. Introduction In the introduction I explained my basic strategies when I am approached by a customer for performance tuning. Basic Information Gathering In this module I explain how I do gather various information for performance tuning project. It is very crucial to demonstrate to customers for consultant his capability of solving problem. I attempt to resolve a small problem which gives a big positive impact on performance, consultant have to gather proper information from the start. I demonstrate in this module, how one can collect all the important performance tuning metrics. Removing Performance Bottleneck In this module, I build upon the previous module’s statistics collected. I analysis various performance tuning measures and immediately start implementing various tweaks on the performance, which will start improving the performance of my server. This is a very effective method and it gives immediate return of efforts. Index Optimization Indexes are considered as a silver bullet for performance tuning. However, it is not true always there are plenty of examples where indexes even performs worst after implemented. The key is to understand a few of the basic properties of the index and implement the right things at the right time. In this module, I describe in detail how to do index optimizations and what are right and wrong with Index. If you are a DBA or developer, and if your application is running slow – this is must attend module for you. I have some really interesting stories to tell as well. Optimize Query with Rewrite Every problem has more than one solution, in this module we will see another very famous, but hard to master skills for performance tuning – Query Rewrite. There are few do’s and don’ts for any query rewrites. I take a very simple example and demonstrate how query rewrite can improve the performance of the query at many folds. I also share some real world funny stories in this module. This course is hosted at Pluralsight. You will need a valid login for Pluralsight to watch Play by Play: Pinal Dave course. You can also sign up for FREE Trial of Pluralsight to watch this course. As today is my birthday – I will give 10 people (randomly) who will express their desire to learn this course, a free code. Please leave your comment and I will send you free code to watch this course for free. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Training, SQLAuthority News, T SQL, Video

Read the article
XNA Notes 005

- by George Clingerman

Another week and another crazy amount of activity going on in the XNA community. I’m fairly certain I missed over half of it. In fact, if I am missing things, make sure to email me and I’ll try and make sure I catch it next week! ([email protected]). Also, if you’ve got any advice, things you like/don’t like about the way these XNA Notes are going let me know. I always appreciate feedback (currently spammers are leaving me the nicest comments so you guys have work to do!) Without further ado, here’s this week’s notes! Time Critical XNA News The XNA Team Blob reminds us that February 7th is the last day to submit XNA 3.1 games to peer review! http://blogs.msdn.com/b/xna/archive/2011/01/31/7-days-left-to-submit-xna-gs-3-1-games-on-app-hub.aspx XNA MVPS Chris Williams kicks off the marketing campaign for our book http://geekswithblogs.net/cwilliams/archive/2011/01/28/143680.aspx Catalin Zima posts the comparison cheat sheet for why Angry Birds is different than Chickens Can’t Fly http://www.amusedsloth.com/2011/02/comparison-cheat-sheet-for-chickens-cant-fly-and-angry-birds/ Jim Perry congratulates the developers selected by Game Developer Magazine for Best Xbox LIVE Indie Games of 2010 http://machxgames.com/blog/?p=24 @NemoKrad posts his XNAKUUG talks for all to enjoy http://twitter.com/NemoKrad/statuses/33142362502864896 http://xna-uk.net/blogs/randomchaos/archive/2011/02/03/xblig-uk-2011-january-amp-february-talk.aspx George (that’s me!) preps for his XNA talk coming up on the 8th http://twitter.com/clingermangw/statuses/32669550554124288 http://www.portlandsilverlight.net/Meetings/Details/15 XNA Developers FireFly posts the last tutorial in his XNA Tower Defense tutorial series http://forums.create.msdn.com/forums/p/26442/451460.aspx#451460 http://xnatd.blogspot.com/2011/01/tutorial-14-polishing-game.html @fredericmy posts the main difference when porting a game from Windows Phone 7 to Xbox 360 http://fairyengine.blogspot.com/2011/01/main-differences-when-porting-game-from.html @ElementCy creates a pretty rad video of a MineCraft type terrain created using XNA http://www.youtube.com/watch?v=Waw1f7wnl9I Andrew Russel gets the first XNA badge on gamedev.stackexchange http://twitter.com/_AndrewRussell/statuses/32322877004972032 http://gamedev.stackexchange.com/badges?tab=tags And his funding for ExEn has passed $7000 only $3000 to go http://twitter.com/_AndrewRussell/statuses/33042412804771840 Subodh Pushpak blogs about his Windows Phone 7 XNA talk http://geekswithblogs.net/subodhnpushpak/archive/2011/02/01/windows-phone-7-silverlight--xna-development-talk.aspx Slyprid releases the latest version of Transmute and needs more people to test http://twitter.com/slyprid/statuses/32452488418299904 http://forgottenstarstudios.com/ SpynDoctorGames celebrates the 1 year anniversary of Your Doodles Are Bugged! Congrats! http://twitter.com/SpynDoctorGames/statuses/32511689068908544 Noogy (creator of Dust the Elysian Tail) prepares his conversion to XNA 4.0 http://twitter.com/NoogyTweet/statuses/32522008449253376 @philippedasilva posts about the Indiefreaks Game Framework v0.2.0.0 Input management system http://twitter.com/philippedasilva/statuses/32763393957957632 http://indiefreaks.com/2011/02/02/behind-smart-input-system-feature/ Mommy’s Best Games debates what to do about High Scores with their new update http://mommysbest.blogspot.com/2011/02/high-score-shake-up.html @BinaryTweedDeej want to know if there’s anything the community needs to make XNA games for the PC. Give him some feedback! http://twitter.com/BinaryTweedDeej/status/32895453863354368 @mikebmcl promises to write us all a book (I can’t wait to read it!) http://twitter.com/mikebmcl/statuses/33206499102687233 @werezompire is going to live, LIVE, thanks to all the generosity and support from the community! http://twitter.com/werezompire/statuses/32840147644977153 Xbox LIVE Indie Games (XBLIG) Making money in Xbox 360 indie game development. Is it possible? http://www.bitmob.com/articles/making-money-in-xbox-360-indie-game-development-is-it-possible @AlejandroDaJ posts some thoughts abut the bitmob article http://twitter.com/AlejandroDaJ/statuses/31068552165330944 http://www.apathyworks.com/blog/view.php?id=00215 Kobun gets my respect as an XBLIG champion. I’m not sure who Kobun is, but if you’ve every read through the comment sections any time Kotaku writes about XBLIGs you’ll see a lot of confusion, disinformation in there. Kobun has been waging a secret war battling that lack of knowledge and he does it well. Also he’s running a pretty kick ass site for Xbox LIVE Indie Game reviews http://xboxindies.teamkobun.com/ @radiangames releases his last Xbox LIVE Indie Game...for now http://bit.ly/gMK6lE Playing Avaglide with the Kinect controller http://www.youtube.com/watch?v=UqAYbHww53o http://www.joystiq.com/2011/01/30/kinect-hacks-take-to-the-skies-with-avaglide/ Luke Schneider of Radiangames interviewed in Edge magazine http://www.next-gen.biz/features/radiangames-venture Digital Quarters posts thoughts on why XBLIG’s online requirement kills certain genres http://digitalquarters.blogspot.com/2011/02/thoughts-why-xbligs-online-requirement.html Mommy’s Best Games shares the news that several XBLIGs were featured in the March 2011 issue of Famitsu 360 http://forums.create.msdn.com/forums/p/33455/451487.aspx#451487 NaviFairy continues with his Indie-Game-A-Day http://gaygamer.net/2011/02/indie_game_a_day_epic_dungeon.html http://gaygamer.net/2011/02/indie_game_a_day_break_limit_r.html and more every day...that’s kind of the point! Keep your eye on this series! VVGTV continues with it’s awesome reviews/promotions for XBLIGs http://vvgtv.com/ http://vvgtv.com/2011/02/03/iredia-atrams-secret-xblig-review-2/ http://vvgtv.com/2011/02/02/poopocalypse-coming-soon-to-xblig/ ….and even more, you get the point. Magicka is an Indie Game doing really well on Steam AND it’s made using XNA http://www.magickagame.com/ http://twitter.com/Magickagame/statuses/32712762580799488 GameMarx reviews Antipole http://www.gamemarx.com/reviews/73/antipole-is-vvvvvvery-good.aspx Armless Octopus review Alpha Squad http://www.armlessoctopus.com/2011/01/28/xbox-indie-review-alpha-squad/ An interesting article about Kodu that Jim Perry found http://twitter.com/MachXGames/statuses/32848044105924608 http://www.develop-online.net/news/36915/10-year-old-Jordan-makes-games-The-UK-needs-more-like-her XNA Game Development Sgt. Conker posts about the Natur beta, a new book and whether you can make money with XBLIG http://www.sgtconker.com/ http://www.sgtconker.com/2011/01/a-new-book-on-the-block-and-a-new-natur-beta/ http://www.sgtconker.com/2011/01/making-money-in-xbox-360-indie-game-development-is-it-possible/ Tips for setting up SVN http://bit.ly/fKxgFh @bsimser found tons of royalty free music and soundfx for your XNA Games http://twitter.com/bsimser/statuses/31426632933711872 Post on the new features in the next Sunburn Editor http://www.synapsegaming.com/blogs/fivesidedbarrel/archive/2011/01/28/new-editor-features-prefabs-components-and-more.aspx @jasons_novaleaf posts source code for light pre-pass optimizations for #xna http://twitter.com/jasons_novaleaf/statuses/33348855403642880 http://jcoluna.wordpress.com/2011/02/01/xna-4-0-light-pre-pass-optimization-round-one/ I’ve been learning about doing an A.I. for turn based games and this article was a great resource. http://www.gamasutra.com/view/feature/1535/designing_ai_algorithms_for_.php?print=1

Read the article
What's New In 11.1.2.1 (Talleyrand SP1)

- by russ.bishop

This release is primarily about bug fixes and that's what we spent the most time on, but we also addressed a number of other things: 1. Performance improvements We've done a lot of work to improve the performance of page load and execution times. For example, the View Compare page is about half the size it was previously! We've also done a lot of work on the server to improve performance of queries, exports, action scripts, etc. We implemented some finer-grained locking so fewer operations will block other users while they are in progress. We made some optimizations to improve performance when you have a lot of network or database latency as well. Just a few examples: An Import that previously took 8 GB of memory and hours to complete now runs in about 30 minutes and never takes more than 1 GB of RAM. Searching by exact Node Name now completes within 2 seconds even for a hierarchy with millions of nodes. Another search that was taking 30 seconds to run now completes in less than 5 seconds. 2. Upgrade support This release supports automatic upgrade from previous releases, built right into the console. 3. Console Improvements The Console has been reorganized and made easier to use. It is also much more multi-threaded so it responds quicker without freezing up when you save changes or when it needs to get status. 4. Property Namespaces Properties now have a concept called a Namespace. This is tied into the Application Templates to prevent conflicts with duplicate property names. Right now, if you have an AccountType and you pull in the HFM template, it also has AccountType so you end up creating properties with decorations on the name like "Account Type (HFM)". This is no longer necessary. In addition, properties within a namespace must have unique labels but they can be duplicated across namespaces. So in the Property Grid when you click on the HFM category, you just see "AccountType". When you click on MyCategory, you see "AccountType", but they are different properties with different values. Within formulas, the names are still unique (eg: Custom.AccountType vs HFM.AccountType). I'll write more about this one later. 5. Single Sign On DRM now supports Single Sign-On via HSS. For example, if you are using Oracle's OAM as your SSO solution then you configure HSS to use OAM just like you would before. You also configure DRM to use HSS, again just like before. Then you configure OAM to protect the DRM web app, like you would any other website. However once you do those things, users are no longer prompted to enter their username/password. They simply get redirected to OAM if they don't already have a login token, otherwise they pick their application and sail right into DRM. You can also avoid having to pick an application (see the next item) 6. URL-based navigation You can now specify the application you want to log into via the URL. Combined with SSO and your Intranet, it becomes easy to provide links on our intranet portal that take users directly into a specific DRM application. We also support specifying the Version, Hierarchy, and Node. Again, this can be used on your internal portal, but the scenarios get even more interesting when you are using workflow like Oracle BPEL you can automatically generate links within emails that will take users directly to a specific node in the UI. 7. Job status and cancellation A lot of the jobs now report their status and support true cancellation. Action Scripts also report a progress complete percentage since the amount of work is known ahead of time. 8. Action Script Options Action scripts support Option declarations at the top of the file so a script can self-describe (when specified in the file, the corresponding item in the file is ignored). For example: Option|DetectDelimiter Option|UsePropertyNames|true This will tell DRM to automatically detect the delimiter (a pipe symbol in this case) and that all references to properties are by Name, not by Label. Note that when you load a script in the UI, if you use Labels we automatically try to match them up if they are unique. Any duplicates are indicated and you are presented with a choice to pick which property you actually referred to. This is somewhat similar to Version substitution, but tailored for properties. There are other more minor changes and like I said earlier a lot of bug fixes and performance improvements. Hopefully I will get a chance to dig into some of these things in future blog posts.

Read the article
Running a Mongo Replica Set on Azure VM Roles

- by Elton Stoneman

Originally posted on: http://geekswithblogs.net/EltonStoneman/archive/2013/10/15/running-a-mongo-replica-set-on-azure-vm-roles.aspxSetting up a MongoDB Replica Set with a bunch of Azure VMs is straightforward stuff. Here’s a step-by-step which gets you from 0 to fully-redundant 3-node document database in about 30 minutes (most of which will be spent waiting for VMs to fire up). First, create yourself 3 VM roles, which is the minimum number of nodes you need for high availability. You can use any OS that Mongo supports. This guide uses Windows but the only difference will be the mechanism for starting the Mongo service when the VM starts (Windows Service, daemon etc.) While the VMs are provisioning, download and install Mongo locally, so you can set up the replica set with the Mongo shell. We’ll create our replica set from scratch, doing one machine at a time (if you have a single node you want to upgrade to a replica set, it’s the same from step 3 onwards): 1. Setup Mongo Log into the first node, download mongo and unzip it to C:. Rename the folder to remove the version – so you have c:\MongoDB\bin etc. – and create a new folder for the logs, c:\MongoDB\logs. 2. Setup your data disk When you initialize a node in a replica set, Mongo pre-allocates a whole chunk of storage to use for data replication. It will use up to 5% of your data disk, so if you use a Windows VM image with a defsault 120Gb disk and host your data on C:, then Mongo will allocate 6Gb for replication. And that takes a while. Instead you can create yourself a new partition by shrinking down the C: drive in Computer Management, by say 10Gb, and then creating a new logical disk for your data from that spare 10Gb, which will be allocated as E:. Create a new folder, e:\data. 3. Start Mongo When that’s done, start a command line, point to the mongo binaries folder, install Mongo as a Windows Service, running in replica set mode, and start the service: cd c:\mongodb\bin mongod -logpath c:\mongodb\logs\mongod.log -dbpath e:\data -replSet TheReplicaSet –install net start mongodb 4. Open the ports Mongo uses port 27017 by default, so you need to allow access in the machine and in Azure. In the VM, open Windows Firewall and create a new inbound rule to allow access via port 27017. Then in the Azure Management Console for the VM role, under the Configure tab add a new rule, again to allow port 27017. 5. Initialise the replica set Start up your local mongo shell, connecting to your Azure VM, and initiate the replica set: c:\mongodb\bin\mongo sc-xyz-db1.cloudapp.net rs.initiate() This is the bit where the new node (at this point the only node) allocates its replication files, so if your data disk is large, this can take a long time (if you’re using the default C: drive with 120Gb, it may take so long that rs.initiate() never responds. If you’re sat waiting more than 20 minutes, start another instance of the mongo shell pointing to the same machine to check on it). Run rs.conf() and you should see one node configured. 6. Fix the host name for the primary – *don’t miss this one* For the first node in the replica set, Mongo on Windows doesn’t populate the full machine name. Run rs.conf() and the name of the primary is sc-xyz-db1, which isn’t accessible to the outside world. The replica set configuration needs the full DNS name of every node, so you need to manually rename it in your shell, which you can do like this: cfg = rs.conf() cfg.members[0].host = ‘sc-xyz-db1.cloudapp.net:27017’ rs.reconfig(cfg) When that returns, rs.conf() will have your full DNS name for the primary, and the other nodes will be able to connect. At this point you have a working database, so you can start adding documents, but there’s no replication yet. 7. Add more nodes For the next two VMs, follow steps 1 through to 4, which will give you a working Mongo database on each node, which you can add to the replica set from the shell with rs.add(), using the full DNS name of the new node and the port you’re using: rs.add(‘sc-xyz-db2.cloudapp.net:27017’) Run rs.status() and you’ll see your new node in STARTUP2 state, which means its initializing and replicating from the PRIMARY. Repeat for your third node: rs.add(‘sc-xyz-db3.cloudapp.net:27017’) When all nodes are finished initializing, you will have a PRIMARY and two SECONDARY nodes showing in rs.status(). Now you have high availability, so you can happily stop db1, and one of the other nodes will become the PRIMARY with no loss of data or service. Note – the process for AWS EC2 is exactly the same, but with one important difference. On the Azure Windows Server 2012 base image, the MongoDB release for 64-bit 2008R2+ works fine, but on the base 2012 AMI that release keeps failing with a UAC permission error. The standard 64-bit release is fine, but it lacks some optimizations that are in the 2008R2+ version.

Read the article
Take Two: Comparing JVMs on ARM/Linux

- by user12608080

Although the intent of the previous article, entitled Comparing JVMs on ARM/Linux, was to introduce and highlight the availability of the HotSpot server compiler (referred to as c2) for Java SE-Embedded ARM v7, it seems, based on feedback, that everyone was more interested in the OpenJDK comparisons to Java SE-E. In fact there were two main concerns: The fact that the previous article compared Java SE-E 7 against OpenJDK 6 might be construed as an unlevel playing field because version 7 is newer and therefore potentially more optimized. That the generic compiler settings chosen to build the OpenJDK implementations did not put those versions in a particularly favorable light. With those considerations in mind, we'll institute the following changes to this version of the benchmarking: In order to help alleviate an additional concern that there is some sort of benchmark bias, we'll use a different suite, called DaCapo. Funded and supported by many prestigious organizations, DaCapo's aim is to benchmark real world applications. Further information about DaCapo can be found at http://dacapobench.org. At the suggestion of Xerxes Ranby, who has been a great help through this entire exercise, a newer Linux distribution will be used to assure that the OpenJDK implementations were built with more optimal compiler settings. The Linux distribution in this instance is Ubuntu 11.10 Oneiric Ocelot. Having experienced difficulties getting Ubuntu 11.10 to run on the original D2Plug ARMv7 platform, for these benchmarks, we'll switch to an embedded system that has a supported Ubuntu 11.10 release. That platform is the Freescale i.MX53 Quick Start Board. It has an ARMv7 Coretex-A8 processor running at 1GHz with 1GB RAM. We'll limit comparisons to 4 JVM implementations: Java SE-E 7 Update 2 c1 compiler (default) Java SE-E 6 Update 30 (c1 compiler is the only option) OpenJDK 6 IcedTea6 1.11pre 6b23~pre11-0ubuntu1.11.10.2 CACAO build 1.1.0pre2 OpenJDK 6 IcedTea6 1.11pre 6b23~pre11-0ubuntu1.11.10.2 JamVM build-1.6.0-devel Certain OpenJDK implementations were eliminated from this round of testing for the simple reason that their performance was not competitive. The Java SE 7u2 c2 compiler was also removed because although quite respectable, it did not perform as well as the c1 compilers. Recall that c2 works optimally in long-lived situations. Many of these benchmarks completed in a relatively short period of time. To get a feel for where c2 shines, take a look at the first chart in this blog. The first chart that follows includes performance of all benchmark runs on all platforms. Later on we'll look more at individual tests. In all runs, smaller means faster. The DaCapo aficionado may notice that only 10 of the 14 DaCapo tests for this version were executed. The reason for this is that these 10 tests represent the only ones successfully completed by all 4 JVMs. Only the Java SE-E 6u30 could successfully run all of the tests. Both OpenJDK instances not only failed to complete certain tests, but also experienced VM aborts too. One of the first observations that can be made between Java SE-E 6 and 7 is that, for all intents and purposes, they are on par with regards to performance. While it is a fact that successive Java SE releases add additional optimizations, it is also true that Java SE 7 introduces additional complexity to the Java platform thus balancing out any potential performance gains at this point. We are still early into Java SE 7. We would expect further performance enhancements for Java SE-E 7 in future updates. In comparing Java SE-E to OpenJDK performance, among both OpenJDK VMs, Cacao results are respectable in 4 of the 10 tests. The charts that follow show the individual results of those four tests. Both Java SE-E versions do win every test and outperform Cacao in the range of 9% to 55%. For the remaining 6 tests, Java SE-E significantly outperforms Cacao in the range of 114% to 311% So it looks like OpenJDK results are mixed for this round of benchmarks. In some cases, performance looks to have improved. But in a majority of instances, OpenJDK still lags behind Java SE-Embedded considerably. Time to put on my asbestos suit. Let the flames begin...

Read the article

Search Results

Search found 454 results on 19 pages for 'optimizations'.

Page 14/19 | < Previous Page | 10 11 12 13 14 15 16 17 18 19 | Next Page >

- by rchrd

- by pinaldave

- by Sean Martin

- by back2dos

- by Eric Bezille

- by David Parks

- by Pinal Dave

- by mseika

- by Ben Emmett

- by Dave Ballantyne

- by jlaskey

- by MarkPearl

- by Nithin

- by Steve Kinney

- by knweiss

- by jsh

- by flight

- by ScottGu

- by mbridge

- by pinaldave

- by Pinal Dave

- by George Clingerman

- by russ.bishop

- by Elton Stoneman

- by user12608080

< Previous Page | 10 11 12 13 14 15 16 17 18 19 | Next Page >