Search Results

Search found 16316 results on 653 pages for 'force download'.

Page 195/653 | < Previous Page | 191 192 193 194 195 196 197 198 199 200 201 202  | Next Page >

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

  • Add a Flight Full of Color to Your Desktop with the Beautiful Birds Theme for Windows 7

    - by Asian Angel
    Do you enjoy looking at and collecting pictures of beautifully colored birds? Then brighten up your desktop with the grace and gorgeous plumage of swans, flamingoes, peacocks, and other exotic birds with this wonderful theme for Windows 7. Note: The theme comes with seventeen awesome wallpapers full of brightly colored avian goodness. Download the Beautiful Birds Theme [Windows 7 Personalization Gallery] How To Encrypt Your Cloud-Based Drive with BoxcryptorHTG Explains: Photography with Film-Based CamerasHow to Clean Your Dirty Smartphone (Without Breaking Something)

    Read the article

  • MIX 2010 recap podcast

    - by Chris Williams
    from www.slickthought.net: Spaghetti Code Podcast: Recapping the MIX Conference 4/2/2010 11:06:10 AM Spaghetti Code (Jeff Brand) is joined by Mike Hodnic, Jason Bock, Adam Grocholski and Chris Williams to share their thoughts and impressions from the Microsoft MIX Conference and their thoughts on Windows Phone, Silverlight 4, and more. Direct Download - click here Subscribe - click here iTunes - click here

    Read the article

  • Use Autoruns to Manually Clean an Infected PC

    - by Mark Virtue
    There are many anti-malware programs out there that will clean your system of nasties, but what happens if you’re not able to use such a program?  Autoruns, from SysInternals (recently acquired by Microsoft), is indispensable when removing malware manually. There are a few reasons why you may need to remove viruses and spyware manually: Perhaps you can’t abide running resource-hungry and invasive anti-malware programs on your PC You might need to clean your mom’s computer (or someone else who doesn’t understand that a big flashing sign on a website that says “Your computer is infected with a virus – click HERE to remove it” is not a message that can necessarily be trusted) The malware is so aggressive that it resists all attempts to automatically remove it, or won’t even allow you to install anti-malware software Part of your geek credo is the belief that anti-spyware utilities are for wimps Autoruns is an invaluable addition to any geek’s software toolkit.  It allows you to track and control all programs (and program components) that start automatically with Windows (or with Internet Explorer).  Virtually all malware is designed to start automatically, so there’s a very strong chance that it can be detected and removed with the help of Autoruns. We have covered how to use Autoruns in an earlier article, which you should read if you need to first familiarize yourself with the program. Autoruns is a standalone utility that does not need to be installed on your computer.  It can be simply downloaded, unzipped and run (link below).  This makes is ideally suited for adding to your portable utility collection on your flash drive. When you start Autoruns for the first time on a computer, you are presented with the license agreement: After agreeing to the terms, the main Autoruns window opens, showing you the complete list of all software that will run when your computer starts, when you log in, or when you open Internet Explorer: To temporarily disable a program from launching, uncheck the box next to it’s entry.  Note:  This does not terminate the program if it is running at the time – it merely prevents it from starting next time.  To permanently prevent a program from launching, delete the entry altogether (use the Delete key, or right-click and choose Delete from the context-menu)).  Note:  This does not remove the program from your computer – to remove it completely you need to uninstall the program (or otherwise delete it from your hard disk). Suspicious Software It can take a fair bit of experience (read “trial and error”) to become adept at identifying what is malware and what is not.  Most of the entries presented in Autoruns are legitimate programs, even if their names are unfamiliar to you.  Here are some tips to help you differentiate the malware from the legitimate software: If an entry is digitally signed by a software publisher (i.e. there’s an entry in the Publisher column) or has a “Description”, then there’s a good chance that it’s legitimate If you recognize the software’s name, then it’s usually okay.  Note that occasionally malware will “impersonate” legitimate software, but adopting a name that’s identical or similar to software you’re familiar with (e.g. “AcrobatLauncher” or “PhotoshopBrowser”).  Also, be aware that many malware programs adopt generic or innocuous-sounding names, such as “Diskfix” or “SearchHelper” (both mentioned below). Malware entries usually appear on the Logon tab of Autoruns (but not always!) If you open up the folder that contains the EXE or DLL file (more on this below), an examine the “last modified” date, the dates are often from the last few days (assuming that your infection is fairly recent) Malware is often located in the C:\Windows folder or the C:\Windows\System32 folder Malware often only has a generic icon (to the left of the name of the entry) If in doubt, right-click the entry and select Search Online… The list below shows two suspicious looking entries:  Diskfix and SearchHelper These entries, highlighted above, are fairly typical of malware infections: They have neither descriptions nor publishers They have generic names The files are located in C:\Windows\System32 They have generic icons The filenames are random strings of characters If you look in the C:\Windows\System32 folder and locate the files, you’ll see that they are some of the most recently modified files in the folder (see below) Double-clicking on the items will take you to their corresponding registry keys: Removing the Malware Once you’ve identified the entries you believe to be suspicious, you now need to decide what you want to do with them.  Your choices include: Temporarily disable the Autorun entry Permanently delete the Autorun entry Locate the running process (using Task Manager or similar) and terminating it Delete the EXE or DLL file from your disk (or at least move it to a folder where it won’t be automatically started) or all of the above, depending upon how certain you are that the program is malware. To see if your changes succeeded, you will need to reboot your machine, and check any or all of the following: Autoruns – to see if the entry has returned Task Manager (or similar) – to see if the program was started again after the reboot Check the behavior that led you to believe that your PC was infected in the first place.  If it’s no longer happening, chances are that your PC is now clean Conclusion This solution isn’t for everyone and is most likely geared to advanced users. Usually using a quality Antivirus application does the trick, but if not Autoruns is a valuable tool in your Anti-Malware kit. Keep in mind that some malware is harder to remove than others.  Sometimes you need several iterations of the steps above, with each iteration requiring you to look more carefully at each Autorun entry.  Sometimes the instant that you remove the Autorun entry, the malware that is running replaces the entry.  When this happens, we need to become more aggressive in our assassination of the malware, including terminating programs (even legitimate programs like Explorer.exe) that are infected with malware DLLs. Shortly we will be publishing an article on how to identify, locate and terminate processes that represent legitimate programs but are running infected DLLs, in order that those DLLs can be deleted from the system. Download Autoruns from SysInternals Similar Articles Productive Geek Tips Using Autoruns Tool to Track Startup Applications and Add-onsHow To Get Detailed Information About Your PCSUPERAntiSpyware Portable is the Must-Have Spyware Removal Tool You NeedQuick Tip: Windows Vista Temp Files DirectoryClear Recent Commands From the Run Dialog in Windows XP TouchFreeze Alternative in AutoHotkey The Icy Undertow Desktop Windows Home Server – Backup to LAN The Clear & Clean Desktop Use This Bookmarklet to Easily Get Albums Use AutoHotkey to Assign a Hotkey to a Specific Window Latest Software Reviews Tinyhacker Random Tips Revo Uninstaller Pro Registry Mechanic 9 for Windows PC Tools Internet Security Suite 2010 PCmover Professional 15 Great Illustrations by Chow Hon Lam Easily Sync Files & Folders with Friends & Family Amazon Free Kindle for PC Download Stretch popurls.com with a Stylish Script (Firefox) OldTvShows.org – Find episodes of Hitchcock, Soaps, Game Shows and more Download Microsoft Office Help tab

    Read the article

  • 11 Ubuntu One Features You May Not Be Aware Of

    - by Chris Hoffman
    While Ubuntu One might seem like a Ubuntu-only file synchronization service, it’s more than that – you can use Ubuntu One on Windows, Android, iOS, and from the web. Ubuntu One offers 5GB of free storage space to everyone. Ubuntu One includes features for sharing files or folders online, streaming music to your smartphone, synchronizing installed applications across all your devices, and more. How to Use an Xbox 360 Controller On Your Windows PC Download the Official How-To Geek Trivia App for Windows 8 How to Banish Duplicate Photos with VisiPic

    Read the article

  • Videos of my MonoTouch and Mono and Mobile sessions from NDC 2011

    - by Chris Hardy (ChrisNTR)
    Two weeks ago, I was in Oslo, Norway getting ready to present a few talks at the Norwegian Developer's Conference 2011 and now two weeks later, it's about time I point you to my MonoTouch and Mono and Mobile talks from the conference! First I would like to thanks for everyone involved with the conference, the hosts, the staff, the speakers and the attendees. There was so many great talks going on that you're forced to download the videos afterwards! All the videos from the conference are up on the...(read more)

    Read the article

  • ASP.NET MVC 2 RTM Available

    - by Shaun
    Shiju Varghese posted an article on his(her) blog and said that the RTM of the ASP.NET MVC 2 had been released and available to download. You can get the installation packeage and the release note here. And based on the release note there’s no breaking changes from RC2 to RTM. Let’s play with the new ASP.NET MVC and look forward the Visual Studio 2010 RTM.

    Read the article

  • Understanding Column Properties for a SQL Server Table

    Designing a table can be a little complicated if you don’t have the correct knowledge of data types, relationships, and even column properties. In this tip, Brady Upton goes over the column properties and provides examples. "It really helped us isolate where we were experiencing a bottleneck"- John Q Martin, SQL Server DBA. Get started with SQL Monitor today to solve tricky performance problems - download a free trial

    Read the article

  • VS.NET 2010 SP1, Win 7, Parallels, and a MBP&ndash;Hell, my friends&hellip;HELL!

    - by D'Arcy Lussier
    LightSwitch Beta 2 is out. That’s how all this started. All I wanted was to install it on my MBP’s Win7 Parallels VM. But as I’m finding with running a Win7 VM on a MBP, nothing is as easy as it should be. First my MBP froze during the SP1 installation. Not my VM crashing, the entire machine freezing…no mouse, nothing. Had to do a hard reset. BLECH. Then we’re back and I try to re-install SP1 (since the first try obviously failed). I get met with a dialog asking me where silverlight_sdk.msi was. It was *nowhere*! So I hit the net and download it from Microsoft’s site. Unfortunately, it only downloads an exe and not the individual files which would include the msi. Here’s what I did: - Download the tools for Silverlight 4 (http://www.microsoft.com/downloads/en/details.aspx?FamilyID=b3deb194-ca86-4fb6-a716-b67c2604a139&displaylang=en) - Run it, but don’t hit the install or next button when the dialog comes up - Look in your file structure for a folder with a weird name…bunch of numbers and letters. This is a temp folder that the exe creates and dumps all the necessary setup files into, and clears away after its done. - Inside this folder you’ll find the silverlight_sdk.msi (hooray!). Just copy it to a different location on the C drive. You can then cancel installation. Ok, so that takes care of that…but then running the SP1 installer I get hit with *another* dialog asking for the WCF RIA Services SP1 msi. Now it looks like this MSI is part of the Silverlight Tools package because you’ll see the MSI, but the VS.NET 2010 SP1 installer will thumb its nose at this unworthy msi…for whatever reason. So instead, go here: http://www.silverlight.net/getstarted/riaservices/ …and click on the “Install WCF Ria Services Sp1…” option. This downloads the msi, which you should save to your C drive and direct the VS.NET 2010 SP1 installer to. Then, if you’ve done all that, been good all year, and not made any little children cry, you *might* just be able to install VS.NET 2010 SP1 on your Parallels VM. If you were playing that “Take a shot every time he writes VS.NET 2010 Sp1” drinking game, then you’re drunk…which is a better place to be than where I am right now: watching the installation progress bar slowly creep to completion, hoping there’s no more surprises in store. D

    Read the article

  • SQL SERVER – Expanding Views – Contest Win Joes 2 Pros Combo (USD 198) – Day 4 of 5

    - by pinaldave
    August 2011 we ran a contest where every day we give away one book for an entire month. The contest had extreme success. Lots of people participated and lots of give away. I have received lots of questions if we are doing something similar this month. Absolutely, instead of running a contest a month long we are doing something more interesting. We are giving away USD 198 worth gift every day for this week. We are giving away Joes 2 Pros 5 Volumes (BOOK) SQL 2008 Development Certification Training Kit every day. One copy in India and One in USA. Total 2 of the giveaway (worth USD 198). All the gifts are sponsored from the Koenig Training Solution and Joes 2 Pros. The books are available here Amazon | Flipkart | Indiaplaza How to Win: Read the Question Read the Hints Answer the Quiz in Contact Form in following format Question Answer Name of the country (The contest is open for USA and India residents only) 2 Winners will be randomly selected announced on August 20th. Question of the Day: Which of the following key word will force the query to use indexes created on views? a) ENCRYPTION b) SCHEMABINDING c) NOEXPAND d) CHECK OPTION Query Hints: BIG HINT POST Usually, the assumption is that Index on the table will use Index on the table and Index on view will be used by view. However, that is the misconception. It does not happen this way. In fact, if you notice the image, you will find the both of them (table and view) use both the index created on the table. The index created on the view is not used. The reason for the same as listed in BOL. The cost of using the indexed view may exceed the cost of getting the data from the base tables, or the query is so simple that a query against the base tables is fast and easy to find. This often happens when the indexed view is defined on small tables. You can use the NOEXPAND hint if you want to force the query processor to use the indexed view. This may require you to rewrite your query if you don’t initially reference the view explicitly. You can get the actual cost of the query with NOEXPAND and compare it to the actual cost of the query plan that doesn’t reference the view. If they are close, this may give you the confidence that the decision of whether or not to use the indexed view doesn’t matter. Additional Hints: I have previously discussed various concepts from SQL Server Joes 2 Pros Volume 4. SQL Joes 2 Pros Development Series – Structured Error Handling SQL Joes 2 Pros Development Series – SQL Server Error Messages SQL Joes 2 Pros Development Series – Table-Valued Functions SQL Joes 2 Pros Development Series – Table-Valued Store Procedure Parameters SQL Joes 2 Pros Development Series – Easy Introduction to CHECK Options SQL Joes 2 Pros Development Series – Introduction to Views SQL Joes 2 Pros Development Series – All about SQL Constraints Next Step: Answer the Quiz in Contact Form in following format Question Answer Name of the country (The contest is open for USA and India) Bonus Winner Leave a comment with your favorite article from the “additional hints” section and you may be eligible for surprise gift. There is no country restriction for this Bonus Contest. Do mention why you liked it any particular blog post and I will announce the winner of the same along with the main contest. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Joes 2 Pros, PostADay, SQL, SQL Authority, SQL Puzzle, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • Downloading error "Could not retrieve the required disk image "

    - by Freed Ahmad
    When I try to download Ubuntu 12.04 using the Windows installer (Wubi) through a proxy server which requires proxy authentication, after I choose the Ubuntu installation size, username and password, when I click the Install button, I get this error: An error occurred: Could not retrieve the required disk image files For more information, please see the log file: c:\users\(myusername)\appdata\local\temp\wubi-12.04-rev266.log The log file says: proxy authentication error How can I solve this problem?

    Read the article

  • Printer Review: HP LaserJet Pro 1606dn

    Looking for a black-and-white laser printer for your small office or workgroup? HP's $199 entry offers Ethernet, duplex printing, and fast performance -- and can install itself with no CD to insert or driver to download.

    Read the article

  • Printer Review: HP LaserJet Pro 1606dn

    Looking for a black-and-white laser printer for your small office or workgroup? HP's $199 entry offers Ethernet, duplex printing, and fast performance -- and can install itself with no CD to insert or driver to download.

    Read the article

  • Problem with WebUpd8 PPA: Hash Sum mismatch

    - by jiewmeng
    I keep getting W: Failed to fetch bzip2:/var/lib/apt/lists/partial/ppa.launchpad.net_webupd8team_gnome3_ubuntu_dists_oneiric_main_binary-i386_Packages Hash Sum mismatch W: Failed to fetch http://ppa.launchpad.net/webupd8team/gnome3/ubuntu/dists/oneiric/main/i18n/Index No Hash entry in Release file /var/lib/apt/lists/partial/ppa.launchpad.net_webupd8team_gnome3_ubuntu_dists_oneiric_main_i18n_Index E: Some index files failed to download. They have been ignored, or old ones used instead. How might I fix this? I tried deleting the files in /var/lib/apt/lists/partial already ... still doesnt work ...

    Read the article

  • Updating Nvidia drivers 11.10 (nvidia 290.xxx.xxx)

    - by Jesse
    Hi I installed NVIDIA drivers using terminal. I added the newest repositories, to ensure getting the newest driver, once that was complete, I ran the following command: sudo apt-get install nvidia-current This started the download and successfully installed the (290.xx.xxx) version. The issue I have is when opening X-server application it still shows NVIDIA driver version 173.14.30. Can someone walk me through correcting this so it corresponds to the driver installed? Thanks

    Read the article

  • broadcom BCM4318 firmware missing

    - by Lucas
    I tried sudo apt-get update && apt-cache search b43 sudo apt-get install firmware-b43-installer after that i get this error Can't update, Some index files failed to download i tried cd /var/lib/apt/lists/partial sudo rm gb.archive.ubuntu.com_ubuntu_dists_natty_multivers e_source_Sources OR sudo rm /var/lib/apt/lists/partial/gb.archive.ubuntu.com_ubuntu_dists_natty_multivers e_source_Sources another error cannot remove e_source_Sources

    Read the article

  • [MINI HOW-TO] Redeem Pre-paid Zune Card Points for Zune Marketplace Media

    - by Mysticgeek
    If you don’t want to pay the monthly fee for a Zune Pass, one option is buying a pre-paid Zune card. Here we take a look at how to redeem the Zune card points so you can get music for your Zune or Zune HD. Of course the first thing you will need to do is buy a Zune card. You can find them for different amounts at most retail locations that sell Zune’s like Walmart, Best Buy…etc. When you purchase the card make sure the cashier activates it.   Now open up your Zune desktop software and sign in if you aren’t already. Go into Settings \ Account and under Microsoft Points click on Redeem Code. Now enter the code from the back of the card that you scratch off and hit Next. After entering in your code successfully it asks for your contact information, which seems odd considering you’re using a prepaid card. You may want to enter in a fictitious address and phone number if concerned about privacy…then click Next. The only thing you might want to enter in legitimately is your email address to get a confirmation email. You’re given a Thank you message… And back in your Account Settings you’ll see the points have been added. Now you can go shopping for music, videos, TV shows, and more at the Zune Marketplace. If you don’t want to give up your credit card info and pay the monthly fee for the Zune Pass, using prepaid card to purchase music as you go is a good alternative. Similar Articles Productive Geek Tips Update Your Zune Player SoftwareUnofficial Windows XP Themes Created by MicrosoftSweet Black Theme for Windows XPMake Windows XP Use a Custom Theme for the Classic Logon ScreenListen to Local FM Radio in Windows 7 Media Center TouchFreeze Alternative in AutoHotkey The Icy Undertow Desktop Windows Home Server – Backup to LAN The Clear & Clean Desktop Use This Bookmarklet to Easily Get Albums Use AutoHotkey to Assign a Hotkey to a Specific Window Latest Software Reviews Tinyhacker Random Tips Revo Uninstaller Pro Registry Mechanic 9 for Windows PC Tools Internet Security Suite 2010 PCmover Professional Easily Sync Files & Folders with Friends & Family Amazon Free Kindle for PC Download Stretch popurls.com with a Stylish Script (Firefox) OldTvShows.org – Find episodes of Hitchcock, Soaps, Game Shows and more Download Microsoft Office Help tab The Growth of Citibank

    Read the article

  • BizTalk ESB Toolkit: Core Components and Examples

    - by Rajesh Charagandla
    The BizTalk ESB Toolkit 2.0 provides a stable and powerful platform for services that can change as fast as your business needs. The main purpose of an enterprise service bus (ESB) to is to provide a common mediation layer (the “bus”) through which all services connect. By doing so, not only can many of the problems of point-to-point service connectivity be resolved, but a new level of agile service delivery can be achieved. Author: Jon Flanders This Document can be download from here.

    Read the article

  • 2 Birds, 1 Stone: Enabling M2M and Mobility in Healthcare

    - by Eric Jensen
    Jim Connors has created a video showcase of a comprehensive healthcare solution, connecting a mobile application directly to an embedded patient monitoring system. In the demo, Jim illustrates how you can easily build solutions on top of the Java embedded platform, using Oracle products like Berkeley DB and Database Mobile Server. Jim is running Apache Tomcat on an embedded device, using Berkeley DB as the data store. BDB is transparently linked to an Oracle Database backend using  Database Mobile Server. Information protection is important in healthcare, so it is worth pointing out that these products offer strong data encryption, for storage as well as transit. In his video, Jim does a great job of demystifying M2M. What's compelling about this demo is that uses a solution architecture that enterprise developers are already comfortable and familiar with: a Java apps server with a database backend. The additional pieces used to embed this solution are Oracle Berkeley DB and Database Mobile Server. It functions transparently, from the perspective of Java apps developers. This means that organizations who understand Java apps (basically everyone) can use this technology to develop embedded M2M products. The potential uses for this technology in healthcare alone are immense; any device that measures and records some aspect of the patient could be linked, securely and directly, to the medical records database. Breathing, circulation, other vitals, sensory perception, blood tests, x-rats or CAT scans. The list goes on and on. In this demo case, it's a testament to the power of the Java embedded platform that they are able to easily interface the device, called a Pulse Oximeter, with the web application. If Jim had stopped there, it would've been a cool demo. But he didn't; he actually saved the most awesome part for the end! At 9:52 Jim drops a bombshell: He's also created an Android app, something a doctor would use to view patient health data from his mobile device. The mobile app is seamlessly integrated into the rest of the system, using the device agent from Oracle's Database Mobile Server. In doing so, Jim has really showcased the full power of this solution: the ability to build M2M solutions that integrate seamlessly with mobile applications. In closing, I want to point out that this is not a hypothetical demo using beta or even v1.0 products. Everything in Jim's demo is available today. What's more, every product shown is mature, and already in production at many customer sites, albeit not in the innovative combination Jim has come up with. If your customers are in the market for these type of solutions (and they almost certainly are) I encourage you to download the components and try it out yourself! All the Oracle products showcased in this video are available for evaluation download via Oracle Technology Network.

    Read the article

  • Windows Server AppFabric Beta 2 Refresh for Visual Studio 2010/.NET 4 RTM

    - by The Official Microsoft IIS Site
    Today we are pleased to announce a Beta 2 Refresh for Windows Server AppFabric. This build supports the recently released .NET Framework 4 and Visual Studio 2010 RTM versions—a request we’ve had from a number of you. Organizations wanting to use Windows Server AppFabric with the final RTM versions of .NET 4 and Visual Studio 2010 are encouraged to download the Beta 2 Refresh today. Please click here for an installation guide on installing the Beta 2 Refresh. We encourage developers and IT professionals...(read more)

    Read the article

  • Update of SAE Benchmark Presentation to M6/T5/ZFS

    - by uwes
    Strategic Applications Engineering (SAE) published in March an updated Benchmark presentation showing the performance of Oracle systems, software and Virtualization. SPARC M6/T5/ZFS Benchmarks March 2014 The presentation is available via our eSTEP portal.  You will need to provide your email address and the pin below to access the downloads. Link to the portal is shown below. URL: http://launch.oracle.com/ PIN: eSTEP_2011 The material can be found under tab eSTEP Download Located under: Recent Updates and Miscellaneous

    Read the article

  • Goodbye Perth :-)

    - by Mike Dietrich
    We had a great day with +50 customers in Perth - so thanks to everybody! We hope you'd enjoy the day as well. And thanks to Tim for the excellent organization!!! And - again as always - please download the most recent version of the slides (and we've changed a few of them during the Perth workshop) from here: http://apex.oracle.com/folien Use the keyword (Schluesselwort): upgrade112 So hope to see you all next time again when we'll visit this amazing country - and sorry in advance if we'll beat Australia (and maybe Serbia as well) in World Cup 2010 :-))))

    Read the article

  • SQL SERVER – Weekly Series – Memory Lane – #048

    - by Pinal Dave
    Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Order of Result Set of SELECT Statement on Clustered Indexed Table When ORDER BY is Not Used Above theory is true in most of the cases. However SQL Server does not use that logic when returning the resultset. SQL Server always returns the resultset which it can return fastest.In most of the cases the resultset which can be returned fastest is the resultset which is returned using clustered index. Effect of TRANSACTION on Local Variable – After ROLLBACK and After COMMIT One of the Jr. Developer asked me this question (What will be the Effect of TRANSACTION on Local Variable – After ROLLBACK and After COMMIT?) while I was rushing to an important meeting. I was getting late so I asked him to talk with his Application Tech Lead. When I came back from meeting both of them were looking for me. They said they are confused. I quickly wrote down following example for them. 2008 SQL SERVER – Guidelines and Coding Standards Complete List Download Coding standards and guidelines are very important for any developer on the path of a successful career. A coding standard is a set of guidelines, rules and regulations on how to write code. Coding standards should be flexible enough or should take care of the situation where they should not prevent best practices for coding. They are basically the guidelines that one should follow for better understanding. Download Guidelines and Coding Standards complete List Download Get Answer in Float When Dividing of Two Integer Many times we have requirements of some calculations amongst different fields in Tables. One of the software developers here was trying to calculate some fields having integer values and divide it which gave incorrect results in integer where accurate results including decimals was expected. Puzzle – Computed Columns Datatype Explanation SQL Server automatically does a cast to the data type having the highest precedence. So the result of INT and INT will be INT, but INT and FLOAT will be FLOAT because FLOAT has a higher precedence. If you want a different data type, you need to do an EXPLICIT cast. Renaming SP is Not Good Idea – Renaming Stored Procedure Does Not Update sys.procedures I have written many articles about renaming a tables, columns and procedures SQL SERVER – How to Rename a Column Name or Table Name, here I found something interesting about renaming the stored procedures and felt like sharing it with you all. The interesting fact is that when we rename a stored procedure using SP_Rename command, the Stored Procedure is successfully renamed. But when we try to test the procedure using sp_helptext, the procedure will be having the old name instead of new names. 2009 Insert Values of Stored Procedure in Table – Use Table Valued Function It is clear from the result set that , where I have converted stored procedure logic into the table valued function, is much better in terms of logic as it saves a large number of operations. However, this option should be used carefully. The performance of the stored procedure is “usually” better than that of functions. Interesting Observation – Index on Index View Used in Similar Query Recently, I was working on an optimization project for one of the largest organizations. While working on one of the queries, we came across a very interesting observation. We found that there was a query on the base table and when the query was run, it used the index, which did not exist in the base table. On careful examination, we found that the query was using the index that was on another view. This was very interesting as I have personally never experienced a scenario like this. In simple words, “Query on the base table can use the index created on the indexed view of the same base table.” Interesting Observation – Execution Plan and Results of Aggregate Concatenation Queries Working with SQL Server has never seemed to be monotonous – no matter how long one has worked with it. Quite often, I come across some excellent comments that I feel like acknowledging them as blog posts. Recently, I wrote an article on SQL SERVER – Execution Plan and Results of Aggregate Concatenation Queries Depend Upon Expression Location, which is well received in the community. 2010 I encourage all of you to go through complete series and write your own on the subject. If you write an article and send it to me, I will publish it on this blog with due credit to you. If you write on your own blog, I will update this blog post pointing to your blog post. SQL SERVER – ORDER BY Does Not Work – Limitation of the View 1 SQL SERVER – Adding Column is Expensive by Joining Table Outside View – Limitation of the View 2 SQL SERVER – Index Created on View not Used Often – Limitation of the View 3 SQL SERVER – SELECT * and Adding Column Issue in View – Limitation of the View 4 SQL SERVER – COUNT(*) Not Allowed but COUNT_BIG(*) Allowed – Limitation of the View 5 SQL SERVER – UNION Not Allowed but OR Allowed in Index View – Limitation of the View 6 SQL SERVER – Cross Database Queries Not Allowed in Indexed View – Limitation of the View 7 SQL SERVER – Outer Join Not Allowed in Indexed Views – Limitation of the View 8 SQL SERVER – SELF JOIN Not Allowed in Indexed View – Limitation of the View 9 SQL SERVER – Keywords View Definition Must Not Contain for Indexed View – Limitation of the View 10 SQL SERVER – View Over the View Not Possible with Index View – Limitations of the View 11 2011 Startup Parameters Easy to Configure If you are a regular reader of this blog, you must be aware that I have written about SQL Server Denali recently. Here is the quickest way to reach into the screen where we can change the startup parameters. Go to SQL Server Configuration Manager >> SQL Server Services >> Right Click on the Server >> Properties >> Startup Parameters 2012 Validating Unique Columnname Across Whole Database I sometimes come across very strange requirements and often I do not receive a proper explanation of the same. Here is the one of those examples. For example “Our business requirement is when we add new column we want it unique across current database.” Read the solution to this strange request in this blog post. Excel Losing Decimal Values When Value Pasted from SSMS ResultSet It is very common when users are coping the resultset to Excel, the floating point or decimals are missed. The solution is very much simple and it requires a small adjustment in the Excel. By default Excel is very smart and when it detects the value which is getting pasted is numeric it changes the column format to accommodate that. Basic Calculation and PEMDAS Order of Operation Read this interesting blog post for fantastic conversation about the subject. Copy Column Headers from Resultset – SQL in Sixty Seconds #027 – Video http://www.youtube.com/watch?v=x_-3tLqTRv0 Delete From Multiple Table – Update Multiple Table in Single Statement There are two questions which I get every single day multiple times. In my gmail, I have created standard canned reply for them. Let us see the questions here. I want to delete from multiple table in a single statement how will I do it? I want to update multiple table in a single statement how will I do it? Read the answer in the blog post. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • Macbook Pro 13" Retina (10,2): Keyboard and Touchpad don't work correcltly

    - by Dirk
    I'm dealing with Ubuntu since about 5 years and installed it on several laptops. Now I'm stuck when trying to install Ubuntu 12.04.1 on a brand new Macbook Pro 13" Retina (10,2). I sucessfully can start Ubuntu from an USB stick, the Ubuntu desktop is visible, a mouse cursor is visible. But there is no respond to keyboard or touchpad input. So I cannot really install Ubuntu on the Macbook. The details of my approach: Prepare an empty USB stick Download "ISO 2 USB EFI Booter for Mac" and copy the file bootX64.efi to the USB drive as /efi/boot/bootX64.efi. Download Ubuntu 12.04.1 Desktop for Mac from http://cdimage.ubuntu.com/releases/1-amd64+mac.iso and copy the iso the USB drive as /efi/boot/boot.iso Put the USB stick into the Macbook Press and hold the "alt" button while switching the Macbook on Select "EFI Boot" from the boot menu that appears and press the Return / Enter key Immediately a black terminal screen appears with the headline "Welcome to the Ubuntu ISO << - EFI booter". 30 seconds later the familiar Ubuntu startup graphics screen is showing. Further 20 seconds later Ubuntu has started and the desktop is visible - in wonderfully fine resolution Now the computer does not respond to any actions on the touchpad nor the keyboard Who did install Ubuntu on this Macbook Pro 13" Retina (10,2) successfully? On this site https://help.ubuntu.com/community/MacBookPro this unit is not listed yet, anyway. Any help would be greatly appreciated! Dirk PS: I could now install ubuntu with an external USB Keyboard/Mouse Set. But now, after showing the grub menu, a kernel panic error appears and booting stops :-/ Seems that the ubuntu images fit not to a macbook pro retina 13" (10,2) yet. PPS: Ok, there are new facts: If I edit the boot options and enter " nomodeset noapic" ubuntu starts and Keyboard and Touchpad work! Now I have to enable WiFi... PPPS: After installing Broadcom firmware from USB Live stick as described in other posts, WiFi was enabled. Then I could update ubuntu normally to 12.10. After this, I must not enter "nomodeset noapic" in the grub menu anymore. Last Thing now is the Touchpad. The driver seems not to be there. The touch pad is only showing as mouse. t.b.c.

    Read the article

< Previous Page | 191 192 193 194 195 196 197 198 199 200 201 202  | Next Page >