Search Results

Search found 9824 results on 393 pages for 'space partitioning'.

Page 96/393 | < Previous Page | 92 93 94 95 96 97 98 99 100 101 102 103  | Next Page >

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

  • OCS 2007 R2 User Properties Error Message

    - by BWCA
    When I attempted to configure one of our user’s Meeting settings using the Microsoft Office Communications Server 2007 R2 Administration Tool   I received an Validation failed – Validation failed with HRESULT = 0XC3EC7E02 dialog box error message. I received the same error message when I tried to configure the user’s Telephony and Other settings. Using ADSI Edit, I compared the settings of an user that I had no problems configuring and the user that I had problems configuring.  For the user I had problems configuring, I noticed a trailing space after the last phone number digit for the user’s msRTCSIP-Line attribute. After I removed the trailing space for the attribute and waited for Active Directory replication to complete, I was able to configure the user’s Meeting settings (and Telephony/Other settings) without any problems. If you get the error message, check your user’s msRTCSIP-xxxxx attributes in Active Directory using ADSI Edit for any trailing spaces, typos, or any other mistakes.

    Read the article

  • Temporarily share/deploy a python (flask) application

    - by Jeff
    Goal Temporarily (1 month?) deploy/share a python (flask) web app without expensive/complex hosting. More info I've developed a basic mobile web app for the non-profit I work for. It's written in python and uses flask as its framework. I'd like to share this with other employees and beta testers (<25 people). Ideally, I could get some sort of simple hosting space/service and push regular updates to it while we test and iterate on this app. Think something along the lines of dropbox, which of course would not work for this purpose. We do have a website, and hosting services for it, but I'm concerned about using this resource as our website is mission critical and this app is very much pre-alpha at this point. Options I've researched / considered Self host from local machine/network (slow, unreliable) Purchase hosting space (with limited non-profit resources, I'm concerned this is overkill) Using our current web server / hosting (not appropriate for testing) Thanks very much for your time!

    Read the article

  • Why Google skips page title

    - by Bob
    I have no idea why this is happening. An example http://www.londonofficespace.com/ofdj17062004934429t.htm Title tag is: Unfurnished Office Space Wimbledon – Serviced Office on Lombard Road SW19 But is indexed as: Lombard Road – SW19 - London Office Space If you look in the source code and search for this portion ‘Lombard Road – SW19’ You then find that it's next to an office image alt=’Lombard Road – SW19’. The only thing I could think of is that the spider somehow ‘skips’ our title tag and grabs this bit, and then inserts the name of the site (but WHY?) Is there anything I can do with this? or is this a Google behaviour?

    Read the article

  • Problem with Numlock light and remapping keyboard

    - by ansidev
    I upgraded to Ubuntu 13.10. And there are two problem: Numlock is on. But after I press Ctrl+Space (hotkey for IBus, (default in 13.10 is Super+Space)), Numlock is off and I must press Numlock button twice to enable Numlock. About remapping keyboard, because my keyboard have some broken key, so I am using xmodmap to remap my keyboard (config file for xmodmap is $HOME/.xmodmap). But when I switch keyboard layout (I mentioned above), everything changed to default, and I must run xmodmap .xmodmap to remap keyboard again. When using Ubuntu 13.04, everything is good. How to solve my problems? UPDATE: 1. I am using two keyboard layout: English (US) and Unikey (from ibus-unikey package). 2. If I change key board layout using menu on Unity panel, no first problem.

    Read the article

  • How can I change my cursor behavior?

    - by Doug Clement
    When I am typing, my mouse cursor, if left on the text, will eventually auto-click in whatever space in the the text box I happen to be, causing me to type in the middle of a sentence. Also, the cursor in the text box will frequently stop mid-word and the screen will scroll down all of a sudden when pressing the space bar while typing. My question is, how do I change this behavior because it is driving me absolutely bat crap crazy. I have an Acer Aspire One D257 Netbook. I am not sure if it's a Xubuntu problem because it does this while I am using Windows 7 too. Any help would be nice, thanks!

    Read the article

  • New Computer

    - by Matt Christian
    Last night I received my computer that was ordered with my tax return money.  Here are the specs of my old computer: - Pentium 4 Processor - 3-4 GB RAM - ~256 GB HDD space (2 drives) - nVidia card (AGP 8x) Sorry I can't be more specific, my memory is gone :p  Here are the new computer specs (mostly): - 2.8ghz Pentium i7 quadcore - 6 GB RAM - 1 TB HDD space (1 drive) - 1 GB Radeon card (PCI-X) I also got a new monitor (22" Asus with HDMI) so will be using my 19" widescreen as a secondary monitor. If I remember I'll hop on here and post the specifics later on...

    Read the article

  • Solaris: What comes next?

    - by alanc
    As you probably know by now, a few months ago, we released Solaris 11 after years of development. That of course means we now need to figure out what comes next - if Solaris 11 is “The First Cloud OS”, then what do we need to make future releases of Solaris be, to be modern and competitive when they're released? So we've been having planning and brainstorming meetings, and I've captured some notes here from just one of those we held a couple weeks ago with a number of the Silicon Valley based engineers. Now before someone sees an idea here and calls their product rep wanting to know what's up, please be warned what follows are rough ideas, and as I'll discuss later, none of them have any committment, schedule, working code, or even plan for integration in any possible future product at this time. (Please don't make me force you to read the full Oracle future product disclaimer here, you should know it by heart already from the front of every Oracle product slide deck.) To start with, we did some background research, looking at ideas from other Oracle groups, and competitive OS'es. We examined what was hot in the technology arena and where the interesting startups were heading. We then looked at Solaris to see where we could apply those ideas. Making Network Admins into Socially Networking Admins We all know an admin who has grumbled about being the only one stuck late at work to fix a problem on the server, or having to work the weekend alone to do scheduled maintenance. But admins are humans (at least most are), and crave companionship and community with their fellow humans. And even when they're alone in the server room, they're never far from a network connection, allowing access to the wide world of wonders on the Internet. Our solution here is not building a new social network - there's enough of those already, and Oracle even has its own Oracle Mix social network already. What we proposed is integrating Solaris features to help engage our system admins with these social networks, building community and bringing them recognition in the workplace, using achievement recognition systems as found in many popular gaming platforms. For instance, if you had a Facebook account, and a group of admin friends there, you could register it with our Social Network Utility For Facebook, and then your friends might see: Alan earned the achievement Critically Patched (April 2012) for patching all his servers. Matt is only at 50% - encourage him to complete this achievement today! To avoid any undue risk of advertising who has unpatched servers that are easier targets for hackers to break into, this information would be tightly protected via Facebook's world-renowned privacy settings to avoid it falling into the wrong hands. A related form of gamification we considered was replacing simple certfications with role-playing-game-style Experience Levels. Instead of just knowing an admin passed a test establishing a given level of competency, these would provide recruiters with a more detailed level of how much real-world experience an admin has. Achievements such as the one above would feed into it, but larger numbers of experience points would be gained by tougher or more critical tasks - such as recovering a down system, or migrating a service to a new platform. (As long as it was an Oracle platform of course - migrating to an HP or IBM platform would cause the admin to lose points with us.) Unfortunately, we couldn't figure out a good way to prevent (if you will) “gaming” the system. For instance, a disgruntled admin might decide to start ignoring warnings from FMA that a part is beginning to fail or skip preventative maintenance, in the hopes that they'd cause a catastrophic failure to earn more points for bolstering their resume as they look for a job elsewhere, and not worrying about the effect on your business of a mission critical server going down. More Z's for ZFS Our suggested new feature for ZFS was inspired by the worlds most successful Z-startup of all time: Zynga. Using the Social Network Utility For Facebook described above, we'd tie it in with ZFS monitoring to help you out when you find yourself in a jam needing more disk space than you have, and can't wait a month to get a purchase order through channels to buy more. Instead with the click of a button you could post to your group: Alan can't find any space in his server farm! Can you help? Friends could loan you some space on their connected servers for a few weeks, knowing that you'd return the favor when needed. ZFS would create a new filesystem for your use on their system, and securely share it with your system using Kerberized NFS. If none of your friends have space, then you could buy temporary use space in small increments at affordable rates right there in Facebook, using your Facebook credits, and then file an expense report later, after the urgent need has passed. Universal Single Sign On One thing all the engineers agreed on was that we still had far too many "Single" sign ons to deal with in our daily work. On the web, every web site used to have its own password database, forcing us to hope we could remember what login name was still available on each site when we signed up, and which unique password we came up with to avoid having to disclose our other passwords to a new site. In recent years, the web services world has finally been reducing the number of logins we have to manage, with many services allowing you to login using your identity from Google, Twitter or Facebook. So we proposed following their lead, introducing PAM modules for web services - no more would you have to type in whatever login name IT assigned and try to remember the password you chose the last time password aging forced you to change it - you'd simply choose which web service you wanted to authenticate against, and would login to your Solaris account upon reciept of a cookie from their identity service. Pinning notes to the cloud We also all noted that we all have our own pile of notes we keep in our daily work - in text files in our home directory, in notebooks we carry around, on white boards in offices and common areas, on sticky notes on our monitors, or on scraps of paper pinned to our bulletin boards. The contents of the notes vary, some are things just for us, some are useful for our groups, some we would share with the world. For instance, when our group moved to a new building a couple years ago, we had a white board in the hallway listing all the NIS & DNS servers, subnets, and other network configuration information we needed to set up our Solaris machines after the move. Similarly, as Solaris 11 was finishing and we were all learning the new network configuration commands, we shared notes in wikis and e-mails with our fellow engineers. Users may also remember one of the popular features of Sun's old BigAdmin site was a section for sharing scripts and tips such as these. Meanwhile, the online "pin board" at Pinterest is taking the web by storm. So we thought, why not mash those up to solve this problem? We proposed a new BigAddPin site where users could “pin” notes, command snippets, configuration information, and so on. For instance, once they had worked out the ideal Automated Installation manifest for their app server, they could pin it up to share with the rest of their group, or choose to make it public as an example for the world. Localized data, such as our group's notes on the servers for our subnet, could be shared only to users connecting from that subnet. And notes that they didn't want others to see at all could be marked private, such as the list of phone numbers to call for late night pizza delivery to the machine room, the birthdays and anniversaries they can never remember but would be sleeping on the couch if they forgot, or the list of automatically generated completely random, impossible to remember root passwords to all their servers. For greater integration with Solaris, we'd put support right into the command shells — redirect output to a pinned note, set your path to include pinned notes as scripts you can run, or bring up your recent shell history and pin a set of commands to save for the next time you need to remember how to do that operation. Location service for Solaris servers A longer term plan would involve convincing the hardware design groups to put GPS locators with wireless transmitters in future server designs. This would help both admins and service personnel trying to find servers in todays massive data centers, and could feed into location presence apps to help show potential customers that while they may not see many Solaris machines on the desktop any more, they are all around. For instance, while walking down Wall Street it might show “There are over 2000 Solaris computers in this block.” [Note: this proposal was made before the recent media coverage of a location service aggregrator app with less noble intentions, and in hindsight, we failed to consider what happens when such data similarly falls into the wrong hands. We certainly wouldn't want our app to be misinterpreted as “There are over $20 million dollars of SPARC servers in this building, waiting for you to steal them.” so it's probably best it was rejected.] Harnessing the power of the GPU for Security Most modern OS'es make use of the widespread availability of high powered GPU hardware in today's computers, with desktop environments requiring 3-D graphics acceleration, whether in Ubuntu Unity, GNOME Shell on Fedora, or Aero Glass on Windows, but we haven't yet made Solaris fully take advantage of this, beyond our basic offering of Compiz on the desktop. Meanwhile, more businesses are interested in increasing security by using biometric authentication, but must also comply with laws in many countries preventing discrimination against employees with physical limations such as missing eyes or fingers, not to mention the lost productivity when employees can't login due to tinted contacts throwing off a retina scan or a paper cut changing their fingerprint appearance until it heals. Fortunately, the two groups considering these problems put their heads together and found a common solution, using 3D technology to enable authentication using the one body part all users are guaranteed to have - pam_phrenology.so, a new PAM module that uses an array USB attached web cams (or just one if the user is willing to spin their chair during login) to take pictures of the users head from all angles, create a 3D model and compare it to the one in the authentication database. While Mythbusters has shown how easy it can be to fool common fingerprint scanners, we have not yet seen any evidence that people can impersonate the shape of another user's cranium, no matter how long they spend beating their head against the wall to reshape it. This could possibly be extended to group users, using modern versions of some of the older phrenological studies, such as giving all users with long grey beards access to the System Architect role, or automatically placing users with pointy spikes in their hair into an easy use mode. Unfortunately, there are still some unsolved technical challenges we haven't figured out how to overcome. Currently, a visit to the hair salon causes your existing authentication to expire, and some users have found that shaving their heads is the only way to avoid bad hair days becoming bad login days. Reaction to these ideas After gathering all our notes on these ideas from the engineering brainstorming meeting, we took them in to present to our management. Unfortunately, most of their reaction cannot be printed here, and they chose not to accept any of these ideas as they were, but they did have some feedback for us to consider as they sent us back to the drawing board. They strongly suggested our ideas would be better presented if we weren't trying to decipher ink blotches that had been smeared by the condensation when we put our pint glasses on the napkins we were taking notes on, and to that end let us know they would not be approving any more engineering offsites in Irish themed pubs on the Friday of a Saint Patrick's Day weekend. (Hopefully they mean that situation specifically and aren't going to deny the funding for travel to this year's X.Org Developer's Conference just because it happens to be in Bavaria and ending on the Friday of the weekend Oktoberfest starts.) They recommended our research techniques could be improved over just sitting around reading blogs and checking our Facebook, Twitter, and Pinterest accounts, such as considering input from alternate viewpoints on topics such as gamification. They also mentioned that Oracle hadn't fully adopted some of Sun's common practices and we might have to try harder to get those to be accepted now that we are one unified company. So as I said at the beginning, don't pester your sales rep just yet for any of these, since they didn't get approved, but if you have better ideas, pass them on and maybe they'll get into our next batch of planning.

    Read the article

  • Is there any difference between storing textures and baked lighting for environment meshes?

    - by Ben Hymers
    I assume that when texturing environments, one or several textures will be used, and the UVs of the environment geometry will likely overlap on these textures, so that e.g. a tiling brick texture can be used by many parts of the environment, rather than UV unwrapping the entire thing, and having several areas of the texture be identical. If my assumption is wrong, please let me know! Now, when thinking about baking lighting, clearly this can't be done the same way - lighting in general will be unique to every face so the environment must be UV unwrapped without overlap, and lighting must be baked onto unique areas of one or several textures, to give each surface its own texture space to store its lighting. My questions are: Have I got this wrong? If so, how? Isn't baking lighting going to use a lot of texture space? Will the geometry need two UV sets, one used for the colour/normal texture and one for the lighting texture? Anything else you'd like to add? :)

    Read the article

  • How do I list installed software with the installed size?

    - by Lewis Goddard
    I would like to have a list the installed software on my machine, with the disk space consumed by them alongside. I would prefer to be able to order by largest/smallest, but that is not a necessity. I am the sort of person who will install software to try it, and never clean up after myself. As a result, my 7GB (Windows and my Data are on separate partitions, as well as a swap area) Ubuntu 11.04 partition is suffering, and has started regularly showing warning messages. I have cleaned my browser cache, as well as everything under Package Cleaner in Ubuntu Tweak, and am left with 149.81 MB off free space.

    Read the article

  • This Week in Geek History: Zelda Turns 25, Birth of the Printing Press, and the Unveiling of ENIAC

    - by Jason Fitzpatrick
    Every week we bring you interesting highlights from the history of geekdom. This week we take a look at The Legend of Zelda’s 25th anniversary, the Gutenberg press, and the unveiling of primitive super computer ENIAC. Latest Features How-To Geek ETC Should You Delete Windows 7 Service Pack Backup Files to Save Space? What Can Super Mario Teach Us About Graphics Technology? Windows 7 Service Pack 1 is Released: But Should You Install It? How To Make Hundreds of Complex Photo Edits in Seconds With Photoshop Actions How to Enable User-Specific Wireless Networks in Windows 7 How to Use Google Chrome as Your Default PDF Reader (the Easy Way) Reclaim Vertical UI Space by Moving Your Tabs to the Side in Firefox Wind and Water: Puzzle Battles – An Awesome Game for Linux and Windows How Star Wars Changed the World [Infographic] Tabs Visual Manager Adds Thumbnailed Tab Switching to Chrome Daisies and Rye Swaying in the Summer Wind Wallpaper Read On Phone Pushes Data from Your Desktop to the Appropriate Android App

    Read the article

  • Error java.lang.OutOfMemoryError: getNewTla using Oracle EPM products

    - by Marc Schumacher
    Running into a Java out of memory error, it is very common behaviour in the field that the Java heap size will be increased. While this might help to solve a heap space out of memory error, it might not help to fix an out of memory error for the Thread Local Area (TLA). Increasing the available heap space from 1 GB to 16 GB might not even help in this situation. The Thread Local Area (TLA) is part of the Java heap, but as the name already indicates, this memory area is local to a specific thread so there is no need to synchronize with other threads using this memory area. For optimization purposes the TLA size is configurable using the Java command line option “-XXtlasize”. Depending on the JRockit version and the available Java heap, the default values vary. Using Oracle EPM System (mainly 11.1.2.x) the following setting was tested successfully: -XXtlasize:min=8k,preferred=128k More information about the “-XXtlasize” parameter can be found in the JRockit documentation: http://docs.oracle.com/cd/E13150_01/jrockit_jvm/jrockit/jrdocs/refman/optionXX.html

    Read the article

  • Time/duration handling in strategic game

    - by borg
    I'm considering developing a space opera game, having already done some game design. Technically, though, I'm coming from a business applications background. Hence I don't really know how I should handle time and duration. Let's state the matter clearly: what if something is bound to happen in 5 hours and on which other events depend. For example the arrival of some space ship in some system where some defense systems are present, hence a fight would start. Should I use some kind of scheduler (like Quartz in my java land) to trigger the corresponding event when due (I plan to use events for communication)? Something else?

    Read the article

  • Compute directional light frustum from view furstum points and light direction

    - by Fabian
    I'm working on a friends engine project and my task is to construct a new frustum from the light direction that overlaps the view frustum and possible shadow casters. The project already has a function that creates a frustum for this but its way to big and includes way to many casters (shadows) which can't be seen in the view frustum. Now the only parameter of this function are the normalized light direction vector and a view class which lets me extract the 8 view frustum points in world space. I don't have any additional infos about the scene. I have read some of the related Questions here but non seem to fit very well to my problem as they often just point to cascaded shadow maps. Sadly i can't use DX or openGl functions directly because this engine has a dedicated math library. From what i've read so far the steps are: Transform view frustum points into light space and find min/max x and y values (or sometimes minima and maxima of all three axis) and create a AABB using the min/max vectors. But what comes after this step? How do i transform this new AABB back to world space? What i've done so far: CVector3 Points[8], MinLight = CVector3(FLT_MAX), MaxLight = CVector3(FLT_MAX); for(int i = 0; i<8;++i){ Points[i] = Points[i] * WorldToShadowMapMatrix; MinLight = Math::Min(Points[i],MinLight); MaxLight = Math::Max(Points[i],MaxLight); } AABox box(MinLight,MaxLight); I don't think this is the right way to do it. The near plain probably has to extend into the direction of the light source to include potentional shadow casters. I've read the Microsoft article about cascaded shadow maps http://msdn.microsoft.com/en-us/library/windows/desktop/ee416307%28v=vs.85%29.aspx which also includes some sample code. But they seem to use the scenes AABB to determine the near and far plane which I can't since i cant access this information from the funtion I'm working in. Could you guys please link some example code which shows the calculation of such frustum? Thanks in advance! Additional questio: is there a way to construct a WorldToFrustum matrix that represents the above transformation?

    Read the article

  • Live Meeting error: malformed email address... or IS IT???

    - by PeterBrunone
    During a remote SharePoint training session this morning, Live Meeting presented one of our instructors with the following gem:  "An attendee email address is malformed".  This was particularly troubling since a wizard took care of adding all the entries, and they looked correct (even after being sifted through my character analysis tool).As it turns out, the addresses were indeed correct.  As sometimes happens, though, at the line breaks, it looked like there was no space between the semicolon and the following email address.  Since I'm a member in good standing of the "I wonder what this button does" school of thought, I added an extra space after each of these cramped little semicolons -- and the invitation executed flawlessly.Coincidence?  Maybe... but you can bet I'm going to keep trying dumb stuff like that when the error message doesn't make sense.  Think of it as the tech support equivalent of "Ask a silly question..."

    Read the article

  • SQL SERVER – Weekly Series – Memory Lane – #037

    - by Pinal Dave
    Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Convert Text to Numbers (Integer) – CAST and CONVERT If table column is VARCHAR and has all the numeric values in it, it can be retrieved as Integer using CAST or CONVERT function. List All Stored Procedure Modified in Last N Days If SQL Server suddenly start behaving in un-expectable behavior and if stored procedure were changed recently, following script can be used to check recently modified stored procedure. If a stored procedure was created but never modified afterwards modified date and create a date for that stored procedure are same. Count Duplicate Records – Rows Validate Field For DATE datatype using function ISDATE() We always checked DATETIME field for incorrect data type. One of the user input date as 30/2/2007. The date was sucessfully inserted in the temp table but while inserting from temp table to final table it crashed with error. We had now task to validate incorrect date value before we insert in final table. Jr. Developer asked me how can he do that? We check for incorrect data type (varchar, int, NULL) but this is incorrect date value. Regular expression works fine with them because of mm/dd/yyyy format. 2008 Find Space Used For Any Particular Table It is very simple to find out the space used by any table in the database. Two Convenient Features Inline Assignment – Inline Operations Here is the script which does both – Inline Assignment and Inline Operation DECLARE @idx INT = 0 SET @idx+=1 SELECT @idx Introduction to SPARSE Columns SPARSE column are better at managing NULL and ZERO values in SQL Server. It does not take any space in database at all. If column is created with SPARSE clause with it and it contains ZERO or NULL it will be take lesser space then regular column (without SPARSE clause). SP_CONFIGURE – Displays or Changes Global Configuration Settings If advanced settings are not enabled at configuration level SQL Server will not let user change the advanced features on server. Authorized user can turn on or turn off advance settings. 2009 Standby Servers and Types of Standby Servers Standby Server is a type of server that can be brought online in a situation when Primary Server goes offline and application needs continuous (high) availability of the server. There is always a need to set up a mechanism where data and objects from primary server are moved to secondary (standby) server. BLOB – Pointer to Image, Image in Database, FILESTREAM Storage When it comes to storing images in database there are two common methods. I had previously blogged about the same subject on my visit to Toronto. With SQL Server 2008, we have a new method of FILESTREAM storage. However, the answer on when to use FILESTREAM and when to use other methods is still vague in community. 2010 Upper Case Shortcut SQL Server Management Studio I select the word and hit CTRL+SHIFT+U and it SSMS immediately changes the case of the selected word. Similar way if one want to convert cases to lower case, another short cut CTRL+SHIFT+L is also available. The Self Join – Inner Join and Outer Join Self Join has always been a noteworthy case. It is interesting to ask questions about self join in a room full of developers. I often ask – if there are three kinds of joins, i.e.- Inner Join, Outer Join and Cross Join; what type of join is Self Join? The usual answer is that it is an Inner Join. However, the reality is very different. Parallelism – Row per Processor – Row per Thread – Thread 0  If you look carefully in the Properties window or XML Plan, there is “Thread 0?. What does this “Thread 0” indicate? Well find out from the blog post. How do I Learn and How do I Teach The blog post has raised three very interesting questions. How do you learn? How do you teach? What are you learning or teaching? Let me try to answer the same. 2011 SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 7 of 31 What are Different Types of Locks? What are Pessimistic Lock and Optimistic Lock? When is the use of UPDATE_STATISTICS command? What is the Difference between a HAVING clause and a WHERE clause? What is Connection Pooling and why it is Used? What are the Properties and Different Types of Sub-Queries? What are the Authentication Modes in SQL Server? How can it be Changed? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 8 of 31 Which Command using Query Analyzer will give you the Version of SQL Server and Operating System? What is an SQL Server Agent? Can a Stored Procedure call itself or a Recursive Stored Procedure? How many levels of SP nesting is possible? What is Log Shipping? Name 3 ways to get an Accurate Count of the Number of Records in a Table? What does it mean to have QUOTED_IDENTIFIER ON? What are the Implications of having it OFF? What is the Difference between a Local and a Global Temporary Table? What is the STUFF Function and How Does it Differ from the REPLACE Function? What is PRIMARY KEY? What is UNIQUE KEY Constraint? What is FOREIGN KEY? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 9 of 31 What is CHECK Constraint? What is NOT NULL Constraint? What is the difference between UNION and UNION ALL? What is B-Tree? How to get @@ERROR and @@ROWCOUNT at the Same Time? What is a Scheduled Job or What is a Scheduled Task? What are the Advantages of Using Stored Procedures? What is a Table Called, if it has neither Cluster nor Non-cluster Index? What is it Used for? Can SQL Servers Linked to other Servers like Oracle? What is BCP? When is it Used? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 10 of 31 What Command do we Use to Rename a db, a Table and a Column? What are sp_configure Commands and SET Commands? How to Implement One-to-One, One-to-Many and Many-to-Many Relationships while Designing Tables? What is Difference between Commit and Rollback when Used in Transactions? What is an Execution Plan? When would you Use it? How would you View the Execution Plan? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 11 of 31 What is Difference between Table Aliases and Column Aliases? Do they Affect Performance? What is the difference between CHAR and VARCHAR Datatypes? What is the Difference between VARCHAR and VARCHAR(MAX) Datatypes? What is the Difference between VARCHAR and NVARCHAR datatypes? Which are the Important Points to Note when Multilanguage Data is Stored in a Table? How to Optimize Stored Procedure Optimization? What is SQL Injection? How to Protect Against SQL Injection Attack? How to Find Out the List Schema Name and Table Name for the Database? What is CHECKPOINT Process in the SQL Server? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 12 of 31 How does Using a Separate Hard Drive for Several Database Objects Improves Performance Right Away? How to Find the List of Fixed Hard Drive and Free Space on Server? Why can there be only one Clustered Index and not more than one? What is Difference between Line Feed (\n) and Carriage Return (\r)? Is It Possible to have Clustered Index on Separate Drive From Original Table Location? What is a Hint? How to Delete Duplicate Rows? Why the Trigger Fires Multiple Times in Single Login? 2012 CTRL+SHIFT+] Shortcut to Select Code Between Two Parenthesis Shortcut key is CTRL+SHIFT+]. This key can be very useful when dealing with multiple subqueries, CTE or query with multiple parentheses. When exercised this shortcut key it selects T-SQL code between two parentheses. Monday Morning Puzzle – Query Returns Results Sometimes but Not Always I am beginner with SQL Server. I have one query, it sometime returns a result and sometime it does not return me the result. Where should I start looking for a solution and what kind of information I should send to you so you can help me with solving. I have no clue, please guide me. Remove Debug Button in SSMS – SQL in Sixty Seconds #020 – Video Effect of Case Sensitive Collation on Resultset Collation is a very interesting concept but I quite often see it is heavily neglected. I have seen developer and DBA looking for a workaround to fix collation error rather than understanding if the side effect of the workaround. Switch Between Two Parenthesis using Shortcut CTRL+] Earlier this week I wrote a blog post about CTRL+SHIFT+] Shortcut to Select Code Between Two Parenthesis, I received quite a lot of positive feedback from readers. If you are a regular reader of the blog post, you must be aware that I appreciate the learning shared by readers. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • How can I force X to start in a computer without a monitor?

    - by Javier Rivera
    I have I computer that have no monitor attached. When I boot it X fails to start because there is no monitor detected. If I boot it with a monitor attached and after X is started I remove the monitor everything works fine. Details and Background: This computer is a kind of hardware consolidation server. It's only purpose is to run two Virtual Box VM's that run Windows XP and some important but seldom used (once or twice a month) programs. For a couple of time it has been lying in a corner with an old monitor attached to it and working great. But space in the office was getting scarce and I moved the computer to the server room. There is no monitor attached to it there (no space), and sometimes the computer is rebooted. When it boots without monitor X is not started, the vms don't start and I get called to solve the problem.

    Read the article

  • Week in Geek: Rogue Antivirus Caught Using AVG Logo Edition

    - by Asian Angel
    This week we learned how to quickly cut a clip from a video file with Avidemux, “tile windows, remote control a desktop using an iOS device, taking advantage of Windows 7 libraries”, turn a home Ubuntu PC into a LAMP web server, enable desktop notifications for Gmail in Chrome, “what image channels are and what they mean”, and more Latest Features How-To Geek ETC How to Integrate Dropbox with Pages, Keynote, and Numbers on iPad RGB? CMYK? Alpha? What Are Image Channels and What Do They Mean? How to Recover that Photo, Picture or File You Deleted Accidentally How To Colorize Black and White Vintage Photographs in Photoshop How To Get SSH Command-Line Access to Windows 7 Using Cygwin The How-To Geek Video Guide to Using Windows 7 Speech Recognition Android Notifier Pushes Android Notices to Your Desktop Dead Space 2 Theme for Chrome and Iron Carl Sagan and Halo Reach Mashup – We Humans are Capable of Greatness [Video] Battle the Necromorphs Once Again on Your Desktop with the Dead Space 2 Theme for Windows 7 HTC Home Brings HTC’s Weather Widget to Your Windows Desktop Apps Uninstall Batch Removes Android Applications

    Read the article

  • HTG Explains: How Do Noise Reducing Headphones Work?

    - by YatriTrivedi
    Passive noise reduction, active noise cancellation, sound isolation… The world of headphones has become quite advanced in giving you your own private sound bubble. Here’s how these different technologies work. Latest Features How-To Geek ETC Should You Delete Windows 7 Service Pack Backup Files to Save Space? What Can Super Mario Teach Us About Graphics Technology? Windows 7 Service Pack 1 is Released: But Should You Install It? How To Make Hundreds of Complex Photo Edits in Seconds With Photoshop Actions How to Enable User-Specific Wireless Networks in Windows 7 How to Use Google Chrome as Your Default PDF Reader (the Easy Way) WizMouse Enables Mouse Over Scrolling on Any Window Enhance GIMP’s Image Editing Power with Gimp Paint Studio Reclaim Vertical UI Space by Moving Your Tabs to the Side in Firefox Wind and Water: Puzzle Battles – An Awesome Game for Linux and Windows How Star Wars Changed the World [Infographic] Tabs Visual Manager Adds Thumbnailed Tab Switching to Chrome

    Read the article

  • Determine percentage of screen covered by an object without using frustum culling

    - by Meltac
    On the CPU-side of an 3D first-person / ego perspective game I need to check whether what the players currently sees on screen is the inside of a box object defined by world space coordinates (the player might be outside of that box but on screen sees only/mostly the inside of the box, or vice-versa, looks from within the box to the outside). The "casual" way of performing such a check would incorporate frustum culling but such an approach would be hard to achieve with my given set of engine parameters which I'd like to avoid if there is a simpler way. What I actually have at the point where I would like to do the check (high-level script on CPU, not GPU side): Camera world position Camera direction Camera FOV Two Box corner world coordinates (left-bottom-front, right-top-back) What I do not have right away: View frustrum definition (near/far plane or say 6 planes defining frustum) Any specific pixel information (uv, view space position, depth or the like) What I would like to calculate: Percentage of screen "covered" by box. Any hints on how to perform such calculation?

    Read the article

  • Folder missing in external hard drive

    - by Hans
    I have been backing up my folders, I am using Seagate Expansion portable Drive. I had created a folder called "|o|o" in the root folder of the portable drive... I copied my latest folders into "|o|o" folder to re-install ubuntu. When I open the portable drive the folder |o|o is not visible, when I ctrl+a and check properties the space used is 122GB, however when I click on the drive to view properties of the drive used space is 260GB. It looks as if the folder is there in the portable drive but I cannot access it... I have tried to view all the hidden files and "|o|o" is still not there. I am using 12.04.Can you please help me to retrieve this folder.

    Read the article

  • Cocos2d and Body with few collision shapes using chipmunk

    - by Eimantas
    I'm using Cocos2d (0.99.5) with chipmunk physics engine. Currently I'm trying to place a body into space which is combined from few circle shapes. Let's say I have a corresponding sprite image with displays atom (nucleus + 3 electrons around it. Something like this without orbit lines). In it's simplest form - only one circle shape at the center should be enough which would detect collisions from other objects with nucleus. Now I'd like to add other circle shapes for each electron. How can I do that? Now when I add those shapes to the body and add the body into chipmunk space - the shapes (together with the body/sprite) start flickering and spinning with no recognizable pattern (or reason for that matter).

    Read the article

  • Using machine learning to aim mirrors in a solar array?

    - by Buttons840
    I've been thinking about solar collectors where several independent mirrors to focus the light on a solar collector, similar to the following design from Energy Innovations. Because there will be flaws in the assembly of this solar array, I am proceeding with the following assumptions (or lack thereof): The software knows the "position" of each mirror, but doesn't know how this position relates to the real world or to other mirrors. This will account for poor mirror calibration or other environmental factors which may effect one mirror but not the others. If a mirror moves 10 units in one direction, and then 10 units in the opposite direction, it will end up where it originally started. I would like to use machine learning to position the mirrors correctly and focus the light on the collector. I expect I would approach this as an optimization problem, optimizing the mirror positions to maximize the heat inside the collector and the power output. The problem is finding a small target in a noisy high-dimensional space (considering each mirror has 2 axis of rotation). Some of the problems I anticipate are: cloudy days, even if you stumble upon the perfect mirror alignment, it might be cloudy at the time noisy sensor data the sun is a moving target, it moves along a path, and follows a different path every day - although you could calculate the exact position of the sun at any time, you wouldn't know how that position relates to your mirrors My question isn't about the solar array, but possible machine learning techniques that would help in this "small target in a noisy high dimensional-space" problem. I mentioned the solar array because it was the catalyst for this question and a good example. What machine learning techniques can find such a small target in a noisy high-dimensional space? EDIT: A few additional thoughts: Yes, you can calculate the suns position in the real world, but you don't know how the mirrors position is related to the real world (unless you've learned it somehow). You might know the suns azimuth is 220 degrees, and the suns elevation is 60 degrees, and you might know a mirror is at position (-20, 42); now tell me, is that mirror correctly aligned with the sun? You don't know. Lets assume you have some very sophisticated heat measurements, and you know "with this heat level, there must be 2 mirrors correctly aligned". Now the question is, which two mirrors (out of 25 or more) are correctly aligned? One solution I considered was to approximate the correct "alignment function" using a neural network which would take the suns azimuth and elevation as input and output a large array with 2 values for each mirror which correspond to the 2 axis of each mirror. I'm not sure what the best training method is though.

    Read the article

  • Broken package after update: linux-headers, error brokencount >0

    - by escozul
    Ubuntu 12.04. After an update, I get a red warning icon in the system tray, warning about an error: broken count 0 Opening Update manager, I see that the broken package is linux-headers-3.2.0-33-generic-pae (new install) Specificaly I have my ubuntu on an AspireOne with 8gb internal storage. I tried apt-get clean as suggested in another question on this site, and tried reinstalling the package in Synaptic. I have tried to reboot but to no avail. I have also tried apt-get install --fix-broken and I get the following: sudo apt-get install --fix-broken [sudo] password for elina: Reading package lists... Done Building dependency tree Reading state information... Done Correcting dependencies... Done The following extra packages will be installed: linux-headers-3.2.0-33-generic-pae The following NEW packages will be installed: linux-headers-3.2.0-33-generic-pae 0 upgraded, 1 newly installed, 0 to remove and 38 not upgraded. 1 not fully installed or removed. Need to get 0 B/977 kB of archives. After this operation 11,3 MB of additional disk space will be used. Do you want to continue [Y/n]; y (Reading database ... 437051 files and directories currently installed.) Unpacking linux-headers-3.2.0-33-generic-pae (from .../linux-headers-3.2.0-33-generic-pae_3.2.0-33.52_i386.deb) ... dpkg: error processing /var/cache/apt/archives/linux-headers-3.2.0-33-generic-pae_3.2.0-33.52_i386.deb (--unpack): unable to create `/usr/src/linux-headers-3.2.0-33-generic-pae/include/config/usb/gspca/sonixb.h.dpkg-new' (while processing `./usr/src/linux-headers-3.2.0-33-generic-pae/include/config/usb/gspca/sonixb.h'): No space left on device No apport report written because the error message indicates a disk full error dpkg-deb: error: subprocess paste was killed by signal (Broken pipe) Errors were encountered while processing: /var/cache/apt/archives/linux-headers-3.2.0-33-generic-pae_3.2.0-33.52_i386.deb E: Sub-process /usr/bin/dpkg returned an error code (1) I've tried all suggestions I could find: sudo apt-get clean sudo apt-get autoclean sudo apt-get autoremove sudo apt-get update sudo apt-get upgrade sudo apt-get -f install sudo apt-get install --fix-broken Then I saw that on the error there was a mention about free space. So I did a df -h and the result was: Filesystem Size Used Avail Use% Mounted on /dev/sda1 7,0G 5,5G 1,1G 84% / udev 235M 4,0K 235M 1% /dev tmpfs 97M 816K 96M 1% /run none 5,0M 0 5,0M 0% /run/lock none 242M 352K 242M 1% /run/shm I see that on my root folder I have 1.1Gb free. The broken package is linux-headers-3.2.0-33-generic-pae_3.2.0-33.52_i386.deb which only takes up 11.3Mb on my hard drive. I'm soooo lost. I really hope there is something I'm missing here. I don't want to go about reformatting this bucket. It's really not worth the time. Any help for fixing this would be hot.

    Read the article

< Previous Page | 92 93 94 95 96 97 98 99 100 101 102 103  | Next Page >