Search Results

Search found 1783 results on 72 pages for 'computation theory'.

Page 15/72 | < Previous Page | 11 12 13 14 15 16 17 18 19 20 21 22  | Next Page >

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

  • Gracefully terminate a request based service on server

    - by Jatin
    In our web application, for each http-request there is a lot of computation that happens on back end. Output can vary from 10 sec - 1 Hour. In the mean time when it is computed, "Waiting.." is shown on the website for the respective user. But it so happens, that a user might cut down the service in between. So what all can be done on the back end so that the computation can be stopped in between to save resources? What different tactics can be applied here? And if better (instead of killing the thread directly), then a graceful termination policy should make wonders.

    Read the article

  • Result class dependency

    - by Stefano Borini
    I have an object containing the results of a computation. This computation is performed in a function which accepts an input object and returns the result object. The result object has a print method. This print method must print out the results, but in order to perform this operation I need the original input object. I cannot pass the input object at printing because it would violate the signature of the print function. One solution I am using right now is to have the result object hold a pointer to the original input object, but I don't like this dependency between the two, because the input object is mutable. How would you design for such case ?

    Read the article

  • Is there such a concept as "pseudo implementation" in software development?

    - by MachuPichu
    I'm looking for a label to describe the practice of using human-based computation methods or other means of "faking" an algorithm for the sake of getting a product or demo off the ground quickly without spending the time to develop an technical/scalable/analytical solution? Eg: using Amazon Turk to count the number of empty tables in a restaurant. I'm also looking to learn more about this subject, but not sure what to search for. Human-based computation is only one method, I'm interested in the general idea of pseudo-implementation. Any ideas, recommended reading? Thanks

    Read the article

  • LinearLayout - How to get text to be on the right of an icon?

    - by RED_
    Hi there, Bit of a newbie when it comes to android, only been working on it properly for a few days but even after all the searching I've done im stumped and nobody seems to know how to help me. I have this so far: http://img263.imageshack.us/i/sellscreen.jpg How can I move the text to be besides each icon rather than underneath it? Hoping the gallery won't be moved either. Here is the code i have: <?xml version="1.0" encoding="utf-8"?> <ScrollView xmlns:android="http://schemas.android.com/apk/res/android" android:id="@+id/scroller" android:layout_width="fill_parent" android:layout_height="fill_parent" android:fillViewport="true" > <LinearLayout xmlns:android="http://schemas.android.com/apk/res/android" android:orientation="vertical" android:layout_width="fill_parent" android:layout_height="fill_parent" > <Gallery xmlns:android="http://schemas.android.com/apk/res/android" android:id="@+id/gallery" android:layout_width="fill_parent" android:layout_height="wrap_content"/> <ImageView android:id="@+id/test_image" android:src="@drawable/icon" android:layout_width="wrap_content" android:layout_height="wrap_content"/> <TextView android:layout_width="fill_parent" android:layout_height="wrap_content" android:text="The offcial UK driving theory test application. Over 190 questions." /> <ImageView android:id="@+id/test_image" android:src="@drawable/icon" android:layout_width="wrap_content" android:layout_height="wrap_content"/> <TextView android:layout_width="fill_parent" android:layout_height="wrap_content" android:text="The offcial UK driving theory test application. Over 190 questions."/> <ImageView android:id="@+id/test_image" android:src="@drawable/icon" android:layout_width="wrap_content" android:layout_height="wrap_content"/> <TextView android:layout_width="fill_parent" android:layout_height="wrap_content" android:text="The offcial UK driving theory test application. Over 190 questions."/> <ImageView android:id="@+id/test_image" android:src="@drawable/icon" android:layout_width="wrap_content" android:layout_height="wrap_content"/> <TextView android:layout_width="fill_parent" android:layout_height="wrap_content" android:text="The offcial UK driving theory test application. Over 190 questions."/> <ImageView android:id="@+id/test_image" android:src="@drawable/icon" android:layout_width="wrap_content" android:layout_height="wrap_content" /> <TextView android:layout_width="fill_parent" android:layout_height="wrap_content" android:text="The offcial UK driving theory test application. Over 190 questions." /> </LinearLayout> </ScrollView> Top half of my code doesn't seem to be showing for some reason but it's just the opening of the linear layout. I will be forever grateful to anyone that can help, i've been racking my brains for days and getting nowhere. Really getting stressed out by it. Thanks in advance!!

    Read the article

  • Perl: Printing without a newline

    - by synapz
    I have a computationally expensive task in perl, and would like to inform the user that computation is ongoing by printing out a period after each portion of the computation is completed. Unfortunately, until I print a "\n", none of my periods are printed. How can I address this?

    Read the article

  • Do you think functional language is good for applications that have a lot of business rules but very

    - by StackUnderflow
    I am convinced that functional programming is an excellent choice when it comes to applications that require a lot of computation (data mining, AI, nlp etc). But is it wise to use functional programming for a typical enterprise application where there are a lot of business rules but not much in terms of computation? Please disregard the fact that there are very few people using functional programming and that it's kind of tough. Thanks

    Read the article

  • How do I unit test the methods in a method object?

    - by Sancho
    I've performed the "Replace Method with Method Object" refactoring described by Beck. Now, I have a class with a "run()" method and a bunch of member functions that decompose the computation into smaller units. How do I test those member functions? My first idea is that my unit tests be basically copies of the "run()" method (with different initializations), but with assertions between each call to the member functions to check the state of the computation. (I'm using Python and the unittest module.)

    Read the article

  • LaTeX printing only first two pages of a document

    - by Peter Flom
    I am working in LaTeX, and when I create a pdf file (using LaTeX button or pdfLaTeX button or using yap) the pdf has only the first two pages. No errors. It just stops. If I make the first page longer by adding text, it still stops at end of 2nd page. Any ideas? OK, responding to first comment, here is the code \documentclass{article} \title{Outline of Book} \author{Peter L. Flom} \begin{document} \maketitle \section*{Preface} \subsection*{Audience} \subsection*{What makes this book different?} \subsection*{Necessary background} \subsection*{How to read this book} \section{Introduction} \subsection{The purpose of logistic regression} \subsection{The need for logistic regression} \subsection{Types of logistic regression} \section{General issues in logistic regression} \subsection{Transforming independent and dependent variables} \subsection{Interactions} \subsection{Model selection} \subsection{Parameter estimates, confidence intervals, p values} \subsection{Summary and further reading} \section{Dichotomous logistic regression} \subsection{Introduction, theory, examples} \subsection{Exploratory plots and analysis} \subsection{Basic model fitting} \subsection{Advanced and special issues in model fitting} \subsection{Diagnostic and descriptive plots and analysis} \subsection{Traps and gotchas} \subsection{Power analysis} \subsection{Summary and further reading} \subsection{Exercises} \section{Ordinal logistic regression} \subsection{Introduction, theory, examples} \subsubsection{Introduction - what are ordinal variables?} \subsubsection{Theory of the model} \subsubsection{Examples for this chapter} \subsection{Exploratory plots and analysis} \subsection{Basic model fitting} \subsection{Advanced and special issues in model fitting} \subsection{Diagnostic and descriptive plots and analysis} \subsection{Traps and gotchas} \subsection{Power analysis} \subsection{Summary and further reading} \subsection{Exercises} \section{Multinomial logistic regression} \subsection{Introduction, theory, examples} \subsection{Exploratory plots and analysis} \subsection{Basic model fitting} \subsection{Advanced and special issues in model fitting} \subsection{Diagnostic and descriptive plots and analysis} \subsection{Traps and gotchas} \subsection{Power analysis} \subsection{Summary and further reading} \subsection{Exercises} \section{Choosing a model} \subsection{NOIR and its problems} \subsection{Linear vs. ordinal} \subsection{Ordinal vs. multinomial} \subsection{Summary and further reading} \subsection{Exercises} \section{Extensions and related models} \subsection{Other logistic models} \subsection{Multilevel models - PROC NLMIXED and GLIMMIX} \subsection{Loglinear models - PROC CATMOD} \section{Summary} \end{document} thanks Peter

    Read the article

  • Specification, modeling and programming are principially the same, right?

    - by Gabriel Šcerbák
    In formal specifications based on abstract algebraic types and equational theory you use formulas of equational theory to specify theory. System which will satisfy those constraints is called in formal logic a model. Modeling is process of creating a model, which abstracts of some aspects, which are unnecessary details for a specific case. So concrete system has to adhere to created model in observed aspects. Programming is a process of creating a program which will have specific behaviour - will perform specific algorithms - and programming languages through different paradigms enable us to think in a certain specific way, which abstracts of some details, usually machine specific ones. So could we be doing all those things at the same time, because they are principially the same? Is declarative programming the nearest attempt to do that? Could we use some sort f programming languages which will be good for programming as well as for modeling and specification?

    Read the article

  • F# Objects &ndash; Integration with the other .Net Languages &ndash; Part 2

    - by MarkPearl
    So in part one of my posting I covered the real basics of object creation. Today I will hopefully dig a little deeper… My expert F# book brings up an interesting point – properties in F# are just syntactic sugar for method calls. This makes sense… for instance assume I had the following object with the property exposed called Firstname. type Person(Firstname : string, Lastname : string) = member v.Firstname = Firstname I could extend the Firstname property with the following code and everything would be hunky dory… type Person(Firstname : string, Lastname : string) = member v.Firstname = Console.WriteLine("Side Effect") Firstname   All that this would do is each time I use the property Firstname, I would see the side effect printed to the screen saying “Side Effect”. Member methods have a very similar look & feel to properties, in fact the only difference really is that you declare that parameters are being passed in. type Person(Firstname : string, Lastname : string) = member v.FullName(middleName) = Firstname + " " + middleName + " " + Lastname   In the code above, FullName requires the parameter middleName, and if viewed from another project in C# would show as a method and not a property. Precomputation Optimizations Okay, so something that is obvious once you think of it but that poses an interesting side effect of mutable value holders is pre-computation of results. All it is, is a slight difference in code but can result in quite a huge saving in performance. Basically pre-computation means you would not need to compute a value every time a method is called – but could perform the computation at the creation of the object (I hope I have got it right). In a way I battle to differentiate this from lazy evaluation but I will show an example to explain the principle. Let me try and show an example to illustrate the principle… assume the following F# module namespace myNamespace open System module myMod = let Add val1 val2 = Console.WriteLine("Compute") val1 + val2 type MathPrecompute(val1 : int, val2 : int) = let precomputedsum = Add val1 val2 member v.Sum = precomputedsum type MathNormalCompute(val1 : int, val2 : int) = member v.Sum = Add val1 val2 Now assume you have a C# console app that makes use of the objects with code similar to the following… using System; using myNamespace; namespace CSharpTest { class Program { static void Main(string[] args) { Console.WriteLine("Constructing Objects"); var myObj1 = new myMod.MathNormalCompute(10, 11); var myObj2 = new myMod.MathPrecompute(10, 11); Console.WriteLine(""); Console.WriteLine("Normal Compute Sum..."); Console.WriteLine(myObj1.Sum); Console.WriteLine(myObj1.Sum); Console.WriteLine(myObj1.Sum); Console.WriteLine(""); Console.WriteLine("Pre Compute Sum..."); Console.WriteLine(myObj2.Sum); Console.WriteLine(myObj2.Sum); Console.WriteLine(myObj2.Sum); Console.ReadKey(); } } } The output when running the console application would be as follows…. You will notice with the normal compute object that the system would call the Add function every time the method was called. With the Precompute object it only called the compute method when the object was created. Subtle, but something that could lead to major performance benefits. So… this post has gone off in a slight tangent but still related to F# objects.

    Read the article

  • Program structure in long running data processing python script

    - by fmark
    For my current job I am writing some long-running (think hours to days) scripts that do CPU intensive data-processing. The program flow is very simple - it proceeds into the main loop, completes the main loop, saves output and terminates: The basic structure of my programs tends to be like so: <import statements> <constant declarations> <misc function declarations> def main(): for blah in blahs(): <lots of local variables> <lots of tightly coupled computation> for something in somethings(): <lots more local variables> <lots more computation> <etc., etc.> <save results> if __name__ == "__main__": main() This gets unmanageable quickly, so I want to refactor it into something more manageable. I want to make this more maintainable, without sacrificing execution speed. Each chuck of code relies on a large number of variables however, so refactoring parts of the computation out to functions would make parameters list grow out of hand very quickly. Should I put this sort of code into a python class, and change the local variables into class variables? It doesn't make a great deal of sense tp me conceptually to turn the program into a class, as the class would never be reused, and only one instance would ever be created per instance. What is the best practice structure for this kind of program? I am using python but the question is relatively language-agnostic, assuming a modern object-oriented language features.

    Read the article

  • NTFS Issues in Windows 7 and 2008 R2 - 'Is it a Bug?'

    - by renewieldraaijer
    I have been using the various versions of the Microsoft Windows product line since NT4 and I really thought I knew the ins and outs about the NTFS filesystem by now. There were always a few rules of thumb to understand what happens if you move data around. These rules were: "If you copy data, the copied data will inherit the permissions of the location it is being copied to. The same goes for moving data between disk partitions. Only when you move data within the same partition, the permissions are kept."  Recently I was asked to assist in troubleshooting some NTFS related issues. This forced me to have another good look at this theory. To my surprise I found out that this theory does not completely stand anymore. Apparently some things have changed since the release of Windows Vista / Windows 2008. Since the release of these Operating Systems, a move within the same disk partition results in the data inheriting the permissions of the location it is being copied into. A major change in the NTFS filesystem you would think!  Not quite! The above only counts when the move operation is being performed by using Windows Explorer. A move by using the 'move' command from within a cmd prompt for example, retains the NTFS permissions, just like before in Windows XP and older systems. Conclusion: The Windows Explorer is responsible for changing the ACL's of the moved data. This is a remarkable change, but if you follow this theory, the resulting ACL after a move operation is still predictable.  We could say that since Windows Vista and Windows 2008, a new rule set applies: "If you copy data, the copied data will inherit the permissions of the location it is being copied to. Same goes for moving data between disk partitions and within disk partitions. Only when you move data within the same partition by using something else than the Windows Explorer, the permissions are kept." The above behavior should be unchanged in Windows 7 / Windows 2008 R2, compared to Windows Vista / 2008. But somehow the NTFS permissions are not so predictable in Windows 7 and Windows 2008 R2. Moving data within the same disk partition the one time results in the permissions being kept and the next time results in inherited permissions from the destination location. I will try to demonstrate this in a few examples: Example 1 (Incorrect behavior): Consider two folders, 'Folder A' and 'Folder B' with the following permissions configured.                    Now we create the test file 'test file 1.txt' in 'Folder A' and afterwards move this file to 'Folder B' using Windows Explorer.                       According to the new theory, the file should inherit the permissions of 'Folder B' and therefore 'Group B' should appear in the ACL of 'test file 1.txt'. In the screenshot below the resulting permissions are displayed. The permissions from the originating location are kept, while the permissions of 'Folder B' should be inherited.                   Example 2 (Correct behavior): Again, consider the same two folders. This time we make a small modification to the ACL of 'Folder A'. We add 'Group C' to the ACL and again we create a file in 'Folder A' which we name 'test file 2.txt'.                    Next, we move 'test file 2.txt' to 'Folder B'.                       Again, we check the permissions of 'test file 2.txt' at the target location. We can now see that the permissions are inherited. This is what should be happening, and can be considered 'correct behavior' for Windows Vista / 2008 / 7 / 2008 R2. It remains uncertain why this behavior is so inconsistent. At this time, this is under investigation with Microsoft Support. The investigation has been going for the last two weeks and it is beginning to look like there is no rational reason for this, other than a bug in the Windows Explorer in Windows 7 and 2008 R2. As soon as there is any certainty on this, I will note it here in this blog.                   The examples above are harmless tests, by using my own laptop. If you would create the same set of folders and groups, and configure exactly the same permissions, you will see exactly the same behavior. Be sure to use Windows 7 or Windows 2008 R2.   Initially the problem arose at a customer site where move operations on data on the fileserver by users would result in unpredictable results. This resulted in the wrong set of people having àccess permissions on data that they should not have permissions to. Off course this is something we want to prevent at all costs.   I have also done several tests with move operations by using the move command in a cmd prompt. This way the behavior is always consistent. The inconsistent behavior is only exposed when using the Windows Explorer to initiate the move operation, and only when using Windows 7 or Windows 2008 R2 systems. It is evident that this behavior changes when the ACL of a folder has been changed, for example by adding an extra entry. The reason for this remains uncertain though. To be continued…. A dutch version of this post can be found at: http://blogs.platani.nl/?p=612

    Read the article

  • Transitioning from Oracle based CMS to MySQL based CMS

    - by KM01
    We're looking at a replacement for our CMS which runs on Oracle. The new CMSes that we've looked at can in theory run on Oracle, but most of the vendor's installs run off of MySQL vendor supports install of their CMS on MySQL, and a "theoretical" install on Oracle the vendor's dev shops use MySQL none of them develop/test against Oracle Our DBA team works exclusively with Oracle, and doesn't have the bandwidth to provide additional support for a highly available and performing MySQL setup. They could in theory go to training and get ramped up, but our time line is also short (surprise!). So ... I guess my question(s) are: If you've seen a situation like this, how have you dealt with it? What tipped the balance either way? What type of effort did it take? If you're to do it over, what would you do differently ... ? Thanks! KM

    Read the article

  • Data Modeling Resources

    - by Dejan Sarka
    You can find many different data modeling resources. It is impossible to list all of them. I selected only the most valuable ones for me, and, of course, the ones I contributed to. Books Chris J. Date: An Introduction to Database Systems – IMO a “must” to understand the relational model correctly. Terry Halpin, Tony Morgan: Information Modeling and Relational Databases – meet the object-role modeling leaders. Chris J. Date, Nikos Lorentzos and Hugh Darwen: Time and Relational Theory, Second Edition: Temporal Databases in the Relational Model and SQL – all theory needed to manage temporal data. Louis Davidson, Jessica M. Moss: Pro SQL Server 2012 Relational Database Design and Implementation – the best SQL Server focused data modeling book I know by two of my friends. Dejan Sarka, et al.: MCITP Self-Paced Training Kit (Exam 70-441): Designing Database Solutions by Using Microsoft® SQL Server™ 2005 – SQL Server 2005 data modeling training kit. Most of the text is still valid for SQL Server 2008, 2008 R2, 2012 and 2014. Itzik Ben-Gan, Lubor Kollar, Dejan Sarka, Steve Kass: Inside Microsoft SQL Server 2008 T-SQL Querying – Steve wrote a chapter with mathematical background, and I added a chapter with theoretical introduction to the relational model. Itzik Ben-Gan, Dejan Sarka, Roger Wolter, Greg Low, Ed Katibah, Isaac Kunen: Inside Microsoft SQL Server 2008 T-SQL Programming – I added three chapters with theoretical introduction and practical solutions for the user-defined data types, dynamic schema and temporal data. Dejan Sarka, Matija Lah, Grega Jerkic: Training Kit (Exam 70-463): Implementing a Data Warehouse with Microsoft SQL Server 2012 – my first two chapters are about data warehouse design and implementation. Courses Data Modeling Essentials – I wrote a 3-day course for SolidQ. If you are interested in this course, which I could also deliver in a shorter seminar way, you can contact your closes SolidQ subsidiary, or, of course, me directly on addresses [email protected] or [email protected]. This course could also complement the existing courseware portfolio of training providers, which are welcome to contact me as well. Logical and Physical Modeling for Analytical Applications – online course I wrote for Pluralsight. Working with Temporal data in SQL Server – my latest Pluralsight course, where besides theory and implementation I introduce many original ways how to optimize temporal queries. Forthcoming presentations SQL Bits 12, July 17th – 19th, Telford, UK – I have a full-day pre-conference seminar Advanced Data Modeling Topics there.

    Read the article

  • Relationship between TDD and Software Architecture/Design

    - by Christopher Francisco
    I'm new to TDD and have been reading the theory since applying it is more complicated than it sounds when you're learning by yourself. As far as I know, the objective is to write test cases for each requirement and run the test so it fails (to prevent a false positive). Afterward, you should write the minimum amount of code that can pass the test and move to the next one. That being said, is it true that you get a fast development, but what about the code itself? this theory makes me think you are not considering things like abstraction, delegation of responsibilities, design patterns, architecture and others since you're just writing "the minimum amount of code that can pass the test". I know I'm probably wrong because if this were true, we'd have a lot of crappy developers with poor software architecture and documentation so I'm asking for a guide here, what's the relationship between TDD and Software Architecture/Design?

    Read the article

  • Will high reputation in Programmers help to get a good job?

    - by Lorenzo
    In reference to this question, do you think that having a high reputation on this site will help to get a good job? Aside silly and humorous questions, on Programmers we can see a lot of high quality theory questions. I think that, if Stack Overflow will eventually evolve in "strictly programming related" (which usually is "strictly coding related"), the questions on Programmers will be much more interesting and meaningful ("Stack Overflow" = "I have this specific coding/implementation issue"; "Programmers" = "Best practices, team shaping, paradigms, CS theory"). So could high reputation on this site help (or at least be a good reference)? And then, more o less than Stack Overflow?

    Read the article

  • The Joy Of Hex

    - by Jim Giercyk
    While working on a mainframe integration project, it occurred to me that some basic computer concepts are slipping into obscurity. For example, just about anyone can tell you that a 64-bit processor is faster than a 32-bit processer. A grade school child could tell you that a computer “speaks” in ‘1’s and ‘0’s. Some people can even tell you that there are 8 bits in a byte. However, I have found that even the most seasoned developers often can’t explain the theory behind those statements. That is not a knock on programmers; in the age of IntelliSense, what reason do we have to work with data at the bit level? Many computer theory classes treat bit-level programming as a thing of the past, no longer necessary now that storage space is plentiful. The trouble with that mindset is that the world is full of legacy systems that run programs written in the 1970’s.  Today our jobs require us to extract data from those systems, regardless of the format, and that often involves low-level programming. Because it seems knowledge of the low-level concepts is waning in recent times, I thought a review would be in order.       CHARACTER: See Spot Run HEX: 53 65 65 20 53 70 6F 74 20 52 75 6E DECIMAL: 83 101 101 32 83 112 111 116 32 82 117 110 BINARY: 01010011 01100101 01100101 00100000 01010011 01110000 01101111 01110100 00100000 01010010 01110101 01101110 In this example, I have broken down the words “See Spot Run” to a level computers can understand – machine language.     CHARACTER:  The character level is what is rendered by the computer.  A “Character Set” or “Code Page” contains 256 characters, both printable and unprintable.  Each character represents 1 BYTE of data.  For example, the character string “See Spot Run” is 12 Bytes long, exclusive of the quotation marks.  Remember, a SPACE is an unprintable character, but it still requires a byte.  In the example I have used the default Windows character set, ASCII, which you can see here:  http://www.asciitable.com/ HEX:  Hex is short for hexadecimal, or Base 16.  Humans are comfortable thinking in base ten, perhaps because they have 10 fingers and 10 toes; fingers and toes are called digits, so it’s not much of a stretch.  Computers think in Base 16, with numeric values ranging from zero to fifteen, or 0 – F.  Each decimal place has a possible 16 values as opposed to a possible 10 values in base 10.  Therefore, the number 10 in Hex is equal to the number 16 in Decimal.  DECIMAL:  The Decimal conversion is strictly for us humans to use for calculations and conversions.  It is much easier for us humans to calculate that [30 – 10 = 20] in decimal than it is for us to calculate [1E – A = 14] in Hex.  In the old days, an error in a program could be found by determining the displacement from the entry point of a module.  Since those values were dumped from the computers head, they were in hex. A programmer needed to convert them to decimal, do the equation and convert back to hex.  This gets into relative and absolute addressing, a topic for another day.  BINARY:  Binary, or machine code, is where any value can be expressed in 1s and 0s.  It is really Base 2, because each decimal place can have a possibility of only 2 characters, a 1 or a 0.  In Binary, the number 10 is equal to the number 2 in decimal. Why only 1s and 0s?  Very simply, computers are made up of lots and lots of transistors which at any given moment can be ON ( 1 ) or OFF ( 0 ).  Each transistor is a bit, and the order that the transistors fire (or not fire) is what distinguishes one value from  another in the computers head (or CPU).  Consider 32 bit vs 64 bit processing…..a 64 bit processor has the capability to read 64 transistors at a time.  A 32 bit processor can only read half as many at a time, so in theory the 64 bit processor should be much faster.  There are many more factors involved in CPU performance, but that is the fundamental difference.    DECIMAL HEX BINARY 0 0 0000 1 1 0001 2 2 0010 3 3 0011 4 4 0100 5 5 0101 6 6 0110 7 7 0111 8 8 1000 9 9 1001 10 A 1010 11 B 1011 12 C 1100 13 D 1101 14 E 1110 15 F 1111   Remember that each character is a BYTE, there are 2 HEX characters in a byte (called nibbles) and 8 BITS in a byte.  I hope you enjoyed reading about the theory of data processing.  This is just a high-level explanation, and there is much more to be learned.  It is safe to say that, no matter how advanced our programming languages and visual studios become, they are nothing more than a way to interpret bits and bytes.  There is nothing like the joy of hex to get the mind racing.

    Read the article

  • CS subjects that an undergraduate must know.

    - by Karl
    In college, I was never interested in theory. I never read it. No matter how much I tried, I was unable to read stuff and not know what was actually happening practically. Like for example, in my course on automata theory, my professor told me everything possibly related to the mathematical aspect of it, but not even once did he mention where it would be used practically. This is just an example. I managed to pass my college and interned with a company also, where I did a project and thankfully they didn't bother about my grades, as they were above average. Now, I am interested in knowing what subjects should a CS student must absolutely and positively be aware of? Subjects that can have relevance in the industry. This is because I have some free time on my hands and it would help me better to have a good understanding of them. What are your suggestions? Like for one, algorithms is one subject.

    Read the article

  • Getting to math applications gradually

    - by den-javamaniac
    I'm currently getting a formal degree related to computation, in particular my current focus is numerical programming, scientific computing and machine learning. I'd love to apply that knowledge in game dev and expand it with statistics, probability theory, and graph theory (probably even linear algebra). The question is: which spheres of gamedev are filled with such math stuff, is it possible to advance in those without being a part of a group of people and how to get to it gradually? P.S.: I've got experience with commercial java dev and am getting my hands on C/C++ at the moment, however, I'm opened to go ahead and try Unity3D and etc.

    Read the article

  • Crappy school, what to do? [closed]

    - by zhenka
    I started programming fairly late. I am 24 years old and about to graduate from a local public university with a really poorly designed curriculum and teachers. Most of the work felt like busy work, and no matter how much I try, it all feels like a waste. I know what a good curriculum looks like. I know what books I should read, but alas it's not so in my university. There is no way at this point that I can catch up to those graduating from places like MIT. My question and this is a serious one: what do I do? Do I just postpone learning the theory I would have learned until later and focus on software engineering skills? How important is the theory in terms of landing a job in New York? Any particular things I should focus on to land a software engineer job? I am very motivated and I just wish someone would give me the time and a chance.

    Read the article

  • How can I repair a corrupt kernel if no others are installed?

    - by Willi Ballenthin
    I've been running Ubuntu 10.04 on my laptop for quite some time. When I went to boot it up this morning, BAM! kernel panic (which immediately lead to human panic) when loading the kernel. So I've spent much of the day troubleshooting, and my current theory is that the FS is fine, but that the kernel image may be corrupt. Let's go with this current theory for the sake of this question, as I am interested how it is done. How can I replace the kernel image if I have no bootable kernels? Can I boot to a 10.04 live CD, copy the the vmlinuz-2.6.3x... to the HD and go from there? Wouldn't I want to copy the initramfs as well, but configured for the desktop system? Can I generate this from the live CD?

    Read the article

< Previous Page | 11 12 13 14 15 16 17 18 19 20 21 22  | Next Page >