Search Results

Search found 14044 results on 562 pages for 'trusted root ca'.

Page 86/562 | < Previous Page | 82 83 84 85 86 87 88 89 90 91 92 93  | Next Page >

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

  • Cannot run logwatch due to Date::Manip issue

    - by Quintin Par
    I tried to run logwatch at follows [root@machine cron.daily]# ./0logwatch ERROR: Date::Manip unable to determine TimeZone. Execute the following command in a shell prompt: perldoc Date::Manip The section titled TIMEZONES describes valid TimeZones and where they can be defined. My date is as follows root@machine cron.daily]# date Thu Aug 23 06:25:21 GMT 2012 Now based on details in various forums I tried to fix this by setting /etc/timezone to “+0800” but it didn’t work My /etc/localtime points to /usr/share/zoneinfo/GMT and is managed by puppet How do I go about fixing this? I still want all my machines to be in GMT timezone. EDIT: Sadly, Both the changes are not working: [root@machine cron.daily]# cat /etc/TIMEZONE UTC Quanta’s [root@machine cron.daily]# cat ~/.bash_profile # .bash_profile # Get the aliases and functions if [ -f ~/.bashrc ]; then . ~/.bashrc fi # User specific environment and startup programs PATH=$PATH:$HOME/bin export TZ=GMT export PATH [root@machine cron.daily]# source ~/.bash_profile [root@machine cron.daily]# ./0logwatch ERROR: Date::Manip unable to determine TimeZone. Execute the following command in a shell prompt: perldoc Date::Manip The section titled TIMEZONES describes valid TimeZones and where they can be defined.

    Read the article

  • where is memory gone (no, not buffers or cache)

    - by Marki
    can anyone tell me where the memory is gone: (no, this time neither buffers nor cache) # free total used free shared buffers cached Mem: 3928200 3868560 59640 0 2888 92924 -/+ buffers/cache: 3772748 155452 Swap: 4192956 226352 3966604 top, sorted by memory, descending: top - 13:42:06 up 1 day, 3:47, 2 users, load average: 0.08, 0.12, 0.36 Tasks: 228 total, 1 running, 227 sleeping, 0 stopped, 0 zombie Cpu0 : 2.0%us, 4.0%sy, 0.0%ni, 90.1%id, 0.0%wa, 0.0%hi, 4.0%si, 0.0%st Cpu1 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id,100.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3928200k total, 3868020k used, 60180k free, 2896k buffers Swap: 4192956k total, 226048k used, 3966908k free, 82068k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3863 root 20 0 902m 199m 3296 S 7 5.2 99:08.77 ndsd 21906 root 20 0 138m 9076 2988 S 0 0.2 0:00.02 sfcbd 2332 root 20 0 126m 4660 1332 S 0 0.1 0:17.72 mono 4243 wwwrun 20 0 683m 4468 668 S 0 0.1 0:07.38 java 2994 root 20 0 202m 2288 1660 S 0 0.1 6:10.02 httpstkd 4338 root 20 0 184m 2240 1112 S 0 0.1 0:00.52 namcd 21898 root 20 0 32368 1832 1256 R 1 0.0 0:00.08 top In fact, some time ago oom kicked in and crashed the system (kernel panic), and I'm afraid we're again not far from that point.... UPDATE # cat /proc/meminfo MemTotal: 3928200 kB MemFree: 51336 kB Buffers: 2964 kB Cached: 72876 kB SwapCached: 29128 kB Active: 233440 kB Inactive: 88040 kB Active(anon): 188920 kB Inactive(anon): 56752 kB Active(file): 44520 kB Inactive(file): 31288 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 4192956 kB SwapFree: 3966824 kB Dirty: 32 kB Writeback: 0 kB AnonPages: 225112 kB Mapped: 11356 kB Shmem: 32 kB Slab: 1624080 kB SReclaimable: 13740 kB SUnreclaim: 1610340 kB KernelStack: 4176 kB PageTables: 10500 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 6157056 kB Committed_AS: 2397684 kB VmallocTotal: 34359738367 kB VmallocUsed: 441372 kB VmallocChunk: 34359246755 kB HardwareCorrupted: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 10240 kB DirectMap2M: 4184064 kB slabtop Active / Total Objects (% used) : 9041019 / 9207548 (98.2%) Active / Total Slabs (% used) : 401132 / 401156 (100.0%) Active / Total Caches (% used) : 91 / 159 (57.2%) Active / Total Size (% used) : 1491537.88K / 1519791.56K (98.1%) Minimum / Average / Maximum Object : 0.02K / 0.17K / 4096.00K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 4240470 4240319 99% 0.12K 141349 30 565396K pid 2245140 2219675 98% 0.25K 149676 15 598704K size-256 2238090 2210087 98% 0.12K 74603 30 298412K size-128 ...

    Read the article

  • How to restrict zone transfers to specific authorized servers only

    - by JonoB
    I recently failed a PCI compliance scan because of the following: This DNS server allows unrestricted zone transfers. Attackers may be able to use this information to gain knowledge on the structure of your networks to aid in device discovery prior to an actual attack. And the suggested solution is as follows: Reconfigure this DNS server to restrict zone transfers to specific authorized servers only. I am running a dedicated Linux Centos server. My understanding is that I have to edit the /etc/named.conf file, which I have done and the the relevant part is as follows: options { acl "trusted" { 127.0.0.1; xxx.xxx.xxx.001; //this is one of the server's ip's xxx.xxx.xxx.002; //this is another server's ip }; allow-recursion { trusted; }; allow-notify { trusted; }; allow-transfer { trusted; }; }; I then restarted the named service /etc/rc.d/init.d/named restart and requested a re-scan, which failed again for the same reason. Am I missing something obvious here?

    Read the article

  • NFSv3 + ACL: mask is gone on clients

    - by Jorge Suárez de Lis
    I'm sharing a NFS folder among a user group. The default umask on the clients is 0700, and this is a problem because newly created files won't be readable/writable by another users. So, I'm using ACLs to force the umask 0770 on the shared folder, and this works OK on the server, but not on the clients. server # getfacl /export/proyectos getfacl: Eliminando «/» inicial en nombres de ruta absolutos # file: export/proyectos # owner: root # group: root user::rwx group::rwx other::r-x default:user::rwx default:group::rwx default:mask::rwx default:other::r-x server # getfacl /export/proyectos/innovacion getfacl: Eliminando «/» inicial en nombres de ruta absolutos # file: export/proyectos/innovacion # owner: root # group: proyecto-innovacion # flags: ss- user::rwx group::rwx mask::rwx other::--- default:user::rwx default:group::rwx default:mask::rwx default:other::--- As you see, the default (and also a specific on the second directory) mask ACLs are being applied. I mount the whole share on the client: 172.16.54.56:/export/proyectos on /proyectos type nfs (rw,noatime,rsize=131072,wsize=131072,acregmin=10,acl,nfsvers=3,addr=172.16.54.56) But the mask and default:mask ACLs are gone. client $ getfacl /proyectos/ getfacl: Eliminando «/» inicial en nombres de ruta absolutos # file: proyectos/ # owner: root # group: root user::rwx group::rwx other::r-x default:user::rwx default:group::rwx default:other::r-x client $ getfacl /proyectos/innovacion getfacl: Eliminando «/» inicial en nombres de ruta absolutos # file: proyectos/innovacion # owner: root # group: proyecto-innovacion # flags: ss- user::rwx group::rwx other::--- default:user::rwx default:group::rwx default:other::--- It lacks the default:mask and mask ACLs, the only ones that I've setted. So the proposed solution to enforce umask won't work for me. Why is happening this?

    Read the article

  • Install self-signed certificate on local server (iis)

    - by ile
    On this page there are instructions on how to create self-signed cert (on apache) and how to install this certificate on server. I found this page (http://www.visualwin.com/SelfSSL/) with instructions on how to create self-signed certificate on windows (iis). I followed instructions and when I type https://myip/myapp (this leads to localhost because I set my router's port forwarding to go to localhost on my pc) this part works. From the first link, the most important part is this: What needs to be installed in IE is actually the Root CA Certificate. In the how-to above, the Root CA Certificate is called ca.crt. Copy this file to the server that is running QuickBooks. The following is for IE6: - Open IE - Tools - Internet Options - Content - Certificates - Trusted Root Certification Authorities Tab - Import, Next, Browse to 'ca.crt' - Next, Next, Finish, Close, OK The part that is missing in second link is that there is no instruction on how to get .crt file, so I tried to get it myself. What I did was following: I opened https://myip/myapp in Firefox and then "This Connection is Untrusted" screen appeared. Then I clicked on "Add Exception" and then below "Certificate Status" I clicked "View". Under the Details tab I clicked on Export and choosed Save as type: "X 509 Certificate (PEM)" and file was saved with .crt extension. Then I opened IE8 and followed above instructions. After opening https://myip/myapp in IE8 I always get warning screen. Does anyone knows what am I doing wrong? Thanks, Ile

    Read the article

  • Apache 2.4, Ubuntu 12.04 Forbidden Errors

    - by tubaguy50035
    I just installed Apache 2.4 today, and I'm having some issues getting vhost configuration to work correctly. Below is the vhost conf <VirtualHost *:80> ServerAdmin [email protected] DocumentRoot /hosting/Client/site.com/www ServerName site.com ServerAlias www.site.com <Directory "/hosting/Client/site.com/www"> Options +Indexes +FollowSymLinks Order allow,deny Allow from all </Directory> DirectoryIndex index.html </VirtualHost> There is an index.html file in /hosting/Client/site.com/www. When I go to the site, I receive a 403 forbidden error. The www-data group is the group on the www folder, which I've already given all permissions (r/w/x). I'm really at a loss as to why this is happening. Any thoughts? If I remove the vhost and go straight to the IP address, I get the default, "It works!" page. So I know that it's working. The error log says "client denied by server configuration". apache2ctl -S dump: nick@server:~$ apache2ctl -S /usr/sbin/apache2ctl: 87: ulimit: error setting limit (Operation not permitted) VirtualHost configuration: *:80 is a NameVirtualHost default server site.com (/etc/apache2/sites-enabled/site.com.conf:1) port 80 namevhost site.com (/etc/apache2/sites-enabled/site.com.conf:1) alias www.site.com port 80 namevhost site.com (/etc/apache2/sites-enabled/site.com.conf:1) alias www.site.com ServerRoot: "/etc/apache2" Main DocumentRoot: "/var/www" Main ErrorLog: "/var/log/apache2/error.log" Mutex watchdog-callback: using_defaults Mutex default: dir="/var/lock/apache2" mechanism=fcntl Mutex mpm-accept: using_defaults PidFile: "/var/run/apache2.pid" Define: DUMP_VHOSTS Define: DUMP_RUN_CFG Define: ENALBLE_USR_LIB_CGI_BIN User: name="www-data" id=33 not_used Group: name="www-data" id=33 not_used Ouput of namei -mo /hosting/Client/site/www/index.html f: /hosting/Client/site.com/www/index.html drwxr-xr-x root root / drwxr-xr-x root root hosting drwxr-xr-x root root Client drwxr-xr-x nick www-data site.com drwxr-xr-x nick www-data www -rw-rwxr-x nick www-data index.html

    Read the article

  • Why is this file hidden when you run ls?

    - by luckytaxi
    For a few weeks now I couldn't figure out why I wasn't able to delete this one particular file. As root I can, but my shell script runs as a different user. So I go run ls -la and it's not there. However, if I call it as a parameter, it shows up! Sure enough, the owner is root, hence I'm not able to delete. Notice, 6535 is missing ... [root@server]# ls -la 653* -rw-rw-r-- 1 svn svn 24002 Mar 26 01:00 653 -rw-rw-r-- 1 svn svn 7114 Mar 26 01:01 6530 -rw-rw-r-- 1 svn svn 8653 Mar 26 01:01 6531 -rw-rw-r-- 1 svn svn 6836 Mar 26 01:01 6532 -rw-rw-r-- 1 svn svn 3308 Mar 26 01:01 6533 -rw-rw-r-- 1 svn svn 3918 Mar 26 01:01 6534 -rw-rw-r-- 1 svn svn 3237 Mar 26 01:01 6536 -rw-rw-r-- 1 svn svn 3195 Mar 26 01:01 6537 -rw-rw-r-- 1 svn svn 27725 Mar 26 01:01 6538 -rw-rw-r-- 1 svn svn 263473 Mar 26 01:01 6539 Now it shows up if you call it directly. [root@server]# ls -la 6535 -rw-rw-r-- 1 root root 3486 Mar 26 01:01 6535

    Read the article

  • OpenVPN connected but not internet access on the client

    - by Stefan
    I've setup OpenVPN following this tutorial, and everything works fine except that I don't have an internet connection on the client while connected to VPN. http://www.howtoforge.com/internet-and-lan-over-vpn-using-openvpn-linux-server-windows-linux-clients-works-for-gaming-and-through-firewalls My VPS server config is as follows (Ubuntu): dev tun proto udp port 1194 ca /etc/openvpn/easy-rsa/keys/ca.crt cert /etc/openvpn/easy-rsa/keys/server.crt key /etc/openvpn/easy-rsa/keys/server.key dh /etc/openvpn/easy-rsa/keys/dh1024.pem user nobody group nogroup server 10.8.0.0 255.255.255.0 persist-key persist-tun status /var/log/openvpn-status.log verb 3 client-to-client push "redirect-gateway local def1" #set the dns servers push "dhcp-option DNS 8.8.8.8" push "dhcp-option DNS 8.8.4.4" log-append /var/log/openvpn comp-lzo plugin /usr/lib/openvpn/openvpn-auth-pam.so common-auth My client config is as follows (Windows 7): dev tun client proto udp remote XXX.XXX.XXX.XXX 1194 resolv-retry infinite nobind persist-key persist-tun ca ca.crt cert stefan.crt key stefan.key comp-lzo verb 3 auth-user-pass redirect-gateway local def1 I've turned off the firewall on the server for testing purposes (it doesn't help), and tried both wired and wireless connecting on the client. I've tried many Google results... but nothing seems to help. Can you help me? Thanks so far...

    Read the article

  • Basic OpenVPN setup

    - by WalterJ89
    I am attempting to connect 2 win7 (x64+ x32) computers (there will be 4 in total) using OpenVPN. Right now they are on the same network but the intention is to be able to access the client remotely regardless of its location. The Problem I am having is I am unable to ping or tracert between the two computers. They seem to be on different subnets even though I have the mask set to 255.255.255.0. The server ends up as 10.8.0.1 255.255.255.252 and the client 10.8.0.6 255.255.255.252. And a third ends up as 10.8.0.10. I don't know if this a Windows 7 problem or something I have wrong in my config. Its a very simple set up, I'm not connecting two LANs. this is the server config (removed all the extra lines because it was too ugly) port 1194 proto udp dev tun ca keys/ca.crt cert keys/server.crt key keys/server.key # This file should be kept secret dh keys/dh1024.pem server 10.8.0.0 255.255.255.0 ifconfig-pool-persist ipp.txt client-to-client duplicate-cn keepalive 10 120 comp-lzo persist-key persist-tun status openvpn-status.log verb 6 this is the client config client dev tun proto udp remote thisdomainis.random.com 1194 resolv-retry infinite nobind persist-key persist-tun ca keys/ca.crt cert keys/client.crt key keys/client.key ns-cert-type server comp-lzo verb 6 Is there anything I missed in this? keys are all correct and the vpn's connect fine, its just the subnet or route issue. Thank You

    Read the article

  • Refresh file access time under Linux / Discard disk read cache

    - by calandoa
    I am making use of the access time to analyse some build process, but it is not working the way I want: the access time is updated the first time I read the file, then it stays the same for a long while, or until the next reboot. For instance: $ ll -u some_file -rw-r--r-- 1 root root 1.3M 2010-04-07 10:03 some_file $ grep abcdef some_file $ ll -u some_file -rw-r--r-- 1 root root 1.3M 2010-04-07 11:24 some_file # The access time is updated # waiting a few minutes... $ grep abcdef some_file $ ll -u some_file -rw-r--r-- 1 root root 1.3M 2010-04-07 11:24 some_file # The access time has not been updated :( I suppose that the file is buffered by Linux in the free memory, the only this copy is accessed the subsequent times for speed reasons. A solution would be to discard the buffers in memory. After searching some forums, I found: sync echo 1 > /proc/sys/vm/drop_caches echo 2 > /proc/sys/vm/drop_caches echo 3 > /proc/sys/vm/drop_caches But it is not working, it seems that it only sync up the write buffers, not the read ones. May be it is due to some custom kernel configuration on my distro (fedora 9)? Or I am missing something here? Is there a way to achieve this access time refresh? Note also that I do not want to simulate some writes on my entire file tree. Because I am using some makefile based build system, this will cause the entire project to be build again.

    Read the article

  • Exchange 2007 OWA returns blank page with url xxxxx&reason=0

    - by Dayton Brown
    Hi All: I've just run into an issue with my exchange OWA. It returns a blank page with the url string https://www.xxxxxxxx/&reason=0. Nothing in the logs gives me any good reasons. Here's what I've done so far; 1) reinstall Exchange roll-up 7. 2) recreate virtual directories. 3) reboot. (this was mostly a shot in the dark, but what the hell) Exchange via rpc/https is still working great. Anyone run into this before? EDIT Here is the last snippet from the OWASetupLog. doesn't look like anything blew up. [09:45:36] ******************************************* [09:45:36] * UpdateOwa.ps1: 5/27/2009 9:45:36 AM [09:45:40] Updating OWA on server HOMER [09:45:40] Finding OWA install path on the filesystem [09:45:40] Updating OWA to version 8.1.375.2 [09:45:40] Copying files from 'C:\Program Files\Microsoft\Exchange Server\ClientAccess\owa\Current' to 'C:\Program Files\Microsoft\Exchange Server\ClientAccess\owa\8.1.375.2' [09:45:41] Getting all Exchange 2007 OWA virtual directories [09:45:42] Found 1 OWA virtual directories. [09:45:42] Updating OWA virtual directories [09:45:42] Processing virtual directory with metabase path 'IIS://HOMER.DG.LOCAL/W3SVC/1/ROOT/owa'. [09:45:42] Metabase entry 'IIS://HOMER.DG.LOCAL/W3SVC/1/ROOT/owa/8.1.375.2' exists. Removing it. [09:45:42] Creating metabase entry IIS://HOMER.DG.LOCAL/W3SVC/1/ROOT/owa/8.1.375.2. [09:45:42] Configuring metabase entry 'IIS://HOMER.DG.LOCAL/W3SVC/1/ROOT/owa/8.1.375.2'. [09:45:43] Saving changes to 'IIS://HOMER.DG.LOCAL/W3SVC/1/ROOT/owa/8.1.375.2' [09:45:43] Saving changes to 'IIS://HOMER.DG.LOCAL/W3SVC/1/ROOT/owa'

    Read the article

  • CentOS OpenVZ fail to boot after kernel update

    - by SkechBoy
    After upgrading to latest OpenVZ kernel CentOS server won't boot. When i try go boot the latest kernel server is stuck at this point: (note that images are taken from virtual kvm) http://i.stack.imgur.com/4lusz.jpg Then i try to start the server on some old kernels and than i get this error message: kernel panic - not syncing - attempted to kill init better shown on this image: http://i.stack.imgur.com/2SReF.jpg Here is some useful information fdisk -l WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util fdisk doesn't support GPT. Use GNU Parted. Disk /dev/sda: 2995.7 GB, 2995739688960 bytes 255 heads, 63 sectors/track, 364211 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0004c4e4 Device Boot Start End Blocks Id System /dev/sda1 1 523 4199044+ 82 Linux swap / Solaris /dev/sda2 524 785 2104515 83 Linux /dev/sda3 786 261869 2097157230 83 Linux /dev/sda4 261870 364211 822062115 83 Linux /etc/fstab proc /proc proc defaults 0 0 none /dev/pts devpts gid=5,mode=620 0 0 /dev/sda1 none swap sw 0 0 /dev/sda2 /boot ext3 defaults 0 0 /dev/sda3 / ext3 defaults 0 0 /dev/sda4 /home ext3 defaults 0 0 and grub config file: title OpenVZ (2.6.18-274.18.1.el5.028stab098.1) root (hd0,1) kernel /vmlinuz-2.6.18-274.18.1.el5.028stab098.1 ro root=/dev/sda3 vga=0x317 selinux=0 initrd /initrd-2.6.18-274.18.1.el5.028stab098.1.img title OpenVZ (2.6.18-274.7.1.el5.028stab095.1) root (hd0,1) kernel /vmlinuz-2.6.18-274.7.1.el5.028stab095.1 ro root=/dev/sda3 vga=0x317 selinux=0 initrd /initrd-2.6.18-274.7.1.el5.028stab095.1.img title OpenVZ (2.6.18-194.8.1.el5.028stab070.4) root (hd0,1) kernel /vmlinuz-2.6.18-194.8.1.el5.028stab070.4 ro root=/dev/sda3 vga=0x317 initrd /initrd-2.6.18-194.8.1.el5.028stab070.4.img Any help is greatly appreciated Thanks.

    Read the article

  • Securing SSH/SFTP and best practices on security

    - by MultiformeIngegno
    I'm on a fresh VPS with Ubuntu Server 12.04. I wanted to ask you the good practices to apply to enhance security over a stock Ubuntu-server. This is what I did up to now: I added Google Authenticator to SSH, then I created a new user (whom I'll use instead of 'root' for SSH & SFTP access) which I added to my /etc/sudoers list below 'root', so now it's: # User privilege specification root ALL=(ALL:ALL) ALL new_user ALL=(ALL:ALL) ALL Then I edited sshd_config and set PermitRootLogin to 'no'. Then restarted the ssh service. Is this ok? There are a few things I'd like to ask you though: 1) What's the sense of adding a new (sudoer) user whilst the root user still exist (ok it can't access with root privilege but it's still there..)? 2) System files are owned by 'root'.. I want to use my new_user to access via SFTP but with it I can't edit those files!! Should I mass-CHMOD 'em so that new_user has write perms too? What's the good practice on this? Thanks in advance, I hope you'll tell me if I did something wrong and/or other ways to secure the system. :)

    Read the article

  • Have only read access to Samba

    - by Tahir Malik
    Hi I've been struggling a lot with Samba on Centos 5.5 lately. I develop in Windows 7 and send files through scp (ant task), but it's to slow and wanted to setup thoroughly samba. After installing and following some guides I've done the following: Disable firewall (iptables) Disable SelLinux (didn't do that at the start, but didn't help either) setup my smbusers file to map my windows user to root (root = "Tahir Malik" -- works) added a current user mitco to the sambapassdb with the command smbpasswd -a mitco , because the windows user had only read access So both the users have read access to my share. Here is my smb.conf snippit: [global] workgroup = MITCO server string = Samba Server Version %v netbios name = centos ; interfaces = lo eth0 192.168.12.2/24 192.168.13.2/24 ; hosts allow = 127. 192.168.12. 192.168.13. [alf4] comment = Alfresco 4 path = /opt read only = no valid users = mitco, mitco force user = root force group = root admin users = mitco , mitco writeable = yes ; browseable = yes What also maybe important is that the /opt is only writable by root, but that shouldn't matter because I use the force user and group or admin users. The log file : [2012/09/29 07:43:44, 0] smbd/server.c:main(958) smbd version 3.0.33-3.39.el5_8 started. Copyright Andrew Tridgell and the Samba Team 1992-2008 [2012/09/29 07:43:59, 1] smbd/service.c:make_connection_snum(1085) mitco-tahir (192.168.13.1) connect to service alf4 initially as user root (uid=0, gid=0) (pid 5228)

    Read the article

  • Linux commands shows different results

    - by ClydeFrog
    I'm really having a hard time to process these results on my Ubuntu server. I have a major problem with my JBoss server where I get FileNotFoundExceptions along with "No space left on device" errors. And I thought "maybe I'm out of disk space", and used df command to figure out how much I have left: root@ubuntu1:/# df -h Filsystem Storlek Anvnt Tillg Anv% Monterat på /dev/mapper/ubuntu1-root 36G 13G 21G 38% / none 2,0G 192K 2,0G 1% /dev none 2,0G 0 2,0G 0% /dev/shm none 2,0G 64K 2,0G 1% /var/run none 2,0G 0 2,0G 0% /var/lock /dev/sda1 228M 23M 193M 11% /boot /dev/mapper/vgdata-lvdata 79G 9,2G 66G 13% /data And as you can see, I have plenty of space left. And I also checked if I'm out of i-nodes: root@ubuntu1:/# df -i Filsystem Inoder IAnv IFria IAnv% Monterat på /dev/mapper/ubuntu1-root 2346512 61992 2284520 3% / none 505380 773 504607 1% /dev none 507383 1 507382 1% /dev/shm none 507383 30 507353 1% /var/run none 507383 2 507381 1% /var/lock /dev/sda1 124496 230 124266 1% /boot /dev/mapper/vgdata-lvdata 10486784 233945 10252839 3% /data But then i used du: root@ubuntu1:/# du -s -h /* 7,5M /bin 23M /boot 19G /data 192K /dev 11G /eniro 5,3M /etc 112K /home 0 /initrd.img 183M /lib 0 /lib64 16K /lost+found 12K /media 4,0K /mnt 4,0K /opt du: kan inte komma åt "/proc/20452/task/20452/fd/3": Filen eller katalogen finns inte du: kan inte komma åt "/proc/20452/task/20452/fdinfo/3": Filen eller katalogen finns inte du: kan inte komma åt "/proc/20452/fd/3": Filen eller katalogen finns inte du: kan inte komma åt "/proc/20452/fdinfo/3": Filen eller katalogen finns inte 0 /proc 18M /root 8,2M /sbin 4,0K /selinux 8,0K /srv 0 /sys 40K /tmp 691M /usr 1,2G /var 0 /vmlinuz Notice that /data and /eniro are 30G combined! How is it possible? Do I have a memory leak somewhere? Or is it something else? ----- EDIT 1 ----- Ok, I figured out that /data has its own mount so it's not possible to combine /data and /eniro because they aren't on the same mount. But how come it says 9,2G on the first command when it says 19G on the third on directory /data?

    Read the article

  • Linux authentication via ADS -- allowing only specific groups in PAM

    - by Kenaniah
    I'm taking the samba / winbind / PAM route to authenticate users on our linux servers from our Active Directory domain. Everything works, but I want to limit what AD groups are allowed to authenticate. Winbind / PAM currently allows any enabled user account in the active directory, and pam_winbind.so doesn't seem to heed the require_membership_of=MYDOMAIN\\mygroup parameter. Doesn't matter if I set it in the /etc/pam.d/system-auth or /etc/security/pam_winbind.conf files. How can I force winbind to honor the require_membership_of setting? Using CentOS 5.5 with up-to-date packages. Update: turns out that PAM always allows root to pass through auth, by virtue of the fact that it's root. So as long as the account exists, root will pass auth. Any other account is subjected to the auth constraints. Update 2: require_membership_of seems to be working, except for when the requesting user has the root uid. In that case, the login succeeds regardless of the require_membership_of setting. This is not an issue for any other account. How can I configure PAM to force the require_membership_of check even when the current user is root? Current PAM config is below: auth sufficient pam_winbind.so auth sufficient pam_unix.so nullok try_first_pass auth requisite pam_succeed_if.so uid >= 500 quiet auth required pam_deny.so account sufficient pam_winbind.so account sufficient pam_localuser.so account required pam_unix.so broken_shadow password ..... (excluded for brevity) session required pam_winbind.so session required pam_mkhomedir.so skel=/etc/skel umask=0077 session required pam_limits.so session required pam_unix.so require_memebership_of is currently set in the /etc/security/pam_winbind.conf file, and is working (except for the root case outlined above).

    Read the article

  • DNS issue on Fedora 12? wget wordpress.org fails where wget www.google.com works

    - by Tom Auger
    I'm administering a Fedora 12 box, but am quite new to networking specifics. Recently one of our WordPress apps hosted on our server has stopped being able to perform its auto-update or auto-download of plugins. Investigating further, I have tried the following: $ wget wordpress.org --2010-12-17 11:26:50-- http://wordpress.org/ Resolving wordpress.org... failed: Temporary failure in name resolution. wget: unable to resolve host address âwordpress.orgâ Whereas: $ wget www.google.com --2010-12-17 11:27:26-- http://www.google.com/ Resolving www.google.com... 74.125.226.82, 74.125.226.84, 74.125.226.80, ... Connecting to www.google.com|74.125.226.82|:80... connected. HTTP request sent, awaiting response... 302 Found Location: http://www.google.ca/ [following] --2010-12-17 11:27:26-- http://www.google.ca/ Resolving www.google.ca... 173.194.32.104 Connecting to www.google.ca|173.194.32.104|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: âindex.html.4â [ <=> ] 9,079 --.-K/s in 0.02s 2010-12-17 11:27:26 (462 KB/s) - âindex.html.4â Interestingly: $ ping wordpress.org PING wordpress.org (72.233.56.138) 56(84) bytes of data. 64 bytes from wordpress.org (72.233.56.138): icmp_seq=1 ttl=50 time=81.5 ms 64 bytes from wordpress.org (72.233.56.138): icmp_seq=2 ttl=50 time=67.3 ms ^C --- wordpress.org ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1783ms rtt min/avg/max/mdev = 67.361/74.448/81.536/7.092 ms and $ nslookup wordpress.org Server: 192.168.2.1 Address: 192.168.2.1#53 Non-authoritative answer: Name: wordpress.org Address: 72.233.56.138 Name: wordpress.org Address: 72.233.56.139 nscd has been stopped and flushed. iptables appear to be clean. At this point I have exhausted my limited abilities to diagnose the issue. Can anyone suggest a resolution path?

    Read the article

  • lighttpd with multiple IPs, each with a UCC certificate and many hostnames

    - by Dave
    I'd like to get lighttpd working with UCC certificates, but I can't seem to figure out the correct syntax. Essentially, for each IP address, I have one UCC certificate and a bunch of hostnames. $SERVER["socket"] == "10.0.0.1:443" { ssl.engine = "enable" ssl.ca-file = "/etc/ssl/certs/the.ca.cert.pem" ssl.pemfile = "/etc/ssl/private/websitegroup1.com.pem" $HTTP["host"] =~ "mywebsite.com" { server.document-root = /var/www/mywebsite.com/htdocs" } The above code works fine for one hostname, but as soon as I try to set up another hostname (note the same SSL cert): $SERVER["socket"] == "10.0.0.1:443" { ssl.engine = "enable" ssl.ca-file = "/etc/ssl/certs/the.ca.cert.pem" ssl.pemfile = "/etc/ssl/private/websitegroup1.com.pem" $HTTP["host"] =~ "anotherwebsite.com" { server.document-root = /var/www/anotherwebsite.com/htdocs" } ...I get this error: Duplicate config variable in conditional 6 global/SERVERsocket==10.0.0.1:443: ssl.engine Is there any way I can put a conditional so that only if ssl.engine is not already enabled, enable it? Or do I have to put all my $HTTP["host"]s inside the same $SERVER["socket"] (which will make config file management more difficult for me) or is there some entirely different way to do it? This has to be repeated for multiple IPs too (so I'll have a bunch of SERVER["socket"] == 10.0.0.2:443" etc), each with one UCC cert and many hostnames. Am I going about this the wrong way entirely? My goal is to conserve IP addresses when I have many websites that are related and can share an SSL certificate, but still need their own SSL-accessible version from the appropriate hostname (instead of a single secure.mywebsite.com).

    Read the article

  • How to find process that's using 100% of CPU

    - by Gabriel
    As i'm looking at htop and top i see that my processor usage is 100% allways. But i can not see any process that is using that much CPU. Htop shows me only 1-2 processes that use around 5% cpu time. Is there a way to find the processes that use that much cpu time? Here is the output of ps -eo pcpu,pid,user,args | sort -r -k1 | less %CPU PID USER COMMAND 0.8 20413 root jsvc.exec -user tomcat -cp ./bootstrap.jar -Djava.endorsed.dirs=../common/endorsed -outfile ../logs/catalina.out -errfile ../logs/catalina.err -verbose org.apache.catalina.startup.Bootstrap -security 0.3 631 mysql /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=mysql --pid-file=/var/lib/mysql/mysql.pid --skip-external-locking 0.2 3380 root /usr/local/apache/bin/httpd -k restart -DSSL 0.2 24698 root tailwatchd 0.2 22472 root /usr/local/jdk/bin/java -Djava.util.logging.config.file=/usr/local/jakarta/tomcat/conf/logging.properties -Dfile.encoding=UTF8 -XX:MaxPermSize=128m -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.endorsed.dirs=/usr/local/jakarta/tomcat/common/endorsed -classpath /usr/local/jakarta/tomcat/bin/bootstrap.jar -Dcatalina.base=/usr/local/jakarta/tomcat -Dcatalina.home=/usr/local/jakarta/tomcat -Djava.io.tmpdir=/usr/local/jakarta/tomcat/temp org.apache.catalina.startup.Bootstrap start 0.1 32095 root cpanellogd - processing bandwidth 0.0 9733 root sleep 1m

    Read the article

  • su not giving proper message for restricted LDAP groups

    - by user1743881
    I have configured PAM authentication on Linux box to restrict particular group only to login. I have enabled pam and ldap through authconfig and modified access.conf like below, [root@test root]# tail -1 /etc/security/access.conf - : ALL EXCEPT root test-auth : ALL Also modified sudoers file, to get su for this group <code> [root@test ~]# tail -1 /etc/sudoers %test-auth ALL=/bin/su</code> Now, only this ldap group members can login to system. However when from any of this authorized user, I tried for su, it asks for password and then though I enter correct password it gives message like Incorrect password and login failed. /var/log/secure shows that user is not having permission to get the access, but then it should print message like Access denied.The way it prints for console login. My functionality is working but its no giving proper messages. Could anyone please help on this. My /etc/pam.d/su file, [root@test root]# cat /etc/pam.d/su #%PAM-1.0 auth sufficient pam_rootok.so # Uncomment the following line to implicitly trust users in the "wheel" group. #auth sufficient pam_wheel.so trust use_uid # Uncomment the following line to require a user to be in the "wheel" group. #auth required pam_wheel.so use_uid auth include system-auth account sufficient pam_succeed_if.so uid = 0 use_uid quiet account include system-auth password include system-auth session include system-auth session optional pam_xauth.so

    Read the article

  • big cpu load on vmware server / linux

    - by dezfafara
    Hi, I currently using a server 2.x hosting 4 virtual machines on a linux system Today, on my physical server, I saw an enormous load average: this is the "top" of the server, illustrating my 4 virtual guests. top - 11:02:02 up 194 days, 23:09, 5 users, load average: 18.78, 12.05, 13.55 Tasks: 113 total, 4 running, 109 sleeping, 0 stopped, 0 zombie Cpu0 : 71.6%us, 19.0%sy, 0.0%ni, 8.8%id, 0.0%wa, 0.3%hi, 0.3%si, 0.0%st Cpu1 : 74.3%us, 10.4%sy, 0.0%ni, 15.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 72.5%us, 17.6%sy, 0.0%ni, 9.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 79.5%us, 4.6%sy, 0.0%ni, 16.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8178884k total, 8129980k used, 48904k free, 134904k buffers Swap: 10490436k total, 148k used, 10490288k free, 6129728k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7312 root 6 -10 1149m 921m 559m R 97 11.5 107947:09 vmware-vmx 6995 root 6 -10 779m 687m 317m R 92 8.6 107374:31 vmware-vmx 6693 root 6 -10 880m 659m 409m S 85 8.3 76947:33 vmware-vmx 12937 root 6 -10 960m 719m 523m S 75 9.0 67219:49 vmware-vmx In bold are the cpu usage for my 4 virtuals guests These guests are running on a linux system, and the appropriate process are usually 5% - 15% of cpu I don't understang why , since a few days I have this big problem. This is the "top" on a virtual guest which is at 95% of cpu load top - 11:23:15 up 194 days, 23:13, 4 users, load average: 0.25, 0.47, 0.59 Tasks: 92 total, 2 running, 90 sleeping, 0 stopped, 0 zombie Cpu(s): 1.4%us, 7.7%sy, 0.0%ni, 90.5%id, 0.5%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 382296k total, 369732k used, 12564k free, 145156k buffers Swap: 979924k total, 13956k used, 965968k free, 86988k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3691 root 20 0 23948 1148 960 S 13.0 0.3 15339:23 vmware-guestd 3840 root 20 0 19880 584 512 S 7.7 0.2 1729:17 hald-addon-stor This virtual guest state is ok ... If anyone has any ideas .. Thanks

    Read the article

  • Node.js Build failed: -> task failed (error#2)?

    - by Richard Hedges
    I'm trying to install Node.js on my CentOS server. I run ./configure and it runs perfectly fine. I then run the 'make' command and it produces the following: [5/38] libv8.a: deps/v8/SConstruct - out/Release/libv8.a /usr/local/bin/python "/root/node/tools/scons/scons.py" -j 1 -C "/root/node/out/Release/" -Y "/root/node/deps/v8" visibility=default mode=release arch=ia32 toolchain=gcc library=static snapshot=on scons: Reading SConscript files ... ImportError: No module named bz2: File "/root/node/deps/v8/SConstruct", line 37: import js2c, utils File "/root/node/deps/v8/tools/js2c.py", line 36: import bz2 Waf: Leaving directory `/root/node/out' Build failed: - task failed (err #2): {task: libv8.a SConstruct - libv8.a} make: * [program] Error 1 I've done some searching on Google but I can't seem to find anything to help. Most of what I've found is for Cygwin anyway, and I'm on CentOS 4.9. Like I said, the ./configure went through perfectly fine with no errors, so there's nothing there that I can see. EDIT I've got a little further. Now I just need to upgrade G++ to version 4 (or higher). I tried yum update gcc but no luck, so I tried yum install gcc44, which resulted in no luck either. Has anyone got any ideas as to how I can update G++?

    Read the article

  • grub2 error : out of disk

    - by Nostradamnit
    Hi everyone, I recently install ArchBang on a machine with Ubuntu and XP. I ran update-grub from Ubuntu and it found the new install and created an entry. However, when I try to boot it, I get: error: out of disk error: you need to load kernel first I've tried several things, including adding a new entry in 40_custom, but nothing changes. Here are the entries I have: default found by update-grub ### BEGIN /etc/grub.d/30_os-prober ### menuentry "ArchBang Linux (on /dev/sda4)" { insmod part_msdos insmod ext2 set root='(hd0,msdos4)' search --no-floppy --fs-uuid --set 75f96b44-3a8f-4727-9959-d669b9244f2a linux /boot/vmlinuz26 root=/dev/sda4 rootfstype=ext4 ro xorg=vesa quiet nomodeset swapon initrd /boot/kernel26.img } menuentry "ArchBang Linux Fallback (on /dev/sda4)" { insmod part_msdos insmod ext2 set root='(hd0,msdos4)' search --no-floppy --fs-uuid --set 75f96b44-3a8f-4727-9959-d669b9244f2a linux /boot/vmlinuz26 root=/dev/sda4 rootfstype=ext4 ro xorg=vesa quiet nomodeset swapon initrd /boot/kernel26-fallback.img } ### END /etc/grub.d/30_os-prober ### custom entry in 40_custom based on various ideas found on the internets menuentry "ArchBang Linux (on /dev/sda4)" { insmod part_msdos insmod ext2 set root='(hd0,msdos4)' search --no-floppy --fs-uuid --set 75f96b44-3a8f-4727-9959-d669b9244f2a linux /boot/vmlinuz26 root=/dev/disk/by-uuid/75f96b44-3a8f-4727-9959-d669b9244f2a rootfstype=ext4 ro xorg=vesa quiet nomodeset swapon initrd /boot/kernel26.img } I think the problem has something to do with the sda4 not being mounted at boot-time... Thanks in advance for you help, Sam

    Read the article

  • SSLVerifyClient optional with location-based exceptions

    - by Ian Dunn
    I have a site that requires authentication in order to access certain directories, but not others. (The "directories" are really just rewrite rules that all pass through /index.php) In order to authenticate, the user can either login with a standard username/password, or submit a client-side x509 certificate. So, Apache's vhost conf looks something like this: SSLCACertificateFile /etc/pki/CA/certs/redacted-ca.crt SSLOptions +ExportCertData +StdEnvVars SSLVerifyClient none SSLVerifyDepth 1 <LocationMatch "/(foo-one|foo-two|foo-three)"> SSLVerifyClient optional </LocationMatch> That works fine, but then large file uploads fail because of the behavior documented in bug 12355. The workaround for that is to set SSLVerifyClient require (or optional) as the default, so now the conf looks like this SSLCACertificateFile /etc/pki/CA/certs/redacted-ca.crt SSLOptions +ExportCertData +StdEnvVars SSLVerifyClient optional SSLVerifyDepth 1 <LocationMatch "/(bar-one|bar-two|bar-three)"> SSLVerifyClient none </LocationMatch> That fixes the upload problem, but the SSLVerifyClient none doesn't work for bar-one, bar-two, etc. Those directories are still prompted to present a certificate. Additionally, I also need the root URL to accessible without the user being prompted for a certificate. I'm afraid that will cancel out the workaround, though.

    Read the article

< Previous Page | 82 83 84 85 86 87 88 89 90 91 92 93  | Next Page >