vectors - Page 22 - Developer IT

3D coordinate of 2D point given camera and view plane

- by Myx

I wish to generate rays from the camera through the viewing plane. In order to do this, I need my camera position ("eye"), the up, right, and towards vectors (where towards is the vector from the camera in the direction of the object that the camera is looking at) and P, the point on the viewing plane. Once I have these, the ray that's generated is: ray = camera_eye + t*(P-camera_eye); where t is the distance along the ray (assume t = 1 for now). My question is, how do I obtain the 3D coordinates of point P given that it is located at position (i,j) on the viewing plane? Assume that the upper left and lower right corners of the viewing plane are given. NOTE: The viewing plane is not actually a plane in the sense that it doesn't extend infinitely in all directions. Rather, one may think of this plane as a widthxheight image. In the x direction, the range is 0--width and in the y direction the range is 0--height. I wish to find the 3D coordinate of the (i,j)th element, 0

Read the article

How to make a multi vector in C++

- by Bob Dealio

I was wondering why can't I have a multi vectors in C ++ /please take a look at this example, it's not working though. there are only two parts to the code, foo function to manipulate the vector and the main function to echo them. typedef vector< vector<double> > MyVec; MyVec foo() { MyVec v; for (int index=0; index < 2; index ++) { for (int j=0; j<5; j++) { v[index][j]; } } return v; } int main () { MyVec z = foo(); for (int i = 0; i < z.size(); i++) { cout << z[i][1]; } return 0; }

Read the article

How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article

Triangle Line-Segment Intersection - detecting near misses

- by Will

A ray is a very poor approximation of a player! I think approximating a player with a sphere traveling a straight line each game tick will solve my problems of the player intersecting edges of scenery because their line segment missed it yet their own model is not infinitely thin... I have a 3D triangle and a line segment. I have the normal triangle-line-segment intersection code which I admit I have only a woolly grasp of. To model movement and compute collisions of the player I have to determine if a line passes within sphere-radius of a triangle. But I can find no convenient line near-miss intersection code! Here's the classic triangle intersection ### commented ### code with my starting assumptions: function triangle_ray_intersection(a,b,c,ray_origin,ray_dir,ray_radius) { // http://softsurfer.com/Archive/algorithm_0105/algorithm_0105.htm#intersect_RayTriangle%28%29 // get triangle edge vectors and plane normal var u = vec3_sub(b,a); var v = vec3_sub(c,a); var n = vec3_cross(u,v); if(n[0]==0 && n[1]==0 && n[2]==0) return null; // triangle is degenerate var w0 = vec3_sub(ray_origin,a); var j = vec3_dot(n,ray_dir); if(Math.abs(j) < 0.00000001) { //### if parallel, might still pass within ray_radius of it return null; // parallel, disjoint or on plane } var i = -vec3_dot(n,w0); // get intersect point of ray with triangle plane var k = i / j; if(k < 0.0) return null; // ray goes away from triangle //### as its a line segment, k > 1+ray_radius means no intersect var hit = vec3_add(ray_origin,vec3_scale(ray_dir,k)); // intersect point of ray and plane // is I inside T? //### here I'm a bit lost; this is presumably computing barycentric coordinates? var uu = vec3_dot(u,u); var uv = vec3_dot(u,v); var vv = vec3_dot(v,v); var w = vec3_sub(hit,a); var wu = vec3_dot(w,u); var wv = vec3_dot(w,v); var D = uv * uv - uu * vv; var s = (uv * wv - vv * wu) / D; //### therefore, compute if its within ray_radius scaled to the 0..1 of barycentric coordinates? if(s<0.0 || s>1.0) return null; // I is outside T var t = (uv * wu - uu * wv) / D; if(t<0.0 || (s+t)>1.0) return null; // I is outside T //### finally, if it passses a barycentric test it might still be too far //### to a point; must check that its distance from a corner is within ray_radius too if more than one barycentric coord is >1 //### so we have rounded corners... return [hit,n]; // I is in T } Given the distance between the point of plane intersection and each corner, I ought to be able to determine distance at world scale of how far beyond the edge - beyond 1.0 in barycentric coordinates for each axis - that point is... At this point my head explodes! Is this the right track? What's the actual code? UPDATE: you can earn 100 pts on SO if you answer this question there...! How can you determine if a line segment passes within some distance of a triangle?

Read the article

Surviving MATLAB and R as a Hardcore Programmer

- by dsimcha

I love programming in languages that seem geared towards hardcore programmers. (My favorites are Python and D.) MATLAB is geared towards engineers and R is geared towards statisticians, and it seems like these languages were designed by people who aren't hardcore programmers and don't think like hardcore programmers. I always find them somewhat awkward to use, and to some extent I can't put my finger on why. Here are some issues I have managed to identify: (Both): The extreme emphasis on vectors and matrices to the extent that there are no true primitives. (Both): The difficulty of basic string manipulation. (Both): Lack of or awkwardness in support for basic data structures like hash tables and "real", i.e. type-parametric and nestable, arrays. (Both): They're really, really slow even by interpreted language standards, unless you bend over backwards to vectorize your code. (Both): They seem to not be designed to interact with the outside world. For example, both are fairly bulky programs that take a while to launch and seem to not be designed to make simple text filter programs easy to write. Furthermore, the lack of good string processing makes file I/O in anything but very standard forms near impossible. (Both): Object orientation seems to have a very bolted-on feel. Yes, you can do it, but it doesn't feel much more idiomatic than OO in C. (Both): No obvious, simple way to get a reference type. No pointers or class references. For example, I have no idea how you roll your own linked list in either of these languages. (MATLAB): You can't put multiple top level functions in a single file, encouraging very long functions and cut-and-paste coding. (MATLAB): Integers apparently don't exist as a first class type. (R): The basic builtin data structures seem way too high level and poorly documented, and never seem to do quite what I expect given my experience with similar but lower level data structures. (R): The documentation is spread all over the place and virtually impossible to browse or search. Even D, which is often knocked for bad documentation and is still fairly alpha-ish, is substantially better as far as I can tell. (R): At least as far as I'm aware, there's no good IDE for it. Again, even D, a fairly alpha-ish language with a small community, does better. In general, I also feel like MATLAB and R could be easily replaced by plain old libraries in more general-purpose langauges, if sufficiently comprehensive libraries existed. This is especially true in newer general purpose languages that include lots of features for library writers. Why do R and MATLAB seem so weird to me? Are there any other major issues that you've noticed that may make these languages come off as strange to hardcore programmers? When their use is necessary, what are some good survival tips? Edit: I'm seeing one issue from some of the answers I've gotten. I have a strong personal preference, when I analyze data, to have one script that incorporates the whole pipeline. This implies that a general purpose language needs to be used. I hate having to write a script to "clean up" the data and spit it out, then another to read it back in a completely different environment, etc. I find the friction of using MATLAB/R for some of my work and a completely different language with a completely different address space and way of thinking for the rest to be a huge source of friction. Furthermore, I know there are glue layers that exist, but they always seem to be horribly complicated and a source of friction.

Read the article

Circle-Line Collision Detection Problem

- by jazzdawg

I am currently developing a breakout clone and I have hit a roadblock in getting collision detection between a ball (circle) and a brick (convex polygon) working correctly. I am using a Circle-Line collision detection test where each line represents and edge on the convex polygon brick. For the majority of the time the Circle-Line test works properly and the points of collision are resolved correctly. Collision detection working correctly. However, occasionally my collision detection code returns false due to a negative discriminant when the ball is actually intersecting the brick. Collision detection failing. I am aware of the inefficiency with this method and I am using axis aligned bounding boxes to cut down on the number of bricks tested. My main concern is if there are any mathematical bugs in my code below. /* * from and to are points at the start and end of the convex polygons edge. * This function is called for every edge in the convex polygon until a * collision is detected. */ bool circleLineCollision(Vec2f from, Vec2f to) { Vec2f lFrom, lTo, lLine; Vec2f line, normal; Vec2f intersectPt1, intersectPt2; float a, b, c, disc, sqrt_disc, u, v, nn, vn; bool one = false, two = false; // set line vectors lFrom = from - ball.circle.centre; // localised lTo = to - ball.circle.centre; // localised lLine = lFrom - lTo; // localised line = from - to; // calculate a, b & c values a = lLine.dot(lLine); b = 2 * (lLine.dot(lFrom)); c = (lFrom.dot(lFrom)) - (ball.circle.radius * ball.circle.radius); // discriminant disc = (b * b) - (4 * a * c); if (disc < 0.0f) { // no intersections return false; } else if (disc == 0.0f) { // one intersection u = -b / (2 * a); intersectPt1 = from + (lLine.scale(u)); one = pointOnLine(intersectPt1, from, to); if (!one) return false; return true; } else { // two intersections sqrt_disc = sqrt(disc); u = (-b + sqrt_disc) / (2 * a); v = (-b - sqrt_disc) / (2 * a); intersectPt1 = from + (lLine.scale(u)); intersectPt2 = from + (lLine.scale(v)); one = pointOnLine(intersectPt1, from, to); two = pointOnLine(intersectPt2, from, to); if (!one && !two) return false; return true; } } bool pointOnLine(Vec2f p, Vec2f from, Vec2f to) { if (p.x >= min(from.x, to.x) && p.x <= max(from.x, to.x) && p.y >= min(from.y, to.y) && p.y <= max(from.y, to.y)) return true; return false; }

Read the article

Arcball Problems with UDK

- by opdude

I'm trying to re-create an arcball example from a Nehe, where an object can be rotated in a more realistic way while floating in the air (in my game the object is attached to the player at a distance like for example the Physics Gun) however I'm having trouble getting this to work with UDK. I have created an LGArcBall which follows the example from Nehe and I've compared outputs from this with the example code. I think where my problem lies is what I do to the Quaternion that is returned from the LGArcBall. Currently I am taking the returned Quaternion converting it to a rotation matrix. Getting the product of the last rotation (set when the object is first clicked) and then returning that into a Rotator and setting that to the objects rotation. If you could point me in the right direction that would be great, my code can be found below. class LGArcBall extends Object; var Quat StartRotation; var Vector StartVector; var float AdjustWidth, AdjustHeight, Epsilon; function SetBounds(float NewWidth, float NewHeight) { AdjustWidth = 1.0f / ((NewWidth - 1.0f) * 0.5f); AdjustHeight = 1.0f / ((NewHeight - 1.0f) * 0.5f); } function StartDrag(Vector2D startPoint, Quat rotation) { StartVector = MapToSphere(startPoint); } function Quat Update(Vector2D currentPoint) { local Vector currentVector, perp; local Quat newRot; //Map the new point to the sphere currentVector = MapToSphere(currentPoint); //Compute the vector perpendicular to the start and current perp = startVector cross currentVector; //Make sure our length is larger than Epsilon if (VSize(perp) > Epsilon) { //Return the perpendicular vector as the transform newRot.X = perp.X; newRot.Y = perp.Y; newRot.Z = perp.Z; //In the quaternion values, w is cosine (theta / 2), where //theta is the rotation angle newRot.W = startVector dot currentVector; } else { //The two vectors coincide, so return an identity transform newRot.X = 0.0f; newRot.Y = 0.0f; newRot.Z = 0.0f; newRot.W = 0.0f; } return newRot; } function Vector MapToSphere(Vector2D point) { local float x, y, length, norm; local Vector result; //Transform the mouse coords to [-1..1] //and inverse the Y coord x = (point.X * AdjustWidth) - 1.0f; y = 1.0f - (point.Y * AdjustHeight); length = (x * x) + (y * y); //If the point is mapped outside of the sphere //( length > radius squared) if (length > 1.0f) { norm = 1.0f / Sqrt(length); //Return the "normalized" vector, a point on the sphere result.X = x * norm; result.Y = y * norm; result.Z = 0.0f; } else //It's inside of the sphere { //Return a vector to the point mapped inside the sphere //sqrt(radius squared - length) result.X = x; result.Y = y; result.Z = Sqrt(1.0f - length); } return result; } DefaultProperties { Epsilon = 0.000001f } I'm then attempting to rotate that object when the mouse is dragged, with the following update code in my PlayerController. //Get Mouse Position MousePosition.X = LGMouseInterfacePlayerInput(PlayerInput).MousePosition.X; MousePosition.Y = LGMouseInterfacePlayerInput(PlayerInput).MousePosition.Y; newQuat = ArcBall.Update(MousePosition); rotMatrix = MakeRotationMatrix(QuatToRotator(newQuat)); rotMatrix = rotMatrix * LastRot; LGMoveableActor(movingPawn.CurrentUseableObject).SetPhysics(EPhysics.PHYS_Rotating); LGMoveableActor(movingPawn.CurrentUseableObject).SetRotation(MatrixGetRotator(rotMatrix));

Read the article

HLSL 5 interpolation issues

- by metredigm

I'm having issues with the depth components of my shadowmapping shaders. The shadow map rendering shader is fine, and works very well. The world rendering shader is more problematic. The only value which seems to definitely be off is the pixel's position from the light's perspective, which I pass in parallel to the position. struct Pixel { float4 position : SV_Position; float4 light_pos : TEXCOORD2; float3 normal : NORMAL; float2 texcoord : TEXCOORD; }; The reason that I used the semantic 'TEXCOORD2' on the light's pixel position is because I believe that the problem lies with Direct3D's interpolation of values between shaders, and I started trying random semantics and also forcing linear and noperspective interpolations. In the world rendering shader, I observed in the pixel shader that the Z value of light_pos was always extremely close to, but less than the W value. This resulted in a depth result of 0.999 or similar for every pixel. Here is the vertex shader code : struct Vertex { float3 position : POSITION; float3 normal : NORMAL; float2 texcoord : TEXCOORD; }; struct Pixel { float4 position : SV_Position; float4 light_pos : TEXCOORD2; float3 normal : NORMAL; float2 texcoord : TEXCOORD; }; cbuffer Camera : register (b0) { matrix world; matrix view; matrix projection; }; cbuffer Light : register (b1) { matrix light_world; matrix light_view; matrix light_projection; }; Pixel RenderVertexShader(Vertex input) { Pixel output; output.position = mul(float4(input.position, 1.0f), world); output.position = mul(output.position, view); output.position = mul(output.position, projection); output.world_pos = mul(float4(input.position, 1.0f), world); output.world_pos = mul(output.world_pos, light_view); output.world_pos = mul(output.world_pos, light_projection); output.texcoord = input.texcoord; output.normal = input.normal; return output; } I suspect interpolation to be the culprit, as I used the camera matrices in place of the light matrices in the vertex shader, and had the same problem. The problem is evident as both of the same vectors were passed to a pixel from the VS, but only one of them showed a change in the PS. I have already thoroughly debugged the matrices' validity, the cbuffers' validity, and the multiplicative validity. I'm very stumped and have been trying to solve this for quite some time. Misc info : The light projection matrix and the camera projection matrix are the same, generated from D3DXMatrixPerspectiveFovLH(), with an FOV of 60.0f * 3.141f / 180.0f, a near clipping plane of 0.1f, and a far clipping plane of 1000.0f. Any ideas on what is happening? (This is a repost from my question on Stack Overflow)

Read the article

Is my class structure good enough?

- by Rivten

So I wanted to try out this challenge on reddit which is mostly about how you structure your data the best you can. I decided to challenge my C++ skills. Here's how I planned this. First, there's the Game class. It deals with time and is the only class main has access to. A game has a Forest. For now, this class does not have a lot of things, only a size and a Factory. Will be put in better use when it will come to SDL-stuff I guess A Factory is the thing that deals with the Game Objects (a.k.a. Trees, Lumberjack and Bears). It has a vector of all GameObjects and a queue of Events which will be managed at the end of one month. A GameObject is an abstract class which can be updated and which can notify the Event Listener The EventListener is a class which handles all the Events of a simulation. It can recieve events from a Game Object and notify the Factory if needed, the latter will manage correctly the event. So, the Tree, Lumberjack and Bear classes all inherits from GameObject. And Sapling and Elder Tree inherits from Tree. Finally, an Event is defined by an event_type enumeration (LUMBERJACK_MAWED, SAPPLING_EVOLUTION, ...) and an event_protagonists union (a GameObject or a pair of GameObject (who killed who ?)). I was quite happy at first with this because it seems quite logic and flexible. But I ended up questionning this structure. Here's why : I dislike the fact that a GameObject need to know about the Factory. Indeed, when a Bear moves somewhere, it needs to know if there's a Lumberjack ! Or it is the Factory which handles places and objects. It would be great if a GameObject could only interact with the EventListener... or maybe it's not that much of a big deal. Wouldn't it be better if I separate the Factory in three vectors ? One for each kind of GameObject. The idea would be to optimize research. If I'm looking do delete a dead lumberjack, I would only have to look in one shorter vector rather than a very long vector. Another problem arises when I want to know if there is any particular object in a given case because I have to look for all the gameObjects and see if they are at the given case. I would tend to think that the other idea would be to use a matrix but then the issue would be that I would have empty cases (and therefore unused space). I don't really know if Sapling and Elder Tree should inherit from Tree. Indeed, a Sapling is a Tree but what about its evolution ? Should I just delete the sapling and say to the factory to create a new Tree at the exact same place ? It doesn't seem natural to me to do so. How could I improve this ? Is the design of an Event quite good ? I've never used unions before in C++ but I didn't have any other ideas about what to use. Well, I hope I have been clear enough. Thank you for taking the time to help me !

Read the article

Is the guideline: don't open email attachments or execute downloads or run plug-ins (Flash, Java) from untrusted sites enough to avert infection?

- by therobyouknow

I'd like to know if the following is enough to avert malware as I feel that the press and other advisory resources aren't always precisely clear on all the methods as to how PCs get infected. To my mind, the key step to getting infected is a conscious choice by the user to run an executable attachment from an email or download, but also viewing content that requires a plug-in (Flash, Java or something else). This conscious step breaks down into the following possibilities: don't open email attachments: certainly agree with this. But lets try to be clear: email comes in 2 parts -the text and the attachment. Just reading the email should not be risky, right? But opening (i.e. running) email attachments IS risky (malware can be present in the attachment) don't execute downloads (e.g. from sites linked from in suspect emails or otherwise): again certainly agree with this (malware can be present in the executable). Usually the user has to voluntary click to download, or at least click to run the executable. Question: has there ever been a case where a user has visited a site and a download has completed on its own and run on its own? don't run content requiring plug-ins: certainly agree: malware can be present in the executable. I vaguely recall cases with Flash but know of the Java-based vulnerabilities much better. Now, is the above enough? Note that I'm much more cautious than this. What I'm concerned about is that the media is not always very clear about how the malware infection occurs. They talk of "booby-trapped sites", "browser attacks" - HOW exactly? I'd presume the other threat would be malevolent use of Javascript to make an executable run on the user's machine. Would I be right and are there details I can read up on about this. Generally I like Javascript as a developer, please note. An accepted answer would fill in any holes I've missed here so we have a complete general view of what the threats are (even though the actual specific details of new threats vary, but the general vectors are known).

Read the article

NET Math Libraries

- by JoshReuben

NET Mathematical Libraries .NET Builder for Matlab The MathWorks Inc. - http://www.mathworks.com/products/netbuilder/ MATLAB Builder NE generates MATLAB based .NET and COM components royalty-free deployment creates the components by encrypting MATLAB functions and generating either a .NET or COM wrapper around them. .NET/Link for Mathematica www.wolfram.com a product that 2-way integrates Mathematica and Microsoft's .NET platform call .NET from Mathematica - use arbitrary .NET types directly from the Mathematica language. use and control the Mathematica kernel from a .NET program. turns Mathematica into a scripting shell to leverage the computational services of Mathematica. write custom front ends for Mathematica or use Mathematica as a computational engine for another program comes with full source code. Leverages MathLink - a Wolfram Research's protocol for sending data and commands back and forth between Mathematica and other programs. .NET/Link abstracts the low-level details of the MathLink C API. Extreme Optimization http://www.extremeoptimization.com/ a collection of general-purpose mathematical and statistical classes built for the.NET framework. It combines a math library, a vector and matrix library, and a statistics library in one package. download the trial of version 4.0 to try it out. Multi-core ready - Full support for Task Parallel Library features including cancellation. Broad base of algorithms covering a wide range of numerical techniques, including: linear algebra (BLAS and LAPACK routines), numerical analysis (integration and differentiation), equation solvers. Mathematics leverages parallelism using .NET 4.0's Task Parallel Library. Basic math: Complex numbers, 'special functions' like Gamma and Bessel functions, numerical differentiation. Solving equations: Solve equations in one variable, or solve systems of linear or nonlinear equations. Curve fitting: Linear and nonlinear curve fitting, cubic splines, polynomials, orthogonal polynomials. Optimization: find the minimum or maximum of a function in one or more variables, linear programming and mixed integer programming. Numerical integration: Compute integrals over finite or infinite intervals, over 2D and higher dimensional regions. Integrate systems of ordinary differential equations (ODE's). Fast Fourier Transforms: 1D and 2D FFT's using managed or fast native code (32 and 64 bit) BigInteger, BigRational, and BigFloat: Perform operations with arbitrary precision. Vector and Matrix Library Real and complex vectors and matrices. Single and double precision for elements. Structured matrix types: including triangular, symmetrical and band matrices. Sparse matrices. Matrix factorizations: LU decomposition, QR decomposition, singular value decomposition, Cholesky decomposition, eigenvalue decomposition. Portability and performance: Calculations can be done in 100% managed code, or in hand-optimized processor-specific native code (32 and 64 bit). Statistics Data manipulation: Sort and filter data, process missing values, remove outliers, etc. Supports .NET data binding. Statistical Models: Simple, multiple, nonlinear, logistic, Poisson regression. Generalized Linear Models. One and two-way ANOVA. Hypothesis Tests: 12 14 hypothesis tests, including the z-test, t-test, F-test, runs test, and more advanced tests, such as the Anderson-Darling test for normality, one and two-sample Kolmogorov-Smirnov test, and Levene's test for homogeneity of variances. Multivariate Statistics: K-means cluster analysis, hierarchical cluster analysis, principal component analysis (PCA), multivariate probability distributions. Statistical Distributions: 25 29 continuous and discrete statistical distributions, including uniform, Poisson, normal, lognormal, Weibull and Gumbel (extreme value) distributions. Random numbers: Random variates from any distribution, 4 high-quality random number generators, low discrepancy sequences, shufflers. New in version 4.0 (November, 2010) Support for .NET Framework Version 4.0 and Visual Studio 2010 TPL Parallellized – multicore ready sparse linear program solver - can solve problems with more than 1 million variables. Mixed integer linear programming using a branch and bound algorithm. special functions: hypergeometric, Riemann zeta, elliptic integrals, Frensel functions, Dawson's integral. Full set of window functions for FFT's. Product Price Update subscription Single Developer License $999 $399 Team License (3 developers) $1999 $799 Department License (8 developers) $3999 $1599 Site License (Unlimited developers in one physical location) $7999 $3199 NMath http://www.centerspace.net .NET math and statistics libraries matrix and vector classes random number generators Fast Fourier Transforms (FFTs) numerical integration linear programming linear regression curve and surface fitting optimization hypothesis tests analysis of variance (ANOVA) probability distributions principal component analysis cluster analysis built on the Intel Math Kernel Library (MKL), which contains highly-optimized, extensively-threaded versions of BLAS (Basic Linear Algebra Subroutines) and LAPACK (Linear Algebra PACKage). Product Price Update subscription Single Developer License $1295 $388 Team License (5 developers) $5180 $1554 DotNumerics http://www.dotnumerics.com/NumericalLibraries/Default.aspx free DotNumerics is a website dedicated to numerical computing for .NET that includes a C# Numerical Library for .NET containing algorithms for Linear Algebra, Differential Equations and Optimization problems. The Linear Algebra library includes CSLapack, CSBlas and CSEispack, ports from Fortran to C# of LAPACK, BLAS and EISPACK, respectively. Linear Algebra (CSLapack, CSBlas and CSEispack). Systems of linear equations, eigenvalue problems, least-squares solutions of linear systems and singular value problems. Differential Equations. Initial-value problem for nonstiff and stiff ordinary differential equations ODEs (explicit Runge-Kutta, implicit Runge-Kutta, Gear's BDF and Adams-Moulton). Optimization. Unconstrained and bounded constrained optimization of multivariate functions (L-BFGS-B, Truncated Newton and Simplex methods). Math.NET Numerics http://numerics.mathdotnet.com/ free an open source numerical library - includes special functions, linear algebra, probability models, random numbers, interpolation, integral transforms. A merger of dnAnalytics with Math.NET Iridium in addition to a purely managed implementation will also support native hardware optimization. constants & special functions complex type support real and complex, dense and sparse linear algebra (with LU, QR, eigenvalues, ... decompositions) non-uniform probability distributions, multivariate distributions, sample generation alternative uniform random number generators descriptive statistics, including order statistics various interpolation methods, including barycentric approaches and splines numerical function integration (quadrature) routines integral transforms, like fourier transform (FFT) with arbitrary lengths support, and hartley spectral-space aware sequence manipulation (signal processing) combinatorics, polynomials, quaternions, basic number theory. parallelized where appropriate, to leverage multi-core and multi-processor systems fully managed or (if available) using native libraries (Intel MKL, ACMS, CUDA, FFTW) provides a native facade for F# developers

Read the article

SPARC M7 Chip - 32 cores - Mind Blowing performance

- by Angelo-Oracle

The M7 Chip Oracle just announced its Next Generation Processor at the HotChips HC26 conference. As the Tech Lead in our Systems Division's Partner group, I had a front row seat to the extraordinary price performance advantage of Oracle current T5 and M6 based systems. Partner after partner tested these systems and were impressed with it performance. Just read some of the quotes to see what our partner has been saying about our hardware. We just announced our next generation processor, the M7. This has 32 cores (up from 16-cores in T5 and 12-cores in M6). With 20 nm technology this is our most advanced processor. The processor has more cores than anything else in the industry today. After the Sun acquisition Oracle has released 5 processors in 4 years and this is the 6th. The S4 core The M7 is built using the foundation of the S4 core. This is the next generation core technology. Like its predecessor, the S4 has 8 dynamic threads. It increases the frequency while maintaining the Pipeline depth. Each core has its own fine grain power estimator that keeps the core within its power envelop in 250 nano-sec granularity. Each core also includes Software in Silicon features for Application Acceleration Support. Each core includes features to improve Application Data Integrity, with almost no performance loss. The core also allows using part of the Virtual Address to store meta-data. User-Level Synchronization Instructions are also part of the S4 core. Each core has 16 KB Instruction and 16 KB Data L1 cache. The Core Clusters The cores on the M7 chip are organized in sets of 4-core clusters. The core clusters share L2 cache. All four cores in the complex share 256 KB of 4 way set associative L2 Instruction Cache, with over 1/2 TB/s of throughput. Two cores share 256 KB of 8 way set associative L2 Data Cache, with over 1/2 TB/s of throughput. With this innovative Core Cluster architecture, the M7 doubles core execution bandwidth. to maximize per-thread performance. The Chip Each M7 chip has 8 sets of these core-clusters. The chip has 64 MB on-chip L3 cache. This L3 caches is shared among all the cores and is partitioned into 8 x 8 MB chunks. Each chunk is 8-way set associative cache. The aggregate bandwidth for the L3 cache on the chip is over 1.6TB/s. Each chip has 4 DDR4 memory controllers and can support upto 16 DDR4 DIMMs, allowing for 2 TB of RAM/chip. The chip also includes 4 internal links of PCIe Gen3 I/O controllers. Each chip has 7 coherence links, allowing for 8 of these chips to be connected together gluelessly. Also 32 of these chips can be connected in an SMP configuration. A potential system with 32 chips will have 1024 cores and 8192 threads and 64 TB of RAM. Software in Silicon The M7 chip has many built in Application Accelerators in Silicon. These features will be exposed to our Software partners using the SPARC Accelerator Program. The M7 has built-in logic to decompress data at the speed of memory access. This means that applications can directly work on compressed data in memory increasing the data access rates. The VA Masking feature allows the use of part of the virtual address to store meta-data. Realtime Application Data Integrity The Realtime Application Data Integrity feature helps applications safeguard against invalid, stale memory reference and buffer overflows. The first 4-bits if the Pointer can be used to store a version number and this version number is also maintained in the memory & cache lines. When a pointer accesses memory the hardware checks to make sure the two versions match. A SEGV signal is raised when there is a mismatch. This feature can be used by the Database, applications and the OS. M7 Database In-Memory Query Accelerator The M7 chip also includes a In-Silicon Query Engines. These accelerate tasks that work on In-Memory Columnar Vectors. Oracle In-Memory options stores data in Column Format. The M7 Query Engine can speed up In-Memory Format Conversion, Value and Range Comparisons and Set Membership lookups. This engine can work on Compressed data - this means not only are we accelerating the query performance but also increasing the memory bandwidth for queries. SPARC Accelerated Program At the Hotchips conference we also introduced the SPARC Accelerated Program to provide our partners and third part developers access to all the goodness of the M7's SPARC Application Acceleration features. Please get in touch with us if you are interested in knowing more about this program.

Read the article

Extreme Optimization – Numerical Algorithm Support

- by JoshReuben

Function Delegates Many calculations involve the repeated evaluation of one or more user-supplied functions eg Numerical integration. The EO MathLib provides delegate types for common function signatures and the FunctionFactory class can generate new delegates from existing ones. RealFunction delegate - takes one Double parameter – can encapsulate most of the static methods of the System.Math class, as well as the classes in the Extreme.Mathematics.SpecialFunctions namespace: var sin = new RealFunction(Math.Sin); var result = sin(1); BivariateRealFunction delegate - takes two Double parameters: var atan2 = new BivariateRealFunction (Math.Atan2); var result = atan2(1, 2); TrivariateRealFunction delegate – represents a function takes three Double arguments ParameterizedRealFunction delegate - represents a function taking one Integer and one Double argument that returns a real number. The Pow method implements such a function, but the arguments need order re-arrangement: static double Power(int exponent, double x) { return ElementaryFunctions.Pow(x, exponent); } ... var power = new ParameterizedRealFunction(Power); var result = power(6, 3.2); A ComplexFunction delegate - represents a function that takes an Extreme.Mathematics.DoubleComplex argument and also returns a complex number. MultivariateRealFunction delegate - represents a function that takes an Extreme.Mathematics.LinearAlgebra.Vector argument and returns a real number. MultivariateVectorFunction delegate - represents a function that takes a Vector argument and returns a Vector. FastMultivariateVectorFunction delegate - represents a function that takes an input Vector argument and an output Matrix argument – avoiding object construction The FunctionFactory class RealFromBivariateRealFunction and RealFromParameterizedRealFunction helper methods - transform BivariateRealFunction or a ParameterizedRealFunction into a RealFunction delegate by fixing one of the arguments, and treating this as a new function of a single argument. var tenthPower = FunctionFactory.RealFromParameterizedRealFunction(power, 10); var result = tenthPower(x); Note: There is no direct way to do this programmatically in C# - in F# you have partial value functions where you supply a subset of the arguments (as a travelling closure) that the function expects. When you omit arguments, F# generates a new function that holds onto/remembers the arguments you passed in and "waits" for the other parameters to be supplied. let sumVals x y = x + y let sumX = sumVals 10 // Note: no 2nd param supplied. // sumX is a new function generated from partially applied sumVals. // ie "sumX is a partial application of sumVals." let sum = sumX 20 // Invokes sumX, passing in expected int (parameter y from original) val sumVals : int -> int -> int val sumX : (int -> int) val sum : int = 30 RealFunctionsToVectorFunction and RealFunctionsToFastVectorFunction helper methods - combines an array of delegates returning a real number or a vector into vector or matrix functions. The resulting vector function returns a vector whose components are the function values of the delegates in the array. var funcVector = FunctionFactory.RealFunctionsToVectorFunction( new MultivariateRealFunction(myFunc1), new MultivariateRealFunction(myFunc2)); The IterativeAlgorithm<T> abstract base class Iterative algorithms are common in numerical computing - a method is executed repeatedly until a certain condition is reached, approximating the result of a calculation with increasing accuracy until a certain threshold is reached. If the desired accuracy is achieved, the algorithm is said to converge. This base class is derived by many classes in the Extreme.Mathematics.EquationSolvers and Extreme.Mathematics.Optimization namespaces, as well as the ManagedIterativeAlgorithm class which contains a driver method that manages the iteration process. The ConvergenceTest abstract base class This class is used to specify algorithm Termination , convergence and results - calculates an estimate for the error, and signals termination of the algorithm when the error is below a specified tolerance. Termination Criteria - specify the success condition as the difference between some quantity and its actual value is within a certain tolerance – 2 ways: absolute error - difference between the result and the actual value. relative error is the difference between the result and the actual value relative to the size of the result. Tolerance property - specify trade-off between accuracy and execution time. The lower the tolerance, the longer it will take for the algorithm to obtain a result within that tolerance. Most algorithms in the EO NumLib have a default value of MachineConstants.SqrtEpsilon - gives slightly less than 8 digits of accuracy. ConvergenceCriterion property - specify under what condition the algorithm is assumed to converge. Using the ConvergenceCriterion enum: WithinAbsoluteTolerance / WithinRelativeTolerance / WithinAnyTolerance / NumberOfIterations Active property - selectively ignore certain convergence tests Error property - returns the estimated error after a run MaxIterations / MaxEvaluations properties - Other Termination Criteria - If the algorithm cannot achieve the desired accuracy, the algorithm still has to end – according to an absolute boundary. Status property - indicates how the algorithm terminated - the AlgorithmStatus enum values:NoResult / Busy / Converged (ended normally - The desired accuracy has been achieved) / IterationLimitExceeded / EvaluationLimitExceeded / RoundOffError / BadFunction / Divergent / ConvergedToFalseSolution. After the iteration terminates, the Status should be inspected to verify that the algorithm terminated normally. Alternatively, you can set the ThrowExceptionOnFailure to true. Result property - returns the result of the algorithm. This property contains the best available estimate, even if the desired accuracy was not obtained. IterationsNeeded / EvaluationsNeeded properties - returns the number of iterations required to obtain the result, number of function evaluations. Concrete Types of Convergence Test classes SimpleConvergenceTest class - test if a value is close to zero or very small compared to another value. VectorConvergenceTest class - test convergence of vectors. This class has two additional properties. The Norm property specifies which norm is to be used when calculating the size of the vector - the VectorConvergenceNorm enum values: EuclidianNorm / Maximum / SumOfAbsoluteValues. The ErrorMeasure property specifies how the error is to be measured – VectorConvergenceErrorMeasure enum values: Norm / Componentwise ConvergenceTestCollection class - represent a combination of tests. The Quantifier property is a ConvergenceTestQuantifier enum that specifies how the tests in the collection are to be combined: Any / All The AlgorithmHelper Class inherits from IterativeAlgorithm<T> and exposes two methods for convergence testing. IsValueWithinTolerance<T> method - determines whether a value is close to another value to within an algorithm's requested tolerance. IsIntervalWithinTolerance<T> method - determines whether an interval is within an algorithm's requested tolerance.

Read the article

Opposite Force to Apply to a Collided Rigid Body?

- by Milo

I'm working on the physics for my GTA2-like game so I can learn more about game physics. The collision detection and resolution are working great. I'm now just unsure how to compute the force to apply to a body after it collides with a wall. My rigid body looks like this: /our simulation object class RigidBody extends Entity { //linear private Vector2D velocity = new Vector2D(); private Vector2D forces = new Vector2D(); private float mass; private Vector2D v = new Vector2D(); //angular private float angularVelocity; private float torque; private float inertia; //graphical private Vector2D halfSize = new Vector2D(); private Bitmap image; private Matrix mat = new Matrix(); private float[] Vector2Ds = new float[2]; private Vector2D tangent = new Vector2D(); private static Vector2D worldRelVec = new Vector2D(); private static Vector2D relWorldVec = new Vector2D(); private static Vector2D pointVelVec = new Vector2D(); private static Vector2D acceleration = new Vector2D(); public RigidBody() { //set these defaults so we don't get divide by zeros mass = 1.0f; inertia = 1.0f; setLayer(LAYER_OBJECTS); } protected void rectChanged() { if(getWorld() != null) { getWorld().updateDynamic(this); } } //intialize out parameters public void initialize(Vector2D halfSize, float mass, Bitmap bitmap) { //store physical parameters this.halfSize = halfSize; this.mass = mass; image = bitmap; inertia = (1.0f / 20.0f) * (halfSize.x * halfSize.x) * (halfSize.y * halfSize.y) * mass; RectF rect = new RectF(); float scalar = 10.0f; rect.left = (int)-halfSize.x * scalar; rect.top = (int)-halfSize.y * scalar; rect.right = rect.left + (int)(halfSize.x * 2.0f * scalar); rect.bottom = rect.top + (int)(halfSize.y * 2.0f * scalar); setRect(rect); } public void setLocation(Vector2D position, float angle) { getRect().set(position.x,position.y, getWidth(), getHeight(), angle); rectChanged(); } public Vector2D getPosition() { return getRect().getCenter(); } @Override public void update(float timeStep) { doUpdate(timeStep); } public void doUpdate(float timeStep) { //integrate physics //linear acceleration.x = forces.x / mass; acceleration.y = forces.y / mass; velocity.x += (acceleration.x * timeStep); velocity.y += (acceleration.y * timeStep); //velocity = Vector2D.add(velocity, Vector2D.scalarMultiply(acceleration, timeStep)); Vector2D c = getRect().getCenter(); v.x = getRect().getCenter().getX() + (velocity.x * timeStep); v.y = getRect().getCenter().getY() + (velocity.y * timeStep); setCenter(v.x, v.y); forces.x = 0; //clear forces forces.y = 0; //angular float angAcc = torque / inertia; angularVelocity += angAcc * timeStep; setAngle(getAngle() + angularVelocity * timeStep); torque = 0; //clear torque } //take a relative Vector2D and make it a world Vector2D public Vector2D relativeToWorld(Vector2D relative) { mat.reset(); Vector2Ds[0] = relative.x; Vector2Ds[1] = relative.y; mat.postRotate(JMath.radToDeg(getAngle())); mat.mapVectors(Vector2Ds); relWorldVec.x = Vector2Ds[0]; relWorldVec.y = Vector2Ds[1]; return relWorldVec; } //take a world Vector2D and make it a relative Vector2D public Vector2D worldToRelative(Vector2D world) { mat.reset(); Vector2Ds[0] = world.x; Vector2Ds[1] = world.y; mat.postRotate(JMath.radToDeg(-getAngle())); mat.mapVectors(Vector2Ds); worldRelVec.x = Vector2Ds[0]; worldRelVec.y = Vector2Ds[1]; return worldRelVec; } //velocity of a point on body public Vector2D pointVelocity(Vector2D worldOffset) { tangent.x = -worldOffset.y; tangent.y = worldOffset.x; pointVelVec.x = (tangent.x * angularVelocity) + velocity.x; pointVelVec.y = (tangent.y * angularVelocity) + velocity.y; return pointVelVec; } public void applyForce(Vector2D worldForce, Vector2D worldOffset) { //add linear force forces.x += worldForce.x; forces.y += worldForce.y; //add associated torque torque += Vector2D.cross(worldOffset, worldForce); } @Override public void draw( GraphicsContext c) { c.drawRotatedScaledBitmap(image, getPosition().x, getPosition().y, getWidth(), getHeight(), getAngle()); } public Vector2D getVelocity() { return velocity; } public void setVelocity(Vector2D velocity) { this.velocity = velocity; } } The way it is given force is by the applyForce method, this method considers angular torque. I'm just not sure how to come up with the vectors in the case where: RigidBody hits static entity RigidBody hits other RigidBody that may or may not be in motion. Would anyone know a way (without too complex math) that I could figure out the opposite force I need to apply to the car? I know the normal it is colliding with and how deep it collided. My main goal is so that say I hit a building from the side, well the car should not just stay there, it should slowly rotate out of it if I'm more than 45 degrees. Right now when I hit a wall I only change the velocity directly which does not consider angular force. Thanks!

Read the article

Not getting desired results with SSAO implementation

- by user1294203

After having implemented deferred rendering, I tried my luck with a SSAO implementation using this Tutorial. Unfortunately, I'm not getting anything that looks like SSAO, you can see my result below. You can see there is some weird pattern forming and there is no occlusion shading where there needs to be (i.e. in between the objects and on the ground). The shaders I implemented follow: #VS #version 330 core uniform mat4 invProjMatrix; layout(location = 0) in vec3 in_Position; layout(location = 2) in vec2 in_TexCoord; noperspective out vec2 pass_TexCoord; smooth out vec3 viewRay; void main(void){ pass_TexCoord = in_TexCoord; viewRay = (invProjMatrix * vec4(in_Position, 1.0)).xyz; gl_Position = vec4(in_Position, 1.0); } #FS #version 330 core uniform sampler2D DepthMap; uniform sampler2D NormalMap; uniform sampler2D noise; uniform vec2 projAB; uniform ivec3 noiseScale_kernelSize; uniform vec3 kernel[16]; uniform float RADIUS; uniform mat4 projectionMatrix; noperspective in vec2 pass_TexCoord; smooth in vec3 viewRay; layout(location = 0) out float out_AO; vec3 CalcPosition(void){ float depth = texture(DepthMap, pass_TexCoord).r; float linearDepth = projAB.y / (depth - projAB.x); vec3 ray = normalize(viewRay); ray = ray / ray.z; return linearDepth * ray; } mat3 CalcRMatrix(vec3 normal, vec2 texcoord){ ivec2 noiseScale = noiseScale_kernelSize.xy; vec3 rvec = texture(noise, texcoord * noiseScale).xyz; vec3 tangent = normalize(rvec - normal * dot(rvec, normal)); vec3 bitangent = cross(normal, tangent); return mat3(tangent, bitangent, normal); } void main(void){ vec2 TexCoord = pass_TexCoord; vec3 Position = CalcPosition(); vec3 Normal = normalize(texture(NormalMap, TexCoord).xyz); mat3 RotationMatrix = CalcRMatrix(Normal, TexCoord); int kernelSize = noiseScale_kernelSize.z; float occlusion = 0.0; for(int i = 0; i < kernelSize; i++){ // Get sample position vec3 sample = RotationMatrix * kernel[i]; sample = sample * RADIUS + Position; // Project and bias sample position to get its texture coordinates vec4 offset = projectionMatrix * vec4(sample, 1.0); offset.xy /= offset.w; offset.xy = offset.xy * 0.5 + 0.5; // Get sample depth float sample_depth = texture(DepthMap, offset.xy).r; float linearDepth = projAB.y / (sample_depth - projAB.x); if(abs(Position.z - linearDepth ) < RADIUS){ occlusion += (linearDepth <= sample.z) ? 1.0 : 0.0; } } out_AO = 1.0 - (occlusion / kernelSize); } I draw a full screen quad and pass Depth and Normal textures. Normals are in RGBA16F with the alpha channel reserved for the AO factor in the blur pass. I store depth in a non linear Depth buffer (32F) and recover the linear depth using: float linearDepth = projAB.y / (depth - projAB.x); where projAB.y is calculated as: and projAB.x as: These are derived from the glm::perspective(gluperspective) matrix. z_n and z_f are the near and far clip distance. As described in the link I posted on the top, the method creates samples in a hemisphere with higher distribution close to the center. It then uses random vectors from a texture to rotate the hemisphere randomly around the Z direction and finally orients it along the normal at the given pixel. Since the result is noisy, a blur pass follows the SSAO pass. Anyway, my position reconstruction doesn't seem to be wrong since I also tried doing the same but with the position passed from a texture instead of being reconstructed. I also tried playing with the Radius, noise texture size and number of samples and with different kinds of texture formats, with no luck. For some reason when changing the Radius, nothing changes. Does anyone have any suggestions? What could be going wrong?

Read the article

Black Screen: How to set Projection/View Matrix

- by Lisa

I have a Windows Phone 8 C#/XAML with DirectX component project. I'm rendering some particles, but each particle is a rectangle versus a square (as I've set the vertices to be positions equally offset from each other). I used an Identity matrix in the view and projection matrix. I decided to add the windows aspect ratio to prevent the rectangles. But now I get a black screen. None of the particles are rendered now. I don't know what's wrong with my matrices. Can anyone see the problem? These are the default matrices in Microsoft's project example. View Matrix: XMVECTOR eye = XMVectorSet(0.0f, 0.7f, 1.5f, 0.0f); XMVECTOR at = XMVectorSet(0.0f, -0.1f, 0.0f, 0.0f); XMVECTOR up = XMVectorSet(0.0f, 1.0f, 0.0f, 0.0f); XMStoreFloat4x4(&m_constantBufferData.view, XMMatrixTranspose(XMMatrixLookAtRH(eye, at, up))); Projection Matrix: void CubeRenderer::CreateWindowSizeDependentResources() { Direct3DBase::CreateWindowSizeDependentResources(); float aspectRatio = m_windowBounds.Width / m_windowBounds.Height; float fovAngleY = 70.0f * XM_PI / 180.0f; if (aspectRatio < 1.0f) { fovAngleY /= aspectRatio; } XMStoreFloat4x4(&m_constantBufferData.projection, XMMatrixTranspose(XMMatrixPerspectiveFovRH(fovAngleY, aspectRatio, 0.01f, 100.0f))); } I've tried modifying them to use cocos2dx's WP8 example. XMMATRIX identityMatrix = XMMatrixIdentity(); float fovy = 60.0f; float aspect = m_windowBounds.Width / m_windowBounds.Height; float zNear = 0.1f; float zFar = 100.0f; float xmin, xmax, ymin, ymax; ymax = zNear * tanf(fovy * XM_PI / 360); ymin = -ymax; xmin = ymin * aspect; xmax = ymax * aspect; XMMATRIX tmpMatrix = XMMatrixPerspectiveOffCenterRH(xmin, xmax, ymin, ymax, zNear, zFar); XMMATRIX projectionMatrix = XMMatrixMultiply(tmpMatrix, identityMatrix); // View Matrix float fEyeX = m_windowBounds.Width * 0.5f; float fEyeY = m_windowBounds.Height * 0.5f; float fEyeZ = m_windowBounds.Height / 1.1566f; float fLookAtX = m_windowBounds.Width * 0.5f; float fLookAtY = m_windowBounds.Height * 0.5f; float fLookAtZ = 0.0f; float fUpX = 0.0f; float fUpY = 1.0f; float fUpZ = 0.0f; XMMATRIX tmpMatrix2 = XMMatrixLookAtRH(XMVectorSet(fEyeX,fEyeY,fEyeZ,0.f), XMVectorSet(fLookAtX,fLookAtY,fLookAtZ,0.f), XMVectorSet(fUpX,fUpY,fUpZ,0.f)); XMMATRIX viewMatrix = XMMatrixMultiply(tmpMatrix2, identityMatrix); XMStoreFloat4x4(&m_constantBufferData.view, viewMatrix); Vertex Shader cbuffer ModelViewProjectionConstantBuffer : register(b0) { //matrix model; matrix view; matrix projection; }; struct VertexInputType { float4 position : POSITION; float2 tex : TEXCOORD0; float4 color : COLOR; }; struct PixelInputType { float4 position : SV_POSITION; float2 tex : TEXCOORD0; float4 color : COLOR; }; PixelInputType main(VertexInputType input) { PixelInputType output; // Change the position vector to be 4 units for proper matrix calculations. input.position.w = 1.0f; //===================================== // TODO: ADDED for testing input.position.z = 0.0f; //===================================== // Calculate the position of the vertex against the world, view, and projection matrices. //output.position = mul(input.position, model); output.position = mul(input.position, view); output.position = mul(output.position, projection); // Store the texture coordinates for the pixel shader. output.tex = input.tex; // Store the particle color for the pixel shader. output.color = input.color; return output; } Before I render the shader, I set the view/projection matrices into the constant buffer void ParticleRenderer::SetShaderParameters() { ViewProjectionConstantBuffer* dataPtr; D3D11_MAPPED_SUBRESOURCE mappedResource; DX::ThrowIfFailed(m_d3dContext->Map(m_constantBuffer.Get(), 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedResource)); dataPtr = (ViewProjectionConstantBuffer*)mappedResource.pData; dataPtr->view = m_constantBufferData.view; dataPtr->projection = m_constantBufferData.projection; m_d3dContext->Unmap(m_constantBuffer.Get(), 0); // Now set the constant buffer in the vertex shader with the updated values. m_d3dContext->VSSetConstantBuffers(0, 1, m_constantBuffer.GetAddressOf() ); // Set shader texture resource in the pixel shader. m_d3dContext->PSSetShaderResources(0, 1, &m_textureView); } Nothing, black screen... I tried so many different look at, eye, and up vectors. I tried transposing the matrices. I've set the particle center position to always be (0, 0, 0), I tried different positions too, just to make sure they're not being rendered offscreen.

Read the article

Increasing efficiency of N-Body gravity simulation

- by Postman

I'm making a space exploration type game, it will have many planets and other objects that will all have realistic gravity. I currently have a system in place that works, but if the number of planets goes above 70, the FPS decreases an practically exponential rates. I'm making it in C# and XNA. My guess is that I should be able to do gravity calculations between 100 objects without this kind of strain, so clearly my method is not as efficient as it should be. I have two files, Gravity.cs and EntityEngine.cs. Gravity manages JUST the gravity calculations, EntityEngine creates an instance of Gravity and runs it, along with other entity related methods. EntityEngine.cs public void Update() { foreach (KeyValuePair<string, Entity> e in Entities) { e.Value.Update(); } gravity.Update(); } (Only relevant piece of code from EntityEngine, self explanatory. When an instance of Gravity is made in entityEngine, it passes itself (this) into it, so that gravity can have access to entityEngine.Entities (a dictionary of all planet objects)) Gravity.cs namespace ExplorationEngine { public class Gravity { private EntityEngine entityEngine; private Vector2 Force; private Vector2 VecForce; private float distance; private float mult; public Gravity(EntityEngine e) { entityEngine = e; } public void Update() { //First loop foreach (KeyValuePair<string, Entity> e in entityEngine.Entities) { //Reset the force vector Force = new Vector2(); //Second loop foreach (KeyValuePair<string, Entity> e2 in entityEngine.Entities) { //Make sure the second value is not the current value from the first loop if (e2.Value != e.Value ) { //Find the distance between the two objects. Because Fg = G * ((M1 * M2) / r^2), using Vector2.Distance() and then squaring it //is pointless and inefficient because distance uses a sqrt, squaring the result simple cancels that sqrt. distance = Vector2.DistanceSquared(e2.Value.Position, e.Value.Position); //This makes sure that two planets do not attract eachother if they are touching, completely unnecessary when I add collision, //For now it just makes it so that the planets are not glitchy, performance is not significantly improved by removing this IF if (Math.Sqrt(distance) > (e.Value.Texture.Width / 2 + e2.Value.Texture.Width / 2)) { //Calculate the magnitude of Fg (I'm using my own gravitational constant (G) for the sake of time (I know it's 1 at the moment, but I've been changing it) mult = 1.0f * ((e.Value.Mass * e2.Value.Mass) / distance); //Calculate the direction of the force, simply subtracting the positions and normalizing works, this fixes diagonal vectors //from having a larger value, and basically makes VecForce a direction. VecForce = e2.Value.Position - e.Value.Position; VecForce.Normalize(); //Add the vector for each planet in the second loop to a force var. Force = Vector2.Add(Force, VecForce * mult); //I have tried Force += VecForce * mult, and have not noticed much of an increase in speed. } } } //Add that force to the first loop's planet's position (later on I'll instead add to acceleration, to account for inertia) e.Value.Position += Force; } } } } I have used various tips (about gravity optimizing, not threading) from THIS question (that I made yesterday). I've made this gravity method (Gravity.Update) as efficient as I know how to make it. This O(N^2) algorithm still seems to be eating up all of my CPU power though. Here is a LINK (google drive, go to File download, keep .Exe with the content folder, you will need XNA Framework 4.0 Redist. if you don't already have it) to the current version of my game. Left click makes a planet, right click removes the last planet. Mouse moves the camera, scroll wheel zooms in and out. Watch the FPS and Planet Count to see what I mean about performance issues past 70 planets. (ALL 70 planets must be moving, I've had 100 stationary planets and only 5 or so moving ones while still having 300 fps, the issue arises when 70+ are moving around) After 70 planets are made, performance tanks exponentially. With < 70 planets, I get 330 fps (I have it capped at 300). At 90 planets, the FPS is about 2, more than that and it sticks around at 0 FPS. Strangely enough, when all planets are stationary, the FPS climbs back up to around 300, but as soon as something moves, it goes right back down to what it was, I have no systems in place to make this happen, it just does. I considered multithreading, but that previous question I asked taught me a thing or two, and I see now that that's not a viable option. I've also thought maybe I could do the calculations on my GPU instead, though I don't think it should be necessary. I also do not know how to do this, it is not a simple concept and I want to avoid it unless someone knows a really noob friendly simple way to do it that will work for an n-body gravity calculation. (I have an NVidia gtx 660) Lastly I've considered using a quadtree type system. (Barnes Hut simulation) I've been told (in the previous question) that this is a good method that is commonly used, and it seems logical and straightforward, however the implementation is way over my head and I haven't found a good tutorial for C# yet that explains it in a way I can understand, or uses code I can eventually figure out. So my question is this: How can I make my gravity method more efficient, allowing me to use more than 100 objects (I can render 1000 planets with constant 300+ FPS without gravity calculations), and if I can't do much to improve performance (including some kind of quadtree system), could I use my GPU to do the calculations?

Read the article

C# XNA 4.0 multitextured cube

- by chron

So I am following this tutorial on how to draw a cube in XNA and I ran into a problem. Ok so my shader can only have texture right? I need to have a texture on the front and back of my cube. So I thought I could just have both textures stored in one texture. Problem is I do not know how to map out my UV coords to do so. (I tried dividing by 2 and such with no luck...). I am really new to this (not programming, just game development and some concepts), but how could I get UV coordinates for both halfs of the texture. private void ConstructCube(Vector3 Position,Vector3 Size) { _vertices = new VertexPositionNormalTexture[50]; // Calculate the position of the vertices on the top face. Vector3 topLeftFront = Position + new Vector3(-1.0f, 1.0f, -1.0f) * Size; Vector3 topLeftBack = Position + new Vector3(-1.0f, 1.0f, 1.0f) * Size; Vector3 topRightFront = Position + new Vector3(1.0f, 1.0f, -1.0f) * Size; Vector3 topRightBack = Position + new Vector3(1.0f, 1.0f, 1.0f) * Size; // Calculate the position of the vertices on the bottom face. Vector3 btmLeftFront = Position + new Vector3(-1.0f, -1.0f, -1.0f) * Size; Vector3 btmLeftBack = Position + new Vector3(-1.0f, -1.0f, 1.0f) * Size; Vector3 btmRightFront = Position + new Vector3(1.0f, -1.0f, -1.0f) * Size; Vector3 btmRightBack = Position + new Vector3(1.0f, -1.0f, 1.0f) * Size; // Normal vectors for each face (needed for lighting / display) Vector3 normalFront = new Vector3(0.0f, 0.0f, 1.0f) * Size; Vector3 normalBack = new Vector3(0.0f, 0.0f, -1.0f) * Size; Vector3 normalTop = new Vector3(0.0f, 1.0f, 0.0f) * Size; Vector3 normalBottom = new Vector3(0.0f, -1.0f, 0.0f) * Size; Vector3 normalLeft = new Vector3(-1.0f, 0.0f, 0.0f) * Size; Vector3 normalRight = new Vector3(1.0f, 0.0f, 0.0f) * Size; // UV texture coordinates Vector2 textureTopLeft = new Vector2(1.0f * Size.X, 0.0f * Size.Y); Vector2 textureTopRight = new Vector2(0.0f * Size.X, 0.0f * Size.Y); Vector2 textureBottomLeft = new Vector2(1.0f * Size.X, 1.0f * Size.Y); Vector2 textureBottomRight = new Vector2(0.0f * Size.X, 1.0f * Size.Y); // Add the vertices for the FRONT face. _vertices[0] = new VertexPositionNormalTexture(topLeftFront, normalFront, textureTopLeft); _vertices[1] = new VertexPositionNormalTexture(btmLeftFront, normalFront, textureBottomLeft); _vertices[2] = new VertexPositionNormalTexture(topRightFront, normalFront, textureTopRight); _vertices[3] = new VertexPositionNormalTexture(btmLeftFront, normalFront, textureBottomLeft); _vertices[4] = new VertexPositionNormalTexture(btmRightFront, normalFront, textureBottomRight); _vertices[5] = new VertexPositionNormalTexture(topRightFront, normalFront, textureTopRight); // Add the vertices for the BACK face. _vertices[6] = new VertexPositionNormalTexture(topLeftBack, normalBack, textureTopRight); _vertices[7] = new VertexPositionNormalTexture(topRightBack, normalBack, textureTopLeft); _vertices[8] = new VertexPositionNormalTexture(btmLeftBack, normalBack, textureBottomRight); _vertices[9] = new VertexPositionNormalTexture(btmLeftBack, normalBack, textureBottomRight); _vertices[10] = new VertexPositionNormalTexture(topRightBack, normalBack, textureTopLeft); _vertices[11] = new VertexPositionNormalTexture(btmRightBack, normalBack, textureBottomLeft); // Add the vertices for the TOP face. _vertices[12] = new VertexPositionNormalTexture(topLeftFront, normalTop, textureBottomLeft); _vertices[13] = new VertexPositionNormalTexture(topRightBack, normalTop, textureTopRight); _vertices[14] = new VertexPositionNormalTexture(topLeftBack, normalTop, textureTopLeft); _vertices[15] = new VertexPositionNormalTexture(topLeftFront, normalTop, textureBottomLeft); _vertices[16] = new VertexPositionNormalTexture(topRightFront, normalTop, textureBottomRight); _vertices[17] = new VertexPositionNormalTexture(topRightBack, normalTop, textureTopRight); // Add the vertices for the BOTTOM face. _vertices[18] = new VertexPositionNormalTexture(btmLeftFront, normalBottom, textureTopLeft); _vertices[19] = new VertexPositionNormalTexture(btmLeftBack, normalBottom, textureBottomLeft); _vertices[20] = new VertexPositionNormalTexture(btmRightBack, normalBottom, textureBottomRight); _vertices[21] = new VertexPositionNormalTexture(btmLeftFront, normalBottom, textureTopLeft); _vertices[22] = new VertexPositionNormalTexture(btmRightBack, normalBottom, textureBottomRight); _vertices[23] = new VertexPositionNormalTexture(btmRightFront, normalBottom, textureTopRight); // Add the vertices for the LEFT face. _vertices[24] = new VertexPositionNormalTexture(topLeftFront, normalLeft, textureTopRight); _vertices[25] = new VertexPositionNormalTexture(btmLeftBack, normalLeft, textureBottomLeft ); _vertices[26] = new VertexPositionNormalTexture(btmLeftFront, normalLeft, textureBottomRight ); _vertices[27] = new VertexPositionNormalTexture(topLeftBack, normalLeft, textureTopLeft); _vertices[28] = new VertexPositionNormalTexture(btmLeftBack, normalLeft, textureBottomLeft ); _vertices[29] = new VertexPositionNormalTexture(topLeftFront, normalLeft, textureTopRight ); // Add the vertices for the RIGHT face. _vertices[30] = new VertexPositionNormalTexture(topRightFront, normalRight, textureTopLeft); _vertices[31] = new VertexPositionNormalTexture(btmRightFront, normalRight, textureBottomLeft); _vertices[32] = new VertexPositionNormalTexture(btmRightBack, normalRight, textureBottomRight); _vertices[33] = new VertexPositionNormalTexture(topRightBack, normalRight, textureTopRight); _vertices[34] = new VertexPositionNormalTexture(topRightFront, normalRight, textureTopLeft); _vertices[35] = new VertexPositionNormalTexture(btmRightBack, normalRight, textureBottomRight); done = true; }

Read the article

OpenGL FrameBuffer Objects weird behavior

- by Ben Jones

My algorithm is this: Render the scene to a FBO with shadow mapping from multiple locations Render the scene to the screen with shadow mapping ...black magic that I still have to imlement... Combine the samples from step 1 with the image from step 2 I'm trying to debug steps 1 and 2 and am coming across STRANGE behavior. My algorithm for each shadow mapped pass is: render the scene to a FBO connected to a depth array texture from the POV of each light render the scene from the viewpoint and use vertex/frag shaders to compare the depths When I run my algorithm this way: render from point to FBO render from point to screen glutSwapBuffers() The normal vectors in the screen pass appear to be incorrect (inverted possibly). I'm pretty sure that's the issue because my diffuse lighting calculation is incorrect, but the material colors are correct, and the shadows appear in the correct places. So, it seems like the only thing that could be the culprit is the normals. However if I do render from point to FBO render from point to Screen glutSwapBuffers() //wrong here render from point to Screen glutSwapBuffers() the second pass is correct. I assume there's a problem with my framebuffer calls. Can anyone see what the problem is from the log below? Its from a bugle trace grepped for 'buffer' with a few edits to make it a little more clear. Thanks! [INFO] trace.call: glGenFramebuffersEXT(1, 0xdfeb90 - { 1 }) [INFO] trace.call: glGenFramebuffersEXT(1, 0xdfebac - { 2 }) [INFO] trace.call: glBindFramebufferEXT(GL_FRAMEBUFFER, 1) [INFO] trace.call: glDrawBuffer(GL_NONE) [INFO] trace.call: glReadBuffer(GL_NONE) [INFO] trace.call: glBindFramebufferEXT(GL_FRAMEBUFFER, 0) //start render to FBO [INFO] trace.call: glBindFramebufferEXT(GL_FRAMEBUFFER, 2) [INFO] trace.call: glReadBuffer(GL_NONE) [INFO] trace.call: glFramebufferTexture2DEXT(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, 2, 0) [INFO] trace.call: glFramebufferTexture2DEXT(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_TEXTURE_2D, 3, 0) [INFO] trace.call: glDrawBuffer(GL_COLOR_ATTACHMENT0) //bind to the FBO attached to a depth tex array for shadows [INFO] trace.call: glBindFramebufferEXT(GL_FRAMEBUFFER, 1) [INFO] trace.call: glFramebufferTextureLayerARB(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, 1, 0, 0) [INFO] trace.call: glClear(GL_DEPTH_BUFFER_BIT) //draw geometry //bind to the FBO I want the shadow mapped image rendered to [INFO] trace.call: glBindFramebufferEXT(GL_FRAMEBUFFER, 2) [INFO] trace.call: glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT) //draw geometry //draw to screen pass //again shadow mapping FBO [INFO] trace.call: glBindFramebufferEXT(GL_FRAMEBUFFER, 1) [INFO] trace.call: glFramebufferTextureLayerARB(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, 1, 0, 0) [INFO] trace.call: glClear(GL_DEPTH_BUFFER_BIT) //draw geometry //bind to the screen [INFO] trace.call: glBindFramebufferEXT(GL_FRAMEBUFFER, 0) [INFO] trace.call: glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT) //finished, swap buffers [INFO] trace.call: glXSwapBuffers(0xd5fc10, 0x05800002) //INCORRECT OUTPUT //second try at render to screen: [INFO] trace.call: glBindFramebufferEXT(GL_FRAMEBUFFER, 1) [INFO] trace.call: glFramebufferTextureLayerARB(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, 1, 0, 0) [INFO] trace.call: glClear(GL_DEPTH_BUFFER_BIT) //draw geometry [INFO] trace.call: glBindFramebufferEXT(GL_FRAMEBUFFER, 0) [INFO] trace.call: glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT) draw geometry [INFO] trace.call: glXSwapBuffers(0xd5fc10, 0x05800002) //correct output

Read the article

How to better create stacked bar graphs with multiple variables from ggplot2?

- by deoksu

I often have to make stacked barplots to compare variables, and because I do all my stats in R, I prefer to do all my graphics in R with ggplot2. I would like to learn how to do two things: First, I would like to be able to add proper percentage tick marks for each variable rather than tick marks by count. Counts would be confusing, which is why I take out the axis labels completely. Second, there must be a simpler way to reorganize my data to make this happen. It seems like the sort of thing I should be able to do natively in ggplot2 with plyR, but the documentation for plyR is not very clear (and I have read both the ggplot2 book and the online plyR documentation. My best graph looks like this, the code to create it follows: the R code I use to get it is the following: library(epicalc) ### recode the variables to factors ### recode(c(int_newcoun, int_newneigh, int_neweur, int_newusa, int_neweco, int_newit, int_newen, int_newsp, int_newhr, int_newlit, int_newent, int_newrel, int_newhth, int_bapo, int_wopo, int_eupo, int_educ), c(1,2,3,4,5,6,7,8,9, NA), c('Very Interested','Somewhat Interested','Not Very Interested','Not At All interested',NA,NA,NA,NA,NA,NA)) ### Combine recoded variables to a common vector Interest1<-c(int_newcoun, int_newneigh, int_neweur, int_newusa, int_neweco, int_newit, int_newen, int_newsp, int_newhr, int_newlit, int_newent, int_newrel, int_newhth, int_bapo, int_wopo, int_eupo, int_educ) ### Create a second vector to label the first vector by original variable ### a1<-rep("News about Bangladesh", length(int_newcoun)) a2<-rep("Neighboring Countries", length(int_newneigh)) [...] a17<-rep("Education", length(int_educ)) Interest2<-c(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17) ### Create a Weighting vector of the proper length ### Interest.weight<-rep(weight, 17) ### Make and save a new data frame from the three vectors ### Interest.df<-cbind(Interest1, Interest2, Interest.weight) Interest.df<-as.data.frame(Interest.df) write.csv(Interest.df, 'C:\\Documents and Settings\\[name]\\Desktop\\Sweave\\InterestBangladesh.csv') ### Sort the factor levels to display properly ### Interest.df$Interest1<-relevel(Interest$Interest1, ref='Not Very Interested') Interest.df$Interest1<-relevel(Interest$Interest1, ref='Somewhat Interested') Interest.df$Interest1<-relevel(Interest$Interest1, ref='Very Interested') Interest.df$Interest2<-relevel(Interest$Interest2, ref='News about Bangladesh') Interest.df$Interest2<-relevel(Interest$Interest2, ref='Education') [...] Interest.df$Interest2<-relevel(Interest$Interest2, ref='European Politics') detach(Interest) attach(Interest) ### Finally create the graph in ggplot2 ### library(ggplot2) p<-ggplot(Interest, aes(Interest2, ..count..)) p<-p+geom_bar((aes(weight=Interest.weight, fill=Interest1))) p<-p+coord_flip() p<-p+scale_y_continuous("", breaks=NA) p<-p+scale_fill_manual(value = rev(brewer.pal(5, "Purples"))) p update_labels(p, list(fill='', x='', y='')) I'd very much appreciate any tips, tricks or hints. Thanks.

Read the article

velocity vector

- by wanderer

Hi, I am trying to simulate a collision. The collision is shown here http://www.freeimagehosting.net/image.php?c5ae01b476.jpg A particle falls down on a sphere and a collision between sphere and particle takes place. The sphere always remain stationary and the collision itself is not elastic. So if the particle falls directly n top of sphere, the velocity of particle will become zero. I was trying to set the velocity of particle to be zero after the collision. But that does not give good simulation when the collision does not occur on top of sphere but along the side of sphere. So now after the collision i need to make sure that the particle has a velocity which is orthogonal to the vector of the point of collision from the center of sphere. The velocity along the vector from center of sphere to point of collision should become zero. How do i do that? I am a bit mathematically challenged but i think it has something to do with dot product of vectors. Or maybe i am wrong :) I have the initial velocity vector and 'radiusvector' say :- 1)velocity <-1.03054, -1.56563, 1.33341e-016 2) radius vector <2.04406, 2.19587, 1.0514 Pseudo code for the problem is: foreach( particle particle in particlesCollections) { //sphere.x, sphere.y sphere.z give the center of the sphere dist = particle.pos-vector(sphere.x,sphere.y,sphere.z); //detect if a collision has taken place. if (dist.mag < sphere.radius) { rVector=dist/dist.mag*sphere.radius; particle.pos=vector(sphere.x,sphere.y,sphere.z) + rVector; //particle.Velocity gives the velocity vector of the particle at the time of collision //i need to modify particle.Velocity so that the component of velocity that runs along // with the rvector becomes zero as i have a non elsatic collision. The remaining //velocity that the particle will have is the one which runs along with tangent to the //rVector. The sphere remains stationary. //example values: particle.Velocity == <-1.03054, -1.56563, .006> //and rVector = <2.04406, 2.19587, 1.0514> } } Thanks

Read the article

Binary Search Help

- by aloh

Hi, for a project I need to implement a binary search. This binary search allows duplicates. I have to get all the index values that match my target. I've thought about doing it this way if a duplicate is found to be in the middle: Target = G Say there is this following sorted array: B, D, E, F, G, G, G, G, G, G, Q, R S, S, Z I get the mid which is 7. Since there are target matches on both sides, and I need all the target matches, I thought a good way to get all would be to check mid + 1 if it is the same value. If it is, keep moving mid to the right until it isn't. So, it would turn out like this: B, D, E, F, G, G, G, G, G, G (MID), Q, R S, S, Z Then I would count from 0 to mid to count up the target matches and store their indexes into an array and return it. That was how I was thinking of doing it if the mid was a match and the duplicate happened to be in the mid the first time and on both sides of the array. Now, what if it isn't a match the first time? For example: B, D, E, F, G, G, J, K, L, O, Q, R, S, S, Z Then as normal, it would grab the mid, then call binary search from first to mid-1. B, D, E, F, G, G, J Since G is greater than F, call binary search from mid+1 to last. G, G, J. The mid is a match. Since it is a match, search from mid+1 to last through a for loop and count up the number of matches and store the match indexes into an array and return. Is this a good way for the binary search to grab all duplicates? Please let me know if you see problems in my algorithm and hints/suggestions if any. The only problem I see is that if all the matches were my target, I would basically be searching the whole array but then again, if that were the case I still would need to get all the duplicates. Thank you BTW, my instructor said we cannot use Vectors, Hash or anything else. He wants us to stay on the array level and get used to using them and manipulating them.

Read the article

why is my 3n+1 problem solution wrong?

- by nunos

I have recently started reading "Programming Challenges" book by S. Skiena and believe or not I am kind of stuck in the very first problem. Here's a link to the problem: 3n+1 problem Here's my code: #include <iostream> #include <vector> #include <algorithm> using namespace std; unsigned long calc(unsigned long n); int main() { int i, j, a, b, m; vector<int> temp; while (true) { cin >> i >> j; if (cin.fail()) break; if (i < j) { a = i; b = j; } else { a = j; b = i; } temp.clear(); for (unsigned int k = a; k != b; k++) { temp.push_back(calc(k)); } m = *max_element(temp.begin(), temp.end()); cout << i << ' ' << j << ' ' << m << endl; } } unsigned long calc(unsigned long n) { unsigned long ret = 1; while (n != 1) { if (n % 2 == 0) n = n/2; else n = 3*n + 1; ret++; } return ret; } I know the code is inefficient and I should not be using vectors to store the data. Anyway, I still would like to know what it's wrong with this, since, for me, it makes perfect sense, even though I am getting WA (wrong answer at programming challenges judge and RE (Runtime Error) at UVa judge). Thanks.

Read the article

wrong operator() overload called

- by user313202

okay, I am writing a matrix class and have overloaded the function call operator twice. The core of the matrix is a 2D double array. I am using the MinGW GCC compiler called from a windows console. the first overload is meant to return a double from the array (for viewing an element). the second overload is meant to return a reference to a location in the array (for changing the data in that location. double operator()(int row, int col) const ; //allows view of element double &operator()(int row, int col); //allows assignment of element I am writing a testing routine and have discovered that the "viewing" overload never gets called. for some reason the compiler "defaults" to calling the overload that returns a reference when the following printf() statement is used. fprintf(outp, "%6.2f\t", testMatD(i,j)); I understand that I'm insulting the gods by writing my own matrix class without using vectors and testing with C I/O functions. I will be punished thoroughly in the afterlife, no need to do it here. Ultimately I'd like to know what is going on here and how to fix it. I'd prefer to use the cleaner looking operator overloads rather than member functions. Any ideas? -Cal the matrix class: irrelevant code omitted class Matrix { public: double getElement(int row, int col)const; //returns the element at row,col //operator overloads double operator()(int row, int col) const ; //allows view of element double &operator()(int row, int col); //allows assignment of element private: //data members double **array; //pointer to data array }; double Matrix::getElement(int row, int col)const{ //transform indices into true coordinates (from sorted coordinates //only row needs to be transformed (user can only sort by row) row = sortedArray[row]; result = array[usrZeroRow+row][usrZeroCol+col]; return result; } //operator overloads double Matrix::operator()(int row, int col) const { //this overload is used when viewing an element return getElement(row,col); } double &Matrix::operator()(int row, int col){ //this overload is used when placing an element return array[row+usrZeroRow][col+usrZeroCol]; } The testing program: irrelevant code omitted int main(void){ FILE *outp; outp = fopen("test_output.txt", "w+"); Matrix testMatD(5,7); //construct 5x7 matrix //some initializations omitted fprintf(outp, "%6.2f\t", testMatD(i,j)); //calls the wrong overload }

Read the article

gcc/g++: error when compiling large file

- by Alexander

Hi, I have a auto-generated C++ source file, around 40 MB in size. It largely consists of push_back commands for some vectors and string constants that shall be pushed. When I try to compile this file, g++ exits and says that it couldn't reserve enough virtual memory (around 3 GB). Googling this problem, I found that using the command line switches --param ggc-min-expand=0 --param ggc-min-heapsize=4096 may solve the problem. They, however, only seem to work when optimization is turned on. 1) Is this really the solution that I am looking for? 2) Or is there a faster, better (compiling takes ages with these options acitvated) way to do this? Best wishes, Alexander Update: Thanks for all the good ideas. I tried most of them. Using an array instead of several push_back() operations reduced memory usage, but as the file that I was trying to compile was so big, it still crashed, only later. In a way, this behaviour is really interesting, as there is not much to optimize in such a setting -- what does the GCC do behind the scenes that costs so much memory? (I compiled with deactivating all optimizations as well and got the same results) The solution that I switched to now is reading in the original data from a binary object file that I created from the original file using objcopy. This is what I originally did not want to do, because creating the data structures in a higher-level language (in this case Perl) was more convenient than having to do this in C++. However, getting this running under Win32 was more complicated than expected. objcopy seems to generate files in the ELF format, and it seems that some of the problems I had disappeared when I manually set the output format to pe-i386. The symbols in the object file are by standard named after the file name, e.g. converting the file inbuilt_training_data.bin would result in these two symbols: binary_inbuilt_training_data_bin_start and binary_inbuilt_training_data_bin_end. I found some tutorials on the web which claim that these symbols should be declared as extern char _binary_inbuilt_training_data_bin_start;, but this does not seem to be right -- only extern char binary_inbuilt_training_data_bin_start; worked for me.

Search Results

Search found 614 results on 25 pages for 'vectors'.

Page 22/25 | < Previous Page | 18 19 20 21 22 23 24 25 | Next Page >

- by Myx

- by Bob Dealio

- by rchrd

- by Will

- by dsimcha

- by jazzdawg

- by opdude

- by metredigm

- by Rivten

- by therobyouknow

- by JoshReuben

- by Angelo-Oracle

- by JoshReuben

- by Milo

- by user1294203

- by Lisa

- by Postman

- by chron

- by Ben Jones

- by deoksu

- by wanderer

- by aloh

- by nunos

- by user313202

- by Alexander

< Previous Page | 18 19 20 21 22 23 24 25 | Next Page >