three tier - Page 36 - Developer IT

Java Developer Days India Trip Report

- by reza_rahman

October 21st through October 25th I spoke at Java Developer Days India. This was three separate but identical one-day events in the cities of Pune (October 21st), Chennai (October 24th) and Bangalore (October 25th). For those with some familiarity with India, other than Hyderabad these cities are India's IT powerhouses. The events were focused on Java EE. I delivered five sessions on Java EE 7, WebSocket, JAX-RS 2, JMS 2 and EclipeLink/NoSQL. The events went extremely well and was packed in all three cities. More details on the sessions and Java Developer Days India, including the slide decks, posted on my personal blog.

Read the article

TechEd 2010 Thanks and Demos

- by Adam Machanic

Thank you to everyone who attended my three sessions at this year's TechEd show in New Orleans. I had a great time presenting and answering the really great questions posed by attendees. My sessions were: DAT317 T-SQL Power! The OVER Clause: Your Key to No-Sweat Problem Solving Have you ever stared at a convoluted requirement, unsure of where to begin and how to get there with T-SQL? Have you ever spent three days working on a long and complex query, wondering if there might be a better way? Good...(read more)

Read the article

How to setup Dual Head with "radeon" driver for R770?

- by user1709408

I want to make dual head setup without xrandr but with Xinerama. I put "Screen 1" line into xorg.conf, but card still show identical output on DVI-2 and DVI-3 It is important to use xinerama for me (to glue three monitors), that's why i decide not to use ranrd (randr is incompatible with xinerama as i read somewhere) Here is my videocard (HD 4850 X2): lspci | grep R700 03:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI R700 [Radeon HD 4850] 04:00.0 Display controller: Advanced Micro Devices [AMD] nee ATI R700 [Radeon HD 4850] Here is how monitors are connected: grep "DVI" /var/log/Xorg.0.log [ 1210.002] (II) RADEON(0): Output DVI-0 using monitor section Monitor0 [ 1210.048] (II) RADEON(0): Output DVI-1 has no monitor section [ 1210.079] (II) RADEON(0): EDID for output DVI-0 [ 1210.080] (II) RADEON(0): Printing probed modes for output DVI-0 [ 1210.128] (II) RADEON(0): EDID for output DVI-1 [ 1210.128] (II) RADEON(0): Output DVI-0 connected [ 1210.128] (II) RADEON(0): Output DVI-1 disconnected [ 1210.128] (II) RADEON(0): Output DVI-0 using initial mode 1920x1200 [ 1210.160] (II) RADEON(1): Output DVI-2 using monitor section Monitor2 [ 1210.215] (II) RADEON(1): Output DVI-3 has no monitor section [ 1210.246] (II) RADEON(1): EDID for output DVI-2 [ 1210.247] (II) RADEON(1): Printing probed modes for output DVI-2 [ 1210.299] (II) RADEON(1): EDID for output DVI-3 [ 1210.300] (II) RADEON(1): Printing probed modes for output DVI-3 [ 1210.300] (II) RADEON(1): Output DVI-2 connected [ 1210.300] (II) RADEON(1): Output DVI-3 connected [ 1210.300] (II) RADEON(1): Output DVI-2 using initial mode 1920x1200 [ 1210.300] (II) RADEON(1): Output DVI-3 using initial mode 1920x1200 Here is my /etc/X11/xorg.conf Section "ServerFlags" Option "RandR" "0" Option "Xinerama" "1" EndSection Section "ServerLayout" Identifier "Three Head Layout" Screen "MyPrecious0" Screen "MyPrecious2" RightOf "MyPrecious0" Screen "MyPrecious3" LeftOf "MyPrecious0" EndSection Section "Screen" Identifier "MyPrecious0" Monitor "Monitor0" Device "Device300" EndSection Section "Screen" Identifier "MyPrecious2" Monitor "Monitor2" Device "Device400" EndSection Section "Screen" Identifier "MyPrecious3" Monitor "Monitor3" Device "Device401" EndSection Section "Device" Identifier "Device300" BusID "PCI:3:0:0" Screen 0 Driver "radeon" EndSection Section "Device" Identifier "Device400" BusID "PCI:4:0:0" Screen 0 Driver "radeon" EndSection Section "Device" Identifier "Device401" BusID "PCI:4:0:0" Screen 1 Driver "radeon" EndSection Section "Monitor" Identifier "Monitor0" EndSection Section "Monitor" Identifier "Monitor2" EndSection Section "Monitor" Identifier "Monitor3" EndSection I tried to switch to vesa driver (didn't work for me) I tried to add options like Option "ZaphodHeads" "DVI-2" and Option "ZaphodHeads" "DVI-3" into sections "Device 400" and "Device 401" (this didn't help because "ZaphodHeads" option is for ranrd, and randr is disabled by decision) I tried to merge sections "Device 400" and "Device 401" into one section and add Option "ZaphodHeads" "DVI-2,DVI-3" (see comment about randr above) single section setup helps to change log line RADEON(1): Output DVI-3 has no monitor section into RADEON(1): Output DVI-3 using monitor section Monitor3 but nothing was enough to switch from screen cloning to separate screens. This problem (lack of documentation on radeon driver) is similar to these: Radeon display driver clones monitors while using Xinerama (moderators decision to close that problem was wrong) Ubuntu 12.10 multi-monitor setup isn't working The problem is solvable, because this hardware worked as three headed for me earlier with gentoo/xorg-server-1.3 Xorg -configure creates setup for the first monitor on the first GPU Please don't advise to use fglrx/aticonfig/amdcccle (this goes against my religion beliefs)

Read the article

How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article

C#/.NET Little Wonders: The Predicate, Comparison, and Converter Generic Delegates

- by James Michael Hare

Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here. In the last three weeks, we examined the Action family of delegates (and delegates in general), the Func family of delegates, and the EventHandler family of delegates and how they can be used to support generic, reusable algorithms and classes. This week I will be completing my series on the generic delegates in the .NET Framework with a discussion of three more, somewhat less used, generic delegates: Predicate<T>, Comparison<T>, and Converter<TInput, TOutput>. These are older generic delegates that were introduced in .NET 2.0, mostly for use in the Array and List<T> classes. Though older, it’s good to have an understanding of them and their intended purpose. In addition, you can feel free to use them yourself, though obviously you can also use the equivalents from the Func family of delegates instead. Predicate<T> – delegate for determining matches The Predicate<T> delegate was a very early delegate developed in the .NET 2.0 Framework to determine if an item was a match for some condition in a List<T> or T[]. The methods that tend to use the Predicate<T> include: Find(), FindAll(), FindLast() Uses the Predicate<T> delegate to finds items, in a list/array of type T, that matches the given predicate. FindIndex(), FindLastIndex() Uses the Predicate<T> delegate to find the index of an item, of in a list/array of type T, that matches the given predicate. The signature of the Predicate<T> delegate (ignoring variance for the moment) is: 1: public delegate bool Predicate<T>(T obj); So, this is a delegate type that supports any method taking an item of type T and returning bool. In addition, there is a semantic understanding that this predicate is supposed to be examining the item supplied to see if it matches a given criteria. 1: // finds first even number (2) 2: var firstEven = Array.Find(numbers, n => (n % 2) == 0); 3: 4: // finds all odd numbers (1, 3, 5, 7, 9) 5: var allEvens = Array.FindAll(numbers, n => (n % 2) == 1); 6: 7: // find index of first multiple of 5 (4) 8: var firstFiveMultiplePos = Array.FindIndex(numbers, n => (n % 5) == 0); This delegate has typically been succeeded in LINQ by the more general Func family, so that Predicate<T> and Func<T, bool> are logically identical. Strictly speaking, though, they are different types, so a delegate reference of type Predicate<T> cannot be directly assigned to a delegate reference of type Func<T, bool>, though the same method can be assigned to both. 1: // SUCCESS: the same lambda can be assigned to either 2: Predicate<DateTime> isSameDayPred = dt => dt.Date == DateTime.Today; 3: Func<DateTime, bool> isSameDayFunc = dt => dt.Date == DateTime.Today; 4: 5: // ERROR: once they are assigned to a delegate type, they are strongly 6: // typed and cannot be directly assigned to other delegate types. 7: isSameDayPred = isSameDayFunc; When you assign a method to a delegate, all that is required is that the signature matches. This is why the same method can be assigned to either delegate type since their signatures are the same. However, once the method has been assigned to a delegate type, it is now a strongly-typed reference to that delegate type, and it cannot be assigned to a different delegate type (beyond the bounds of variance depending on Framework version, of course). Comparison<T> – delegate for determining order Just as the Predicate<T> generic delegate was birthed to give Array and List<T> the ability to perform type-safe matching, the Comparison<T> was birthed to give them the ability to perform type-safe ordering. The Comparison<T> is used in Array and List<T> for: Sort() A form of the Sort() method that takes a comparison delegate; this is an alternate way to custom sort a list/array from having to define custom IComparer<T> classes. The signature for the Comparison<T> delegate looks like (without variance): 1: public delegate int Comparison<T>(T lhs, T rhs); The goal of this delegate is to compare the left-hand-side to the right-hand-side and return a negative number if the lhs < rhs, zero if they are equal, and a positive number if the lhs > rhs. Generally speaking, null is considered to be the smallest value of any reference type, so null should always be less than non-null, and two null values should be considered equal. In most sort/ordering methods, you must specify an IComparer<T> if you want to do custom sorting/ordering. The Array and List<T> types, however, also allow for an alternative Comparison<T> delegate to be used instead, essentially, this lets you perform the custom sort without having to have the custom IComparer<T> class defined. It should be noted, however, that the LINQ OrderBy(), and ThenBy() family of methods do not support the Comparison<T> delegate (though one could easily add their own extension methods to create one, or create an IComparer() factory class that generates one from a Comparison<T>). So, given this delegate, we could use it to perform easy sorts on an Array or List<T> based on custom fields. Say for example we have a data class called Employee with some basic employee information: 1: public sealed class Employee 2: { 3: public string Name { get; set; } 4: public int Id { get; set; } 5: public double Salary { get; set; } 6: } And say we had a List<Employee> that contained data, such as: 1: var employees = new List<Employee> 2: { 3: new Employee { Name = "John Smith", Id = 2, Salary = 37000.0 }, 4: new Employee { Name = "Jane Doe", Id = 1, Salary = 57000.0 }, 5: new Employee { Name = "John Doe", Id = 5, Salary = 60000.0 }, 6: new Employee { Name = "Jane Smith", Id = 3, Salary = 59000.0 } 7: }; Now, using the Comparison<T> delegate form of Sort() on the List<Employee>, we can sort our list many ways: 1: // sort based on employee ID 2: employees.Sort((lhs, rhs) => Comparer<int>.Default.Compare(lhs.Id, rhs.Id)); 3: 4: // sort based on employee name 5: employees.Sort((lhs, rhs) => string.Compare(lhs.Name, rhs.Name)); 6: 7: // sort based on salary, descending (note switched lhs/rhs order for descending) 8: employees.Sort((lhs, rhs) => Comparer<double>.Default.Compare(rhs.Salary, lhs.Salary)); So again, you could use this older delegate, which has a lot of logical meaning to it’s name, or use a generic delegate such as Func<T, T, int> to implement the same sort of behavior. All this said, one of the reasons, in my opinion, that Comparison<T> isn’t used too often is that it tends to need complex lambdas, and the LINQ ability to order based on projections is much easier to use, though the Array and List<T> sorts tend to be more efficient if you want to perform in-place ordering. Converter<TInput, TOutput> – delegate to convert elements The Converter<TInput, TOutput> delegate is used by the Array and List<T> delegate to specify how to convert elements from an array/list of one type (TInput) to another type (TOutput). It is used in an array/list for: ConvertAll() Converts all elements from a List<TInput> / TInput[] to a new List<TOutput> / TOutput[]. The delegate signature for Converter<TInput, TOutput> is very straightforward (ignoring variance): 1: public delegate TOutput Converter<TInput, TOutput>(TInput input); So, this delegate’s job is to taken an input item (of type TInput) and convert it to a return result (of type TOutput). Again, this is logically equivalent to a newer Func delegate with a signature of Func<TInput, TOutput>. In fact, the latter is how the LINQ conversion methods are defined. So, we could use the ConvertAll() syntax to convert a List<T> or T[] to different types, such as: 1: // get a list of just employee IDs 2: var empIds = employees.ConvertAll(emp => emp.Id); 3: 4: // get a list of all emp salaries, as int instead of double: 5: var empSalaries = employees.ConvertAll(emp => (int)emp.Salary); Note that the expressions above are logically equivalent to using LINQ’s Select() method, which gives you a lot more power: 1: // get a list of just employee IDs 2: var empIds = employees.Select(emp => emp.Id).ToList(); 3: 4: // get a list of all emp salaries, as int instead of double: 5: var empSalaries = employees.Select(emp => (int)emp.Salary).ToList(); The only difference with using LINQ is that many of the methods (including Select()) are deferred execution, which means that often times they will not perform the conversion for an item until it is requested. This has both pros and cons in that you gain the benefit of not performing work until it is actually needed, but on the flip side if you want the results now, there is overhead in the behind-the-scenes work that support deferred execution (it’s supported by the yield return / yield break keywords in C# which define iterators that maintain current state information). In general, the new LINQ syntax is preferred, but the older Array and List<T> ConvertAll() methods are still around, as is the Converter<TInput, TOutput> delegate. Sidebar: Variance support update in .NET 4.0 Just like our descriptions of Func and Action, these three early generic delegates also support more variance in assignment as of .NET 4.0. Their new signatures are: 1: // comparison is contravariant on type being compared 2: public delegate int Comparison<in T>(T lhs, T rhs); 3: 4: // converter is contravariant on input and covariant on output 5: public delegate TOutput Contravariant<in TInput, out TOutput>(TInput input); 6: 7: // predicate is contravariant on input 8: public delegate bool Predicate<in T>(T obj); Thus these delegates can now be assigned to delegates allowing for contravariance (going to a more derived type) or covariance (going to a less derived type) based on whether the parameters are input or output, respectively. Summary Today, we wrapped up our generic delegates discussion by looking at three lesser-used delegates: Predicate<T>, Comparison<T>, and Converter<TInput, TOutput>. All three of these tend to be replaced by their more generic Func equivalents in LINQ, but that doesn’t mean you shouldn’t understand what they do or can’t use them for your own code, as they do contain semantic meanings in their names that sometimes get lost in the more generic Func name. Tweet Technorati Tags: C#,CSharp,.NET,Little Wonders,delegates,generics,Predicate,Converter,Comparison

Read the article

Professional WCF 4.0: Windows Communication Foundation with .NET 4.0

- by cibrax

The book in which I been working on since last year finally went to the light this week. It has been the result of hard work between me and three other Connected Systems MVP, my friend Fabio Cozzolino, Kurt Claeys and Johann Grabner. If you are interested in learning the new features in WCF 4.0, but also WCF in general and how to apply in real world scenarios, this book is for you. I dedicated three chapters of this book to one of my favorites topics, Security, from the basics to more complicated scenarios with Claim-Based security and Federated authentication using WCF services with Windows Identity Foundation. You can find more information about the book and the table of contents in the Wrox web site here.

Read the article

Professional WCF 4.0: Windows Communication Foundation with .NET 4.0

The book in which I been working on since last year finally went to the light this week. It has been the result of hard work between me and three other Connected Systems MVP, my friend Fabio Cozzolino, Kurt Claeys and Johann Grabner. If you are interested in learning the new features in WCF 4.0, but also WCF in general and how to apply in real world scenarios, this book is for you. I dedicated three chapters of this book to one of my favorites topics, Security, from the basics to more complicated...Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

Read the article

Useful utilities - PFE, Notepad++ and XML Notepad 2007

- by TATWORTH

All three editors are free to download and use! PFE http://www.lancs.ac.uk/staff/steveb/cpaap/pfe XML Notepad 2007 http://www.microsoft.com/downloads/details.aspx?familyid=72d6aa49-787d-4118-ba5f-4f30fe913628&displaylang=en Notepad++ http://notepad-plus.sourceforge.net/uk/download.php PFE development has stopped and I have included it mainly for reference. It does however have the facility to: Run DOS commands and capture the output Run simple macros XML Note Pad 2007 is excellent if you need to data values from attributes or elements. Notepad++ has various add-ons available for it. - When you download it, be sure to run its internal update function. Needless to say, all three are better than Notepad!

Read the article

CMS DITA North America Conference / Agile Doc

- by ultan o'broin

I attended and presented, along with a colleague, at the Content Management Strategies DITA North America Conference 2010 in Santa Clara this week. It was touch and go whether I would make it across the Atlantic, but as usual the Irish always got through! Our presentation was about DITA and Writing Patterns, and there was three other presentations from Oracle folks too, all very well delivered and received. The interaction with other companies was superb, and the sparks of innovation that flew as a result left me with three use case ideas for UX investigation and implementation. My colleague had a similar experience. Well worth attending! One of the last sessions was about Authoring in an Agile environment, presented by Julio Vasquez. This was an excellent, common sense, and forthright no-nonsense delivery that made complete sense to me. I'd encourage you, if you are interested in the subject, to check out Julio's white paper on the subject too, available from the SDI website.

Read the article

Triple-head on a Lenovo T520

- by codeape

Lenovo T520 with integrated Intel HD graphics + a NVidia card (Optimus) Ubuntu 11.10 on the computer. I would like to use the built-in screen plus two external screens. This PDF indicates that it is possible to connect up to four external monitors to the laptop. The information is Windows only. I was planning to disable the NVidia card, since I have read that Linux support for Optimus is not good. Questions: Has anyone set up three monitors on NVidia hardware? Has anyone set up three monitors using Intel HD 3000? Can I expect it to work out of the box, or are there tricks I need to be aware of?

Read the article

TechEd 2010 Thanks and Demos

- by Adam Machanic

Thank you to everyone who attended my three sessions at this year's TechEd show in New Orleans. I had a great time presenting and answering the really great questions posed by attendees. My sessions were: DAT317 T-SQL Power! The OVER Clause: Your Key to No-Sweat Problem Solving Have you ever stared at a convoluted requirement, unsure of where to begin and how to get there with T-SQL? Have you ever spent three days working on a long and complex query, wondering if there might be a better way? Good...(read more)

Read the article

Switching from Visual Studio to Eclipse [closed]

- by Jouke van der Maas

I've been using Visual Studio for about 6 years now, which is enough time to know most useful keyboard shortcuts and little features. I recently had to switch to Eclipse and java for school, and now I'm constantly searching for the right keys to press. I have searched around for a definitve guide on this, but I couldn't find any. Here's what I want to know: For any feature in Visual Studio, what is the equivalent feature in Eclipse called and what is it's default keyboard shortcut? Are there any things that work very differently in Eclipse, that one might misunderstand or do wrong at first when switching? Are there features in Visual Studio that Eclipse does not have, and is there a workaround? I hope we can create a guide to make life easier for future developers that have to make this switch. You can answer any of the three questions above (no need to do all three), and multiple per answer if you want. I can't mark questions as community wiki anymore, but I do think that's appropriate here.

Read the article

Configuring Multi-Tap on Synaptics Touchpad

- by nunos

I am having a hard time configuring my notebook's touchpad. The touchpad already works. It successfully responds to one-finger tap, two-finger tap and two-finger vertical scrolling. What I want to accomplish: change two-finger tap action from right-mouse click to middle-mouse click add three-finger tap functionality to yield right-mouse click action (i have checked that the three-finger tap is supported by my laptop's touchpad since it works on Windows) I read on a forum to use this as a guide. I have successfully accomplished point 1 with synclient TapButton2=2. However, I have to do it everytime I log in. I have tried to put that command on /etc/rc.local but the computer always boots and logins with the default configuration. Regarding point 2, I have tried synclient TapButton3=3 but it doesn't do anything when I three-finger tap the touchpad. I am running Ubuntu 11.10 on an Asus N82JV. /etc/X11/xorg.conf: nuno@mozart:~$ cat /etc/X11/xorg.conf Section "InputClass" Identifier "touchpad catchall" Driver "synaptics" MatchIsTouchpad "on" MatchDevicePath "/dev/input/event*" Option "TapButton1" "1" Option "TapButton2" "2" Option "TapButton3" "3" EndSection /usr/share/X11/xorg.conf.d/50-synaptics.conf: nuno@mozart:~$ cat /usr/share/X11/xorg.conf.d/50-synaptics.conf # Example xorg.conf.d snippet that assigns the touchpad driver # to all touchpads. See xorg.conf.d(5) for more information on # InputClass. # DO NOT EDIT THIS FILE, your distribution will likely overwrite # it when updating. Copy (and rename) this file into # /etc/X11/xorg.conf.d first. # Additional options may be added in the form of # Option "OptionName" "value" # Section "InputClass" Identifier "touchpad catchall" Driver "synaptics" MatchIsTouchpad "on" MatchDevicePath "/dev/input/event*" Option "TapButton1" "1" Option "TapButton2" "2" Option "TapButton3" "3" EndSection xinput list: nuno@mozart:~$ xinput list ? Virtual core pointer id=2 [master pointer (3)] ? ? Virtual core XTEST pointer id=4 [slave pointer (2)] ? ? Microsoft Microsoft® Nano Transceiver v2.0 id=12 [slave pointer (2)] ? ? Microsoft Microsoft® Nano Transceiver v2.0 id=13 [slave pointer (2)] ? ? ETPS/2 Elantech Touchpad id=16 [slave pointer (2)] ? Virtual core keyboard id=3 [master keyboard (2)] ? Virtual core XTEST keyboard id=5 [slave keyboard (3)] ? Power Button id=6 [slave keyboard (3)] ? Video Bus id=7 [slave keyboard (3)] ? Video Bus id=8 [slave keyboard (3)] ? Sleep Button id=9 [slave keyboard (3)] ? USB2.0 2.0M UVC WebCam id=10 [slave keyboard (3)] ? Microsoft Microsoft® Nano Transceiver v2.0 id=11 [slave keyboard (3)] ? Asus Laptop extra buttons id=14 [slave keyboard (3)] ? AT Translated Set 2 keyboard id=15 [slave keyboard (3)]

Read the article

Project Management - Asana / activeCollab / basecamp / alternative / none

- by rickyduck

I don't know whether this should be on programmers - I've been looking at the above three apps over the past few weeks just for myself and I'm in two minds. All three look good, are easy to use, and I came to this conclusion; Asana is the easiest to use ActiveCollab is the feature rich and easiest flow BaseCamp is the best UX / design But I didn't really find my workflow was any more quicker / efficient, in fact it was a bit slower and organized. Is there a realistic place for them in workflow - should programmers use them for themselves, or only when a project manager can take control of it?

Read the article

A design pattern for data binding an object (with subclasses) to asp.net user control

- by Rohith Nair

I have an abstract class called Address and I am deriving three classes ; HomeAddress, Work Address, NextOfKin address. My idea is to bind this to a usercontrol and based on the type of Address it should bind properly to the ASP.NET user control. My idea is the user control doesn't know which address it is going to present and based on the type it will parse accordingly. How can I design such a setup, based on the fact that, the user control can take any type of address and bind accordingly. I know of one method like :- Declare class objects for all the three types (Home,Work,NextOfKin). Declare an enum to hold these types and based on the type of this enum passed to user control, instantiate the appropriate object based on setter injection. As a part of my generic design, I just created a class structure like this :- I know I am missing a lot of pieces in design. Can anybody give me an idea of how to approach this in proper way.

Read the article

Share files - Ubuntu 12.4 and Windows 7 - one network - password not accepted

- by gotqn

I ask this question in SuperUser but no one helps me. I hope to get more attention here. I have three computers connected in one network by modem. I want to share files in this network in the most easy way (I have read about solutions using Samba). So, I have three machines: One with Windows 7 One with Windows XP One with Ubuntu 12.04 and I have the following situation: The windows PCs can share files between each other. The windows PCs can see that Ubuntu's one is in the network The PC with Ubuntu can see only the PC with Windows 7, but when I click on a folder it ask to enter the network password and it is not accepting it (I am 100% sure it's the correct one) Is there to fix this situation a little bit - at least to enable the file sharing between the Ubuntu and Windows 7 PCs or I should choose a different approach (please advice).

Read the article

Using GridView and DetailsView in ASP.NET MVC - Part 1

- by bipinjoshi

For any beginner in ASP.NET MVC the first disappointment is possibly the lack of any server controls. ASP.NET MVC divides the entire processing logic into three distinct parts namely model, view and controller. In the process views (that represent UI under MVC architecture) need to sacrifice three important features of web forms viz. Postbacks, ViewState and rich event model. Though server controls are not a recommended choice under ASP.NET MVC there are situations where you may need to use server controls. In this two part article I am going to explain, as an example, how GridView and DetailsView can be used in ASP.NET MVC without breaking the MVC pattern.http://www.bipinjoshi.net/articles/59b91531-3fb2-4504-84a4-9f52e2d65c20.aspx

Read the article

Eee PC brighness contol with 12.04 release

- by Terry

is there a solution to the low screen brightness issue with the Eee PC and release 12.04? When I use the brightness control, the screen goes through three adjustment cycles of dark to semi bright, but never gets to bright. As you index the control up, brightness increases, then suddenly cuts back to dark. use the brightness button to further increase the brightness and the same cycle happens. As though there are three distinct brightness events, each one setting back to low level. Under no circumstances other than initial boot up can you get to a bright screen. I just finished installing 12.04 on two Acer (Gateway netbooks) with no brightness issue. Just on the Eee PC

Read the article

Data Import Resources for Release 17

- by Pete

With Release 17 you now have three ways to import data into CRM On Demand: The Import Assistant Oracle Data Loader On Demand, a new, Java-based, command-line utility with a programmable API Web Services We have created the Data Import Options Overview document to help you choose the method that works best for you. We have also created the Data Import Resources page as a single point of reference. It guides you to all resources related to these three import options. So if you're looking for the Data Import Options Overview document, the Data Loader Overview for Release 17, the Data Loader User Guide, or the Data Loader FAQ, here's where you find them: On our new Training and Support Center, under the Learn More tab, go to the What's New section and click Data Import Resources.

Read the article

Simple heart container script for 2D game (Unity)?

- by N1ghtshade3

I'm attempting to create a simple mobile game (C#) that involves a simple three-heart life system. After searching for hours online, many of the solutions use OnGUI (which is apparently horrible for performance) and the rest are too complicated for me to understand and add to my code. The other solutions involve using a single texture and just hiding part of it when damage is taken. In my game, however, the player should be able to go over three hearts (for example, every 100 points). Sebastian Lague's Zelda-Style Health is what I'm looking for, but even though it's a tutorial there is way too much going on that I don't need or can't customize to fit in mine. What I have so far is a script called HealthScript.cs which contains a variable lives. I have another script, PlayerPhysics.cs which calls HealthScript and subtracts a life when an enemy is hit. The part I don't get is actually drawing the hearts. I think I understand what needs to happen, I just am not experienced enough with Unity to know how. The Start function should draw three (or whatever lives is set to) hearts in the top right corner. Since the game should be resolution-independent to accommodate the various sizes of Android devices, I'd rather use scaling rather than PixelInset. When the player hits an enemy as detected by PlayerPhysics.cs, it should subtract from lives. I think that I have this working using this.GetComponent<HealthScript>().lives -= 1 but I'm not sure if it actually works. This should trigger a redraw of the hearts so that there are now two hearts. The same principle would apply for adding hearts when a score is reached, except when lives > maxHeartsPerRow, the new hearts should be drawn below the old ones. I realise I don't have much code to show but believe me; I've tried for quite some time to figure this out and have little to show for it. Any help at all would be welcome; it seems like it shouldn't take that much code to put an image on the screen for each life there is, but I haven't found anything yet. Thanks!

Read the article

D, Vala or Go for game development [on hold]

- by Sheosi

I'm looking forward to choose some compiled language for my 3D engine. The engine it's written in C++, however I would like to help coders by using a language which is good for games. I came with these three: Vala, D and Go. The engine is being made to write as less code as posible, also the "main" language it's going to be Lua so any of these will be the "advanced" one (mainly things which could affect performance or . Because of all this and the fact that I heard that Go is good for small projects I thought it would be a pretty good option, however it does not seems to be made (at least originally) for games and also some say I can have trouble with garbage collection in games. So what do you think? Do you have any experience with any of these three in games? How was it?

Read the article

How do I get brightness controls working properly on an Eee PC 1001P?

- by Terry

Is there a solution to the low screen brightness issue with the Eee PC 1001P and release 12.04? When I use the brightness control, the screen goes through three adjustment cycles of dark to semi bright, but never gets to bright. As you index the control up, brightness increases, then suddenly cuts back to dark. use the brightness button to further increase the brightness and the same cycle happens. As though there are three distinct brightness events, each one setting back to low level. Under no circumstances other than initial boot up can you get to a bright screen. I just finished installing 12.04 on two Acer (Gateway netbooks) with no brightness issue. Just on the Eee PC 1001P Eee PC model is 1001P

Read the article

Clone an Azure VM using Powershell

- by jamiet

In a few months time I will, in association with Technitrain, be running a training course entitled Introduction to SQL Server Data Tools. I am currently working on putting together some hands-on lab material for the course delegates and have decided that in order to save time in asking people to install software during the course I am simply going to prepare a virtual machine (VM) containing all the software and lab material for each delegate to use. Given that I am an MSDN subscriber it makes sense to use Windows Azure to host those VMs given that it will be close to, if not completely, free to do so. What I don’t want to do however is separately build a VM for each delegate, I would much rather build one VM and clone it for each delegate. I’ve spent a bit of time figuring out how to do this using Powershell and in this blog post I am sharing a script that will: Prompt for some information (Azure credentials, Azure subscription name, VM name, username & password, etc…) Create a VM on Azure using that information Prompt you to sysprep the VM and image it (this part can’t be done with Powershell so has to be done manually, a link to instructions is provided in the script output) Create three new VMs based on the image Remove those three VMs Simply download the script and execute it within Powershell, assuming you have an Azure account it should take about 20minutes to execute (spinning up VMs and shutting the down isn’t instantaneous). If you experience any issues please do let me know. There are additional notes below. Hope this is useful! @Jamiet Notes: Obviously there isn’t a lot of point in creating some new VMs and then instantly deleting them. However, this demo script does provide everything you need should you want to do any of these operations in isolation. The names of the three VMs that get created will be suffixed with 001, 002, 003 but you can edit the script to call them whatever you like. The script doesn’t totally clean up after itself. If you specify a service name & storage account name that don’t already exist then it will create them however it won’t remove them when everything is complete. The created image file will also not be deleted. Removing these items can be done by visiting http://manage.windowsazure.com. When creating the image, ensure you use the correct name (the script output tells you what name to use): Here are some screenshots taken from running the script: When the third and final VM gets removed you are asked to confirm via this dialog: Select ‘Yes’

Read the article

Lubuntu with only Ubuntu's workspace/tiling features

- by Johnny Tremain

I searched around for a long time but found nothing. I am moving from Ubuntu to Lubuntu. Everything is great (I am okay with the bland style) except for three features that I use regularly in Ubuntu. 1) Win+w / Win+s zooms out to see an overview of the current workspace and all the workspaces respectively. 2) Ctrl+Alt+num which puts the current application in a specific portion of the workspace. 3) Snap to edge of workspace. How would I get those three features onto Lubuntu? Would that cancel out the benefit of Lubuntu, so I should just stick with Ubuntu (or any distro you can recommend)? Thank you.

Read the article

Sample Code and Slides from DevConnection Germany

- by Stephen Walther

Thank you everyone who came to my three talks this week at DevConnections Germany! I really enjoyed my time in Karlsruhe. Here are the slides and sample code for the three talks: jQuery Templates In this talk, I discuss how you can take advantage of jQuery templates when building both ASP.NET Web Forms and ASP.NET MVC applications. I demonstrate several advanced features of templates such as wrapped templates and remote templates. Download the slides Download the code HTML5 In this talk, I discuss the features of HTML5 which matter most when building database-driven web applications. I demonstrate WebSockets, Web Workers, Web Storage, IndexedDB, and Offline Web Applications. Download the slides Download the code jQuery + OData In this talk, I demonstrate how you can build entire web applications by taking advantage of jQuery and OData. I demonstrate how you can use jQuery and OData to both query and update database data. I also discuss two approaches for supporting validation. Download the slides Download the code

Search Results

Search found 9462 results on 379 pages for 'three tier'.

Page 36/379 | < Previous Page | 32 33 34 35 36 37 38 39 40 41 42 43 | Next Page >

- by reza_rahman

- by Adam Machanic

- by user1709408

- by rchrd

- by James Michael Hare

- by cibrax

- by TATWORTH

- by ultan o'broin

- by codeape

- by Adam Machanic

- by Jouke van der Maas

- by nunos

- by rickyduck

- by Rohith Nair

- by gotqn

- by bipinjoshi

- by Terry

- by Pete

- by N1ghtshade3

- by Sheosi

- by Terry

- by jamiet

- by Johnny Tremain

- by Stephen Walther

< Previous Page | 32 33 34 35 36 37 38 39 40 41 42 43 | Next Page >