Search Results

Search found 8266 results on 331 pages for 'distributed systems'.

Page 180/331 | < Previous Page | 176 177 178 179 180 181 182 183 184 185 186 187  | Next Page >

  • unlinked libraries in a makefile?

    - by wyatt
    I'm trying to install a libspopc, but when I run the make I get the following output: cc -Wall -Wextra -pedantic -pipe -fPIC -Os -DUSE_SSL -c session.c cc -Wall -Wextra -pedantic -pipe -fPIC -Os -DUSE_SSL -c queries.c cc -Wall -Wextra -pedantic -pipe -fPIC -Os -DUSE_SSL -c parsing.c cc -Wall -Wextra -pedantic -pipe -fPIC -Os -DUSE_SSL -c format.c cc -Wall -Wextra -pedantic -pipe -fPIC -Os -DUSE_SSL -c objects.c cc -Wall -Wextra -pedantic -pipe -fPIC -Os -DUSE_SSL -c libspopc.c rm -f libspopc*.a ar r libspopc-0.9n.a session.o queries.o parsing.o format.o objects.o libspopc.o ar: creating libspopc-0.9n.a ranlib libspopc-0.9n.a ln -s libspopc-0.9n.a libspopc.a rm -f libspopc*.so cc -o libspopc-0.9n.so -shared session.o queries.o parsing.o format.o objects.o libspopc.o ln -s libspopc-0.9n.so libspopc.so cc -o poptest1 -Wall -Wextra -pedantic -pipe -fPIC -Os -DUSE_SSL examples/poptest1.c -L. -lspopc -lssl -lcrypto /usr/local/lib/libcrypto.a(dso_dlfcn.o): In function `dlfcn_globallookup': dso_dlfcn.c:(.text+0x2d): undefined reference to `dlopen' dso_dlfcn.c:(.text+0x43): undefined reference to `dlsym' dso_dlfcn.c:(.text+0x4d): undefined reference to `dlclose' /usr/local/lib/libcrypto.a(dso_dlfcn.o): In function `dlfcn_pathbyaddr': dso_dlfcn.c:(.text+0x8f): undefined reference to `dladdr' dso_dlfcn.c:(.text+0xe9): undefined reference to `dlerror' /usr/local/lib/libcrypto.a(dso_dlfcn.o): In function `dlfcn_bind_func': dso_dlfcn.c:(.text+0x491): undefined reference to `dlsym' dso_dlfcn.c:(.text+0x570): undefined reference to `dlerror' /usr/local/lib/libcrypto.a(dso_dlfcn.o): In function `dlfcn_bind_var': dso_dlfcn.c:(.text+0x5f1): undefined reference to `dlsym' dso_dlfcn.c:(.text+0x6d0): undefined reference to `dlerror' /usr/local/lib/libcrypto.a(dso_dlfcn.o): In function `dlfcn_unload': dso_dlfcn.c:(.text+0x735): undefined reference to `dlclose' /usr/local/lib/libcrypto.a(dso_dlfcn.o): In function `dlfcn_load': dso_dlfcn.c:(.text+0x817): undefined reference to `dlopen' dso_dlfcn.c:(.text+0x88e): undefined reference to `dlclose' dso_dlfcn.c:(.text+0x8d5): undefined reference to `dlerror' collect2: ld returned 1 exit status make: *** [poptest1] Error 1 A quick search suggested that this was due to libdl being unlinked, though this seems unlikely in a distributed library, particularly a seemingly relatively popular one. Could anything else be causing this? And if it is due to an unlinked library, how would I go about fixing it? Thanks

    Read the article

  • ISA Server 2006 SSL Certificate Dilemma

    - by JohnyD
    I'm making so great headway in offering our services over https with help from a Go Daddy certificate, later to be upgraded to Thawte SSL123 certs. But, I've just run into one whopper of a problem. Here's my setup: I run an ISA 2006 firewall. Our web services are distributed over 2 servers. One is Windows 2000 (www.domain.com) and the other is Windows 2003 (services.domain.com). So, I'll need to purchase 2 certs for both www and services, import them into IIS6 on their respective machines, then export them with the primary key (making sure to Include all certificates in the certification path if possible... that had me stumped for a while), and then to finally import them into ISA's local computer Personal store. The problem I've just run into is that I have separate firewall rules for services.domain.com and www.domain.com... because requests need to be forwarded to different web servers. Each of these firewall rules use the same httplistener. I have just found out that you can only use 1 certificate per httplistener. To make matters worse you can only have a single httplistener per ip / port. Is this correct? I can only use a single certificate for a single ip address? This would seem to be a severe limitation. Am I wrong? If I'm not then I've got a whole lot more work ahead of me as I'll have to set up extra ip's, add them to the firewall's network interface, create new listeners using that ip, etc... Can someone please confirm that I'm doing this correctly / incorrectly? Once I got my head wrapped around it all it seemed easy... then this. Thanks in advance.

    Read the article

  • Can you "swap" the Sysprep answer file in Windows 7

    - by Ben
    I have a load of new Lenovo laptops which I am due to distribute in my company. We are distributed in multiple locations and I want to ship the laptops "boxed" and untouched by IT hand for distribution. We are using LANDesk to do all the software distribution and provisioning, but are currently falling at the first hurdle as when booted, the laptops kick into the Lenovo mini-setup wizard. I assume this is because they have been sysprepped at Lenovo. In order to keep with our (almost) zero touch strategy I want the users to PXE boot into a PE of some sort, which will run a script on startup which replaces the sysprep answer file with one of my own. (i.e. prepopulated with product key, company info etc.) and then reboot to complete Sysprep. The plan is that this will run, and then install the LANDesk agent as a post-sysprep task, which in turn will complete the provisioning. Anyone have any experience / know any pitfalls to look out for / can suggest a suitable, PXE-bootable PE environment? Apologies for the verbosity of the question - it takes a bit of explaining! Thanks in advance, Ben

    Read the article

  • Nexus 1000v VEM fails on 2 out of 8 hosts.

    - by cougar694u
    I have 8 ESXi hosts. I do a fresh install from the installable CD directly to 4u1. We have another 2-node cluster with a working Nexus 1000v primary & secondary. Everything's up and running. I installed 6 hosts and everything worked great, migrated them to the Nexus DVS, and VUM installed the modules. I did the 7th host, and when I tried to migrate it to the DVS, it failed with the following error: Cannot complete a Distributed Virtual Switch operation for one or more host memebers. DVS Operation failed on host , error durring the configuration of the host: create dvswitch failed with the following error message: SysinfoException: Node (VSI_NODE_net_create) ; Status(bad0003)= Not found ; Message = Instance(0): Inpute(3) DvsPortset-0 256 cisco_nexus_1000v got (vim.fault.PlatformConfigFault) exception Then, I tried to do host 8, and got the exact same problem. It worked about 15 minutes prior when I did host 6, nothing changed, then went to host 7 and it failed. If I try to remediate either of these two hosts, either patches or extensions, it fails. Anyone else have these problems?

    Read the article

  • Biztalk 2009 logshipping with SQL 2008

    - by Manjot
    Hi, I am setting up biztalk logshipping for Biztalk 2009 database. Following http://msdn.microsoft.com/en-us/library/aa560961.aspx article, I am doing the following to setup biztalk logshipping on destination server: Enable Ad-hoc queries by: sp_configure 'show advanced options',1 go reconfigure go sp_configure 'Ad Hoc Distributed Queries',1 go reconfigure go sp_configure 'show advanced options',0 go reconfigure go Execute LogShipping_Destination_Schema & LogShipping_Destination_Logic in master on destinations server Run: exec bts_ConfigureBizTalkLogShipping @nvcDescription = '', @nvcMgmtDatabaseName = '', @nvcMgmtServerName = '', @SourceServerName = null, -- null indicates that this destination server restores all databases @fLinkServers = 1 -- 1 automatically links the server to the management database When I run this I am receiving the following error: Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'. After some research I found some info : Usually this error means that the SQL Server Service Principal Name (SPN) was not configured, and NTLM was not being used as an authentication mechanism. SQl services are runing under different domain accounts. So, I asked the domain admin to create SPNs for the servers, SQL service accounts for beoth source and destination using name and FQDN. enabled computer name and service accounts for delegation. When I run the following: select * from sys.dm_exec_connections I get the the same error: Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON' Any help please?

    Read the article

  • Distribute IP packets accross different NIC queues with MSI (Message Signalled Interrupts)

    - by Ansis Atteka
    NetXtreme II BCM5709 Gigabit Ethernet NIC supports MSI feature (Message Signaled Interrupts) and it has 8 queues. Each queue has its own Interrupt handler in /proc/interrupts. What I am trying to accomplish is to tell NIC which packets should go to which queue. Questions: Is it possible to manually specify which IP packets should go to which queue by encapsulated protocol type (e.g. IPsec packets go in one queue, while TCP packets go in another queue)? If it is possible - how can I do it under Linux? If it is not possible - should I look at MSI-X capable NIC cards to solve this problem? More details: We have one Interface that is terminating IPSec and forwarding/terminating TCP connections. The IPSec packet decryption is inlined (this means that decryption is done under the same ksoftirqd/X context). We are trying to find out if we will be able to improve total performance if IPSec packets will be scheduled on another CPU than TCP packets. One more limitation is that IPSec code is not MP-safe, hence I can not run it under more than one ksoftirqd/X. By default it seems that packets are distributed/hashed by source IP over the 8 NIC queues. The bottleneck is IPSec that chokes out TCP traffic while it is decrypting/encrypting IPSec packets at ~100% CPU. OS is Ubuntu 10.10 (2.6.32-27-server) and NIC is Broadcom BCM5709.

    Read the article

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

  • Using multiple computers effectively

    - by Benjamin Oakes
    I have some extra (old) Macs and PCs around the house and a MacBook that's sometimes overworked. I'm looking for tips on using multiple computers effectively. Basically, I'd like to add to the following list. Here's what I'm using so far: Teleport: lets you use a single mouse and keyboard to control several Macs, like Synergy Built-in file sharing: lets me run programs on another Mac, but only maintain one copy of the data Bazaar: distributed version control Mail.app, Thunderbird, etc.: IMAP for my mail accounts TuneConnect: control iTunes on another Mac with a nice interface, using the library on my MacBook (if I choose it by pressing option at startup) over file sharing OmniFocus: syncs across computers pretty seamlessly Web browsing across computers VNC/Remote Desktop Running X-windows programs using ssh -Y hostname for headless operation (but they die when I sleep the connecting computer -- something like GNU screen would be ideal) Plain-old ssh with GNU screen Really, a better idea of what I do might be necessary. Generally though, I'd like to distribute tasks across more than one computer when possible, but not have much overhead in doing so. The perfect solution? An Xgrid-like program that pushes processing across multiple computers automatically and seamlessly (although that seems unlikely). Here's what I have, in case it makes a difference: MacBook (Dual 2.16 GHz, OS X 10.6.3) eMac (1.25 GHz, OS X 10.4.11, soon to be 10.5) Dell Dimension (800 MHz, some version of Ubuntu) -- no dedicated monitor PowerMac G3 (400 MHz, OS X 10.4.11) -- no dedicated monitor iMac G3 DV (400 MHz, OS X 10.4.11) -- currently in the kitchen for recipes, email, web browsing, music, movies (DVDs), etc. (Total, they cost me around $650, mostly for the MacBook. Freecycle is wonderful, just in case you haven't heard of it.) I'm really only using the MacBook and eMac at this point, but I'd like to push more onto it and possibly the PowerMac and Dell.

    Read the article

  • Convert raw IMAP server data into local folders, then upload partial dataset to new IMAP server?

    - by Manca Weeks
    I am transitioning a company with about 30 IMAP accounts, loaded with data (about 77GB total), to a new email host. The majority of the data will be converted into a local archive and distributed to the company computers as a static reference data set. The server side folders the users absolutely cannot do without being on the server will be uploaded back to the new server. I used Mac OS X Mail (Snow Leopard 10.6.6) to download the content. I notice some messages have the name [xxx].partial.emlx, which leads me to believe they have not been downloaded all the way. I have root access to the mail server data and could download the IMAP server data via FTP. I am not sure what utility to use to convert that data to local Mail.app mailboxes. Furthermore, I would appreciate any input on the best way to upload a portion of the data to the new server (GoDaddy), preserving the original dates of the messages. edit OK - forget the raw server data. I found a script that apparently does pretty good archiving IMAP folders to local mbx files. My main quest now is to batch upload a mailbox hierarchy to the new IMAP server without having to start-stop and deal with similar issues. Anyone know of a utility (hopefully for OS X, but if not, I'll fire up my XP virtual system...) that would be capable of this? Thanks, M

    Read the article

  • Port forwarding for samba

    - by EternallyGreen
    Alright, here's the setup: Internet - Modem - WRT54G - hubs - winxp workstations & linux smb server. Its basically a home-style distributed internet connection setup, except its at a school. What I want is remote, offsite smb access. I figured I'd need to find out which ports need forwarding and then forward them to the server on the router. I'm told in another question on SF that multiple ports will need forwarding, and it gets somewhat complicated. One of the things I need to know is which ports require forwarding for this, and what complications or vulnerabilities could arise from this. Any additional information you think I should have before doing this would be great. I'm told SMB doesn't support encryption, which is fine. Given I set up authentication/access control, all this means is that once one of my users authenticates and starts downloading data, the unencrypted traffic could be intercepted and read by a MITM, correct? Given that that's the only problem arising from lack of encryption, this is of no concern to me. I suppose that it could also mean a MITM injecting false data into the data stream, eg: user requests file A, MITM intercepts and replaces the contents of file A with some false data. This isn't really an issue either, because my users would know that something was wrong, and its not likely anyone would have incentive to do this anyway. Another thing I've been informed of is Microsoft's poor implementation of SMB, and its crap track record for security. Does this apply if only the client-end is MS? My server is linux.

    Read the article

  • HTTP Upload Problems

    - by jfoster
    We are running a marketplace on ColdFusion8 and IIS with a widely geographically distributed user base and have been receiving complaints of issues with some HTTP uploads. Most of the complaints are coming from geographically distant locations from our main datacenter on the US east coast. I've attempted to upload the same 70MB file from a US West coast test server to both our main site and a backup running the same code on a different network route and I saw the same issues fairly consistently in both places, so I've ruled out the code, route, and internal network errors. I've also tested uploads using both the native cf upload tag and a third party tool called SaFileUp. I saw the same issues with both upload tools, so I also don't think this is necessarily a ColdFusion problem. I don't have any problems uploading the test file from the East coast to other east coast servers, so I'm beginning to think that the distance between our users and our equipment is a factor. I've also found that smaller files are more likely to succeed than large ones (< 10MB) I tried the test upload with both IE and FF and did notice a difference in the way that the browsers seemed to handle packet errors. IE seemed to have a tough time continuing an upload after dropped / bad packets, whereas FF seemed to have the ability to gracefully resume an upload after experiencing packet problems. Has anyone experienced similar issues? Is there anything we can do on our side to make uploads more forgiving to packet loss or resumable after an error? A different upload tool etc… Do we need upload servers in more than one location to shorten the network routes between clients and servers? Does anyone think that switching uploads to SSL will help (no layer7 packet sniffing may lead to a smoother upload). Thanks.

    Read the article

  • Symantec Protection Suite Enterprise Edition

    - by rihatum
    We (our company) are planning to deploy Symantec Endpoint Protection and Symantec Desktop Recovery 2011 Desktop Edition to our 3000 - 4000 workstations (Windows7 32 and 64) with a few 100s with Windows XP 32/64 Bit. I have read the implementation guide for SEP and have read tech-notes for Desktop Recovery 2011. Our team have planned to deploy this as follows : 1 x dedicated SQL 2008R2 for Symantec Endpoint Protection (Instead of using the Embedded Database) 1 x Dedicated SQL 2008R2 for Symantec Desktop Recovery 2011 (Instead of using the Embedded Database) 1 x Dedicated W2K8 R2 Box for the SEPM (Symantec Endpoint Protection Manager - Mgmt. APP) 1 x Dedicated W2K8 R2 Box for the Symantec Desktop Recovery 2011 Management Application Agent Deployment : As per Symantec Documentation for both of the above, an agent can be pushed via the Mgmt. Application (provided no firewalls are blocking ports required etc. - we have Windows firewall disabled already). Above is the initial plan we have for 3000 - 4000 client workstation (Windows) Now my Questions :-) a) If we had these users distributed amongst two sites with AD DC / GC in each site, How would I restrict SEPM and Desktop Mgmt. solution to only check for users in their respective site ? b) At present all users are under one building but we are going to move some dept. to a new location (with dedicated connectivity), How would we control which SEPM / MGMT Server is responsible for which site ? c) What Hardware would you recommend as a Server spec for the SQL server 16GB RAM, Dual XEON? d) What Hardware would you recommend as a Server spec for the MGMT Servers 16GB RAM each with DUAL xeon and sas disks? e) Also, how do you or would you recommend to protect these 4 servers (2 x SQL and 2 x MGMT Servers)? f) How would you recommend to store backups for these desktops? We do have a SAN and a NAS in our environment and we do have one spare DAS (Dell MD3000). If you have anything to add / correct - that will be really helpful before diving into the actual implementation phase. Will be most grateful with your suggestions, recommendations and corrections with above - Many Thanks ! Rihatum

    Read the article

  • Temporary boot problem after thunder storm - likely causes?

    - by alastairs
    The village where I live was sat under a thunder cloud for most of Friday, and we suffered a few power fluctuations (specifically, what seemed to be split-second outages). When I got back home from work, I found that my PCs had shut down during one of these outages. When I went to boot one of them back up, I couldn't get anything to display on screen, nor did the boot seem to complete correctly. I tried a number of things - unplugging different bits of hardware, swapping graphics adaptors, etc. - to no avail. I thought I was looking at a fried motherboard or CPU. Power seemed to be distributed correctly to the peripherals (the drives all appeared to be working) so I figured it couldn't be the PSU. Eventually I unplugged it from the mains and left it overnight (approx 12hrs unplugged). I tried it again this morning, and it booted up correctly. Woo-hoo! I have all my equipment protected by surge-protected power strips, so I don't think a spike caused these problems. Obviously it has something to do with the power fluctuations, and maybe the PSU in the problem machine got itself confused somehow. The questions are, for future reference and to help people with similar problems: What are the likely causes of the boot failure I experienced? Is a UPS a simple and cost-effective solution, or might other things help prevent this happening in future? What UPS can you recommend (my budget is limited)?

    Read the article

  • Raid-5 Performance per spindle scaling

    - by Bill N.
    So I am stuck in a corner, I have a storage project that is limited to 24 spindles, and requires heavy random Write (the corresponding read side is purely sequential). Needs every bit of space on my Drives, ~13TB total in a n-1 raid-5, and has to go fast, over 2GB/s sort of fast. The obvious answer is to use a Stripe/Concat (Raid-0/1), or better yet a raid-10 in place of the raid-5, but that is disallowed for reasons beyond my control. So I am here asking for help in getting a sub optimal configuration to be as good as it can be. The array built on direct attached SAS-2 10K rpm drives, backed by a ARECA 18xx series controller with 4GB of cache. 64k array stripes and an 4K stripe aligned XFS File system, with 24 Allocation groups (to avoid some of the penalty for being raid 5). The heart of my question is this: In the same setup with 6 spindles/AG's I see a near disk limited performance on the write, ~100MB/s per spindle, at 12 spindles I see that drop to ~80MB/s and at 24 ~60MB/s. I would expect that with a distributed parity and matched AG's, the performance should scale with the # of spindles, or be worse at small spindle counts, but this array is doing the opposite. What am I missing ? Should Raid-5 performance scale with # of spindles ? Many thanks for your answers and any ideas, input, or guidance. --Bill Edit: Improving RAID performance The other relevant thread I was able to find, discusses some of the same issues in the answers, though it still leaves me with out an answer on the performance scaling.

    Read the article

  • Hardware, network infrastructure for runnng gaming server nd on VirtualGL

    - by archer
    Foud nice project VirtualGL (http://www.virtualgl.org/). Tried to run 3D fames (EVE Online, Prototype) on server and display the output on thin client using 100Mbps network. Server: Gentoo Linux on AMD Phoenom II x6 3.4Gz, 8GB RAM, 2x NVIDIA 9800 GTX in single session with display resulution 1024x768 on client. Performance is very promising. Going to increase network speed to 1Gbps (using either Ethernet or Fiber) and run 5-6 clients simultenously. My questions are: a) what would be better for network - 1Gbps Ethernet or Fiber (clients are distributed in max 20m around server)? Is that a must to use managed switch for better network performance? b) Should I increase number of video cards to put in SLI on server (going to use Gigabyte GA-890FXA-UD7 which has 6 PCIExpress slots [2 x4, 2 x8 and 2 x16]). Will it impact performance significantly. If I need to increase the number of video cards - what would be better - put 2 banks of video cards with 3 in bank using SLI, or 3 banks with 2 in the bank? Would linux recognize that and properly use all banks of video cards? c) any suggestions on good thin clients supporting 1920x1080 HDMI video and 1Gbps network I understand that my questions can't be answered clearly (unless someone already managed to use this kind of stuff ;)) although any suggestions would be very helpful.

    Read the article

  • Amazon CloudFront and EC2: Global Load Balancing

    - by Matt Rogish
    We have an app that is going to store and serve up a decent amount of data in S3 to a global audience where latency should be minimized. So, we've been doing tests with Amazon CloudFront and have seen favorable results. However, we need a thin middleware layer (to do security etc.) and we'd like to put that in EC2. Due to security restrictions, this middleware layer will do the file streaming from S3/CloudFront: S3/CloudFront - EC2 - Clients We can geographically distribute the EC2 nodes (US East/West, and Ireland) but the problem is that a client in the EU would hit our US server and be fed data from there, thus rendering much of the performance benefit of CloudFront moot. I've been digging through the EC2 docs but I can't find a built-in way to get a geographically distributed version of EC2 a la CloudFront. Elastic Load Balancing sounds like the way to go, but I can't seem to find a way with that to direct based on routing... Preferably, we'd like to keep the amount of stuff outside of EC2/S3/etc. to a minimum (for obvious reasons). Any ideas how to do that within the EC2/S3 framework? DNS/routing tricks? Thanks!

    Read the article

  • Set up linux box for secure local hosting a-z

    - by microchasm
    I am in the process of reinstalling the OS on a machine that will be used to host a couple of apps for our business. The apps will be local only; access from external clients will be via vpn only. The prior setup used a hosting control panel (Plesk) for most of the admin, and I was looking at using another similar piece of software for the reinstall - but I figured I should finally learn how it all works. I can do most of the things the software would do for me, but am unclear on the symbiosis of it all. This is all an attempt to further distance myself from the land of Configuration Programmer/Programmer, if at all possible. I can't find a full walkthrough anywhere for what I'm looking for, so I thought I'd put up this question, and if people can help me on the way I will edit this with the answers, and document my progress/pitfalls. Hopefully someday this will help someone down the line. The details: CentOS 5.5 x86_64 httpd: Apache/2.2.3 mysql: 5.0.77 (to be upgraded) php: 5.1 (to be upgraded) The requirements: SECURITY!! Secure file transfer Secure client access (SSL Certs and CA) Secure data storage Virtualhosts/multiple subdomains Local email would be nice, but not critical The Steps: Download latest CentOS DVD-iso (torrent worked great for me). Install CentOS: While going through the install, I checked the Server Components option thinking I was going to be using another Plesk-like admin. In hindsight, considering I've decided to try to go my own way, this probably wasn't the best idea. Basic config: Setup users, networking/ip address etc. Yum update/upgrade. Upgrade PHP/MySQL: To upgrade PHP and MySQL to the latest versions, I had to look to another repo outside CentOS. IUS looks great and I'm happy I found it! Add IUS repository to our package manager cd /tmp wget http://dl.iuscommunity.org/pub/ius/stable/Redhat/5/x86_64/epel-release-1-1.ius.el5.noarch.rpm rpm -Uvh epel-release-1-1.ius.el5.noarch.rpm wget http://dl.iuscommunity.org/pub/ius/stable/Redhat/5/x86_64/ius-release-1-4.ius.el5.noarch.rpm rpm -Uvh ius-release-1-4.ius.el5.noarch.rpm yum list | grep -w \.ius\. # list all the packages in the IUS repository; use this to find PHP/MySQL version and libraries you want to install Remove old version of PHP and install newer version from IUS rpm -qa | grep php # to list all of the installed php packages we want to remove yum shell # open an interactive yum shell remove php-common php-mysql php-cli #remove installed PHP components install php53 php53-mysql php53-cli php53-common #add packages you want transaction solve #important!! checks for dependencies transaction run #important!! does the actual installation of packages. [control+d] #exit yum shell php -v PHP 5.3.2 (cli) (built: Apr 6 2010 18:13:45) Upgrade MySQL from IUS repository /etc/init.d/mysqld stop rpm -qa | grep mysql # to see installed mysql packages yum shell remove mysql mysql-server #remove installed MySQL components install mysql51 mysql51-server mysql51-devel transaction solve #important!! checks for dependencies transaction run #important!! does the actual installation of packages. [control+d] #exit yum shell service mysqld start mysql -v Server version: 5.1.42-ius Distributed by The IUS Community Project Upgrade instructions courtesy of IUS wiki: http://wiki.iuscommunity.org/Doc/ClientUsageGuide Install rssh (restricted shell) to provide scp and sftp access, without allowing ssh login cd /tmp wget http://dag.wieers.com/rpm/packages/rssh/rssh-2.3.2-1.2.el5.rf.x86_64.rpm rpm -ivh rssh-2.3.2-1.2.el5.rf.x86_64.rpm useradd -m -d /home/dev -s /usr/bin/rssh dev passwd dev Edit /etc/rssh.conf to grant access to SFTP to rssh users. vi /etc/rssh.conf Uncomment or add: allowscp allowsftp This allows me to connect to the machine via SFTP protocol in Transmit (my FTP program of choice; I'm sure it's similar with other FTP apps). rssh instructions appropriated (with appreciation!) from http://www.cyberciti.biz/tips/linux-unix-restrict-shell-access-with-rssh.html Set up virtual interfaces ifconfig eth1:1 192.168.1.3 up #start up the virtual interface cd /etc/sysconfig/network-scripts/ cp ifcfg-eth1 ifcfg-eth1:1 #copy default script and match name to our virtual interface vi ifcfg-eth1:1 #modify eth1:1 script #ifcfg-eth1:1 | modify so it looks like this: DEVICE=eth1:1 IPADDR=192.168.1.3 NETMASK=255.255.255.0 NETWORK=192.168.1.0 ONBOOT=yes NAME=eth1:1 Add more Virtual interfaces as needed by repeating. Because of the ONBOOT=yes line in the ifcfg-eth1:1 file, this interface will be brought up when the system boots, or the network starts/restarts. service network restart Shutting down interface eth0: [ OK ] Shutting down interface eth1: [ OK ] Shutting down loopback interface: [ OK ] Bringing up loopback interface: [ OK ] Bringing up interface eth0: [ OK ] Bringing up interface eth1: [ OK ] ping 192.168.1.3 64 bytes from 192.168.1.3: icmp_seq=1 ttl=64 time=0.105 ms Virtualhosts In the rssh section above I added a user to use for SFTP. In this users' home directory, I created a folder called 'https'. This is where the documents for this site will live, so I need to add a virtualhost that will point to it. I will use the above virtual interface for this site (herein called dev.site.local). vi /etc/http/conf/httpd.conf Add the following to the end of httpd.conf: <VirtualHost 192.168.1.3:80> ServerAdmin [email protected] DocumentRoot /home/dev/https ServerName dev.site.local ErrorLog /home/dev/logs/error_log TransferLog /home/dev/logs/access_log </VirtualHost> I put a dummy index.html file in the https directory just to check everything out. I tried browsing to it, and was met with permission denied errors. The logs only gave an obscure reference to what was going on: [Mon May 17 14:57:11 2010] [error] [client 192.168.1.100] (13)Permission denied: access to /index.html denied I tried chmod 777 et. al., but to no avail. Turns out, I needed to chmod+x the https directory and its' parent directories. chmod +x /home chmod +x /home/dev chmod +x /home/dev/https This solved that problem. DNS I'm handling DNS via our local Windows Server 2003 box. However, the CentOS documentation for BIND can be found here: http://www.centos.org/docs/5/html/Deployment_Guide-en-US/ch-bind.html SSL To get SSL working, I changed the following in httpd.conf: NameVirtualHost 192.168.1.3:443 #make sure this line is in httpd.conf <VirtualHost 192.168.1.3:443> #change port to 443 ServerAdmin [email protected] DocumentRoot /home/dev/https ServerName dev.site.local ErrorLog /home/dev/logs/error_log TransferLog /home/dev/logs/access_log </VirtualHost> Unfortunately, I keep getting (Error code: ssl_error_rx_record_too_long) errors when trying to access a page with SSL. As JamesHannah gracefully pointed out below, I had not set up the locations of the certs in httpd.conf, and thusly was getting the page thrown at the broswer as the cert making the browser balk. So first, I needed to set up a CA and make certificate files. I found a great (if old) walkthrough on the process here: http://www.debian-administration.org/articles/284. Here are the relevant steps I took from that article: mkdir /home/CA cd /home/CA/ mkdir newcerts private echo '01' > serial touch index.txt #this and the above command are for the database that will keep track of certs Create an openssl.cnf file in the /home/CA/ dir and edit it per the walkthrough linked above. (For reference, my finished openssl.cnf file looked like this: http://pastebin.com/raw.php?i=hnZDij4T) openssl req -new -x509 -extensions v3_ca -keyout private/cakey.pem -out cacert.pem -days 3650 -config ./openssl.cnf #this creates the cacert.pem which gets distributed and imported to the browser(s) Modified openssl.cnf again per walkthrough instructions. openssl req -new -nodes -out dev.req.pem -config ./openssl.cnf #generates certificate request, and key.pem which I renamed dev.key.pem. Modified openssl.cnf again per walkthrough instructions. openssl ca -out dev.cert.pem -config ./openssl.cnf -infiles dev.req.pem #create and sign certificate. cp dev.cert.pem /home/dev/certs/cert.pem cp dev.key.pem /home/certs/key.pem I updated httpd.conf to reflect the certs and turn SSLEngine on: NameVirtualHost 192.168.1.3:443 <VirtualHost 192.168.1.3:443> ServerAdmin [email protected] DocumentRoot /home/dev/https SSLEngine on SSLCertificateFile /home/dev/certs/cert.pem SSLCertificateKeyFile /home/dev/certs/key.pem ServerName dev.site.local ErrorLog /home/dev/logs/error_log TransferLog /home/dev/logs/access_log </VirtualHost> Put the CA cert.pem in a web-accessible place, and downloaded/imported it into my browser. Now I can visit https://dev.site.local with no errors or warnings. And this is where I'm at. I will keep editing this as I make progress. Any tips on how to configure SSL email would be appreciated.

    Read the article

  • Use Alladin eToken with ThunderBird and other tool

    - by Yurij73
    I'm looking for an example on how to setup the eToken PRO Java device to work with Mozilla Thunderbird and with other Linux tool such as PAM logon. I installed distributed pkiclient-5.00.28-0.i386.RPM from the official product page eToken Pro but that tool only handles importing/exporting certificates on the device. I read a glance an old HOWTO from eToken on Linux, but I couldn't install pkcs11-lib for this device as recommended for Thunderbird use this crypto device. It seems my usb token isn't listed in system, unless lsusb show it, so that is the matter modutil -list -dbdir /etc/pki/nssdb Listing of PKCS #11 Modules NSS Internal PKCS #11 Module Blockquote slots: 2 slots attached Blockquote status: loaded Blockquote slot: NSS User Private Key and Certificate Services Blockquote token: NSS Certificate DB Blockquote CoolKey PKCS #11 Module Blockquote library name: libcoolkeypk11.so Blockquote slots: 1 slot attached Blockquote status: loaded Blockquote slot: AKS ifdh [Main Interface] 00 00 token: is my token absent? on other hand i don't know which module is convenient to Java Pro, does CoolKey does all the job well? It seems Java token is too new hardware for Linux? there is excerpt from /etc/pam_pkcs11.conf #filename of the PKCS #11 module. The default value is "default" use_pkcs11_module = coolkey; screen_savers = gnome-screensaver,xscreensaver,kscreensaver pkcs11_module coolkey { module = libcoolkeypk11.so; description = "Cool Key"`

    Read the article

  • Wastage of resources in Virtualization

    - by Sabeen Malik
    I have asked this question on SO, but was suggested that i ask it here on SF, so here it goes. http://stackoverflow.com/questions/3010753/wastage-of-resources-in-virtualization I am not sure if this is the right place to ask the question. However i hope it is. When looking for a VPS earlier today, I was trying to understand how each container would work in the background. Keeping in mind the fact that the operating system uses most of the memory and power on a system, wouldn't having multiple operating systems in the same machine mean more wastage of resources. For instance if i was running centOS on a dedicated box and it was running lets say 20 background OS level processes. Then i go and install a virtualization platform and install 5 more centOS virtual machines in the same system which are exactly the same as the host operating system. Doesn't this mean duplication of those 20 processes 6 times? So internally the context switching is happening between 120 processes instead of 20? Further Notes: Here is an example of what i am thinking: I have a master-slave configuration for a long running, cpu + memory intensive process, which can be distributed to 4 machines. Lets say when the process runs on these 4 machines with lets say 1 Gh CPU and 1 Gig RAM, i get 400 results per hour from the cluster (assuming 100 results from one machine) . Now i get a bigger machine ( lets say 4Gh and 4 Gig RAM), have 4 virtual hosts on it with 1 Gz CPU and 1 Gig RAM. Will this configuration give me the same 4 results per hour from these 4 virtual hosts?

    Read the article

  • How to manage processes-to-CPU cores affinities ?

    - by Philippe
    I use a distributed user-space filesystem (GlusterFS) and I would like to be sure GlusterFS processes will always have the computing power they need. Each execution node of my grid have 2 CPU, with 4 cores per CPU and 2 threads per core (16 "processors" are seen by Linux). My goal is to guarantee that GlusterFS processes have enough processing power to be reliable, responsive and fast. (There is no marketing here, just the dreams of a sysadmin ;-) I consider two main points : GlusterFS processes I/O for data access (on local disks, or remote disks) I thought about binding the Linux Kernel and GlusterFS instances on a specific "processor". I would like to be sure that : No grid job will impact the kernel and the GlusterFS instances Researchers jobs won't be affected by system processes (I'd like to reserve a pool of cores to job execution and be sure that no system process will use these CPUs) But what about I/O ? As we handle a huge amount of data (several terabytes), we'll have a lot of interuptions. How can I distribute these operations on my processors ? What are the "best practices" ? Thanks for your comments!

    Read the article

  • Replicated MongoDB server slower than simple shards

    - by displayName
    I tried to compare the performance of a sharded configuration against a sharded and replicated configuration. The sharded configuration consists of 8 shards each running on three different machines thereby constituting a total of 24 shards. All 8 of these shards run in the same partition on each machine. The sharded and replicated version is 8 shards again just like plain sharding, and all 8 mongods run on the same partition in each machine. But apart from this, each of these three machine now run additional 16 threads on another partition which serve as the secondary for the 8 mongods running on other machines. This is the way I prepared a sharded and replicated configuration with data chunks having replication factor of 3. Important point to note is that once the data has been loaded, it is not modified. So after primary and secondaries have synchronized then it doesn't matter which one i read from. To run the queries, I use an entirely different machine (let's call it config) which runs mongos and this machine's only purpose is to receive queries and run them on the cluster. Contrary to my expectations, plain sharding of 8 threads on each machine (total = 3 * 8 = 24) is performing better for queries than the sharded + replicated configuration. I have a script written to perform the query. So in order to time the scripts, I use time ./testScript and see the result. I tried changing the reading preference for replicated cluster by logging to mongo of config and run db.getMongo().setReadPref('secondary') and then exit the shell and run the queries like time ./testScript. The questions are: Where am i going wrong in the replication? Why is it slower than its plain sharding version? Does the db.getMongo().ReadPref('secondary') persist when i leave the shell and try to perform the query? All the four machines are running Linux and i have already increased the ulimit -n to 2048 from initial value of 1024 to allow more connections. The collections are properly distributed and all the mongods have equal number of chunks. Goes without saying that indices in both configurations are the same.

    Read the article

  • "Delivered-To" Header in Exchange

    - by Kaii
    In some SMTP server implementations (i.e. Postfix) you can enable Delivered-To and X-Original-To headers that will be added to your email. (or [X-]Envelope-To) This is very helpful with distribution lists to determine which e-mail address the mail has been redirected to. So, when the mail has been sent to [email protected], you can see in the Delivered-To or Envelope-To header that it has been redirected (distributed) to [email protected], which is one of many other e-mail addresses that are linked to a single mailbox. How do I find which address was used to deliver this mail to a specific mailbox on Microsoft Exchange 2010? Looking at the plain message (with all headers) i can not find any information that the mail arrived via address [email protected] I think I need the Delivered-To header (or a similar one) to be set on Microsoft Exchange when a mail is delivered via distribution lists. Is there any way to enable such header in Exchange 2010? I need it so that our Ticket system (OTRS) correctly recognizes where the ticket belongs to. Adding all the e-mail addresses of all distribution lists to the system configuration is not the right solution. And if there is a solution for Exchange 2010, is this possibly also applicable to Exchange 2007?

    Read the article

  • Redis as substitution for Memcache

    - by Boban P.
    We have distributed web app, and for now, as session handler, we use two separate instances of memcache in redundancy, so everything that is written in one memcache is also written in other. Memcache is fairly easy to install, use, and maintain but we have one problem: if one memcache fail, everything is fine, php comunicate with other instance which has all data (although, half of connections have a delay because they try to use failed one, wait a little, and then contact other memcache). When failed instance comes back to life again, it starts up empty. If established session request data from that instance, session fails, and user logs out, and that happens to half of users.So, we are thinking about to switch to redis for session handling, and maybe keep memcache for cache only. My questions are: If we setup redis instances as master-slave, and if master fails, can sentinel promote slave as new master and when old master comes back to life, will it stay as slave or not? Is redis call malloc at startup to allocate part of memory, like memcache or varnish, or it calls malloc for every key inserted? And what are pros and cons of that?

    Read the article

  • Designing a software based load balancer

    - by Kishore pandey
    Hello to all Server fault users, I am new to this website but have constantly been using the mother website, stackover flow. Well to begin with, i would like to design a load balancer for the organization i am working for. As i am very new to this whole, idea about load balancing and networks. I am finding it very difficult to start my project. I did a lot of research on already existing load balancer and found some(HAPROXY,NGINX) that could solve my problems, but the point is, I am still in a dilemma if they could answer the following requirements of mine: The client and server in my architecture are distributed. The load balancer should take care of the firewall. LB server should balance the load among all servers present in WWW cloud. The LB server should have some sort of configuration file, with the help of which it is possible to configure the servers. Heart beat: With the help of which it would be possible to check if any server is down, if any server is down the request should be passed to some other server. Various load balancing algorithms of the incoming requests. Easy error handling. It should be fairly possible to prioritize the incoming requests. Is there any already available load balncer solution on the market that could satisfy these requirements? If not is there any base code available with the help of which i could develop my own load balncer. If not where should i start from scratch? I am practically new to everything. Any help from a load balancer expert is very much appreciated. Thanx a ton in advance. Cheers and regards. Kishore

    Read the article

  • Can I run Excel 2010 on a server?

    - by Glen Little
    This question is not about a person using Excel on a computer that happens to have an Windows Server OS. And it is not about using any Sharepoint services features! The question is about automated processes that use code (Office Automation) to open Excel files, manipulate them, run calculations, read data, save copies of the file and close the files... all in code. In previous versions of Excel the licensing agreement prevented use on a public server, notes from Microsoft warned about the problems trying to use Office Automation in a server environment, and we were warned that Excel was single threaded and not designed for use on a server. Most of the articles about this were written before Office 2010. But now, Excel 2010 is designed to work on a High Performance Computing server using HPC Services for Excel. One HPC document mentions "Windows HPC Server 2008 R2 includes a comprehensive pop-up manager that can handle occasional dialog boxes and pop-up messages". So my question is... is it now "safe" to run code that automates Excel 2010 on a "normal" server without using the HPC services? If not, can the HPC Services for Excel work on a single server? I don't need the high performance, distributed computing, aspect of HPC Services for Excel... just the ability to run Excel on a server. Can that now be done? Thanks, Glen

    Read the article

< Previous Page | 176 177 178 179 180 181 182 183 184 185 186 187  | Next Page >