Search Results

Search found 19586 results on 784 pages for 'machine instruction'.

Page 121/784 | < Previous Page | 117 118 119 120 121 122 123 124 125 126 127 128  | Next Page >

  • Azure Linux out of band managment

    - by faker
    I have a Linux (Ubuntu) virtual machine running on Azure. It seems like the only way to connect to it is via SSH. This is OK for normal operation, but what to do when something goes wrong (fsck waiting for user-input, new kernel doesn't boot, mis-configured network, etc.)? There is a grayed out "Connect" button in the management interface, and the help for it says: To access a virtual machine running Windows Server, click Connect and follow the instructions. Enter the password that was set when the virtual machine was created. The Connect button is not available for a virtual machine running Linux, but you can use your favorite SSH program to access it. I've read the documentation on the command line tools, but there is also no way to connect to it. Is there any way for me to get such a console?

    Read the article

  • Unidentified network in Win7

    - by gylns
    I connect the internet through Ad-hoc network, My machine uses win7 and another uses winows xp, There's no problem when I connect the XP machine, but if i disconnect and reconnect the net, then my local network is marked as "Unidentified network",unless restart the XP machine, I don't know why?

    Read the article

  • Migrating VMWare Fusion Bootcamp Partition to New Mac

    - by 107217170653252726124
    What is the best way for me to migrate my VMWare Fusion Bootcamp Partition to a new MAC. Preferably I would like to import the current bootcamp partition in as a strict virtual machine on the new mac, instead of in bootcamp. The "older mac" is about two years old, so I don't think there will be any compatibility issues, but i do not have enough disc space on the old machine to import it as a virtual machine, and then migrate it. Thanks for the help.

    Read the article

  • Software disables itself when the PC is accessed via RDP

    - by blckgrffn
    We have a large, specialty printer that has vendor specific software that enables its use outside of it showing up simply as a printer in the Windows Control Panel. This software recognizes when we RDP into the machine and "disconnects" the PC from the printer within its proprietary control panel. All is well when an application like TeamViewer is used to access the machine. Ostensibly, the application is helping us be safe by "enforcing" that the machine used for the printer is a walk up workstation, or so the support folks informed me. If TeamViewer etc, fixes the issue, then what is the problem? We have many headless workstations in our warehouse attached to a variety of specialty machines, all used via RDP. We want/need to keep access to the machines the same for the sanity of our production staff. The meat of the question - how, specifically, might a machine know that it is being accessed via RDP (terminal services management???) and how might this be defeated without altering an application or driver. Of note, the system being used is a Windows 7 Pro machine hooked to the printer via USB. Thanks! Nat edit Is there any combination of /admin switches, etc. that will possibly fix this? Simply putting /admin did not.

    Read the article

  • I have install on my xpsp pc VMware® Workstation 7.1.1 build-282343

    - by Ajifan
    I have installed on my XP sp pc to VMware® Workstation 7.1.1 build-282343 And also I have more two separate machine one laptop and one pc I have already connect those three machine's to a Nortel switch to distribute internet as a work group Now I want add a virtual Windows server 2003 enterprise and I want to connect those my to machine as a client and want to give some policies and security for users Like real network environment t how I can implement those steps please explain me step by step

    Read the article

  • VMWare Player does not save changes in guest operating system

    - by Giorgos Manoltzas
    I have installed VMWare Player in my Windows 7 Ultimate and I have added Ubuntu 12.10 in a virtual machine. The problem is that every time I power off the virtual machine everything is lost. I mean every program I install or folder i save or anything else is lost completely. And when I start the again the virtual machine is like newly installed. Is there an option to change this (strange) behaviour?

    Read the article

  • HP LaserJet 1020 doesn't print

    - by Rod
    I've got a HP LaserJet 1020 printer connected to my 64-bit Windows 7 Ultimate machine. I'm sharing this printer out. Got another 64-bit Windows 7 machine, with Home Premium. I installed the printer driver from HP for 64-bit on both machines. However, the strange thing is that if the Home Premium machine prints something, it never comes out unless I reboot the Ultimate machine. This is less than optimal. Any idea what's causing this and if there's anything I can do about it?

    Read the article

  • I have Oracle SQL Developer Installed, Now What?

    - by thatjeffsmith
    If you’re here because you downloaded a copy of Oracle SQL Developer and now you need help connecting to a database, then you’re in the right place. I’ll show you what you need to get up and going so you can finish your homework, teach yourself Oracle database, or get ready for that job interview. You’ll need about 30 minutes to set everything up…and about 5 years to become proficient with Oracle Oracle Database come with SQL Developer but SQL Developer doesn’t include a database If you install Oracle database, it includes a copy of SQL Developer. If you’re running that copy of SQL Developer, please take a second to upgrade now, as it is WAY out of date. But I’m here to talk to the folks that have downloaded SQL Developer and want to know what to do next. You’ve got it running. You see this ‘Connection’ dialog, and… Where am I connecting to, and who as? You NEED a database Installing SQL Developer does not give you a database. So you’re going to need to install Oracle and create a database, or connect to a database that is already up and running somewhere. Basically you need to know the following: where is this database, what’s it called, and what port is the listener running on? The Default Connection properties in SQL Developer These default settings CAN work, but ONLY if you have installed Oracle Database Express Edition (XE). Localhost is a network alias for 127.0.0.1 which is an IP address that maps to the ‘local’ machine, or the machine you are reading this blog post on. The listener is a service that runs on the server and handles connections for the databases on that machine. You can run a database without a listener and you can run a listener without a database, but you can’t connect to a database on a different server unless both that database and listener are up and running. Each listener ‘listens’ on one or more ports, you need to know the port number for each connection. The default port is 1521, but 1522 is often pretty common. I know all of this sounds very complicated Oracle is a very sophisticated piece of software. It’s not analogous to downloading a mobile phone app and and using it 10 seconds later. It’s not like installing Office/Access either – it requires services, environment setup, kernel tweaks, etc. However. Normally an administrator will setup and install Oracle, create the database, and configure the listener for everyone else to use. They’ll often also setup the connection details for everyone via a ‘TNSNAMES.ORA’ file. This file contains a list of database connection details for folks to browse – kind of like an Oracle database phoneboook. If someone has given you a TNSNAMES.ORA file, or setup your machine to have access to a TNSNAMES file, then you can just switch to the ‘TNS’ connection type, and use the dropdown to select the database you want to connect to. Then you don’t have to worry about the server names, database names, and the port numbers. ORCL – that sounds promising! ORCL is the default SID when creating a new database with the Database Creation Assistant (DBCA). It’s just me, and I need help! No administrator, no database, no nothing. What do you do? You have a few options: Buy a copy of Oracle and download, install, and create a database Download and install XE (FREE!) Download, import, and run our Developer Days Hands-on-Lab (FREE!) If you’re a student (or anyone else) with little to no experience with Oracle, then I recommend the third option. Oracle Technology Network Developer Day: Hands-on Database Application Development Lab The OTN lab runs on a A Virtual Box image which contains: 11gR2 Enterprise Edition copy of Oracle a database and listener running for you to connect to lots of demo data for you to play with SQL Developer installed and ready to connect Some browser based labs you can step through to learn Oracle You download the image, you download and install Virtual Box (also FREE!), then you IMPORT the image you previously downloaded. You then ‘Start’ the image. It will boot a copy of Oracle Enterprise Linux (OEL), start your database, and all that jazz. You can then start up and run SQL Developer inside the image OR you can connect to the database running on the image using the copy of SQL Developer you installed on your host machine. Setup Port Forwarding to Make It Easy to Connect From Your Host When you start the image, it will be assigned an IP address. Depending on what network adapter you select in the image preferences, you may get something that can get out to the internet from your image, something your host machine can see and connect to, or something that kind of just lives out there in a vacuum. You want to avoid the ‘vacuum’ option – unless you’re OK with running SQL Developer inside the Linux image. Open the Virtual Box image properties and go to the Networking options. We’re going to setup port forwarding. This will tell your machine that anything that happens on port 1521 (the default Oracle Listener port), should just go to the image’s port 1521. So I can connect to ‘localhost’ and it will magically get transferred to the image that is running. Oracle Virtual Box Port Forwarding 1521 listener database Now You Just Need a Username and Password The default passwords on this image are all ‘oracle’ – so you can connect as SYS, HR, or whatever – just use ‘oracle’ as the password. The Linux passowrds are all ‘oracle’ too, so you can login as ‘root’ or as ‘oracle’ in the Linux desktop. Connect! Connect as HR to your Oracle database running on the OTN Developer Days Virtual Box image If you’re connecting to someone else’s database, you need to ask the person that manages that environment to create for you an account. Don’t try to ‘guess’ or ‘figure out’ what the username and password is. Introduce yourself, explain your situation, and ask kindly for access. This is your first test – can you connect? I know it’s hard to get started with Oracle. There are however many things we offer to make this easier. You’ll need to do a bit of RTM first though. Once you know what’s required, you will be much more likely to succeed. Of course, if you need help, you know where to find me

    Read the article

  • Integrating Coherence & Java EE 6 Applications using ActiveCache

    - by Ricardo Ferreira
    OK, so you are a developer and are starting a new Java EE 6 application using the most wonderful features of the Java EE platform like Enterprise JavaBeans, JavaServer Faces, CDI, JPA e another cool stuff technologies. And your architecture need to hold piece of data into distributed caches to improve application's performance, scalability and reliability? If this is your current facing scenario, maybe you should look closely in the solutions provided by Oracle WebLogic Server. Oracle had integrated WebLogic Server and its champion data caching technology called Oracle Coherence. This seamless integration between this two products provides a comprehensive environment to develop applications without the complexity of extra Java code to manage cache as a dependency, since Oracle provides an DI ("Dependency Injection") mechanism for Coherence, the same DI mechanism available in standard Java EE applications. This feature is called ActiveCache. In this article, I will show you how to configure ActiveCache in WebLogic and at your Java EE application. Configuring WebLogic to manage Coherence Before you start changing your application to use Coherence, you need to configure your Coherence distributed cache. The good news is, you can manage all this stuff without writing a single line of code of XML or even Java. This configuration can be done entirely in the WebLogic administration console. The first thing to do is the setup of a Coherence cluster. A Coherence cluster is a set of Coherence JVMs configured to form one single view of the cache. This means that you can insert or remove members of the cluster without the client application (the application that generates or consume data from the cache) knows about the changes. This concept allows your solution to scale-out without changing the application server JVMs. You can growth your application only in the data grid layer. To start the configuration, you need to configure an machine that points to the server in which you want to execute the Coherence JVMs. WebLogic Server allows you to do this very easily using the Administration Console. In this example, I will call the machine as "coherence-server". Remember that in order to the machine concept works, you need to ensure that the NodeManager are being executed in the target server that the machine points to. The NodeManager executable can be found in <WLS_HOME>/server/bin/startNodeManager.sh. The next thing to do is to configure a Coherence cluster. In the WebLogic administration console, go to Environment > Coherence Clusters and click in "New". Call this Coherence cluster of "my-coherence-cluster". Click in next. Specify a valid cluster address and port. The Coherence members will communicate with each other through this address and port. Our Coherence cluster are now configured. Now it is time to configure the Coherence members and add them to this cluster. In the WebLogic administration console, go to Environment > Coherence Servers and click in "New". In the field "Name" set to "coh-server-1". In the field "Machine", associate this Coherence server to the machine "coherence-server". In the field "Cluster", associate this Coherence server to the cluster named "my-coherence-cluster". Click in "Finish". Start the Coherence server using the "Control" tab of WebLogic administration console. This will instruct WebLogic to start a new JVM of Coherence in the target machine that should join the pre-defined Coherence cluster. Configuring your Java EE Application to Access Coherence Now lets pass to the funny part of the configuration. The first thing to do is to inform your Java EE application which Coherence cluster to join. Oracle had updated WebLogic server deployment descriptors so you will not have to change your code or the containers deployment descriptors like application.xml, ejb-jar.xml or web.xml. In this example, I will show you how to enable DI ("Dependency Injection") to a Coherence cache from a Servlet 3.0 component. In the WEB-INF/weblogic.xml deployment descriptor, put the following metadata information: <?xml version="1.0" encoding="UTF-8"?> <wls:weblogic-web-app xmlns:wls="http://xmlns.oracle.com/weblogic/weblogic-web-app" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd http://xmlns.oracle.com/weblogic/weblogic-web-app http://xmlns.oracle.com/weblogic/weblogic-web-app/1.4/weblogic-web-app.xsd"> <wls:context-root>myWebApp</wls:context-root> <wls:coherence-cluster-ref> <wls:coherence-cluster-name>my-coherence-cluster</wls:coherence-cluster-name> </wls:coherence-cluster-ref> </wls:weblogic-web-app> As you can see, using the "coherence-cluster-name" tag, we are informing our Java EE application that it should join the "my-coherence-cluster" when it loads in the web container. Without this information, the application will not be able to access the predefined Coherence cluster. It will form its own Coherence cluster without any members. So never forget to put this information. Now put the coherence.jar and active-cache-1.0.jar dependencies at your WEB-INF/lib application classpath. You need to deploy this dependencies so ActiveCache can automatically take care of the Coherence cluster join phase. This dependencies can be found in the following locations: - <WLS_HOME>/common/deployable-libraries/active-cache-1.0.jar - <COHERENCE_HOME>/lib/coherence.jar Finally, you need to write down the access code to the Coherence cache at your Servlet. In the following example, we have a Servlet 3.0 component that access a Coherence cache named "transactions" and prints into the browser output the content (the ammount property) of one specific transaction. package com.oracle.coherence.demo.activecache; import java.io.IOException; import javax.annotation.Resource; import javax.servlet.ServletException; import javax.servlet.annotation.WebServlet; import javax.servlet.http.HttpServlet; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; import com.tangosol.net.NamedCache; @WebServlet("/demo/specificTransaction") public class TransactionServletExample extends HttpServlet { @Resource(mappedName = "transactions") NamedCache transactions; protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { int transId = Integer.parseInt(request.getParameter("transId")); Transaction transaction = (Transaction) transactions.get(transId); response.getWriter().println("<center>" + transaction.getAmmount() + "</center>"); } } Thats it! No more configuration is necessary and you have all set to start producing and getting data to/from Coherence. As you can see in the example code, the Coherence cache are treated as a normal dependency in the Java EE container. The magic happens behind the scenes when the ActiveCache allows your application to join the defined Coherence cluster. The most interesting thing about this approach is, no matter which type of Coherence cache your are using (Distributed, Partitioned, Replicated, WAN-Remote) for the client application, it is just a simple attribute member of com.tangosol.net.NamedCache type. And its all managed by the Java EE container as an dependency. This means that if you inject the same dependency (the Coherence cache named "transactions") in another Java EE component (JSF managed-bean, Stateless EJB) the cache will be the same. Cool isn't it? Thanks to the CDI technology, we can extend the same support for non-Java EE standards components like simple POJOs. This means that you are not forced to only use Servlets, EJBs or JSF in order to inject Coherence caches. You can do the same approach for regular POJOs created for you and managed by lightweight containers like Spring or Seam.

    Read the article

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

  • Subterranean IL: Exception handling 2

    - by Simon Cooper
    Control flow in and around exception handlers is tightly controlled, due to the various ways the handler blocks can be executed. To start off with, I'll describe what SEH does when an exception is thrown. Handling exceptions When an exception is thrown, the CLR stops program execution at the throw statement and searches up the call stack looking for an appropriate handler; catch clauses are analyzed, and filter blocks are executed (I'll be looking at filter blocks in a later post). Then, when an appropriate catch or filter handler is found, the stack is unwound to that handler, executing successive finally and fault handlers in their own stack contexts along the way, and program execution continues at the start of the catch handler. Because catch, fault, finally and filter blocks can be executed essentially out of the blue by the SEH mechanism, without any reference to preceding instructions, you can't use arbitary branches in and out of exception handler blocks. Instead, you need to use specific instructions for control flow out of handler blocks: leave, endfinally/endfault, and endfilter. Exception handler control flow try blocks You cannot branch into or out of a try block or its handler using normal control flow instructions. The only way of entering a try block is by either falling through from preceding instructions, or by branching to the first instruction in the block. Once you are inside a try block, you can only leave it by throwing an exception or using the leave <label> instruction to jump to somewhere outside the block and its handler. The leave instructions signals the CLR to execute any finally handlers around the block. Most importantly, you cannot fall out of the block, and you cannot use a ret to return from the containing method (unlike in C#); you have to use leave to branch to a ret elsewhere in the method. As a side effect, leave empties the stack. catch blocks The only way of entering a catch block is if it is run by the SEH. At the start of the block execution, the thrown exception will be the only thing on the stack. The only way of leaving a catch block is to use throw, rethrow, or leave, in a similar way to try blocks. However, one thing you can do is use a leave to branch back to an arbitary place in the handler's try block! In other words, you can do this: .try { // ... newobj instance void [mscorlib]System.Exception::.ctor() throw MidTry: // ... leave.s RestOfMethod } catch [mscorlib]System.Exception { // ... leave.s MidTry } RestOfMethod: // ... As far as I know, this mechanism is not exposed in C# or VB. finally/fault blocks The only way of entering a finally or fault block is via the SEH, either as the result of a leave instruction in the corresponding try block, or as part of handling an exception. The only way to leave a finally or fault block is to use endfinally or endfault (both compile to the same binary representation), which continues execution after the finally/fault block, or, if the block was executed as part of handling an exception, signals that the SEH can continue walking the stack. filter blocks I'll be covering filters in a separate blog posts. They're quite different to the others, and have their own special semantics. Phew! Complicated stuff, but it's important to know if you're writing or outputting exception handlers in IL. Dealing with the C# compiler is probably best saved for the next post.

    Read the article

  • determine parity of a bit representation of a number in MIPS

    - by Hristo
    Is there some instruction in MIPS that will determine the parity of a certain bit representation? I know to determine whether a "number" has an even parity or an odd parity is to XOR the individual bits of the binary representation together, but that seems computationally-intensive for a set of MIPS instructions... and I need to do this as quick as possible. Also, the number I'm working in is represented in Grey Code... just to throw that in there. So is there some pseudo-instruction in MIPS to determine the parity of a "number" or do I have to do it by hand? If there is no MIPS instruction, which it seems very unlikely to be, any advice on how to do it by hand? Thanks, Hristo follow-up: I found a optimization, but my implementation isn't working. unsigned int v; // 32-bit word v ^= v >> 1; v ^= v >> 2; v = (v & 0x11111111U) * 0x11111111U; return (v >> 28) & 1;

    Read the article

  • Disassembling with python - no easy solution?

    - by Abc4599
    Hi, I'm trying to create a python script that will disassemble a binary (a Windows exe to be precise) and analyze its code. I need the ability to take a certain buffer, and extract some sort of struct containing information about the instructions in it. I've worked with libdisasm in C before, and I found it's interface quite intuitive and comfortable. The problem is, its Python interface is available only through SWIG, and I can't get it to compile properly under Windows. At the availability aspect, diStorm provides a nice out-of-the-box interface, but it provides only the Mnemonic of each instruction, and not a binary struct with enumerations defining instruction type and what not. This is quite uncomfortable for my purpose, and will require a lot of what I see as spent time wrapping the interface to make it fit my needs. I've also looked at BeaEngine, which does in fact provide the output I need, a struct with binary info concerning each instruction, but its interface is really odd and counter-intuitive, and it crashes pretty much instantly when provided with wrong arguments. The CTypes sort of ultimate-death-to-your-python crashes. So, I'd be happy to hear about other solutions, which are a little less time consuming than messing around with djgcc or mingw to make SWIGed libdisasm, or writing an OOP wrapper for diStorm. If anyone has some guidance as to how to compile SWIGed libdisasm, or better yet, a compiled binary (pyd or dll+py), I'd love to hear/have it. :) Thanks ahead.

    Read the article

  • Compile and optimize for different target architectures

    - by Peter Smit
    Summary: I want to take advantage of compiler optimizations and processor instruction sets, but still have a portable application (running on different processors). Normally I could indeed compile 5 times and let the user choose the right one to run. My question is: how can I can automate this, so that the processor is detected at runtime and the right executable is executed without the user having to chose it? I have an application with a lot of low level math calculations. These calculations will typically run for a long time. I would like to take advantage of as much optimization as possible, preferably also of (not always supported) instruction sets. On the other hand I would like my application to be portable and easy to use (so I would not like to compile 5 different versions and let the user choose). Is there a possibility to compile 5 different versions of my code and run dynamically the most optimized version that's possible at execution time? With 5 different versions I mean with different instruction sets and different optimizations for processors. I don't care about the size of the application. At this moment I'm using gcc on Linux (my code is in C++), but I'm also interested in this for the Intel compiler and for the MinGW compiler for compilation to Windows. The executable doesn't have to be able to run on different OS'es, but ideally there would be something possible with automatically selecting 32 bit and 64 bit as well. Edit: Please give clear pointers how to do it, preferably with small code examples or links to explanations. From my point of view I need a super generic solution, which is applicable on any random C++ project I have later. Edit I assigned the bounty to ShuggyCoUk, he had a great number of pointers to look out for. I would have liked to split it between multiple answers but that is not possible. I'm not having this implemented yet, so the question is still 'open'! Please, still add and/or improve answers, even though there is no bounty to be given anymore. Thanks everybody!

    Read the article

  • Add two 32-bit integers in Assembler for use in VB6

    - by Emtucifor
    I would like to come up with the byte code in assembler (assembly?) for Windows machines to add two 32-bit longs and throw away the carry bit. I realize the "Windows machines" part is a little vague, but I'm assuming that the bytes for ADD are pretty much the same in all modern Intel instruction sets. I'm just trying to abuse VB a little and make some things faster. So... if the string "8A4C240833C0F6C1E075068B442404D3E0C20800" is the assembly code for SHL that can be "injected" into a VB6 program for a fast SHL operation expecting two Long parameters (we're ignoring here that 32-bit longs in VB6 are signed, just pretend they are unsigned), what is the hex string of bytes representing assembler instructions that will do the same thing to return the sum? The hex code above for SHL is, according to the author: mov eax, [esp+4] mov cl, [esp+8] shl eax, cl ret 8 I spit those bytes into a file and tried unassembling them in a windows command prompt using the old debug utility, but I figured out it's not working with the newer instruction set because it didn't like EAX when I tried assembling something but it was happy with AX. I know from comments in the source code that SHL EAX, CL is D3E0, but I don't have any reference to know what the bytes are for instruction ADD EAX, CL or I'd try it. I tried flat assembler and am not getting anything I can figure out how to use. I used it to assemble the original SHL code and got a very different result, not the same bytes. Help?

    Read the article

  • Security Pattern to store SSH Keys

    - by Mehdi Sadeghi
    I am writing a simple flask application to submit scientific tasks to remote HPC resources. My application in background talks to remote machines via SSH (because it is widely available on various HPC resources). To be able to maintain this connection in background I need either to use the user's ssh keys on the running machine (when user's have passwordless ssh access to the remote machine) or I have to store user's credentials for the remote machines. I am not sure which path I have to take, should I store remote machine's username/password or should I store user's SSH key pair in database? I want to know what is the correct and safe way to connect to remote servers in background in context of a web application.

    Read the article

  • xrdp and xfce4 Ubuntu 14.04 slow refreshrate

    - by Awatatah
    I am running Ubuntu 14.04 LTS 32bit and I am connecting via windows rdp to my ubuntu machine that has xrdp with xfce4 session. Network is 40mbps/5mbps. RDP to a windows machine from a windows machine is flawless from same location. I can connect fine, but the refresh rate is horrible. When it fist opens the desktop, the picture drags as seen in this image. The same thing happens when items are opened, closed, etc. Windows have the same effect. Sometimes up to 4 seconds before the image is rendered, depending on the image. Is there a setting to improve the refresh rate or something I can do to fix?

    Read the article

  • Oracle Announces New Oracle Exastack Program for ISV Partners

    - by pfolgado
    Oracle Exastack Program Enables ISV Partners to Leverage a Scalable, Integrated Infrastructure to Deliver Their Applications Tuned and Optimized for High-Performance News Facts Enabling Independent Software Vendors (ISVs) and other members of Oracle Partner Network (OPN) to rapidly build and deliver faster, more reliable applications to end customers, Oracle today introduced Oracle Exastack Ready, available now, and Oracle Exastack Optimized, available in fall 2011 through OPN. The Oracle Exastack Program focuses on helping ISVs run their solutions on Oracle Exadata Database Machine and Oracle Exalogic Elastic Cloud -- integrated systems in which the software and hardware are engineered to work together. These products provide partners with a lower cost and high performance infrastructure for database and application workloads across on-premise and cloud based environments. Leveraging the new Oracle Exastack Program in which applications can qualify as Oracle Exastack Ready or Oracle Exastack Optimized, partners can use available OPN resources to optimize their applications to run faster and more reliably -- providing increased performance to their end users. By deploying their applications on Oracle Exadata Database Machine and Oracle Exalogic Elastic Cloud, ISVs can reduce the cost, time and support complexities typically associated with building and maintaining a disparate application infrastructure -- enabling them to focus more on their core competencies, accelerating innovation and delivering superior value to customers. After qualifying their applications as Oracle Exastack Ready, partners can note to customers that their applications run on and support Oracle Exadata Database Machine and Oracle Exalogic Elastic Cloud component products including Oracle Solaris, Oracle Linux, Oracle Database and Oracle WebLogic Server. Customers can be confident when choosing a partner's Oracle Exastack Optimized application, knowing it has been tuned by the OPN member on Oracle Exadata Database Machine or Oracle Exalogic Elastic Cloud with a goal of delivering optimum speed, scalability and reliability. Partners participating in the Oracle Exastack Program can also leverage their Oracle Exastack Ready and Oracle Exastack Optimized applications to advance to Platinum or Diamond level in OPN. Oracle Exastack Programs Provide ISVs a Reliable, High-Performance Application Infrastructure With the Oracle Exastack Program ISVs have several options to qualify and tune their applications with Oracle Exastack, including: Oracle Exastack Ready: Oracle Exastack Ready provides qualifying partners with specific branding and promotional benefits based on their adoption of Oracle products. If a partner application supports the latest major release of one of these products, the partner may use the corresponding logo with their product marketing materials: Oracle Solaris Ready, Oracle Linux Ready, Oracle Database Ready, and Oracle WebLogic Ready. Oracle Exastack Ready is available to OPN members at the Gold level or above. Additionally, OPN members participating in the program can leverage their Oracle Exastack Ready applications toward advancement to the Platinum or Diamond levels in the OPN Specialized program and toward achieving Oracle Exastack Optimized status. Oracle Exastack Optimized: When available, for OPN members at the Gold level or above, Oracle Exastack Optimized will provide direct access to Oracle technical resources and dedicated Oracle Exastack lab environments so OPN members can test and tune their applications to deliver optimal performance and scalability on Oracle Exadata Database Machine or Oracle Exalogic Elastic Cloud. Oracle Exastack Optimized will provide OPN members with specific branding and promotional benefits including the use of the Oracle Exastack Optimized logo. OPN members participating in the program will also be able to leverage their Oracle Exastack Optimized applications toward advancement to Platinum or Diamond level in the OPN Specialized program. Oracle Exastack Labs and ISV Enablement: Dedicated Oracle Exastack lab environments and related technical enablement resources (including Guided Learning Paths and Boot Camps) will be available through OPN for OPN members to further their knowledge of Oracle Exastack offerings, and qualify their applications for Oracle Exastack Optimized or Oracle Exastack Ready. Oracle Exastack labs will be available to qualifying OPN members at the Gold level or above. Partners are eligible to participate in the Oracle Exastack Ready program immediately, which will help them meet the requirements to attain Oracle Exastack Optimized status in the future. Guidelines for Oracle Exastack Optimized, as well as Oracle Exastack Labs will be available in fall 2011. Supporting Quotes "In order to effectively differentiate their software applications in the marketplace, ISVs need to rapidly deliver new capabilities and performance improvements," said Judson Althoff, Oracle senior vice president of Worldwide Alliances and Channels and Embedded Sales. "With Oracle Exastack, ISVs have the ability to optimize and deploy their applications with a complete, integrated and cloud-ready infrastructure that will help them accelerate innovation, unlock new features and functionality, and deliver superior value to customers." "We view performance as absolutely critical and a key differentiator," said Tom Stock, SVP of Product Management, GoldenSource. "As a leading provider of enterprise data management solutions for securities and investment management firms, with Oracle Exadata Database Machine, we see an opportunity to notably improve data processing performance -- providing high quality 'golden copy' data in a reduced timeframe. Achieving Oracle Exastack Optimized status will be a stamp of approval that our solution will provide the performance and scalability that our customers demand." "As a leading provider of Revenue Intelligence solutions for telecommunications, media and entertainment service providers, our customers continually demand more readily accessible, enriched and pre-analyzed information to minimize their financial risks and maximize their margins," said Alon Aginsky, President and CEO of cVidya Networks. "Oracle Exastack enables our solutions to deliver the power, infrastructure, and innovation required to transform our customers' business operations and stay ahead of the game." Supporting Resources Oracle PartnerNetwork (OPN) Oracle Exastack Oracle Exastack Datasheet Judson Althoff blog Connect with the Oracle Partner community at OPN on Facebook, OPN on LinkedIn, OPN on YouTube, or OPN on Twitter

    Read the article

  • Running your SSMS client as a domain user even if you&rsquo;re not in a domain

    - by Luca Zavarella
    I wonder if it is possible to use the SQL Server Management Studio (SSMS) client on my machine with a specific domain user when my machine wasn’t in that domain. In fact, many developers use some SSMS add-ons installed on their machine (with appropriate licenses), which greatly simplify their daily work. For example, I’m a Red Gate SQL Prompt addicted , so it’d be convenient for me to work on customers’ SQL Server instances with this tool. After reading Davide Mauri’s post, a friend and collegue of mine, I created a batch file in order to specify a domain and a user for SSMS: @echo off echo *************************************** echo *** Run SSMS 2008 R2 as domain user *** echo *************************************** echo. set /P user="Type the domain\username: " C:\Windows\System32\runas.exe /netonly /user:%user% "C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\VSShell\Common7\IDE\Ssms.exe" Then, you can create on your desktop a shortcut to the file batch previously developed and you can also change the shortcut icon, using the same SSMS icon (get it from the Ssms.exe file). Now if you double-click on the shortcut, you can set domain and user for the SSMS client on-the-fly: So enjoy using your “personal” SSMS client on your preferred domain

    Read the article

  • Will Ubuntu break my RAID 0 array?

    - by Chad
    I am upgrading an older machine today with new Motherboard, RAM, and CPU. Then I am going to do a fresh install of Ubuntu 64bit. Currently the old machine has an 80gb system drive, and a 4TB RAID 0 array. The old Motherboard has no SATA ports, so I used a SATA card. Ubuntu set up the old RAID array, will it still recognize the array on a newer machine? Are there any steps I should take to ensure the array isn't damaged? It's non-crucial data, but I would rather not start over if it can be avoided. Thanks.

    Read the article

  • New Whitepaper: Oracle E-Business Suite on Exadata

    - by Steven Chan
    Our Maximum Availability Architecture (MAA) team has quietly been amassing a formidable set of whitepapers about the Oracle Exadata Database Machine.  They're available here:MAA Best Practices - Exadata Database MachineIf you're one of the lucky ones with access to this hardware platform, you'll be pleased to hear that the MAA team has just published a new whitepaper with best practices for EBS environments:Oracle E-Business Suite on ExadataThis whitepaper covers the following topics:Getting to Exadata -- a high level overview of fresh installation on, and migration to, Exadata Database Machine with pointers to more detailed documentation High Availability and Disaster Recovery -- an overview of our MAA best practices with pointers to our detailed MAA Best Practices documentation Performance and Scalability -- best practices for running Oracle E-Business Suite on Exadata Database Machine based on our internal testing

    Read the article

  • IFS Achieves Oracle Exadata Optimized and Oracle Exalogic Optimized Status

    - by Javier Puerta
    IFS, the global enterprise applications company, announces that it has earned Oracle Exadata Optimized and Oracle Exalogic Optimized status through Oracle PartnerNetwork (OPN), demonstrating that IFS Applications Release 8 has been tested and tuned on Oracle Exadata Database Machine and Oracle Exalogic Elastic Cloud to deliver speed, scalability and reliability to customers. By combining IFS Applications with the Oracle Exadata Database Machine and Oracle Exalogic Elastic Cloud, IFS customers will be able to leverage benefits such as faster time to implementation, increased performance, as well as reduced energy and hardware footprint. IFS is a Platinum level member in Oracle PartnerNetwork. Initial test results showed that IFS Applications Release 8 material resource planning (MRP) batch jobs achieved a 2.5x performance improvement and a 2.2x increase in user transactions on Oracle Exadata Database Machine and Oracle Exalogic Elastic Cloud. Additionally, IFS Applications 8 achieved a 37x higher compression ratio, resulting in significantly shorter time for daily backup routines and lowering storage costs. Read full press release here

    Read the article

  • IBM DB2 and the “'DbProviderFactories' section can only appear once per config” error

    - by Davide Mauri
    IBM doesn’t like MS. That’s a fact. And that’s why you can get your machine.config file (!!!) corrupted if you try to install IBM DB2 data providers on your server machine. If at some point, after having installed IBM DB2 data providers your SSIS packages or SSAS cubes or SSRS Reports starts to complain that 'DbProviderFactories' section can only appear once per config you may want to check into you machine.config, located in the %runtime install path%\Config http://msdn.microsoft.com/en-us/library/ms229697%28v=vs.71%29.aspx Almost surely you’ll find a IBM DB2 Provider into an additional DbProviderFactories section all alone. Poor guy. Remove the double DBProviderFactories entry, and merge everything inside only one section DBProviderFactories and after that everything will start to work again.

    Read the article

  • How to Install KVM and Create Virtual Machines on Ubuntu

    - by Chris Hoffman
    If you’re using Linux, you don’t need VirtualBox or VMware to create virtual machines. You can use KVM – the kernel-based virtual machine – to run both Windows and Linux in virtual machines. You can use KVM directly or with other command-line tools, but the graphical Virtual Machine Manager (Virt-Manager) application will feel most familiar to people that have used other virtual machine programs. How to Banish Duplicate Photos with VisiPic How to Make Your Laptop Choose a Wired Connection Instead of Wireless HTG Explains: What Is Two-Factor Authentication and Should I Be Using It?

    Read the article

  • keybinding issues with xmodmap across synergy

    - by Rick
    I've got two systems I use across Synergy. On the main one I have a normal keyboard that I swap caps lock and ctrl for. So I do: xmodmap -e 'keycode 66 = Control_L' xmodmap -e 'clear lock' xmodmap -e 'add Control = Control_L' Where keycode 66 is my caps lock key. The trouble is that I can't get this key to act as a control key on the other machine I connect to with synergy. The strange thing is that if I plug a keyboard into the machine, and run xev, the control key there is keycode 37. When I then hit my modified control key (keycode 66 on the master) it's registering as keycode 37 on the remote machine. So according to xev, it should be picking it up as a control keypress. Anyone have any hints on if Synergy is doing something overly helpful for me?

    Read the article

< Previous Page | 117 118 119 120 121 122 123 124 125 126 127 128  | Next Page >