Search Results

Search found 13937 results on 558 pages for 'save money'.

Page 296/558 | < Previous Page | 292 293 294 295 296 297 298 299 300 301 302 303  | Next Page >

  • Can Anything be Done to Make Improv (a 1993 Win 3.1 App) handle larger Files?

    - by user75185
    My very favorite spradsheet is Improv, a 1993 Windows 3.1 application. It still puts Excel to shame for building spreadsheets and writing formulas. The only problem is because Improv was written when 1 Meg of RAM was state of the art, it becomes unstable when working with larger spreadsheets and often crashes and/or corrupts the data file. I am working on a project that greatly exceeds Improv's limits. Although it will ultimately require more robust databasing capability, I could save a lot of critical time if I could delay that headache and continue working in Improv for now. To that end, I moved to the only product I could find that comes close, Quantrix, which is nothing more than Improv updated to handle large spreadsheets and utilize today's technologies. The problems with Quantrix are its speed (significantly slower than Improv) and its $1000 price (which I cannot afford). I have already had 3 15 day extensions after the initial 30 day trial, so my time to use Quantrix as a bridge is at its end. Searches for Improv over the years have gotten me nowhere and, not surprisingly after reading some posts on this site, I got nothing for the money and time invested to find a programmer to write code to "fix" this problem. Improv is freely available as "abandonware" at http://vetusware.com/download/LotusImprov2.1/?id=5797 , and the best background info can be found on Wikipedia and at "Moose's Greatest Software Products of All Time - Lotus Improv" http://moosevalley.fhost.com.au/mooses_review_page_lotus_improv.html It is critically urgent for me to focus on analyzing the data asap. Working in a stable Improv would, without question, be the fastest route. To that end, I am looking for answers to the following questions and anything else that might be helpful: 1) Is it lawful to hire someone to fix Improv for my own use? If so, 2) About how much should it cost? 3) About how long should it take? 4) What skills should I be looking for &/or how should a post be worded? 5) Is there a niche site where it should it be posted? 6) What questions can I ask to quickly screen candidates? Since I am not a programmer, I need questions the answers to which leave no room to confuse me, whether intentional or not. For example, what tools or players should someone with an acceptable competency level have knowledge of?

    Read the article

  • Cloud Computing Business Benefits

    - by workflowman
    If you have been living under a rock for the past year, you wouldn't have heard about cloud computing. Cloud computing is a loose term that describes anything that is hosted in data centers and accessed via the internet. It is normally associated with developers who draw clouds in diagrams indicating where services or how systems communicate with each other. Cloud computing also incorporates such well-known trends as Web 2.0 and Software as a Service (SaaS) and more recently Infrastructure as a Service (IaaS) and Platform as a Service (PaaS). Its aim is to change the way we compute, moving from traditional desktop and on-premises servers to services and resources that are hosted in the cloud.  Benefits of Cloud Computing  There are clearly benefits in building applications using cloud computing, some of which are listed here:  Zero up- front investment:  Delivering a large-scale system costs a fortune in both time and money. Often IT departments are split into hardware/network and software services. The hardware team provisions servers and so forth under the requirements of the software team. Often the hardware team has a different budget that requires approval. Although hardware and software management are two separate disciplines, sometimes what happens is developers are given the task to estimate CPU cycles, disk space, and so forth, which ends up in underutilized servers.  Usage-based costing:  You pay for what you use, no more, no less, because you never actually own the server. This is similar to car leasing, where in the long run you get a new car every three years and maintenance is never a worry.  Potential for shrinking the processing time:  If processes are split over multiple machines, parallel processing is performed, which decreases processing time.  More office space:  Walk into most offices, and guaranteed you will find a medium- sized room dedicated to servers.  Efficient resource utilization:  The resource utilization is handed by a centralized cloud administrator who is in charge of deciding exactly the right amount of resources for a system. This takes the task away from local administrators, who have to regularly monitor these servers.  Just-in-time infrastructure:  If your system is a success and needs to scale to meet demand, this can cause further time delays or a slow- performing service. Cloud computing solves this because you can add more resources at any time.  Lower environmental impact:  If servers are centralized, potentially an environment initiative is more likely to succeed. As an example, if servers are placed in sunny or windy parts of the world, then why not use these resources to power those servers?  Lower costs:  Unfortunately, this is one point that administrators will not like. If you have people administrating your e-mail server and network along with support staff doing other cloud-based tasks, this workforce can be reduced. This saves costs, though it also reduces jobs.

    Read the article

  • Response: Agile's Second Chasm

    - by Malcolm Anderson
    William Pietri over at Agile Focus has written an interesting article entitled, "Agile’s Second Chasm (and how we fell in)" in which he talks about how agile development has fallen into a common trap where large companies are now spending a lot of money hiring agile (Scrum) consultants just so that they can say they are agile, but all the while avoiding any change that is required by Scrum.   It echoes the questions that I've been asking for a while, "Can a fortune 500 company actually do agile development?"  I'm starting to think that the answer is "usually not"   William ask 3 questions at the end of his article that I will answer here.   1) Have I seen agile development brought in and then preemptively customized (read: made into ScrummerFall)?   Yes, Scrum is hard and disruptive.  It's a spotlight on company dysfunction.  In a low trust environment like most fortune 500 companies Scrum will be subverted by anyone who has ever seen "transparency" translate into someone being laid off.   2) If I had to do it all over again, would I change anything?  No, this is a natural progression, but the agile principles are powerful enough, that the companies that don't adopt them will no longer be competitive and will start to fail.   3) Is this situation solvable?  I think it is.  I think that one of the issues is that you often see companies implementing Scrum, but avoiding the agile engineering practices.  I believe that you cannot do one without the other.  Scrum keeps the ship sailing in smooth deep waters.  The agile engineering practices keep the engine running smoothly and cleanly.  If you implement agile engineering practices without Scrum, you run the risk of ending up with a great running piece of software that is useful to no one.  On the other hand, implementing cargo-cult Scrum without the agile engineering practices and you end up (especially in a fortune 500 company) being steered in the right direction, but with your development practices coming to a dead halt because you have code that can not keep up with the changes in requirements.   If you are trying to do Scrum, make sure that you hire some agile engineering coaches, or else you may find your deveolpment engines grinding to a dead halt in the middle of the open ocean.

    Read the article

  • Windows Phone 7, taking the mobile out of mobile

    - by markritchie
    I’ll start off this blog entry with a little background which is necessary for you to understand my situation. I moved from to the UK to Germany about three years ago and while I can converse in German, English is still very much my first language and the language I want my content delivered in. I happily purchased a HTC HD7 when WP7 was released here in Germany thinking foolishly that Microsoft really would get internationalization and localization as it’s at the heart of their business and developer products. Overall I’m very happy with my WP7, the camera sucks but that’s more HTC’s fault that Microsoft, but other than that it’s a very promising platform. My problems started when I purchased my first app. Initially everything appeared to be fine and things were as smooth as things had been with my previous free applications, however about a month after I received an email from Zune informing me that the credit card that they had registered against my account had expired. No problem I foolishly thought; I’ll simply add a new one. I don’t want to bore you with the details of what I’ve been through other than to take the low-lights: trying numerous websites, posts on Microsoft Answers and a tweet to Microsoft Support all to no avail. Today somebody suggested that I call the Xbox support line in the UK as they had solved their billing problems this way. So I called up the Xbox support line since I hadn’t resolved the problem using the Zune portal to resolve my problem with my WP7 (anybody else thinks Microsoft might need some consolidation here). After being on the phone with a very friendly representative in Ireland I was informed that because my British credit card has a billing address in Germany they are unable to accept it as a credit card. Because my Live ID and Zune Tag are registered in the UK they cannot take my card. Their solution is that I have to create a new Zune Tag that has its locale in Germany and then associate it with my phone. Now, first thing I go to register for a Zune Tag with a German locale it very kindly switches into German for me, nothing quite like reading a EULA in a language you’re not fluent in. I’ve no idea how this will affect the apps that are available to me in the app store, and I’m pretty sure it means that all my Xbox live achievements will become history. And what if I was to move to say France at some point do I have to go through all this again? At the end of the day I’m trying to set something up so I can give them my money and they’re making it VERY difficult for me. Could you imagine walking into a book store when you were on holiday and being told by the clerk that they couldn’t take your credit card because you came from another country?

    Read the article

  • Usage of Pirated software at a company

    - by hakesh
    Hi, I am redirected from stack over flow since the topic is ethics rather than about programming. Thank you for letting me know that. and please give me advices. So, I started to work at a company as an engineer a couple of months ago. It's a small company and what they basically do is answering service on phones. Now they are switching from normal phones to IP phones so that computers take more important place in the work. However, all the computers used by workers are equipped with pirated software ,even their operating systems are. Moreover, they don't even buy one license to make copies for other computers. In other words, they do not spend any money for the software in office. I am not saying copying a licensed one is legit, but the situation is too much. There is one guy who did installing those pirated soft. He does not feel any sense of guilt and even justified when I asked about it. and he is not even a specialist. He just searched on the internet to install pirated software. Our boss does not have any knowledge of computers, so he took cheaper way. How do you guys think about this? Since I am still new to the company, I am not doing maintenance or managing on those cracked computers. But I have to use those software daily. And later on I will be doing support, help desk kind of staff. I really don't want to take responsibility for operating pirated software. and from an aspect of developer and engineer, pirated software are not able to get legal support and it may work unexpectedly. So, I am thinking about changing job. Am I thinking too much? should I wait until I have more credit from the boss and try to change his policy? So far, the boss does not take any words from me. Any opinions are welcome. Thank you

    Read the article

  • What's a good way to get an IT internship? [closed]

    - by user1419715
    I'm a second year CS student who's worked really hard to build and expand my skills. I've spent the past week now trying to find a place to volunteer (i.e. work for FREE) so I can get a little bit of in-the-door experience with web development. I have a portfolio with several decent projects, a handful of languages and other hard/soft skills that employers constantly say they're clamoring for. I can't even get people to take my calls. This is me offering to work for them for FREE, remember. I'm in a reputable program at a respected school, get decent grades and...yeah, I've worked really hard to be presentable. On the rare occassions I actually get to speak to somebody at a design firm they hedge and do everything they can to get me off the phone. Nobody's ever expressed even the slightest interest in taking me on. The answer to the experience problem is supposed to be "you need to spend a year or two building up a big portfolio of projects on your own" so that employers will be impressed. I've done that. Websites, standalone apps, etc.. Nobody will even look at my resume, though. Question: Why does there seem to be so little interest in taking on upaid interns in the world of IT? Update: Sorry you all think I'm too aggressive or angry. It wasn't my intent to be a jerk to people while asking them for their opinions. That said, how would you feel if employer after employer turned you down cold when you offered yourself to them without asking for remuneration? One can't even get an unpaid job in this economy now, it seems. How am I going about my search? I find web firms in my area and contact them via email with a brief sales pitch of myself and a resume attached. Then a couple of days later I follow up with a phone contact. Nobody--anywhere--is advertising for interns of any kind. If there were I'm sure there'd be about 500 resumes per position, even unpaid. I've had good experiences in the past with cold-calling firms for actual paid jobs in other industries (hiring is a pain in the ass process and a call like this can show initiative while reducing a busy employer's need to do all the hiring overhead work), so I thought volunteering would work at least as well. My skills are pretty good for a CS student and include the usual suspects: HTML/CSS/Javascript, Python, Java, C, C#/.Net etc etc. I made a point on my resume to tie each ability claim to a project as well. Oh, and regarding the "working for free still costs the employer money" argument: that's an excellent point I hadn't though of. But it means...what? I have to pay the employer for the privilege of working there now?

    Read the article

  • Is it smart to take a year off from school to get experience?

    - by user134147
    firstly I apologize if this question is not appropriate for the site, but I've seen other similar (though slightly deviant) questions on this sight before and I know the people here are the most qualified to answer my question. Anyways, I'm currently between my sophomore and junior years at a 4 year university, and after a bit of deliberation I've decided on computer science as a major (BA, by the way, as a BS would require me to stay at least an extra year the way our program is set up). I've been interested now in programming for a few months and I've developed a passion for it in a very short time. I began learning C++, migrating to Java recently when I learned my school focuses on this language. Now, I should mention that the concept of higher education has never sat well with me, so part of my motivation for wanting to take time off is to truly challenge myself and see what I can accomplish when I actually try at something. The autodidact in me finds it difficult to focus on my passions while trying to keep a high GPA in unrelated classes. However, I understand the times we live in and therefore would plan to complete my degree after this year. So my question is whether or not the skills I learn in a year off from college could justify the time off from school. Unfortunately, I don't believe I know enough yet to gain any professional experience (internship, etc.) so I would mostly focus my time on learning Java and another language, possibly Wordpress (to gain an understanding of web programming concepts as I have not yet decided what field I want to get into, and to make some money to fund my off-year), and to delve into security concepts, which also interest me. I'm hoping I could work on projects, such as simple applications or contributions to open source software during this time to enhance my resume once I do finish school, so I can find a job out of college easier. I do not want to be the new hire who knows nothing beyond the concepts of his Java textbooks. Does anyone have any input about these thoughts of mine, or any ideas for where I should focus my studies or how high I might set the bar for my work? Thanks a lot everyone!

    Read the article

  • Developing a cloud based app

    - by user134897
    I am a company owner that has developed a cloud based app. My code writer has told me how good he is more than once, well, better stated, he did a good job telling me he was better than everyone else in my rather small community. In the last 18 months I have spent nearly 160,000.00 dollars trying to get this company to the "making money" stage. I am now nearly broke, sitting on the edge of a brilliant marketing plan to launch a much needed cloud based app. We did launch our app last year (late 2013), and the feedback was amazing from the users. One user that signed up to use the free app stated that we needed to call him the moment our company goes public because he wants to be the first to buy stock. Now, here's my problem. We did not originally set out to develop a freemium app, we just sort of ended up there by the natural progression of the app. So, now I have an app that really needs to be scrapped and re-built. Although I do feel my code writer has displayed some brilliance in what he has done, he was extremely weak on graphics and every time we speak he tells me there is a newer better way to code that he is trying to learn. So, here's the million dollar question. Ho do I find code writers that already know the newest, best ways to write code? Or maybe better asked, what is the newest best code writing technique? Second, is it even possible to find code writers that are good at graphics? In short, I am nearly broke and need to start over, but I do not know where to find people qualified to write it good the first time around and display good graphic skills. I am trying to build a team of writers instead of just one person. Maybe 3 good at code and two good at graphics, but I am clueless as to what criteria I should use to determine if I am building the right team members. Please help, I am sure you can tell I am fairly lost by my continued rambling.

    Read the article

  • Router for creating site to site VPN to server provider using Cisco ASA 5540

    - by fondie
    We have dedicated servers hosted for us by a third party, we connect to these over a VPN. My server provider uses Cisco ASA 5540 as VPN devices. Currently we're using software clients on individual machines to connect to this VPN, either: Cisco VPN Client Shrew Soft VPN Connect However, I'm looking to purchase a new load balancing router for our office and thought this could be an opportunity to get VPN client duties taken over by hardware. We could then create a permanent VPN tunnel that could be used by anyone on the network with no software client necessary. Sadly I'm not the most knowledgeable on this kind of stuff so is: 1) This a realizable goal? Next I need to know what kind of hardware I will need. I'm not looking to spend lots of money on this (~$500), so doubtful I can afford any Cisco kit. Therefore, this is the most promising candidate I've seen (as far as my limited knowledge goes): Draytek Vigor 2955 - http://www.draytek.co.uk/products/vigor2955.html 2) Would this be compatible with the Cisco kit my server provider uses? 3) If not, are there any alternatives I should consider? Many thanks in advance.

    Read the article

  • pfSense Firewall or Linsys/Cisco router for small offices

    - by Tim Meers
    I'm about to start switching some networks around for multiple small offices. Each office has about 10 to 15 users and 10 to 15 computers. Each office has a spread of generic routers and access points. The routers vary from being used as routers, to just being an access point for wireless. Nothing formal has really ever beem implemented for each of the 10 offices. What I'm wanting is to set up a pfSense box for each office to configure things like: traffic shaping (for VoIP QOS) URL Filtering DHCP static routing multiple VLANs I'll then use some of the existing hardware for wireless. Maybe even integrate the wireless right into the firewall depending on the office layout. So my question, would this be better to do a full blown firewall box, or but a new business class or high end consumer class Linksys router to do the URL filtering, QOS and DHPC? Each option could allow for remote access and VPN for remote maintnance and each would only cost a nominal about of money for something decent, i.e. under $250.

    Read the article

  • Is there a way to tell what the download speed is from a site/server

    - by Memor-X
    i'm looking into which ISP i should go with at the place i'm moving into, one ISP which i have been told good things about has data limits (which when breached will drop your speed to dial-up speed) but multiple memberships which, apart from the cheapest membership, have the same data limits (the cheapest has a 10GB data limit) in their fine print, they say that each different membership has different port speeds, one particular part jumps out at me These speeds are the NBN (National Broadband Network) port speed and not the actual Internet data speed which will vary based on numerous factors including destination you are reaching, your network equipment, network congestion etc. i plan to use the net to download DLC and patch updates for games (particular the insanely large update for the Wii U) and games from Steam (if i find any good one other than this one JRPG) and downloading development resources from free sites like Deposit Files and Mediafire since one membership with a 1000GB data limit is $145 with the port speed being 12Mbps/1Mbps (cheapest) while another with the same limit is $190 with the port speed of 100Mbps/40Mbps (expensive) i am wondering how i can tell what the speed coming from site is since i don't want to be wasting money on speed that makes no difference (unlike memory which i rather have to spare) NOTE: the speeds are for a fiber optic network which where my new place is can only connect via fixed wireless which i may not be able to get with this ISP but if i can get this network then good NOTE 2: most of the resources i get from Deposit Files are always about 200 MB or less, if a resource pack is greater then it's split into multiple archives (like .7z.part) while Mediafire i have to see one bigger than 150MB NOTE 3: one update patch for a PS3 game is close to 4 GB (Disgaea 4) which i need to get access to the DLC and on the weekend i downloaded 5 GB for the Final Fantasy XIV Open Beta for the PS3 which took almost 5 hours

    Read the article

  • Install Linux Mint on PC without bootable CD

    - by crosenblum
    Unfortunately my PC's CD drive is not bootable; I have such a mixture of SATA and IDE drives, so until I have more money to redo my controller setup, I can't boot from any cd. Currently, I have a DVD burned with latest version of Linux Mint, and I have an USB drive with an old version of Mint. I have a partition ready to install Linx Mint into, but no idea how to install it, since I can only boot to my hard drive. I am totally unable to boot to CD, so that is definitely out. My main partition is WinXP Pro SP3. Is there software I can use to format my Linux partition, so that I can then just copy Mint over to that partition? Or is there a better way to install linux mint? I have to do it within Windows XP, since that's all that I can boot right now. I have considered Mint4Win, but that doesn't allow a full installation of Linux Mint. Any ideas?

    Read the article

  • Htpc aka "Media Center": cheap and *silent*?

    - by Unknown
    It may be me, or the place I live (Italy), but it seems pretty hard to get a build or a prebuilt nettop or a laptop that fits the need. I need something silent able to playback all h.264 fullhd content without stuttering, and well (and not loosing the hw acceleration because of softsubs...) silent not ugly silent and (possibly) cheap. I'm going the linux route, therefore i'm moving towards a cpu-based or nvida-integrated solution (i don't think ati hw accelerated playback - or the intel "hd" acceleration - is useable yet). Ion nettop; it's either the Acer Revo (but here it's incredibly pricey and it's hard to find the dualcore version) or the Asrock Ion 330, that in the current version is rated "silent" at 26Db. 26. Sounds pretty noisy to me!!! the previous version was even worse. was this product really aimed at htpc market?? the Dell Zino - i think it's ATI based unfortunately. Laptop: correct me if I'm wrong: sub 600€/$ units are quite loud under full load (because of the tiny fans). ULW laptops are indeed quite similar: tiniest fans = high pitched noise and the cpu still lacks power for non hd-accelerated video decoding Handmade build: little money can be saved with underpowered cpus, a low-midrange cpu would help in the case of non-hw-accelerated content the cases are quite pricey the PSU one has to get ranges between 100/150 €/$ minimum to keep the noise down a low-mid build, all included, sums up to over 650 €/$ for a still-looking-ugly-unit, without the blu-ray drive. Please help. What do you advise on this? ;) Am I ignoring laptops too much, maybe? Are low-priced Acers that noisy/high pitched under full load?

    Read the article

  • Switch Text Paragraphs in OpenOfficeOrg Writer mailmerge

    - by Glen S. Dalton
    I am using mailmerge to write the same letter with minor differenes to many peolpe. I experienced that switching text paragraphs depending on database values was not easy for me. I ended up putting huge text paragraphs into the database becaus switching did not really work for me. Actually I dont' understand how writer does it and maybe the boolean evaluation is buggy? There is some possibility making paragraphs invisible depending on database fields, but it was frustrating. After marking a paragraph as invisible (depending on a condition) it went invisible in the main document and did not come back, I lost the content. An example in pseudocode of what I want in my mailmerge document: {if [[balance]] 10} We owe you money. Please can you send your bank details. {end if} {if [[balance]] < -10} Please transfer the remaining amount to our banc account 123... {end if} Maybe this could be done with makros? But how to combine makros with mailmerge? Can you tell me what are the pitfalls and how to master them? I once did this with ms word, it was a lot easier. The normal mailmerge (including database fields in the letters) works fine for me in OpenOffice writer.

    Read the article

  • I cannot format my PC

    - by Jesus Buelna
    I have a Toshiba Satellite(1) l505 6gb RAM, 6.00GB hard disk.Initially I have problem with another satellite(2) I had (mother board problem). I took my Laptop to a technician and cost a lot of money (almost as much as buying new one). So, since I have HDD problems with the first one(1) I decided to use the hard disk of the other one(2). I formatted the HDD and erased the partitions it had into 1 partition (or no partition). The problem is that when I try to format with the SO CD, in the screen, where I have to decide in which partition I want to install the SO, the only one option I have says "unallocated partition and I receive this message "Windows cannot install the SO in this partition, run files do not existed or maybe corrupted" When I erased the disk with Parted Magic, Did I erased any files needed for running the installing disk? I don't know. Is it possible to fixed or reinstate the disk to install the OS? By the way, I checked the disk physical health with Parted Magic, and it is OK. One more thing when I erased the disc to 0, I used the safety option offered by the Parted Magic.Need help please.

    Read the article

  • .com domain transfer failing

    - by digital
    Hi, I'm trying to transfer one of my .com addresses between registrars. I'm down as the owner contact (confirmed working) and the losing registrar is down as the tech and admin contact. Last week I received an email stating that the domain transfer had been rejected by the losing registrar. I contacted the losing registrar and they denied that. My money from the winning registrar was refunded and I was told to try again. I've initiated the transfer again and received confirmation of pending transfer, I gave the correct EPP code and confirmed the transfer. Currently the status on the domain is set as OK, should it not be transfer pending? According to my name.com transfer page if the transfer is not authd in 5 days it will auto transfer anyway. I don't believe this will happen. Name.com have been really helpful but they can't really do much more now. The losing registrar is not being helpful hence me turning here. What can I do to make sure the domain transfers? The domain transfer is set to expire on the 17th. Any help would be greatly appreciated.

    Read the article

  • Recommendations for secure business collaboration tools

    - by Michael Prescott
    I'm searching for a secure and easy way for business partners to collaboratively edit and exchange documents, share calendars, create schedules, and assign tasks. I speculate that the ideal collaboration environment or work-flow would actually involve several technologies and services. My co-workers and I have tried a variety of things from Google Apps to Wiki's, but nothing feels very fluid or complete. I suppose defining what we need and our constraints is probably in order: collaboratively edit basic text documents and spreadsheets exchange documents like flow-charts, graphs, and files generated by our other desktop applications, but not source code assign tasks to each other and ourselves and track the history of those tasks easily see when relevant documents have been modified since last viewing and ability to easily push notifications to relevant workers (a clean front page that shows updates would probably suffice) provide limited access to contract workers and guests users if a remote user system is compromised (keystroke logger or other spyware) we don't want the criminal to be able to gain access to all business documents (processes, trade-secrets, customer lists, etc.) simply because they gained access to a single Google account (or whatever web service) Cannot be a difficult to administer VPN infrastructure Cannot cost more than $100 per month (yeah, money is tight) Needs to support up to 25 users We can host our own web applications, but it must be low maintenance solution

    Read the article

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

  • Does the Intel DX79TO motherboard support x8 devices (SAS HBAs) on PCIe x16 slots?

    - by Zac B
    Context: I have an Intel DX79TO motherboard and a Sun SAS3081E-S/LSI 1060E-S HBA card with a PCIe x8 interface. I plug the HBA into my mobo next to my graphics card, and the HBA power lights illuminate, but the BIOS and OSes (tried Linux, ESXi, Win7) don't see the HBA at all. Question: Does the DX79TO motherboard support non-x16/non-GPU devices in its PCIe x16 slots? According to this question, some consumer motherboards don't support this, but I can't figure out whether or not this motherboard/family does. The answer will affect whether I buy a new motherboard or RMA the SAS card, with money attached to each course, so I figured I'd ask here first. What I've Tried: I've read the spec/manuals for the motherboard and the HBA, and I didn't see anything regarding whether or not the x16 slots were back-compatible to lower lane widths/non graphics-card devices, or whether or not the card could run in wider slots than x8. I've tried contacting Intel, but that was over a month ago and I haven't yet heard anything back except an automated "we got your email!" message.

    Read the article

  • Performance of Virtual machines on very low end machines

    - by TheLQ
    I am managing a few cheap servers as my user base isn't large enough to get much more powerful servers. I also don't have the money lying around to invest in a server to prepare for the larger user base. So I'm stuck with the old hardware I have. I am toying with the idea of virtualizing all the current OS's with most likely VMware vSphere Hypervisor (AKA ESXi) Xen (ESXi has too strict of an HCL, and my hardware is too old). Big reasons for doing so: Ability to upgrade and scale hardware rapidly - This is most likely what I'll be doing as I distribute services, get a bigger server, centralize (electricity bills are horrible), distribute, get a bigger server, etc... Manually doing this by reinstalling the entire OS would be a big pain Safety from me - I've made many rookie mistakes, like doing lots of risky work on a vital production server. With a VM I can just backup the state, work on my machine, test, and revert if necessary. No worries, and no OS reinstallation Safety from other factors - As I scale servers might go down, and a backup VM can instantly be started. Various other reasons. However the limiting factor here is hardware. And I mean very depressing hardware. The current server's run off of a Pentium 3 and 4, and have 512 MB and 768 MB RAM respectively (RAM can be upgraded soon however). Is the Virtualization layer small enough to run itself and a Linux OS effectively? Will performance be acceptable (50% CPU overhead for every operation isn't acceptable)? Does it leave enough RAM for the Linux OS? Is this even feasible?

    Read the article

  • Troubleshooting wireless connection problem / site survey?

    - by johnnyb10
    I just started in the IT department of a small company (200 users) and it's clear that one of the main problems that is driving everyone crazy is the spotty nature of the wireless connectivity throughout the office, particularly in certain conference rooms. This is a huge problem because the connection often drops during important presentations to clients. I was hired to help ease the load on the existing IT admin, who has done a great job, but is overloaded with many other tasks to deal with. So I would like to try to help out with this wireless issue. I am looking for advice on the best way to solve this problem--a realistic troubleshooting methodology that does not require me to spend any money. So far, I've experimented with Ekahau Heat Mapper, which is free and helps create a site survey. But I'm not exactly sure what I'm looking for or if there are other programs/tools/methods I should try as well. Any advice would be greatly appreciated. [Some background: The wireless setup consists of an HP ProCurve Mobility MSM (710?) controller that controls 10 access points throughout the building. There are three virtual wireless networks configured on the controller: one seems to be a default that cannot be changed, one is for internal employees and authenticates via Active Directory, and the third is a guest network for visitors. When I use HeatMapper, these show up as three different SSIDs, with different MAC addresses, all on the same channel. At first I thought maybe this would cause interference, but this seems to be the way the controller works;apparently, it automatically configures the channels to avoid interference from the other APs on the network.]

    Read the article

  • Alternatives to Splunk?

    - by MichaelGG
    I'm pretty impressed with Splunk, especially version 4. Pretty graphs, alerting (Enterprise only), and fast, accurate, searching. It's a great product. However, the cost just way too high to consider for full production use for our company. All we really need is to be able to index different logs in a central place, and have reasonable searching on that. Having alerts based on a saved search is also really nice. We don't really go beyond that. In fact, our biggest usage has been in deploying new applications. Everything gets logged via log4net to either the Event log on Windows or a text file on Linux. Splunk makes it pretty easy to quickly search across those to make sure all the parts of the app are working ok -- that's saved us tons of time versus hunting down individual logging sources. What alternatives exist in this market? I have a sinking feeling Splunk's pricing is so high because they have the best product by far, and they know it. We want the server to run on Windows. I'd be open to a split model, using one product for general logs (collect via syslog/Snare), and a dedicated product for our custom apps (like Log4Net Dashboard). Would using a simple syslog server such as Kiwi, sent to SQL Server (perhaps with fulltext enabled) work? I'd hope the cost should be well under 5 figures, USD. (And yes, I know, we're cheap. We're a startup with little money, and BizSpark takes care of all our MS licensing.) Edit: I should add, we have about 10 physical servers, 20 VMs, and a couple firewalls and switches. 90% is Windows.

    Read the article

  • How do I resolve BSOD: PAGE_FAULT_IN_NONPAGED_AREA?

    - by Burnzy
    I have been trouble shooting this for a few days and cannot fix this anyhow. Computer specifications Mobo: ASUS Sabertooth X58 LGA 1366 Intel X58 SATA 6Gb/s USB 3.0 ATX Intel Motherboard CPU: Intel(R) Core(TM) i7 CPU 920 (Bloomfield) @ 2.67 ( no OC ) RAM: 6144MB RAM GPU: 2x NVIDIA GeForce GTS 250 1Go in SLI (sli is not enabled anyway at the moment anyway) Drives: OCZ RevoDrive OCZSSDPX-1RVD0120 PCI-E x4 120GB PCI Express MLC Internal SSD [RAID-0]. (I know this could potentilly cause trouble but I had the BSOD before using this drive) Seagate Barracuda 7200.11 ST31500341AS 1.5TB 7200 RPM 32MB Cache SATA 3.0Gb/s 3.5" Internal Hard Drive - Bare Drive Click here for a log of a crash I just had. Click here for a log of a crash I had 30 minutes later, note that it's another driver. Some info Occurence: It seems pretty random so far, haven't noticed any kind of pattern I tried: Windows memory diagnostic (went smoothly at 1066mhz) As I said, it was still happening on my HDD, so when I bought the revodrive I install a new OS on there and still got the error, I believed it happened and I had no drivers installed at that point (not 100% sure) Change the following registry value to 1 (true): HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SessionManager \MemoryManagement\ClearPageFileAtShutdown Tried to lower even more ram clock Made sure ram timing was set to recommended by manufacturer Verified if motherboard was in good physical condition (yes and its brand new) There is one thing to note, when I got the new motherboard, I installed the new drivers WITHOUT formatting and the I removed the motherboard drivers that I could remove from the control panel (pretty much the first things that have been installed). Could this cause an issue even ON THE OTHER drive (revodrive). Hopefully someone can help me, I am getting tired of this, spending so much money and cannot get this to work correctly. If you need any other information let me know, thank you!

    Read the article

  • Good PC-based video conference control tools?

    - by Jeremy Wadhams
    I'm looking to improve the user experience in our video conference rooms, by simplifying the things users do all the time (setting up a call, muting and unmuting, panning and zooming between a few room-appropriate presets) and totally taking away the functions that we don't use or that are more likely to ruin the experience (changing the brightness of the TV, using the VNC presenter mode). The rooms each have Tandberg Edge or MXP video units, I'm not looking at PC-based solutions like Skype or iChat. The classic way to do this would be to plunk down huge money for an AMX or Crestron panel. This does have some advantages, and I am considering that approach, but in my initial investigation, that looks expensive, inflexible, and proprietary. On the other hand, the Tandberg video units I'm looking to control are very thorough XML API (PDF), so some of the integration magic that Crestron and AMX consider to be a value add, I could reproduce at mashup speeds. Anyone aware of an Open Source or Commercial product that takes advantage of the readily available web APIs, some simple touch screen PCs, and builds a product that is more like skinning an AJAX web app, and ideally more cost effective than the proprietary panels?

    Read the article

  • is it worth to use load balancer on web server/website

    - by user427969
    I have a website and a while ago, the web server of the company hosting my website was down for about a day. I consulted the company for a solution on how i can stop this from happening in future and they suggested to have a second machine and which will be connected to my current website/web server by a "load balancer" (at an additional huge cost!!!). The second machine will be replicate of the first one and so if i goes down, the other will always be running. ---- Explanation ----- My hosting company suggested that it will be a good idea to have a second machine running at the same time and both the machines will be connected by a load balancer which reduces the rist of a downtime. The second machine will be a mirror of the first and any changes to first must be replicated in the second. I don't mind spending money if it really saves my website from going down. I want to know is it worth having this "load balancer" for my purpose? My website is a 24/7 service. I cannot afford an outage of 24 hours/1 hour. I don't mind using this "load balancer" as far as it is really worth. I am not sure if its just a marketing trick of my hosting company or really a "best" solution Thanks for help. Regards

    Read the article

< Previous Page | 292 293 294 295 296 297 298 299 300 301 302 303  | Next Page >