Search Results

Search found 3911 results on 157 pages for 'research papers'.

Page 14/157 | < Previous Page | 10 11 12 13 14 15 16 17 18 19 20 21  | Next Page >

  • How Important is Keyword Research to My Marketing Campaign?

    Engaging in a good deal of keyword research will help you ensure that your website obtains the type of attention that you desire. The way that the internet is set up these days, everything is done through keywords, if your website content does not contain relevant keywords then your website may not be able to get the respectable attention that you desire. Keywords are commonly defined as one word or a phrase of words that describes the type of product or service that you are opting to promote.

    Read the article

  • Micro Niche Finder Review - Is it Worth Investing in For Niche Research?

    As you know, keywords are at the heart of SEO (Search Engine Optimization) but as you may also know the general search terms on Google are getting very competitive and hard to rank for so the secret nowadays is to do niche research to try and find "meaty", long-tail, low competition keywords. Can Micro Niche Finder help you do this? Find out in this review.

    Read the article

  • It’s On! Oracle Open World 2012 Opens Call for Papers is Open

    - by David Hope-Ross
    Oracle OpenWorld is among the world’s largest industry events for good reason. It offers a vast array of learning and networking opportunities in one of the planet’s great cities.  And one of the key reasons for its popularity among procurement and supply chain professionals is the prominence of presentations by customers.   If you’d like to deliver a presentation based on your experience, now is the time to submit your abstract for review by the selection panel. The competition is strong: roughly 18% of entries are accepted each year from more than 3,000 submissions. Review panels are made up of experts both internal and external to Oracle. Successful submissions often (but not exclusively) focus on customer successes, how-tos, or best practices. What’s in it for you? Recognition, for one thing. Accepted sessions are publicized in the content catalog, which goes live in mid-June, and sessions given by external speakers often prove the most popular. Plus, accepted speakers get a complimentary pass to Oracle OpenWorld with access to all sessions and networking events- that could save you up to $2,595!   Be sure designate your session for inclusion in the correct track by selecting  “APPLICATIONS: Supply Chain Management” or “APPLICATIONS: Sourcing and Procurement” from the Primary Track drop down menu.   We look forward to seeing you in San Francisco!

    Read the article

  • Call for Papers for both Devoxx UK and France now open!

    - by Yolande
    Normal 0 false false false EN-US JA X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:Cambria; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin;} The two conferences are taking place the last week of March 2013 with London on March 26th and 27 and Paris on March 28th and 29th. Oracle fully supports "Devoxx UK" and "Devoxx France" as a European Platinum Partner. Submit proposals and participate in both conferences since they are a two-hour train ride away from one another. The Devoxx conferences are designed “for developers by developers.” The conference committees are looking for speakers who are passionate developers unafraid to share their knowledge of Java, mobile, web and beyond. The sessions are about frameworks, tools and development with in-depth conference sessions, short practical quickies, and bird-of-a-feather discussions. Those different formats allow speakers to choose the best way to present their topics and can be mentioned during the submission process Devoxx has proven its success under Stephan Janssen, organizer of Devoxx in Belgium for the past 11 years. Devoxx has been the biggest Java conference in Europe for many years. To organize those local conferences, Stephan has enrolled the top community leaders in the UK and France. Ben Evans and Martijn Verberg are the leaders of London Java User Group (JUG) and are also known internationally for starting the Adopt-a-JSR program. Antonio Goncalves is the leader of the Paris JUG. He organized last year’s Devoxx France, which was a big success with twice the size first expected. The organizers made sure to add the local character to the conferences. "The community energy has to feel right," said Ben Evans and for that he picked an "old Victoria hall" for the venue. Those leaders are part of very dynamic Java communities in France and in the UK. France has 22 JUGs; the Paris JUG alone has 2,000 members. The UK has over 50,000 developers working in London and its surroundings; a lot of them are Java developers working in the financial industry. The conference fee is kept as low as possible to encourage those developers to attend. Devoxx promises to be crowded and sold out in advance. Make sure to submit your talks to both Devoxx UK and France before January 31st, 2013. 

    Read the article

  • Design Pattern Books, Papers or Resources for Non-Object Orientated Paradigms?

    - by FinnNk
    After viewing this video on InfoQ about functional design patterns I was wondering what resources are out there on design patterns for non-object orientated paradigms. There are plenty out there for the OO world (GOF, etc, etc) and for architecture (EoEAA, etc, etc) but I'm not aware of what's out there for functional, logic, or other programming paradigms. Is there anything? A comment during the video suggests possibly not - does anyone know better? (By the way, by design patterns I don't mean language features or data structures but higher level approaches to designing an application - as discussed in the linked video)

    Read the article

  • Master Note for Generic Data Warehousing

    - by lajos.varady(at)oracle.com
    ++++++++++++++++++++++++++++++++++++++++++++++++++++ The complete and the most recent version of this article can be viewed from My Oracle Support Knowledge Section. Master Note for Generic Data Warehousing [ID 1269175.1] ++++++++++++++++++++++++++++++++++++++++++++++++++++In this Document   Purpose   Master Note for Generic Data Warehousing      Components covered      Oracle Database Data Warehousing specific documents for recent versions      Technology Network Product Homes      Master Notes available in My Oracle Support      White Papers      Technical Presentations Platforms: 1-914CU; This document is being delivered to you via Oracle Support's Rapid Visibility (RaV) process and therefore has not been subject to an independent technical review. Applies to: Oracle Server - Enterprise Edition - Version: 9.2.0.1 to 11.2.0.2 - Release: 9.2 to 11.2Information in this document applies to any platform. Purpose Provide navigation path Master Note for Generic Data Warehousing Components covered Read Only Materialized ViewsQuery RewriteDatabase Object PartitioningParallel Execution and Parallel QueryDatabase CompressionTransportable TablespacesOracle Online Analytical Processing (OLAP)Oracle Data MiningOracle Database Data Warehousing specific documents for recent versions 11g Release 2 (11.2)11g Release 1 (11.1)10g Release 2 (10.2)10g Release 1 (10.1)9i Release 2 (9.2)9i Release 1 (9.0)Technology Network Product HomesOracle Partitioning Advanced CompressionOracle Data MiningOracle OLAPMaster Notes available in My Oracle SupportThese technical articles have been written by Oracle Support Engineers to provide proactive and top level information and knowledge about the components of thedatabase we handle under the "Database Datawarehousing".Note 1166564.1 Master Note: Transportable Tablespaces (TTS) -- Common Questions and IssuesNote 1087507.1 Master Note for MVIEW 'ORA-' error diagnosis. For Materialized View CREATE or REFRESHNote 1102801.1 Master Note: How to Get a 10046 trace for a Parallel QueryNote 1097154.1 Master Note Parallel Execution Wait Events Note 1107593.1 Master Note for the Oracle OLAP OptionNote 1087643.1 Master Note for Oracle Data MiningNote 1215173.1 Master Note for Query RewriteNote 1223705.1 Master Note for OLTP Compression Note 1269175.1 Master Note for Generic Data WarehousingWhite Papers Transportable Tablespaces white papers Database Upgrade Using Transportable Tablespaces:Oracle Database 11g Release 1 (February 2009) Platform Migration Using Transportable Database Oracle Database 11g and 10g Release 2 (August 2008) Database Upgrade using Transportable Tablespaces: Oracle Database 10g Release 2 (April 2007) Platform Migration using Transportable Tablespaces: Oracle Database 10g Release 2 (April 2007)Parallel Execution and Parallel Query white papers Best Practices for Workload Management of a Data Warehouse on the Sun Oracle Database Machine (June 2010) Effective resource utilization by In-Memory Parallel Execution in Oracle Real Application Clusters 11g Release 2 (Feb 2010) Parallel Execution Fundamentals in Oracle Database 11g Release 2 (November 2009) Parallel Execution with Oracle Database 10g Release 2 (June 2005)Oracle Data Mining white paper Oracle Data Mining 11g Release 2 (March 2010)Partitioning white papers Partitioning with Oracle Database 11g Release 2 (September 2009) Partitioning in Oracle Database 11g (June 2007)Materialized Views and Query Rewrite white papers Oracle Materialized Views  and Query Rewrite (May 2005) Improving Performance using Query Rewrite in Oracle Database 10g (December 2003)Database Compression white papers Advanced Compression with Oracle Database 11g Release 2 (September 2009) Table Compression in Oracle Database 10g Release 2 (May 2005)Oracle OLAP white papers On-line Analytic Processing with Oracle Database 11g Release 2 (September 2009) Using Oracle Business Intelligence Enterprise Edition with the OLAP Option to Oracle Database 11g (July 2008)Generic white papers Enabling Pervasive BI through a Practical Data Warehouse Reference Architecture (February 2010) Optimizing and Protecting Storage with Oracle Database 11g Release 2 (November 2009) Oracle Database 11g for Data Warehousing and Business Intelligence (August 2009) Best practices for a Data Warehouse on Oracle Database 11g (September 2008)Technical PresentationsA selection of ObE - Oracle by Examples documents: Generic Using Basic Database Functionality for Data Warehousing (10g) Partitioning Manipulating Partitions in Oracle Database (11g Release 1) Using High-Speed Data Loading and Rolling Window Operations with Partitioning (11g Release 1) Using Partitioned Outer Join to Fill Gaps in Sparse Data (10g) Materialized View and Query Rewrite Using Materialized Views and Query Rewrite Capabilities (10g) Using the SQLAccess Advisor to Recommend Materialized Views and Indexes (10g) Oracle OLAP Using Microsoft Excel With Oracle 11g Cubes (how to analyze data in Oracle OLAP Cubes using Excel's native capabilities) Using Oracle OLAP 11g With Oracle BI Enterprise Edition (Creating OBIEE Metadata for OLAP 11g Cubes and querying those in BI Answers) Building OLAP 11g Cubes Querying OLAP 11g Cubes Creating Interactive APEX Reports Over OLAP 11g CubesSelection of presentations from the BIWA website:Extreme Data Warehousing With Exadata  by Hermann Baer (July 2010) (slides 2.5MB, recording 54MB)Data Mining Made Easy! Introducing Oracle Data Miner 11g Release 2 New "Work flow" GUI   by Charlie Berger (May 2010) (slides 4.8MB, recording 85MB )Best Practices for Deploying a Data Warehouse on Oracle Database 11g  by Maria Colgan (December 2009)  (slides 3MB, recording 18MB, white paper 3MB )

    Read the article

  • Commémoration du centenaire d'Alan Turing, "il était l'un des premiers à établir ce qu'était un ordinateur" pour le directeur de Microsoft Research

    Commémoration du centenaire d'Alan Turing « il était l'un des premiers à établir ce qu'était un ordinateur » pour le directeur de Microsoft Research Ce samedi 23 juin 2012 marquera le centième anniversaire d'Alan Turing, un pionnier de l'informatique moderne. Considéré comme « le père de l'ordinateur », Alan Turing est un mathématicien britannique né en 1912. Il est connu pour avoir percé le code de l'encodeuse Enigma utilisée par les nazis lors de la Seconde Guerre mondiale pour chiffrer les messages. En 1950, il créa le célèbre « Test de Turing », qu'il formula dans l'article « Computing Machinery and Intelligence », dans lequel il qualifie un ordinateur d'intelligent si celui-ci...

    Read the article

  • What languages are most commonly used in medical research?

    - by Chris Taylor
    For someone about to go into a career in medical research, what language would be the most useful to learn? From my limited experience (I have been a researcher in mathematics and in finance) I have been able to recommend looking at R (for statistics) Matlab (for general numeric processing) and Python (for general purpose programming with statistics/numerics as an add-on) but I don't know which of those (if any) are in common use -- or if there are other, more specialized languages that are used. To be clear, I'm not talking about a professional programmer working in a medical setting. I am talking about a medical or genetics researcher who uses programming to analyse data, or generally to help get their work done.

    Read the article

  • Is it better to concentrate on one or two research projects throughout undergrad?

    - by AruniRC
    Currently in the 4th semester of engineering in an Indian university. The thing is - is it better to do as many short-lived projects/research work on diverse topics of computer science or stick to one/two projects consistently throughout my undergraduate years? Case in point: currently working on an image-processing project that promises to carry on for a year or so (as per the prof). Does this seem like being over-specialized at too early a level? Although taking on too many things will spread me out thin and in all probability not end up getting any meaningful work done. Especially as I hope to apply for grad school in the US. Would really appreciate any views and suggestions on this.

    Read the article

  • MoodScope : l'outil qui détermine l'humeur d'un utilisateur de smartphone, quelle application intéressante voyez-vous à ce projet Microsoft Research ?

    MoodScope : l'outil qui peut determiner l'humeur de l'utilisateur d'un smartphone Quelle application intéressante voyez-vous à ce projet de Microsoft Research ?Google a dévoilé une API qui permet de déterminer si l'utilisateur marche, court ou fait du vélo. Microsoft, lui, travaille sur un outil qui va encore plus loin dans l'analyse du contexte. Un outil qui permet de déterminer l'humeur de l'utilisateur à l'instant T.Baptisé MoodScope, l'outil des Labos R&D de l'éditeur analyse plusieurs facteurs parmi lesquels le degré d'activité sur le téléphone, l'heure de la journée, le jour de la semaine (au bureau, en week-end, etc.) ou les appl...

    Read the article

  • Which topics do I need to research to enable me to complete my self-assigned "Learning Project"?

    - by Anonymous -
    I want to continue learning C#. I've read parts of a few books recommended on here and the language is feeling more familiar by the day. I'd like to tackle a mid-sized personal project to take my expertise to the next level. What I'd like to do, is create an application that 'manages expenses', that runs on multiple machines on a LAN. So for example, say we have person1 and person2 on seperate machines running the application, when person1 enters an expense, it will appear on person2's (pretty UI) view of the expenses database and vice versa. What topics do I need to research to make this possible for me? I plan on learning WPF for the UI (though the steep learning curve (or so I'm told) has me a little anxious about that at this stage. With regards to the database, which database would you recommend I use? I don't want a 'server' for the database to run on, so do I need to use an embedded database that each client machine runs a copy of that updates to each other (upon startup/entering of expense on any machine etc)? What topics under networking should I be looking at? I haven't studied networking before in any language, so do I need to learn about sockets or?

    Read the article

  • How can I best manage making open source code releases from my company's confidential research code?

    - by DeveloperDon
    My company (let's call them Acme Technology) has a library of approximately one thousand source files that originally came from its Acme Labs research group, incubated in a development group for a couple years, and has more recently been provided to a handful of customers under non-disclosure. Acme is getting ready to release perhaps 75% of the code to the open source community. The other 25% would be released later, but for now, is either not ready for customer use or contains code related to future innovations they need to keep out of the hands of competitors. The code is presently formatted with #ifdefs that permit the same code base to work with the pre-production platforms that will be available to university researchers and a much wider range of commercial customers once it goes to open source, while at the same time being available for experimentation and prototyping and forward compatibility testing with the future platform. Keeping a single code base is considered essential for the economics (and sanity) of my group who would have a tough time maintaining two copies in parallel. Files in our current base look something like this: > // Copyright 2012 (C) Acme Technology, All Rights Reserved. > // Very large, often varied and restrictive copyright license in English and French, > // sometimes also embedded in make files and shell scripts with varied > // comment styles. > > > ... Usual header stuff... > > void initTechnologyLibrary() { > nuiInterface(on); > #ifdef UNDER_RESEARCH > holographicVisualization(on); > #endif > } And we would like to convert them to something like: > // GPL Copyright (C) Acme Technology Labs 2012, Some rights reserved. > // Acme appreciates your interest in its technology, please contact [email protected] > // for technical support, and www.acme.com/emergingTech for updates and RSS feed. > > ... Usual header stuff... > > void initTechnologyLibrary() { > nuiInterface(on); > } Is there a tool, parse library, or popular script that can replace the copyright and strip out not just #ifdefs, but variations like #if defined(UNDER_RESEARCH), etc.? The code is presently in Git and would likely be hosted somewhere that uses Git. Would there be a way to safely link repositories together so we can efficiently reintegrate our improvements with the open source versions? Advice about other pitfalls is welcome.

    Read the article

  • Why has there been no serious research in statistical programming languages for 25 years?

    - by Robert
    The two main statistical languages today are S (in the form of R) and SAS, which today pretty much have the form they had 25 years ago. Whatever usability problems or worker productivity problems they had then, they still have today. I'm a data language designer, and I look at, largely, four aspects: Usability (learning curve & readability - here Python scores high) Productivity (how long it takes to finish your work) Flexibility (SAS and R don't have problems here, but a macro library will) Reliability (in the QA/reproducibility sense, usually a PL does better than a GUI here) By the way, I have a language that can produce complex statistical tables much faster than SAS (like 25 lines of code instead of several hundred lines of code). And I'm going to produce a language for data cleaning that will be great for usability (it'll be my third).

    Read the article

  • Knowledge and user generated content management system to track files, research, proposals, etc.?

    - by Eshwar
    I'll try keep it short. Here's the scenario: We have employees all over the world performing similar work i.e. research, generating powerpoint slides, word documents, graphics, etc. Many times a lot of this previous work can be reused for another future project. The current arrangement is email and phone calls which as you would agree is quick if you know where to look but otherwise archaic and very very inefficient. So I am looking for software that will allow me to do the following: Tag files e.g. an investor presentation on cellphone usage in kenya would be tagged investor, cellphone, kenya Manage references e.g. if we read something on the internet, should be able to paste that link in some fashion and tag it as above. Preferably cloud based so that it can be accessed by anybody and additionally would be nice (though NOT must) to have access levels (director, manager, everyone) A nice interface that non technically savvy folks can warm up to ;) A desktop app would be handy so that people don't always have to click upload or something A tree based system is inefficient in this case because content is usually linked across branches and also people might not quite agree on one format of a tree. Tagging works around this very nicely. What I have considered so far: Evernote (for its more professional look) Springpad (for its versatility with content) Mendeley (this is a research manager and in some ways ideal, but i fear its limited to PDFs) The goal is that when somebody wants to look for a document, they don't have to ask a colleague, they can just search with keywords and all relevant information shows up. Thanks!

    Read the article

  • Keyword Research - Does it Do Anything For Your Website?

    When it comes to helping a start up online company get off the ground properly and set it on its way to success there are certain actions which can be undertaken and certain services which must be used. The above refer to every step from the web design and web content to the amount of online visibility that specific website gets.

    Read the article

  • How to determine the amount to spend per phrase on Adwords research?

    - by Anonymous -
    My company would like to start a PPC advertising campaign. Whilst I understand the concept and how to set everything up from a technical point of view, this is something I've never done before. Logically, we'd like to test out a wide range of keywords that we think would lead to conversions, which we've put together through brainstorming and with some help from Google's External Keyword Tool. Sub-question whilst I remember - am I correct in thinking that in Google's keyword tool, keywords that we think will perform well that have a low competition yet high monthly searches are good since there will be less advertisers, meaning our bid per click will be less? Is there a common benchmark or process of doing a round of tests with keywords? Should we wait for 100 clicks on each keyword, see which ones have lead to the most sales (or rather, sales that are sustainable with the cost per click of that keyword), then drop the ones which aren't converting and put that budget onto the converting keywords? We realistically have a few hundred keywords/phrases we would like to test, but spending $100 per keyword/phrase is going to work out as quite an expensive test. It would be nice to be able to spend $5-10 per phrase, but I don't think the sample size would be great enough to determine anything usefully reliable. Another approach might be to setup all the keywords, and those that bring the most sales within x hours/days would be the ones we use. What is the common procedure with things like this? I know there are a plethora of companies that specialize in exactly this, but this is something we anticipate doing a lot in the future, so it would make sense to do it in house if at all possible.

    Read the article

  • is it legal/ethical to use source code provided in academic papers, or talks given at trade events l

    - by lucid
    so, is it legal to use source code from papers and such: like this paper on perlin noise: [url]http://mrl.nyu.edu/~perlin/paper445.pdf[/url] links to this source code: [url]http://mrl.nyu.edu/~perlin/noise/[/url] and stam's famous talk on fluid dynamics, includes source code throughout, annotated with instructions like "add these macros to the beginning of your code" [url]http://www.dgp.toronto.edu/people/stam/reality/Research/pdf/GDC03.pdf[/url] I'm just not sure if it's legal to copy and paste this to use in your own commercial code. if I were to make my own implementation, it would end up being close to identical, since I'd probably use the source code as a reference. I know very little about copyright law, including how it applies in these situations, and I can never find usage and licensing terms for these. Nor did googling any terms I could think of provide me the specific answer I need. does anyone know for sure what the rules/laws are here, or where I can find the answer?

    Read the article

  • Which browser is the most secure? (research and practically based)

    - by wag2639
    I was wondering which browser is the most secure today, Firefox, Internet Explorer, Chrome, or Safari on a Windows machine with the user running as a Power User/Administrator account. This is not a question about which browser is the best because its the most usable, but more of a question if asked for security, which browser is the most secure given an everyday user's experience (JavaScript, Flash, Ads, etc). Also, would the choice for most secure change if the user was running as a restricted user? To clarify, I'm looking for an answer that's based in research on potential and common exploits and how long it takes for critical problems to be patched.

    Read the article

  • What area of Software Engineering are you going to focus your research on?

    - by ultrajohn
    hi guys! I have this very subjective question regarding software engineering. Let's say you want to pursue a graduate degree i.e. master degree with a major in software engineering, what particular topic or area of research in the field are your going to pursue? From your experience, what are the different aspects of software engineering which are vital in our field that are "under"(less) research. I know this is very subjective, I just want to elicit ideas from you guys whom I think knows a lot about the field. Thanks a lot.

    Read the article

  • Call for Paper: Oracle OpenWorld 2011

    - by jean-pierre.dijcks
    OpenWorld 2011 is now open for the public to submit session proposals. We would like to encourage our customers, and partners to participate in this ‘call for papers” (CFP) process. CFP for the general public, non-Oracle employee submitters, closes on March 27, 2011. Here are the details: Conference Location: Moscone Convention Center, San Francisco, CA. Conference Date: Sunday - Thursday, October 2 - 6, 2011 Conference Website: http://www.oracle.com/us/openworld CFP Website: https://oracleus.wingateweb.com/portal/cfp/ Paper submission key dates: Deliverables Due Dates Call for Papers Begins Wednesday, March 9 Call for Papers Ends Sunday, March 27 – 11:59 pm PDT Notifications for Accepted and Declined Submissions Sent End of May Questions regarding the Call for Papers, send an email to [email protected]

    Read the article

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

< Previous Page | 10 11 12 13 14 15 16 17 18 19 20 21  | Next Page >