Search Results

Search found 82 results on 4 pages for 'convergence'.

Page 1/4 | 1 2 3 4  | Next Page >

  • What Will Happen to Real Estate Leases when Operating Leases are Gone?

    - by Theresa Hickman
    Many people are concerned about what will happen to real estate leases when FASB and IASB abolish operating leases. They plan to unveil the proposed standards on treating leases this summer as part of the convergence project but no "finalized ruling" is expected for at least a year because it will need to get formal consensus from many players, such as the SEC, American Association of Investors, Congress, the Big Four, American Associate of Realtors, the international equivalents of these, etc. If your accounting is a bit rusty, an Operating Lease is where you lease equipment or some asset for a shorter period than the actual (expected) life of the asset and then give the asset back while it still has some useful life in it. (Think leasing a car). Because an Operating Lease does not contain any of the provisions that would qualify it as a Capital Lease, the lease is not treated as a sale or purchase and hits the lessee's rental expense and the lessor's revenue. So it all stays on the P&L (assuming no prepayments are made). Capital Leases, on the other hand, hit lessee's and lessor's balance sheets because the asset is treated as a sale. (I'm ignoring interest and depreciation here to emphasize my point). Question: What will happen to real estate leases when Operating Leases go away and how will Oracle Financials address these changes? Before I attempt to address these questions, here's a real-life example to expound on some of the issues: Let's say a U.S. retailer leases a store in a mall for 15 years. Under U.S. GAAP, the lease is considered an operating or expense lease. Will that same lease be considered a capital lease under IFRS? Real estate leases are supposedly going to be capitalized under IFRS. If so, will everyone need to change all leases from operating to capital? Or, could we make some adjustments so we report the lease as an expense for operations reporting but capitalize it for SEC reporting? Would all aspects of the lease be capitalized, or would some line items still be expensed? For example, many retail store leases are defined to include (1) the agreed-to rent amount; (2) a negotiated increase in base rent, e.g., maybe a 5% increase in Year 5; (3) a sales rent component whereby the retailer pays a variable additional amount based on the sales generated in the prior month; (4) parking lot maintenance fees. Would the entire lease be capitalized, or would some portions still be expensed? To help answer these questions, I met up with our resident accounting expert and walking encyclopedia, Seamus Moran. Here's what he had to say: Oracle is aware of the potential changes specific to reporting/capitalization of real estate leases; i.e., we are aware that FASB and IASB have identified real estate leases as one of the areas for standards convergence. Oracle stays apprised of the on-going convergence through our domain expertise staff, our relationship with customers, our market awareness, and, of course, our relationships with the Big 4. This is part of our normal process with respect to regulatory compliance worldwide. At this time, Oracle expects that the standards convergence committee will make a recommendation about reporting standards for real estate leases in about a year. Following typical procedures, we also expect that the recommendation will be up for review for a year, and customers will then need to start reporting to the new standard about a year after that. So that means we would expect the first customer to report under the new standard in maybe 3 years. Typically, after the new standard is finalized and distributed, we find that our customers then begin to evaluate how they plan to meet the new standard. And through groups like the Customer Advisory Boards (CABs), our customers tell us what kind of product changes are needed in order to satisfy their new reporting requirements. Of course, Oracle is also working with the Big 4 and Accenture and other implementers in order to ascertain that these recommended changes will indeed meet new reporting standards. So the best advice we can offer right now is, stay apprised of the standards convergence committee; know that Oracle is also staying abreast of developments; get involved with your CAB so your voice is heard; know that Oracle products continue to be GAAP compliant, and we will continue to maintain that as our standard. But exactly what is that "standard"--we need to wait on the standards convergence committee. In a nut shell, operating leases will become either capital leases or month to month rentals, but it is still too early, too political and too uncertain to call out at this point.

    Read the article

  • Setup Convergence Address Book DisplayName Lookup

    - by user13332755
    At Convergence Address Book the default lookup for 'Display Name' is the LDAP attribute 'cn', which leads into confusion if you start to setup 'displayName' LDAP attributes for your users. LDAP User Entry with diaplyName attribute: This behavior can be controlled by the configuration file xlate-inetorgperson.xml. Change the default value of XPATH abperson\entry\displayname from 'cn' to 'displayName'. <convergence_deploy_location>/config/templates/ab/corp-dir/xlate-inetorgperson.xml Note: In case the user has no 'displayName' attribute in LDAP you might noticed '...' at the user entry, if found via 'mail' attribute.

    Read the article

  • US GAAP and IFRS Convergence May Be Delayed Even More

    - by Theresa Hickman
    Yesterday, on March 10, the Financial Accounting Standards Board (FASB) met to discuss the changes in financial statement presentation. Over the last six months, the FASB and IASB have been working feverishly to converge US GAAP and IFRS standards to meet the 2011 deadline. In March alone, the standards-setters met eight times. Many people fear that this accelerated pace is compromising the quality of the end product and that maybe they should slow down and do their due diligence. According to WG&L Accounting & Compliance Alert Checkpoint 3/10/2010, (which requires a subscription to view the full article) "Some statement preparers and investors who advise the FASB believe that the process would be better served if it was slowed down so that more attention could be paid to quality." "Should 2011 be looked at as a line in the sand?" asked Joan Amble, executive vice president and corporate comptroller for American Express Co. "We don't think that due process should be compromised for the due date," concurred Lewis Dulitz, vice president of accounting policies and research for medical products supplier Covidien plc. I personally have mixed emotions about this. On one hand, I have been growing impatient with how slow the US has jumped on the IFRS band wagon. On the other hand, being the conservative that I am and knowing this convergence will be costly and disruptive to businesses, I would prefer to be safe than sorry and get it right the first time.

    Read the article

  • Slower Rate of Convergence for U.S. GAAP and IFRS

    - by Theresa Hickman
    The original date of June 30, 2011 where FASB and IASB would align/converge major areas of accounting has been extended to the end of 2011. They will still meet the June 2011 date for many "urgently required" projects but some projects will not come until the second half of 2011. The reason for this is to allow more time for due diligence, review and consensus. Will this delay the U.S. adoption to IFRS? According to Ms. Schapiro, no, it will not; she is confident that the decision to adopt IFRS in the U.S. will be decided by 2011. I personally hope so because I fear that if the decision is delayed further, it might seep into the 2012 presidential election which could delay the adoption further. For more information, see reuters.com.

    Read the article

  • The convergence of Risk and Performance Management

    Historically, the market has viewed Enterprise Performance Management (EPM) and Governance, Risk and Compliance (GRC) as separate processes and solutions. But these two worlds are coming together – in fact industry analyst firms such as AMR Research believe that by the end of 2009, risk management will be part of every EPM discussion. Tune into this conversation with John O'Rourke, VP of Product Marketing for Oracle Enterprise Performance Management Solutions, and Karen dela Torre, Senior Director of Product Marketing for Financial Applications to learn how EPM and GRC are converging, what the integration points are, and what Oracle is doing to help customers perform more effective risk and performance management.

    Read the article

  • The Convergence of Risk and Performance Management

    Historically, the market has viewed Enterprise Performance Management (EPM) and Governance, Risk and Compliance (GRC) as separate processes and solutions. But these two worlds are coming together-in fact industry analyst firms such as AMR Research believe that by the end of 2009, risk management will be part of every EPM discussion. Tune into this conversation with John O'Rourke, VP of Product Marketing for Oracle Enterprise Performance Management Solutions, and Karen dela Torre, Senior Director of Product Marketing for Financial Applications to learn how EPM and GRC are converging, what the integration points are, and what Oracle is doing to help customers perform more effective risk and performance management.

    Read the article

  • Oracle WebCenter: The Best of the Best

    - by kellsey.ruppel(at)oracle.com
    You may remember that the key goals of the new release of WebCenter are providing a Modern User Experience, unparalleled Application Integration, converging all the best of the existing portal platforms into WebCenter and delivering a Common User Experience Architecture.  Last week, we provided an overview of Oracle WebCenter, and this week, we'll focus on Convergence and how the new release of Oracle WebCenter is the Best of the Best..Our development team has been working very hard to bring all the best capabilities from each of the existing portal products into one modern user experience platform that provides a robust foundation for moving customers into the future.  Each of the development teams still maintain their existing products to support the current customers,  but they've been tasked with converging their unique best of breed features into the new WebCenter release so that it will meet the broadest set of use cases possible. For example, we've taken the fastest and most scalable portlet engine in the industry with Oracle WebLogic Portal, integrated it within WebCenter, and improved performance further, to deliver even more performance for our customers.  In addition, we've focused on extending the reach of all the different user experience resources so that customers can deliver robust capabilities into their existing portals, applications, composite applications, dashboards, mobile applications, really any channel that requires information.  And finally, we've combined a whole set of community and multi-site capabilities leveraging the pioneering capabilities of Plumtree portal directly into the new WebCenter release.  While at the same time we've built and delivered the new WebCenter release, we've also provided new feature releases of all the existing products.  In this way, customers can continue to gain value out of their existing investments while at the same time have the smoothest path to upgrading to the new WebCenter release. With the new WebCenter release, we are delivering a converged platform to address all portal requirements that have been delivered by different point products in our portal portfolio in the past. Additionally, this release delivers the most modern user experience that goes well beyond the experience the other portal products provided. This is because the new WebCenter release has been built from the ground up with modern technologies around rich clients, SOA, and customizations compared with other portal products whose architecture has been adapted to add capabilities like AJAX, personalization, and social computing.The new WebCenter release addresses the broadest set of use cases using single product set and single architecture spanning extranet sites to social communities. It helps customers manage, maintain and develop one technology set, but leverage it throughout their organization whether it's embedded in an application or a new destination for improved customer and employee productivity. Additionally, the new release of WebCenter leverages the best and most performant features of all the existing portfolio products to deliver the fastest and most scalable portal platform.  Most importantly, it supports the broadest development models spanning from J2EE/Java to HTML/REST to .NET.Keep checking back this week as we provide additional resources and information on how the new release of Oracle WebCenter is the Best of the Best - converging all the best capabilities from each of the existing portal products into one modern user experience platform.

    Read the article

  • Issues in Convergence of Sequential minimal optimization for SVM

    - by Amol Joshi
    I have been working on Support Vector Machine for about 2 months now. I have coded SVM myself and for the optimization problem of SVM, I have used Sequential Minimal Optimization(SMO) by Mr. John Platt. Right now I am in the phase where I am going to grid search to find optimal C value for my dataset. ( Please find details of my project application and dataset details here http://stackoverflow.com/questions/2284059/svm-classification-minimum-number-of-input-sets-for-each-class) I have successfully checked my custom implemented SVM`s accuracy for C values ranging from 2^0 to 2^6. But now I am having some issues regarding the convergence of the SMO for C 128. Like I have tried to find the alpha values for C=128 and it is taking long time before it actually converges and successfully gives alpha values. Time taken for the SMO to converge is about 5 hours for C=100. This huge I think ( because SMO is supposed to be fast. ) though I`m getting good accuracy? I am screwed right not because I can not test the accuracy for higher values of C. I am actually displaying number of alphas changed in every pass of SMO and getting 10, 13, 8... alphas changing continuously. The KKT conditions assures convergence so what is so weird happening here? Please note that my implementation is working fine for C<=100 with good accuracy though the execution time is long. Please give me inputs on this issue. Thank You and Cheers.

    Read the article

  • HP ouvre ses serveurs de mission critique à l'architecture x86 et annonce le projet Odyssey de convergence avec les systèmes UNIX

    HP ouvre ses serveurs à mission critique à l'architecture x86 Et annonce le Projet Odyssey de convergence avec les systèmes UNIX HP se lance dans un projet d'envergure qui vise à réunir les architectures serveur UNIX et x86 au sein d'une plateforme unique pour les systèmes critiques : son Projet Odyssey. Les serveurs haut de gamme Integrity vont pouvoir accueillir des processeurs Intel Xeon x86, compatibles Windows et Linux, qui viennent disputer le règne de l'Itanium vieillissant que HP et Intel

    Read the article

  • Handling SMS/email convergence: how does a good business app do it?

    - by Tim Cooper
    I'm writing a school administration software package, but it strikes me that many developers will face this same issue: when communicating with users, should you use email or SMS or both, and should you treat them as fundamentally equivalent channels such that any message can get sent using any media, (with long and short forms of the message template obviously) or should different business functions be specifically tailored to each of the 3? This question got kicked off "StackOverflow" for being overly general, so I'm hoping it's not too general for this site - the answers will no doubt be subjective but "you don't need to write a whole book to answer the question". I'm particularly interested in people who have direct experience of having written comparable business applications. Sub-questions: Do I treat SMS as "moderately secure" and email as less secure? (I'm thinking about booking tokens for parent/teacher nights, permission slips for excursions, absence explanation notes - so high security is not a requirement for us, although medium security is) Is it annoying for users to receive the same message on multiple channels? Should we have a unified framework that reports on delivery or lack thereof of emails and SMS's?

    Read the article

  • Unified Communications Suite Ships New Version

    - by joesciallo
    We shipped the latest version (7.0.5.0.0) of Unified Communications Suite. The following information should get you started: Get the Software New Features Release Notes Some Changes for 7.0.5.0.0 Convergence: Version 3.0.0.0.0 enables you to use the add-on framework to add third-party services to the Convergence UI. These services include: Advertising Click-to-call service Multinetwork IM SMS (both one-way and two-way) Social media applications (Facebook, Twitter, and Flickr) Video and voice calling capability For more information, see Overview of Add-on Services in Convergence. Calendar Server: Version 7.0.4.14.0 provides a number of security enhancements, including supporting the SSL protocol for all front-end and back-end communications, and the ability to list hosts that are allowed to send iSchedule POST requests. For more information, see Securing Communications to Calendar Server Back Ends. New Platform Support: Oracle GlassFish Server 3, Oracle Solaris 11, and Oracle Enterprise Linux 6.x are supported in this release of Communications Suite.

    Read the article

  • Extreme Optimization – Numerical Algorithm Support

    - by JoshReuben
    Function Delegates Many calculations involve the repeated evaluation of one or more user-supplied functions eg Numerical integration. The EO MathLib provides delegate types for common function signatures and the FunctionFactory class can generate new delegates from existing ones. RealFunction delegate - takes one Double parameter – can encapsulate most of the static methods of the System.Math class, as well as the classes in the Extreme.Mathematics.SpecialFunctions namespace: var sin = new RealFunction(Math.Sin); var result = sin(1); BivariateRealFunction delegate - takes two Double parameters: var atan2 = new BivariateRealFunction (Math.Atan2); var result = atan2(1, 2); TrivariateRealFunction delegate – represents a function takes three Double arguments ParameterizedRealFunction delegate - represents a function taking one Integer and one Double argument that returns a real number. The Pow method implements such a function, but the arguments need order re-arrangement: static double Power(int exponent, double x) { return ElementaryFunctions.Pow(x, exponent); } ... var power = new ParameterizedRealFunction(Power); var result = power(6, 3.2); A ComplexFunction delegate - represents a function that takes an Extreme.Mathematics.DoubleComplex argument and also returns a complex number. MultivariateRealFunction delegate - represents a function that takes an Extreme.Mathematics.LinearAlgebra.Vector argument and returns a real number. MultivariateVectorFunction delegate - represents a function that takes a Vector argument and returns a Vector. FastMultivariateVectorFunction delegate - represents a function that takes an input Vector argument and an output Matrix argument – avoiding object construction  The FunctionFactory class RealFromBivariateRealFunction and RealFromParameterizedRealFunction helper methods - transform BivariateRealFunction or a ParameterizedRealFunction into a RealFunction delegate by fixing one of the arguments, and treating this as a new function of a single argument. var tenthPower = FunctionFactory.RealFromParameterizedRealFunction(power, 10); var result = tenthPower(x); Note: There is no direct way to do this programmatically in C# - in F# you have partial value functions where you supply a subset of the arguments (as a travelling closure) that the function expects. When you omit arguments, F# generates a new function that holds onto/remembers the arguments you passed in and "waits" for the other parameters to be supplied. let sumVals x y = x + y     let sumX = sumVals 10     // Note: no 2nd param supplied.     // sumX is a new function generated from partially applied sumVals.     // ie "sumX is a partial application of sumVals." let sum = sumX 20     // Invokes sumX, passing in expected int (parameter y from original)  val sumVals : int -> int -> int val sumX : (int -> int) val sum : int = 30 RealFunctionsToVectorFunction and RealFunctionsToFastVectorFunction helper methods - combines an array of delegates returning a real number or a vector into vector or matrix functions. The resulting vector function returns a vector whose components are the function values of the delegates in the array. var funcVector = FunctionFactory.RealFunctionsToVectorFunction(     new MultivariateRealFunction(myFunc1),     new MultivariateRealFunction(myFunc2));  The IterativeAlgorithm<T> abstract base class Iterative algorithms are common in numerical computing - a method is executed repeatedly until a certain condition is reached, approximating the result of a calculation with increasing accuracy until a certain threshold is reached. If the desired accuracy is achieved, the algorithm is said to converge. This base class is derived by many classes in the Extreme.Mathematics.EquationSolvers and Extreme.Mathematics.Optimization namespaces, as well as the ManagedIterativeAlgorithm class which contains a driver method that manages the iteration process.  The ConvergenceTest abstract base class This class is used to specify algorithm Termination , convergence and results - calculates an estimate for the error, and signals termination of the algorithm when the error is below a specified tolerance. Termination Criteria - specify the success condition as the difference between some quantity and its actual value is within a certain tolerance – 2 ways: absolute error - difference between the result and the actual value. relative error is the difference between the result and the actual value relative to the size of the result. Tolerance property - specify trade-off between accuracy and execution time. The lower the tolerance, the longer it will take for the algorithm to obtain a result within that tolerance. Most algorithms in the EO NumLib have a default value of MachineConstants.SqrtEpsilon - gives slightly less than 8 digits of accuracy. ConvergenceCriterion property - specify under what condition the algorithm is assumed to converge. Using the ConvergenceCriterion enum: WithinAbsoluteTolerance / WithinRelativeTolerance / WithinAnyTolerance / NumberOfIterations Active property - selectively ignore certain convergence tests Error property - returns the estimated error after a run MaxIterations / MaxEvaluations properties - Other Termination Criteria - If the algorithm cannot achieve the desired accuracy, the algorithm still has to end – according to an absolute boundary. Status property - indicates how the algorithm terminated - the AlgorithmStatus enum values:NoResult / Busy / Converged (ended normally - The desired accuracy has been achieved) / IterationLimitExceeded / EvaluationLimitExceeded / RoundOffError / BadFunction / Divergent / ConvergedToFalseSolution. After the iteration terminates, the Status should be inspected to verify that the algorithm terminated normally. Alternatively, you can set the ThrowExceptionOnFailure to true. Result property - returns the result of the algorithm. This property contains the best available estimate, even if the desired accuracy was not obtained. IterationsNeeded / EvaluationsNeeded properties - returns the number of iterations required to obtain the result, number of function evaluations.  Concrete Types of Convergence Test classes SimpleConvergenceTest class - test if a value is close to zero or very small compared to another value. VectorConvergenceTest class - test convergence of vectors. This class has two additional properties. The Norm property specifies which norm is to be used when calculating the size of the vector - the VectorConvergenceNorm enum values: EuclidianNorm / Maximum / SumOfAbsoluteValues. The ErrorMeasure property specifies how the error is to be measured – VectorConvergenceErrorMeasure enum values: Norm / Componentwise ConvergenceTestCollection class - represent a combination of tests. The Quantifier property is a ConvergenceTestQuantifier enum that specifies how the tests in the collection are to be combined: Any / All  The AlgorithmHelper Class inherits from IterativeAlgorithm<T> and exposes two methods for convergence testing. IsValueWithinTolerance<T> method - determines whether a value is close to another value to within an algorithm's requested tolerance. IsIntervalWithinTolerance<T> method - determines whether an interval is within an algorithm's requested tolerance.

    Read the article

  • Enterprise Software Development with Java by Markus Eisele

    - by JuergenKress
    This is a blog about software development for the enterprise. It focuses on Java Enterprise Edition (J2EE/Java EE). Beside this, I blog about Oracle WebLogic and GlassFish Server and other technologies that hit my road. Java Mission Control 5.2 is Finally Here! Welcome 7u40! It has been a while since we last heard of this fancy little thing called Mission Control. It came all the way from JRockit and was renamed to Java Mission Control. This is one of the parts which literally survived the convergence strategy between HotSpot and JRockit. With today's Java SE 7 Update 40 you can actually use it again. Java Mission Control 5.2 The former JRockit Mission Control (JRMC) is now called Java Mission Control (JMC) and is a tools suite which includes tools to monitor, manage, profile, and eliminate memory leaks in your Java application without introducing the performance overhead normally associated with tools of this type. Up to today the 5.1 version was available within the Oracle HotSpot downloads which could only be received by paying customers from the Oracle Support Website. Todays release is the first release of Java Mission Control that is bundled with the Hotspot JDK! The convergence project between JRockit and Hotspot has reached critical mass. With the 7u40 release of the Hotspot JDK there is an equivalent amount of Flight Recorder information available from Hotspot. Read the full article here. WebLogic Partner Community For regular information become a member in the WebLogic Partner Community please visit: http://www.oracle.com/partners/goto/wls-emea ( OPN account required). If you need support with your account please contact the Oracle Partner Business Center. Blog Twitter LinkedIn Mix Forum Wiki Technorati Tags: Markus Eisele,Java Development,WebLogic,WebLogic Community,Oracle,OPN,Jürgen Kress

    Read the article

  • A Taxonomy of Numerical Methods v1

    - by JoshReuben
    Numerical Analysis – When, What, (but not how) Once you understand the Math & know C++, Numerical Methods are basically blocks of iterative & conditional math code. I found the real trick was seeing the forest for the trees – knowing which method to use for which situation. Its pretty easy to get lost in the details – so I’ve tried to organize these methods in a way that I can quickly look this up. I’ve included links to detailed explanations and to C++ code examples. I’ve tried to classify Numerical methods in the following broad categories: Solving Systems of Linear Equations Solving Non-Linear Equations Iteratively Interpolation Curve Fitting Optimization Numerical Differentiation & Integration Solving ODEs Boundary Problems Solving EigenValue problems Enjoy – I did ! Solving Systems of Linear Equations Overview Solve sets of algebraic equations with x unknowns The set is commonly in matrix form Gauss-Jordan Elimination http://en.wikipedia.org/wiki/Gauss%E2%80%93Jordan_elimination C++: http://www.codekeep.net/snippets/623f1923-e03c-4636-8c92-c9dc7aa0d3c0.aspx Produces solution of the equations & the coefficient matrix Efficient, stable 2 steps: · Forward Elimination – matrix decomposition: reduce set to triangular form (0s below the diagonal) or row echelon form. If degenerate, then there is no solution · Backward Elimination –write the original matrix as the product of ints inverse matrix & its reduced row-echelon matrix à reduce set to row canonical form & use back-substitution to find the solution to the set Elementary ops for matrix decomposition: · Row multiplication · Row switching · Add multiples of rows to other rows Use pivoting to ensure rows are ordered for achieving triangular form LU Decomposition http://en.wikipedia.org/wiki/LU_decomposition C++: http://ganeshtiwaridotcomdotnp.blogspot.co.il/2009/12/c-c-code-lu-decomposition-for-solving.html Represent the matrix as a product of lower & upper triangular matrices A modified version of GJ Elimination Advantage – can easily apply forward & backward elimination to solve triangular matrices Techniques: · Doolittle Method – sets the L matrix diagonal to unity · Crout Method - sets the U matrix diagonal to unity Note: both the L & U matrices share the same unity diagonal & can be stored compactly in the same matrix Gauss-Seidel Iteration http://en.wikipedia.org/wiki/Gauss%E2%80%93Seidel_method C++: http://www.nr.com/forum/showthread.php?t=722 Transform the linear set of equations into a single equation & then use numerical integration (as integration formulas have Sums, it is implemented iteratively). an optimization of Gauss-Jacobi: 1.5 times faster, requires 0.25 iterations to achieve the same tolerance Solving Non-Linear Equations Iteratively find roots of polynomials – there may be 0, 1 or n solutions for an n order polynomial use iterative techniques Iterative methods · used when there are no known analytical techniques · Requires set functions to be continuous & differentiable · Requires an initial seed value – choice is critical to convergence à conduct multiple runs with different starting points & then select best result · Systematic - iterate until diminishing returns, tolerance or max iteration conditions are met · bracketing techniques will always yield convergent solutions, non-bracketing methods may fail to converge Incremental method if a nonlinear function has opposite signs at 2 ends of a small interval x1 & x2, then there is likely to be a solution in their interval – solutions are detected by evaluating a function over interval steps, for a change in sign, adjusting the step size dynamically. Limitations – can miss closely spaced solutions in large intervals, cannot detect degenerate (coinciding) solutions, limited to functions that cross the x-axis, gives false positives for singularities Fixed point method http://en.wikipedia.org/wiki/Fixed-point_iteration C++: http://books.google.co.il/books?id=weYj75E_t6MC&pg=PA79&lpg=PA79&dq=fixed+point+method++c%2B%2B&source=bl&ots=LQ-5P_taoC&sig=lENUUIYBK53tZtTwNfHLy5PEWDk&hl=en&sa=X&ei=wezDUPW1J5DptQaMsIHQCw&redir_esc=y#v=onepage&q=fixed%20point%20method%20%20c%2B%2B&f=false Algebraically rearrange a solution to isolate a variable then apply incremental method Bisection method http://en.wikipedia.org/wiki/Bisection_method C++: http://numericalcomputing.wordpress.com/category/algorithms/ Bracketed - Select an initial interval, keep bisecting it ad midpoint into sub-intervals and then apply incremental method on smaller & smaller intervals – zoom in Adv: unaffected by function gradient à reliable Disadv: slow convergence False Position Method http://en.wikipedia.org/wiki/False_position_method C++: http://www.dreamincode.net/forums/topic/126100-bisection-and-false-position-methods/ Bracketed - Select an initial interval , & use the relative value of function at interval end points to select next sub-intervals (estimate how far between the end points the solution might be & subdivide based on this) Newton-Raphson method http://en.wikipedia.org/wiki/Newton's_method C++: http://www-users.cselabs.umn.edu/classes/Summer-2012/csci1113/index.php?page=./newt3 Also known as Newton's method Convenient, efficient Not bracketed – only a single initial guess is required to start iteration – requires an analytical expression for the first derivative of the function as input. Evaluates the function & its derivative at each step. Can be extended to the Newton MutiRoot method for solving multiple roots Can be easily applied to an of n-coupled set of non-linear equations – conduct a Taylor Series expansion of a function, dropping terms of order n, rewrite as a Jacobian matrix of PDs & convert to simultaneous linear equations !!! Secant Method http://en.wikipedia.org/wiki/Secant_method C++: http://forum.vcoderz.com/showthread.php?p=205230 Unlike N-R, can estimate first derivative from an initial interval (does not require root to be bracketed) instead of inputting it Since derivative is approximated, may converge slower. Is fast in practice as it does not have to evaluate the derivative at each step. Similar implementation to False Positive method Birge-Vieta Method http://mat.iitm.ac.in/home/sryedida/public_html/caimna/transcendental/polynomial%20methods/bv%20method.html C++: http://books.google.co.il/books?id=cL1boM2uyQwC&pg=SA3-PA51&lpg=SA3-PA51&dq=Birge-Vieta+Method+c%2B%2B&source=bl&ots=QZmnDTK3rC&sig=BPNcHHbpR_DKVoZXrLi4nVXD-gg&hl=en&sa=X&ei=R-_DUK2iNIjzsgbE5ID4Dg&redir_esc=y#v=onepage&q=Birge-Vieta%20Method%20c%2B%2B&f=false combines Horner's method of polynomial evaluation (transforming into lesser degree polynomials that are more computationally efficient to process) with Newton-Raphson to provide a computational speed-up Interpolation Overview Construct new data points for as close as possible fit within range of a discrete set of known points (that were obtained via sampling, experimentation) Use Taylor Series Expansion of a function f(x) around a specific value for x Linear Interpolation http://en.wikipedia.org/wiki/Linear_interpolation C++: http://www.hamaluik.com/?p=289 Straight line between 2 points à concatenate interpolants between each pair of data points Bilinear Interpolation http://en.wikipedia.org/wiki/Bilinear_interpolation C++: http://supercomputingblog.com/graphics/coding-bilinear-interpolation/2/ Extension of the linear function for interpolating functions of 2 variables – perform linear interpolation first in 1 direction, then in another. Used in image processing – e.g. texture mapping filter. Uses 4 vertices to interpolate a value within a unit cell. Lagrange Interpolation http://en.wikipedia.org/wiki/Lagrange_polynomial C++: http://www.codecogs.com/code/maths/approximation/interpolation/lagrange.php For polynomials Requires recomputation for all terms for each distinct x value – can only be applied for small number of nodes Numerically unstable Barycentric Interpolation http://epubs.siam.org/doi/pdf/10.1137/S0036144502417715 C++: http://www.gamedev.net/topic/621445-barycentric-coordinates-c-code-check/ Rearrange the terms in the equation of the Legrange interpolation by defining weight functions that are independent of the interpolated value of x Newton Divided Difference Interpolation http://en.wikipedia.org/wiki/Newton_polynomial C++: http://jee-appy.blogspot.co.il/2011/12/newton-divided-difference-interpolation.html Hermite Divided Differences: Interpolation polynomial approximation for a given set of data points in the NR form - divided differences are used to approximately calculate the various differences. For a given set of 3 data points , fit a quadratic interpolant through the data Bracketed functions allow Newton divided differences to be calculated recursively Difference table Cubic Spline Interpolation http://en.wikipedia.org/wiki/Spline_interpolation C++: https://www.marcusbannerman.co.uk/index.php/home/latestarticles/42-articles/96-cubic-spline-class.html Spline is a piecewise polynomial Provides smoothness – for interpolations with significantly varying data Use weighted coefficients to bend the function to be smooth & its 1st & 2nd derivatives are continuous through the edge points in the interval Curve Fitting A generalization of interpolating whereby given data points may contain noise à the curve does not necessarily pass through all the points Least Squares Fit http://en.wikipedia.org/wiki/Least_squares C++: http://www.ccas.ru/mmes/educat/lab04k/02/least-squares.c Residual – difference between observed value & expected value Model function is often chosen as a linear combination of the specified functions Determines: A) The model instance in which the sum of squared residuals has the least value B) param values for which model best fits data Straight Line Fit Linear correlation between independent variable and dependent variable Linear Regression http://en.wikipedia.org/wiki/Linear_regression C++: http://www.oocities.org/david_swaim/cpp/linregc.htm Special case of statistically exact extrapolation Leverage least squares Given a basis function, the sum of the residuals is determined and the corresponding gradient equation is expressed as a set of normal linear equations in matrix form that can be solved (e.g. using LU Decomposition) Can be weighted - Drop the assumption that all errors have the same significance –-> confidence of accuracy is different for each data point. Fit the function closer to points with higher weights Polynomial Fit - use a polynomial basis function Moving Average http://en.wikipedia.org/wiki/Moving_average C++: http://www.codeproject.com/Articles/17860/A-Simple-Moving-Average-Algorithm Used for smoothing (cancel fluctuations to highlight longer-term trends & cycles), time series data analysis, signal processing filters Replace each data point with average of neighbors. Can be simple (SMA), weighted (WMA), exponential (EMA). Lags behind latest data points – extra weight can be given to more recent data points. Weights can decrease arithmetically or exponentially according to distance from point. Parameters: smoothing factor, period, weight basis Optimization Overview Given function with multiple variables, find Min (or max by minimizing –f(x)) Iterative approach Efficient, but not necessarily reliable Conditions: noisy data, constraints, non-linear models Detection via sign of first derivative - Derivative of saddle points will be 0 Local minima Bisection method Similar method for finding a root for a non-linear equation Start with an interval that contains a minimum Golden Search method http://en.wikipedia.org/wiki/Golden_section_search C++: http://www.codecogs.com/code/maths/optimization/golden.php Bisect intervals according to golden ratio 0.618.. Achieves reduction by evaluating a single function instead of 2 Newton-Raphson Method Brent method http://en.wikipedia.org/wiki/Brent's_method C++: http://people.sc.fsu.edu/~jburkardt/cpp_src/brent/brent.cpp Based on quadratic or parabolic interpolation – if the function is smooth & parabolic near to the minimum, then a parabola fitted through any 3 points should approximate the minima – fails when the 3 points are collinear , in which case the denominator is 0 Simplex Method http://en.wikipedia.org/wiki/Simplex_algorithm C++: http://www.codeguru.com/cpp/article.php/c17505/Simplex-Optimization-Algorithm-and-Implemetation-in-C-Programming.htm Find the global minima of any multi-variable function Direct search – no derivatives required At each step it maintains a non-degenerative simplex – a convex hull of n+1 vertices. Obtains the minimum for a function with n variables by evaluating the function at n-1 points, iteratively replacing the point of worst result with the point of best result, shrinking the multidimensional simplex around the best point. Point replacement involves expanding & contracting the simplex near the worst value point to determine a better replacement point Oscillation can be avoided by choosing the 2nd worst result Restart if it gets stuck Parameters: contraction & expansion factors Simulated Annealing http://en.wikipedia.org/wiki/Simulated_annealing C++: http://code.google.com/p/cppsimulatedannealing/ Analogy to heating & cooling metal to strengthen its structure Stochastic method – apply random permutation search for global minima - Avoid entrapment in local minima via hill climbing Heating schedule - Annealing schedule params: temperature, iterations at each temp, temperature delta Cooling schedule – can be linear, step-wise or exponential Differential Evolution http://en.wikipedia.org/wiki/Differential_evolution C++: http://www.amichel.com/de/doc/html/ More advanced stochastic methods analogous to biological processes: Genetic algorithms, evolution strategies Parallel direct search method against multiple discrete or continuous variables Initial population of variable vectors chosen randomly – if weighted difference vector of 2 vectors yields a lower objective function value then it replaces the comparison vector Many params: #parents, #variables, step size, crossover constant etc Convergence is slow – many more function evaluations than simulated annealing Numerical Differentiation Overview 2 approaches to finite difference methods: · A) approximate function via polynomial interpolation then differentiate · B) Taylor series approximation – additionally provides error estimate Finite Difference methods http://en.wikipedia.org/wiki/Finite_difference_method C++: http://www.wpi.edu/Pubs/ETD/Available/etd-051807-164436/unrestricted/EAMPADU.pdf Find differences between high order derivative values - Approximate differential equations by finite differences at evenly spaced data points Based on forward & backward Taylor series expansion of f(x) about x plus or minus multiples of delta h. Forward / backward difference - the sums of the series contains even derivatives and the difference of the series contains odd derivatives – coupled equations that can be solved. Provide an approximation of the derivative within a O(h^2) accuracy There is also central difference & extended central difference which has a O(h^4) accuracy Richardson Extrapolation http://en.wikipedia.org/wiki/Richardson_extrapolation C++: http://mathscoding.blogspot.co.il/2012/02/introduction-richardson-extrapolation.html A sequence acceleration method applied to finite differences Fast convergence, high accuracy O(h^4) Derivatives via Interpolation Cannot apply Finite Difference method to discrete data points at uneven intervals – so need to approximate the derivative of f(x) using the derivative of the interpolant via 3 point Lagrange Interpolation Note: the higher the order of the derivative, the lower the approximation precision Numerical Integration Estimate finite & infinite integrals of functions More accurate procedure than numerical differentiation Use when it is not possible to obtain an integral of a function analytically or when the function is not given, only the data points are Newton Cotes Methods http://en.wikipedia.org/wiki/Newton%E2%80%93Cotes_formulas C++: http://www.siafoo.net/snippet/324 For equally spaced data points Computationally easy – based on local interpolation of n rectangular strip areas that is piecewise fitted to a polynomial to get the sum total area Evaluate the integrand at n+1 evenly spaced points – approximate definite integral by Sum Weights are derived from Lagrange Basis polynomials Leverage Trapezoidal Rule for default 2nd formulas, Simpson 1/3 Rule for substituting 3 point formulas, Simpson 3/8 Rule for 4 point formulas. For 4 point formulas use Bodes Rule. Higher orders obtain more accurate results Trapezoidal Rule uses simple area, Simpsons Rule replaces the integrand f(x) with a quadratic polynomial p(x) that uses the same values as f(x) for its end points, but adds a midpoint Romberg Integration http://en.wikipedia.org/wiki/Romberg's_method C++: http://code.google.com/p/romberg-integration/downloads/detail?name=romberg.cpp&can=2&q= Combines trapezoidal rule with Richardson Extrapolation Evaluates the integrand at equally spaced points The integrand must have continuous derivatives Each R(n,m) extrapolation uses a higher order integrand polynomial replacement rule (zeroth starts with trapezoidal) à a lower triangular matrix set of equation coefficients where the bottom right term has the most accurate approximation. The process continues until the difference between 2 successive diagonal terms becomes sufficiently small. Gaussian Quadrature http://en.wikipedia.org/wiki/Gaussian_quadrature C++: http://www.alglib.net/integration/gaussianquadratures.php Data points are chosen to yield best possible accuracy – requires fewer evaluations Ability to handle singularities, functions that are difficult to evaluate The integrand can include a weighting function determined by a set of orthogonal polynomials. Points & weights are selected so that the integrand yields the exact integral if f(x) is a polynomial of degree <= 2n+1 Techniques (basically different weighting functions): · Gauss-Legendre Integration w(x)=1 · Gauss-Laguerre Integration w(x)=e^-x · Gauss-Hermite Integration w(x)=e^-x^2 · Gauss-Chebyshev Integration w(x)= 1 / Sqrt(1-x^2) Solving ODEs Use when high order differential equations cannot be solved analytically Evaluated under boundary conditions RK for systems – a high order differential equation can always be transformed into a coupled first order system of equations Euler method http://en.wikipedia.org/wiki/Euler_method C++: http://rosettacode.org/wiki/Euler_method First order Runge–Kutta method. Simple recursive method – given an initial value, calculate derivative deltas. Unstable & not very accurate (O(h) error) – not used in practice A first-order method - the local error (truncation error per step) is proportional to the square of the step size, and the global error (error at a given time) is proportional to the step size In evolving solution between data points xn & xn+1, only evaluates derivatives at beginning of interval xn à asymmetric at boundaries Higher order Runge Kutta http://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods C++: http://www.dreamincode.net/code/snippet1441.htm 2nd & 4th order RK - Introduces parameterized midpoints for more symmetric solutions à accuracy at higher computational cost Adaptive RK – RK-Fehlberg – estimate the truncation at each integration step & automatically adjust the step size to keep error within prescribed limits. At each step 2 approximations are compared – if in disagreement to a specific accuracy, the step size is reduced Boundary Value Problems Where solution of differential equations are located at 2 different values of the independent variable x à more difficult, because cannot just start at point of initial value – there may not be enough starting conditions available at the end points to produce a unique solution An n-order equation will require n boundary conditions – need to determine the missing n-1 conditions which cause the given conditions at the other boundary to be satisfied Shooting Method http://en.wikipedia.org/wiki/Shooting_method C++: http://ganeshtiwaridotcomdotnp.blogspot.co.il/2009/12/c-c-code-shooting-method-for-solving.html Iteratively guess the missing values for one end & integrate, then inspect the discrepancy with the boundary values of the other end to adjust the estimate Given the starting boundary values u1 & u2 which contain the root u, solve u given the false position method (solving the differential equation as an initial value problem via 4th order RK), then use u to solve the differential equations. Finite Difference Method For linear & non-linear systems Higher order derivatives require more computational steps – some combinations for boundary conditions may not work though Improve the accuracy by increasing the number of mesh points Solving EigenValue Problems An eigenvalue can substitute a matrix when doing matrix multiplication à convert matrix multiplication into a polynomial EigenValue For a given set of equations in matrix form, determine what are the solution eigenvalue & eigenvectors Similar Matrices - have same eigenvalues. Use orthogonal similarity transforms to reduce a matrix to diagonal form from which eigenvalue(s) & eigenvectors can be computed iteratively Jacobi method http://en.wikipedia.org/wiki/Jacobi_method C++: http://people.sc.fsu.edu/~jburkardt/classes/acs2_2008/openmp/jacobi/jacobi.html Robust but Computationally intense – use for small matrices < 10x10 Power Iteration http://en.wikipedia.org/wiki/Power_iteration For any given real symmetric matrix, generate the largest single eigenvalue & its eigenvectors Simplest method – does not compute matrix decomposition à suitable for large, sparse matrices Inverse Iteration Variation of power iteration method – generates the smallest eigenvalue from the inverse matrix Rayleigh Method http://en.wikipedia.org/wiki/Rayleigh's_method_of_dimensional_analysis Variation of power iteration method Rayleigh Quotient Method Variation of inverse iteration method Matrix Tri-diagonalization Method Use householder algorithm to reduce an NxN symmetric matrix to a tridiagonal real symmetric matrix vua N-2 orthogonal transforms     Whats Next Outside of Numerical Methods there are lots of different types of algorithms that I’ve learned over the decades: Data Mining – (I covered this briefly in a previous post: http://geekswithblogs.net/JoshReuben/archive/2007/12/31/ssas-dm-algorithms.aspx ) Search & Sort Routing Problem Solving Logical Theorem Proving Planning Probabilistic Reasoning Machine Learning Solvers (eg MIP) Bioinformatics (Sequence Alignment, Protein Folding) Quant Finance (I read Wilmott’s books – interesting) Sooner or later, I’ll cover the above topics as well.

    Read the article

  • The 7 Pillars of Search Engine Optimization and Marketing

    Search engine optimization (SEO) is all about bringing your website to the 1st page of every major search engine. However, the ultimate goals of SEO services are to increase targeted traffic to your website, improve convergence and increase sales. A successful search engine optimization strategy must involve all possible elements which fit a specific website. It must include the 7 pillars of search engine optimization.

    Read the article

  • Project Jigsaw: Late for the train: The Q&A

    - by Mark Reinhold
    I recently proposed, to the Java community in general and to the SE 8 (JSR 337) Expert Group in particular, to defer Project Jigsaw from Java 8 to Java 9. I also proposed to aim explicitly for a regular two-year release cycle going forward. Herewith a summary of the key questions I’ve seen in reaction to these proposals, along with answers. Making the decision Q Has the Java SE 8 Expert Group decided whether to defer the addition of a module system and the modularization of the Platform to Java SE 9? A No, it has not yet decided. Q By when do you expect the EG to make this decision? A In the next month or so. Q How can I make sure my voice is heard? A The EG will consider all relevant input from the wider community. If you have a prominent blog, column, or other communication channel then there’s a good chance that we’ve already seen your opinion. If not, you’re welcome to send it to the Java SE 8 Comments List, which is the EG’s official feedback channel. Q What’s the overall tone of the feedback you’ve received? A The feedback has been about evenly divided as to whether Java 8 should be delayed for Jigsaw, Jigsaw should be deferred to Java 9, or some other, usually less-realistic, option should be taken. Project Jigsaw Q Why is Project Jigsaw taking so long? A Project Jigsaw started at Sun, way back in August 2008. Like many efforts during the final years of Sun, it was not well staffed. Jigsaw initially ran on a shoestring, with just a handful of mostly part-time engineers, so progress was slow. During the integration of Sun into Oracle all work on Jigsaw was halted for a time, but it was eventually resumed after a thorough consideration of the alternatives. Project Jigsaw was really only fully staffed about a year ago, around the time that Java 7 shipped. We’ve added a few more engineers to the team since then, but that can’t make up for the inadequate initial staffing and the time lost during the transition. Q So it’s really just a matter of staffing limitations and corporate-integration distractions? A Aside from these difficulties, the other main factor in the duration of the project is the sheer technical difficulty of modularizing the JDK. Q Why is modularizing the JDK so hard? A There are two main reasons. The first is that the JDK code base is deeply interconnected at both the API and the implementation levels, having been built over many years primarily in the style of a monolithic software system. We’ve spent considerable effort eliminating or at least simplifying as many API and implementation dependences as possible, so that both the Platform and its implementations can be presented as a coherent set of interdependent modules, but some particularly thorny cases remain. Q What’s the second reason? A We want to maintain as much compatibility with prior releases as possible, most especially for existing classpath-based applications but also, to the extent feasible, for applications composed of modules. Q Is modularizing the JDK even necessary? Can’t you just put it in one big module? A Modularizing the JDK, and more specifically modularizing the Java SE Platform, will enable standard yet flexible Java runtime configurations scaling from large servers down to small embedded devices. In the long term it will enable the convergence of Java SE with the higher-end Java ME Platforms. Q Is Project Jigsaw just about modularizing the JDK? A As originally conceived, Project Jigsaw was indeed focused primarily upon modularizing the JDK. The growing demand for a truly standard module system for the Java Platform, which could be used not just for the Platform itself but also for libraries and applications built on top of it, later motivated expanding the scope of the effort. Q As a developer, why should I care about Project Jigsaw? A The introduction of a modular Java Platform will, in the long term, fundamentally change the way that Java implementations, libraries, frameworks, tools, and applications are designed, built, and deployed. Q How much progress has Project Jigsaw made? A We’ve actually made a lot of progress. Much of the core functionality of the module system has been prototyped and works at both compile time and run time. We’ve extended the Java programming language with module declarations, worked out a structure for modular source trees and corresponding compiled-class trees, and implemented these features in javac. We’ve defined an efficient module-file format, extended the JVM to bootstrap a modular JRE, and designed and implemented a preliminary API. We’ve used the module system to make a good first cut at dividing the JDK and the Java SE API into a coherent set of modules. Among other things, we’re currently working to retrofit the java.util.ServiceLoader API to support modular services. Q I want to help! How can I get involved? A Check out the project page, read the draft requirements and design overview documents, download the latest prototype build, and play with it. You can tell us what you think, and follow the rest of our work in real time, on the jigsaw-dev list. The Java Platform Module System JSR Q What’s the relationship between Project Jigsaw and the eventual Java Platform Module System JSR? A At a high level, Project Jigsaw has two phases. In the first phase we’re exploring an approach to modularity that’s markedly different from that of existing Java modularity solutions. We’ve assumed that we can change the Java programming language, the virtual machine, and the APIs. Doing so enables a design which can strongly enforce module boundaries in all program phases, from compilation to deployment to execution. That, in turn, leads to better usability, diagnosability, security, and performance. The ultimate goal of the first phase is produce a working prototype which can inform the work of the Module-System JSR EG. Q What will happen in the second phase of Project Jigsaw? A The second phase will produce the reference implementation of the specification created by the Module-System JSR EG. The EG might ultimately choose an entirely different approach than the one we’re exploring now. If and when that happens then Project Jigsaw will change course as necessary, but either way I think that the end result will be better for having been informed by our current work. Maven & OSGi Q Why not just use Maven? A Maven is a software project management and comprehension tool. As such it can be seen as a kind of build-time module system but, by its nature, it does nothing to support modularity at run time. Q Why not just adopt OSGi? A OSGi is a rich dynamic component system which includes not just a module system but also a life-cycle model and a dynamic service registry. The latter two facilities are useful to some kinds of sophisticated applications, but I don’t think they’re of wide enough interest to be standardized as part of the Java SE Platform. Q Okay, then why not just adopt the module layer of OSGi? A The OSGi module layer is not operative at compile time; it only addresses modularity during packaging, deployment, and execution. As it stands, moreover, it’s useful for library and application modules but, since it’s built strictly on top of the Java SE Platform, it can’t be used to modularize the Platform itself. Q If Maven addresses modularity at build time, and the OSGi module layer addresses modularity during deployment and at run time, then why not just use the two together, as many developers already do? A The combination of Maven and OSGi is certainly very useful in practice today. These systems have, however, been built on top of the existing Java platform; they have not been able to change the platform itself. This means, among other things, that module boundaries are weakly enforced, if at all, which makes it difficult to diagnose configuration errors and impossible to run untrusted code securely. The prototype Jigsaw module system, by contrast, aims to define a platform-level solution which extends both the language and the JVM in order to enforce module boundaries strongly and uniformly in all program phases. Q If the EG chooses an approach like the one currently being taken in the Jigsaw prototype, will Maven and OSGi be made obsolete? A No, not at all! No matter what approach is taken, to ensure wide adoption it’s essential that the standard Java Platform Module System interact well with Maven. Applications that depend upon the sophisticated features of OSGi will no doubt continue to use OSGi, so it’s critical that implementations of OSGi be able to run on top of the Java module system and, if suitably modified, support OSGi bundles that depend upon Java modules. Ideas for how to do that are currently being explored in Project Penrose. Java 8 & Java 9 Q Without Jigsaw, won’t Java 8 be a pretty boring release? A No, far from it! It’s still slated to include the widely-anticipated Project Lambda (JSR 335), work on which has been going very well, along with the new Date/Time API (JSR 310), Type Annotations (JSR 308), and a set of smaller features already in progress. Q Won’t deferring Jigsaw to Java 9 delay the eventual convergence of the higher-end Java ME Platforms with Java SE? A It will slow that transition, but it will not stop it. To allow progress toward that convergence to be made with Java 8 I’ve suggested to the Java SE 8 EG that we consider specifying a small number of Profiles which would allow compact configurations of the SE Platform to be built and deployed. Q If Jigsaw is deferred to Java 9, would the Oracle engineers currently working on it be reassigned to other Java 8 features and then return to working on Jigsaw again after Java 8 ships? A No, these engineers would continue to work primarily on Jigsaw from now until Java 9 ships. Q Why not drop Lambda and finish Jigsaw instead? A Even if the engineers currently working on Lambda could instantly switch over to Jigsaw and immediately become productive—which of course they can’t—there are less than nine months remaining in the Java 8 schedule for work on major features. That’s just not enough time for the broad review, testing, and feedback which such a fundamental change to the Java Platform requires. Q Why not ship the module system in Java 8, and then modularize the platform in Java 9? A If we deliver a module system in one release but don’t use it to modularize the JDK until some later release then we run a big risk of getting something fundamentally wrong. If that happens then we’d have to fix it in the later release, and fixing fundamental design flaws after the fact almost always leads to a poor end result. Q Why not ship Jigsaw in an 8.5 release, less than two years after 8? Or why not just ship a new release every year, rather than every other year? A Many more developers work on the JDK today than a couple of years ago, both because Oracle has dramatically increased its own investment and because other organizations and individuals have joined the OpenJDK Community. Collectively we don’t, however, have the bandwidth required to ship and then provide long-term support for a big JDK release more frequently than about every other year. Q What’s the feedback been on the two-year release-cycle proposal? A For just about every comment that we should release more frequently, so that new features are available sooner, there’s been another asking for an even slower release cycle so that large teams of enterprise developers who ship mission-critical applications have a chance to migrate at a comfortable pace.

    Read the article

  • Launch Webcast Q&A: Oracle WebCenter Suite 11g - The Platform for the Modern User Experience

    - by howard.beader(at)oracle.com
    Did you have a chance to watch the Oracle WebCenter Suite 11g Launch Webcast yet? Andy MacMillan presented some great information on the webcast and answered quite a few of your questions in the Q&A session as well. For your reading pleasure we have captured a number of the questions and answers and they are summarized below: Question: Can you tell me what should our Portal strategy be for integrating and extending our Oracle enterprise applications? Answer: We recommend that you look at this in two steps, the first would be to ensure that you have a good understanding of our common user experience architecture. Internally our product teams at Oracle are already investing in this quite heavily today for Fusion Applications and this is driving natural convergence from a UX strategy standpoint. The second step would be to look at how best to componentize the back office applications so that the business users across your organization can take advantage of these -- don't make it just about putting a new skin on top of what you already have from an application standpoint, instead look at how best to embed the social computing capabilities as part of the solution for your business users. Question: We are currently using the BEA WebLogic Portal now, should we stay on WLP or should we be looking at moving to WebCenter or when should we move to WebCenter? Answer: Our strategy has been called "Continue & Converge", this theme means that you can continue to use WebLogic or Plumtree portals until your organization is ready to move to WebCenter and in the mean time you can continue to deploy what you need to in your organization of WLP or WCI Portals with the full support of Oracle. In addition WebCenter Services can be leveraged for social computing to complement what you are already doing today and enable your organization to take advantage of some of the latest and greatest social computing capabilities. We have migration scripts and conversion capabilities available as well as programs where Oracle can help you evaluate your options to decide how best to move forward. WebCenter provides the best of the best capabilities and will enable you to take advantage of new capabilities that may not exist in your current portal today. In the end though it's up to you as a customer as to when you want to make the transition to Oracle WebCenter Suite. Question: Can you tell me how is Oracle leveraging WebCenter internally and for its Application and Middleware product UX strategies? Answer: Internally, Oracle is leveraging WebCenter for our employees and thus far we are seeing significant updates with our users taking advantage of the business activity streams, team spaces and collaboration capabilities. From a product strategy standpoint, our product teams are taking advantage of the common user experience architecture and leveraging WebCenter to provide social and collaborative capabilities to the Oracle Applications and providing new types of composite applications with what is coming with Fusion Applications. WebCenter also provides a common user experience across all the products in the Oracle Fusion Middleware family as well. Question: Our organization is currently using SharePoint, but we are also an Oracle Applications customer, how should we be thinking about WebCenter as we move forward? Answer: Great question. Typically, we are seeing organizations using SharePoint for its core use cases of small team collaboration and file server replacement. WebCenter can connect to SharePoint as a content source to feed into WebCenter quite easily and it leverages the robust Oracle ECM product under WebCenter as well. In addition, SharePoint team sites can be connected to WebCenter utilizing our SharePoint connector. With Oracle WebCenter though, we are really targeting business users and enterprise applications, thus affecting positive change on the processes that drive the business to improve productivity across your organization. Question: Are organizations today using WebCenter as a Web platform for externally facing public websites? Answer: Yes, we are seeing a convergence around web content management and portal types of websites with customers converting them from just broadcasting content to making it a much richer personalized experience and also exposing back-office applications as well. Web Content Management capabilities are already embedded in WebCenter so that organizations can take advantage now of the benefits a personalized web experience provides for your customers. This is simply a short summary of a few of the questions addressed on the webcast, please tune in now to learn more about Oracle WebCenter, the user experience platform for the enterprise and the web! The Oracle WebCenter Suite 11g Launch Webcast can be found here

    Read the article

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

  • Making Active Directory changes atomic

    - by Matt Simmons
    I've got a Windows 2003 Active Directory infrastructure, and there are times (such as when terminating an employee) that I want instantaneous propagation across both of my AD servers. Currently, I make the change in both places, which I suspect is unhealthy, but it's the only way I know to make sure that the account is disabled to every machine. Is there a better way? Do I have to wait for the normal propagation time for convergence, or is there a way to "force" it?

    Read the article

1 2 3 4  | Next Page >