Search Results

Search found 90546 results on 3622 pages for 'code optimization'.

Page 13/3622 | < Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20  | Next Page >

  • Performance Optimization for Matrix Rotation

    - by Summer_More_More_Tea
    Hello everyone: I'm now trapped by a performance optimization lab in the book "Computer System from a Programmer's Perspective" described as following: In a N*N matrix M, where N is multiple of 32, the rotate operation can be represented as: Transpose: interchange elements M(i,j) and M(j,i) Exchange rows: Row i is exchanged with row N-1-i A example for matrix rotation(N is 3 instead of 32 for simplicity): ------- ------- |1|2|3| |3|6|9| ------- ------- |4|5|6| after rotate is |2|5|8| ------- ------- |7|8|9| |1|4|7| ------- ------- A naive implementation is: #define RIDX(i,j,n) ((i)*(n)+(j)) void naive_rotate(int dim, pixel *src, pixel *dst) { int i, j; for (i = 0; i < dim; i++) for (j = 0; j < dim; j++) dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)]; } I come up with an idea by inner-loop-unroll. The result is: Code Version Speed Up original 1x unrolled by 2 1.33x unrolled by 4 1.33x unrolled by 8 1.55x unrolled by 16 1.67x unrolled by 32 1.61x I also get a code snippet from pastebin.com that seems can solve this problem: void rotate(int dim, pixel *src, pixel *dst) { int stride = 32; int count = dim >> 5; src += dim - 1; int a1 = count; do { int a2 = dim; do { int a3 = stride; do { *dst++ = *src; src += dim; } while(--a3); src -= dim * stride + 1; dst += dim - stride; } while(--a2); src += dim * (stride + 1); dst -= dim * dim - stride; } while(--a1); } After carefully read the code, I think main idea of this solution is treat 32 rows as a data zone, and perform the rotating operation respectively. Speed up of this version is 1.85x, overwhelming all the loop-unroll version. Here are the questions: In the inner-loop-unroll version, why does increment slow down if the unrolling factor increase, especially change the unrolling factor from 8 to 16, which does not effect the same when switch from 4 to 8? Does the result have some relationship with depth of the CPU pipeline? If the answer is yes, could the degrade of increment reflect pipeline length? What is the probable reason for the optimization of data-zone version? It seems that there is no too much essential difference from the original naive version. EDIT: My test environment is Intel Centrino Duo processor and the verion of gcc is 4.4 Any advice will be highly appreciated! Kind regards!

    Read the article

  • Need help with basic optimization problem

    - by ??iu
    I know little of optimization problems, so hopefully this will be didactic for me: rotors = [1, 2, 3, 4...] widgets = ['a', 'b', 'c', 'd' ...] assert len(rotors) == len(widgets) part_values = [ (1, 'a', 34), (1, 'b', 26), (1, 'c', 11), (1, 'd', 8), (2, 'a', 5), (2, 'b', 17), .... ] Given a fixed number of widgets and a fixed number of rotors, how can you get a series of widget-rotor pairs that maximizes the total value where each widget and rotor can only be used once?

    Read the article

  • C++ Performance/memory optimization guidelines

    - by ML
    Hi All, Does anyone have a resource for C++ memory optimization guidelines? Best practices, tuning, etc? As an example: Class xxx { public: xxx(); virtual ~xxx(); protected: private: }; Would there be ANY benefit on the compiler or memory allocation to get rid of protected and private since there there are no items that are protected and private in this class?

    Read the article

  • Any Javascript optimization benchmarks?

    - by int3
    I watched Nicholas Zakas' talk, Speed up your Javascript, with some interest. I liked how he benchmarked the various performance improvements created by various optimization techniques, e.g. reducing calls to deeply nested objects, changing loops to count down instead of up, etc. I would like to run these benchmarks myself though, to see exactly how our current browsers are faring. I guess it wouldn't be too difficult to cook up some timed loops, but I'd like to know if there are any existing implementations out there.

    Read the article

  • Optimization of a c++ matrix/bitmap class

    - by Andrew
    I am searching a 2D matrix (or bitmap) class which is flexible but also fast element access. The contents A flexible class should allow you to choose dimensions during runtime, and would look something like this (simplified): class Matrix { public: Matrix(int w, int h) : data(new int[x*y]), width(w) {} void SetElement(int x, int y, int val) { data[x+y*width] = val; } // ... private: // symbols int width; int* data; }; A faster often proposed solution using templates is (simplified): template <int W, int H> class TMatrix { TMatrix() data(new int[W*H]) {} void SetElement(int x, int y, int val) { data[x+y*W] = val; } private: int* data; }; This is faster as the width can be "inlined" in the code. The first solution does not do this. However this is not very flexible anymore, as you can't change the size anymore at runtime. So my question is: Is there a possibility to tell the compiler to generate faster code (like when using the template solution), when the size in the code is fixed and generate flexible code when its runtime dependend? I tried to achieve this by writing "const" where ever possible. I tried it with gcc and VS2005, but no success. This kind of optimization would be useful for many other similar cases.

    Read the article

  • Code formatter for SSMS

    - by blakmk
      I was searching recently for a code formatter for T-Sql and I came accross this nice little utility that I wanted to share: http://www.wangz.net/cgi-bin/pp/gsqlparser/sqlpp/sqlformat.tpl I've been dealing with a lot of legacy code latley and there is nothing I find more infuriating than unformatted code. This tool seems to work quite well. Just one click and it formats everything nicely. There is also a free web version.                                           This Web Page Created with PageBreeze Free HTML Editor

    Read the article

  • Demonstrate bad code to client?

    - by jtiger
    I have a new client that has asked me to do a redesign of their website, an ASP.NET Webforms application that was developed by another consultant. It seemed straight-forward (it never is) but I took a look at the code to make sure I knew what I was in for. This application was not written well. At all. It is extremely vulnerable to SQL Injection attacks, business logic is spread throughout the entire application, a lot of duplication, and dead end code that does nothing. On top of that, it keeps throwing exceptions that are being smothered, so it all appears to be running smoothly. My job is to simply update the html and css, but much of the html is being generated in business logic and would be a nightmare for me to sort everything out. My estimates on the redesign were longer than the client was aiming for, and they are asking why so long. How can I explain to my client just how bad this code is? In their mind, the application is running great and the redesign should be a quick one-off. It's my word against the previous consultant, so how can I actually give simple, concrete examples that a non-technical client would understand?

    Read the article

  • Code structure for multiple applications with a common core

    - by Azrael Seraphin
    I want to create two applications that will have a lot of common functionality. Basically, one system is a more advanced version of the other system. Let's call them Simple and Advanced. The Advanced system will add to, extend, alter and sometimes replace the functionality of the Simple system. For instance, the Advanced system will add new classes, add properties and methods to existing Simple classes, change the behavior of classes, etc. Initially I was thinking that the Advanced classes simply inherited from the Simple classes but I can see the functionality diverging quite significantly as development progresses, even while maintaining a core base functionality. For instance, the Simple system might have a Project class with a Sponsor property whereas the Advanced system has a list of Project.Sponsors. It seems poor practice to inherit from a class and then hide, alter or throw away significant parts of its features. An alternative is just to run two separate code bases and copy the common code between them but that seems inefficient, archaic and fraught with peril. Surely we have moved beyond the days of "copy-and-paste inheritance". Another way to structure it would be to use partial classes and have three projects: Core which has the common functionality, Simple which extends the Core partial classes for the simple system, and Advanced which also extends the Core partial classes for the advanced system. Plus having three test projects as well for each system. This seems like a cleaner approach. What would be the best way to structure the solution/projects/code to create two versions of a similar system? Let's say I later want to create a third system called Extreme, largely based on the Advanced system. Do I then create an AdvancedCore project which both Advanced and Extreme extend using partial classes? Is there a better way to do this? If it matters, this is likely to be a C#/MVC system but I'd be happy to do this in any language/framework that is suitable.

    Read the article

  • Need help eliminating dead code paths and variables from C source code

    - by Anjum Kaiser
    I have a legacy C code on my hands, and I am given the task to filter dead/unused symbols and paths from it. Over the time there were many insertions and deletions, causing lots of unused symbols. I have identified many dead variables which were only being written to once or twice, but were never being read from. Both blackbox/whitebox/regression testing proved that dead code removal did not affected any procedures. (We have a comprehensive test-suite). But this removal was done only on a small part of code. Now I am looking for some way to automate this work. We rely on GCC to do the work. P.S. I'm interested in removing stuff like: variables which are being read just for the sake of reading from them. variables which are spread across multiple source files and only being written to. For example: file1.c: int i; file2.c: extern int i; .... i=x;

    Read the article

  • How to tell whether Code Access Security is allowed in library code

    - by Sander Rijken
    In .NET 4 Code Access Security (CAS) is deprecated. Whenever you call a method that implicitly uses it, it fails with a NotSupportedException, that can be resolved with a configuration switch that makes it fall back to the old behavior. We have a common library that's used in both .NET 3.5 and .NET 4, so we need to be able to tell whether or not we should use the CAS method. For example, in .NET 3.5 I should call: Assembly.Load(string, Evidence); Whereas in .NET 4 I want to call Assembly.Load(string); Calling Load(string, Evidence) throws a NotSupportedException. Of course this works, but I'd like to know if there's a better method: try { asm = Assembly.Load(someString, someEvidence); } catch(NotSupportedException) { asm = Assembly.Load(someString); }

    Read the article

  • How to tell wether Code Access Security is allowed in library code

    - by Sander Rijken
    in .NET 4 Code Access Security (CAS) is deprecated. Whenever you call a method that implicitly uses it, it fails with a NotSupportedException, that can be resolved with a configuration switch that makes it fall back to the old behavior. We have a common library that's used in both .NET 3.5 and .NET 4, so we need to be able to tell wether or not we should use the CAS method. For example, in .NET 3.5 I should call: Assembly.Load(string, Evidence); Whereas in .NET 4 I want to call Assembly.Load(string); Calling Load(string, Evidence) throws a NotSupportedException. Ofcourse this works, but I'd like to know if there's a better method: try { asm = Assembly.Load(someString, someEvidence); } catch(NotSupportedException) { asm = Assembly.Load(someString); }

    Read the article

  • Decision Tree code golf

    - by Chris Jester-Young
    In Google Code Jam 2009, Round 1B, there is a problem called Decision Tree that lent itself to rather creative solutions. Post your shortest solution; I'll update the Accepted Answer to the current shortest entry on a semi-frequent basis, assuming you didn't just create a new language just to solve this problem. :-P Current rankings: 107 Perl 121 PostScript (binary) 136 Ruby 154 Arc 160 PostScript (ASCII85) 170 PostScript 192 Python 199 Common Lisp 214 LilyPond 222 JavaScript 273 Scheme 280 R 312 Haskell 314 PHP 339 m4 346 C 406 Fortran 462 Java 476 Java (well, kind of) 718 OCaml 759 F# 1741 sed C++ not qualified for now

    Read the article

  • Array Searching code challenge

    - by RCIX
    Here's my (code golf) challenge: Take two arrays of bytes and determine if the second array is a substring of the first. If it is, output the index at which the contents of the second array appear in the first. If you do not find the second array in the first, then output -1. Example Input: { 63, 101, 245, 215, 0 } { 245, 215 } Expected Output: 2 Example Input 2: { 24, 55, 74, 3, 1 } { 24, 56, 74 } Expected Output 2: -1 Edit: Someone has pointed out that the bool is redundant, so all your function has to do is return an int representing the index of the value or -1 if not found.

    Read the article

  • Haskell optimization of the following function

    - by me2
    Profiling of some code of mine showed that about 65% of the time I was running the following code. What it does is use the Data.Binary.Get monad to walk through a bytestring looking for the terminator. If it detects 0xff, it checks if the next byte is 0x00. If it is, it drops the 0x00 and continues. If it is not 0x00, then it drops both bytes and the resulting list of bytes is converted to a bytestring and returned. Any obvious ways to optimize this code? I can't see it. parseECS = f [] False where f acc ff = do b <- getWord8 if ff then if b == 0x00 then f (0xff:acc) False else return $ L.pack (reverse acc) else if b == 0xff then f acc True else f (b:acc) False

    Read the article

  • Java code optimization leads to numerical inaccuracies and errors

    - by rano
    I'm trying to implement a version of the Fuzzy C-Means algorithm in Java and I'm trying to do some optimization by computing just once everything that can be computed just once. This is an iterative algorithm and regarding the updating of a matrix, the clusters x pixels membership matrix U, this is the update rule I want to optimize: where the x are the element of a matrix X (pixels x features) and v belongs to the matrix V (clusters x features). And m is a parameter that ranges from 1.1 to infinity. The distance used is the euclidean norm. If I had to implement this formula in a banal way I'd do: for(int i = 0; i < X.length; i++) { int count = 0; for(int j = 0; j < V.length; j++) { double num = D[i][j]; double sumTerms = 0; for(int k = 0; k < V.length; k++) { double thisDistance = D[i][k]; sumTerms += Math.pow(num / thisDistance, (1.0 / (m - 1.0))); } U[i][j] = (float) (1f / sumTerms); } } In this way some optimization is already done, I precomputed all the possible squared distances between X and V and stored them in a matrix D but that is not enough, since I'm cycling througn the elements of V two times resulting in two nested loops. Looking at the formula the numerator of the fraction is independent of the sum so I can compute numerator and denominator independently and the denominator can be computed just once for each pixel. So I came to a solution like this: int nClusters = V.length; double exp = (1.0 / (m - 1.0)); for(int i = 0; i < X.length; i++) { int count = 0; for(int j = 0; j < nClusters; j++) { double distance = D[i][j]; double denominator = D[i][nClusters]; double numerator = Math.pow(distance, exp); U[i][j] = (float) (1f / (numerator * denominator)); } } Where I precomputed the denominator into an additional column of the matrix D while I was computing the distances: for (int i = 0; i < X.length; i++) { for (int j = 0; j < V.length; j++) { double sum = 0; for (int k = 0; k < nDims; k++) { final double d = X[i][k] - V[j][k]; sum += d * d; } D[i][j] = sum; D[i][B.length] += Math.pow(1 / D[i][j], exp); } } By doing so I encounter numerical differences between the 'banal' computation and the second one that leads to different numerical value in U (not in the first iterates but soon enough). I guess that the problem is that exponentiate very small numbers to high values (the elements of U can range from 0.0 to 1.0 and exp , for m = 1.1, is 10) leads to ver y small values, whereas by dividing the numerator and the denominator and THEN exponentiating the result seems to be better numerically. The problem is it involves much more operations. Am I doing something wrong? Is there a possible solution to get both the code optimized and numerically stable? Any suggestion or criticism will be appreciated.

    Read the article

  • PHP website Optimization

    - by ana
    I have a high traffic website and I need make sure my site is fast enough to display my pages to everyone rapidly. I searched on Google many articles about speed and optimization and here's what I found: Cache the page Save it to the disk Caching the page in memory: This is very fast but if I need to change the content of my page I have to remove it from cache and then re-save the file on the disk. Save it to disk This is very easy to maintain but every time the page is accessed I have to read on the disk. Which method should I go with? Thanks

    Read the article

  • Performance optimization strategies of last resort?

    - by jerryjvl
    There are plenty of performance questions on this site already, but it occurs to me that almost all are very problem-specific and fairly narrow. And almost all repeat the advice to avoid premature optimization. Let's assume: the code already is working correctly the algorithms chosen are already optimal for the circumstances of the problem the code has been measured, and the offending routines have been isolated all attempts to optimize will also be measured to ensure they do not make matters worse What I am looking for here is strategies and tricks to squeeze out up to the last few percent in a critical algorithm when there is nothing else left to do but whatever it takes. Ideally, try to make answers language agnostic, and indicate any down-sides to the suggested strategies where applicable. I'll add a reply with my own initial suggestions, and look forward to whatever else the SO community can think of.

    Read the article

  • Does MATLAB perform tail call optimization?

    - by Shea Levy
    I've recently learned Haskell, and am trying to carry the pure functional style over to my other code when possible. An important aspect of this is treating all variables as immutable, i.e. constants. In order to do so, many computations that would be implemented using loops in an imperative style have to be performed using recursion, which typically incurs a memory penalty due to the allocation a new stack frame for each function call. In the special case of a tail call (where the return value of a called function is immediately returned to the callee's caller), however, this penalty can be bypassed by a process called tail call optimization (in one method, this can be done by essentially replacing a call with a jmp after setting up the stack properly). Does MATLAB perform TCO by default, or is there a way to tell it to?

    Read the article

  • Including associations optimization in Rails

    - by Vitaly
    Hey, I'm looking for help with Ruby optimization regarding loading of associations on demand. This is simplified example. I have 3 models: Post, Comment, User. References are: Post has many comments and Comment has reference to User (:author). Now when I go to the post page, I expect to see post body + all comments (and their respective authors names). This requires following 2 queries: select * from Post -- to get post data (1 row) select * from Comment inner join User -- to get comment + usernames (N rows) In the code I have: Post.find(params[:id], :include => { :comments => [:author] } But it doesn't work as expected: as I see in the back end, there're still N+1 hits (some of them are cached though). How can I optimize that?

    Read the article

  • why optimization does not happen?

    - by aaa
    hi. I have C/C++ code, that looks like this: static int function(double *I) { int n = 0; // more instructions, loops, for (int i; ...; ++i) n += fabs(I[i] > tolerance); return n; } function(I); // return value is not used. compiler inlines function, however it does not optimize out n manipulations. I would expect compiler is able to recognize that value is never used as rhs only. Is there some side effect, which prevents optimization? Thanks

    Read the article

  • Static variable for optimization

    - by keithjgrant
    I'm wondering if I can use a static variable for optimization: public function Bar() { static $i = moderatelyExpensiveFunctionCall(); if ($i) { return something(); } else { return somethingElse(); } } I know that once $i is initialized, it won't be changed by by that line of code on successive calls to Bar(). I assume this means that moderatelyExpensiveFunctionCall() won't be evaluated every time I call, but I'd like to know for certain. Once PHP sees a static variable that has been initialized, does it skip over that line of code? In other words, is this going to optimize my execution time if I make a lot of calls to Bar(), or am I wasting my time?

    Read the article

< Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20  | Next Page >