micro optimization - Page 6

Free Optimization Library in C#

- by Ngu Soon Hui

Is there any optimization library in C#? I have to optimize a complicated equation in excel, for this equation there are a few coefficients. And I have to optimize them according to a fitness function that I define. So I wonder whether there is such a library that does what I need?

Read the article

Objective-C optimization

- by SpecialK

Are there standard optimization tricks for Objective-C to make for faster execution along the lines of "inlining" frequent methods as in C++ or the "g++ -fast" tag?

Read the article

Performance Optimization for Matrix Rotation

- by Summer_More_More_Tea

Hello everyone: I'm now trapped by a performance optimization lab in the book "Computer System from a Programmer's Perspective" described as following: In a N*N matrix M, where N is multiple of 32, the rotate operation can be represented as: Transpose: interchange elements M(i,j) and M(j,i) Exchange rows: Row i is exchanged with row N-1-i A example for matrix rotation(N is 3 instead of 32 for simplicity): ------- ------- |1|2|3| |3|6|9| ------- ------- |4|5|6| after rotate is |2|5|8| ------- ------- |7|8|9| |1|4|7| ------- ------- A naive implementation is: #define RIDX(i,j,n) ((i)*(n)+(j)) void naive_rotate(int dim, pixel *src, pixel *dst) { int i, j; for (i = 0; i < dim; i++) for (j = 0; j < dim; j++) dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)]; } I come up with an idea by inner-loop-unroll. The result is: Code Version Speed Up original 1x unrolled by 2 1.33x unrolled by 4 1.33x unrolled by 8 1.55x unrolled by 16 1.67x unrolled by 32 1.61x I also get a code snippet from pastebin.com that seems can solve this problem: void rotate(int dim, pixel *src, pixel *dst) { int stride = 32; int count = dim >> 5; src += dim - 1; int a1 = count; do { int a2 = dim; do { int a3 = stride; do { *dst++ = *src; src += dim; } while(--a3); src -= dim * stride + 1; dst += dim - stride; } while(--a2); src += dim * (stride + 1); dst -= dim * dim - stride; } while(--a1); } After carefully read the code, I think main idea of this solution is treat 32 rows as a data zone, and perform the rotating operation respectively. Speed up of this version is 1.85x, overwhelming all the loop-unroll version. Here are the questions: In the inner-loop-unroll version, why does increment slow down if the unrolling factor increase, especially change the unrolling factor from 8 to 16, which does not effect the same when switch from 4 to 8? Does the result have some relationship with depth of the CPU pipeline? If the answer is yes, could the degrade of increment reflect pipeline length? What is the probable reason for the optimization of data-zone version? It seems that there is no too much essential difference from the original naive version. EDIT: My test environment is Intel Centrino Duo processor and the verion of gcc is 4.4 Any advice will be highly appreciated! Kind regards!

Read the article

Need help with basic optimization problem

- by ??iu

I know little of optimization problems, so hopefully this will be didactic for me: rotors = [1, 2, 3, 4...] widgets = ['a', 'b', 'c', 'd' ...] assert len(rotors) == len(widgets) part_values = [ (1, 'a', 34), (1, 'b', 26), (1, 'c', 11), (1, 'd', 8), (2, 'a', 5), (2, 'b', 17), .... ] Given a fixed number of widgets and a fixed number of rotors, how can you get a series of widget-rotor pairs that maximizes the total value where each widget and rotor can only be used once?

Read the article

Any Javascript optimization benchmarks?

- by int3

I watched Nicholas Zakas' talk, Speed up your Javascript, with some interest. I liked how he benchmarked the various performance improvements created by various optimization techniques, e.g. reducing calls to deeply nested objects, changing loops to count down instead of up, etc. I would like to run these benchmarks myself though, to see exactly how our current browsers are faring. I guess it wouldn't be too difficult to cook up some timed loops, but I'd like to know if there are any existing implementations out there.

Read the article

C++ Performance/memory optimization guidelines

- by ML

Hi All, Does anyone have a resource for C++ memory optimization guidelines? Best practices, tuning, etc? As an example: Class xxx { public: xxx(); virtual ~xxx(); protected: private: }; Would there be ANY benefit on the compiler or memory allocation to get rid of protected and private since there there are no items that are protected and private in this class?

Read the article

Java code optimization leads to numerical inaccuracies and errors

- by rano

I'm trying to implement a version of the Fuzzy C-Means algorithm in Java and I'm trying to do some optimization by computing just once everything that can be computed just once. This is an iterative algorithm and regarding the updating of a matrix, the clusters x pixels membership matrix U, this is the update rule I want to optimize: where the x are the element of a matrix X (pixels x features) and v belongs to the matrix V (clusters x features). And m is a parameter that ranges from 1.1 to infinity. The distance used is the euclidean norm. If I had to implement this formula in a banal way I'd do: for(int i = 0; i < X.length; i++) { int count = 0; for(int j = 0; j < V.length; j++) { double num = D[i][j]; double sumTerms = 0; for(int k = 0; k < V.length; k++) { double thisDistance = D[i][k]; sumTerms += Math.pow(num / thisDistance, (1.0 / (m - 1.0))); } U[i][j] = (float) (1f / sumTerms); } } In this way some optimization is already done, I precomputed all the possible squared distances between X and V and stored them in a matrix D but that is not enough, since I'm cycling througn the elements of V two times resulting in two nested loops. Looking at the formula the numerator of the fraction is independent of the sum so I can compute numerator and denominator independently and the denominator can be computed just once for each pixel. So I came to a solution like this: int nClusters = V.length; double exp = (1.0 / (m - 1.0)); for(int i = 0; i < X.length; i++) { int count = 0; for(int j = 0; j < nClusters; j++) { double distance = D[i][j]; double denominator = D[i][nClusters]; double numerator = Math.pow(distance, exp); U[i][j] = (float) (1f / (numerator * denominator)); } } Where I precomputed the denominator into an additional column of the matrix D while I was computing the distances: for (int i = 0; i < X.length; i++) { for (int j = 0; j < V.length; j++) { double sum = 0; for (int k = 0; k < nDims; k++) { final double d = X[i][k] - V[j][k]; sum += d * d; } D[i][j] = sum; D[i][B.length] += Math.pow(1 / D[i][j], exp); } } By doing so I encounter numerical differences between the 'banal' computation and the second one that leads to different numerical value in U (not in the first iterates but soon enough). I guess that the problem is that exponentiate very small numbers to high values (the elements of U can range from 0.0 to 1.0 and exp , for m = 1.1, is 10) leads to ver y small values, whereas by dividing the numerator and the denominator and THEN exponentiating the result seems to be better numerically. The problem is it involves much more operations. Am I doing something wrong? Is there a possible solution to get both the code optimized and numerically stable? Any suggestion or criticism will be appreciated.

Read the article

PHP website Optimization

- by ana

I have a high traffic website and I need make sure my site is fast enough to display my pages to everyone rapidly. I searched on Google many articles about speed and optimization and here's what I found: Cache the page Save it to the disk Caching the page in memory: This is very fast but if I need to change the content of my page I have to remove it from cache and then re-save the file on the disk. Save it to disk This is very easy to maintain but every time the page is accessed I have to read on the disk. Which method should I go with? Thanks

Read the article

Performance optimization strategies of last resort?

- by jerryjvl

There are plenty of performance questions on this site already, but it occurs to me that almost all are very problem-specific and fairly narrow. And almost all repeat the advice to avoid premature optimization. Let's assume: the code already is working correctly the algorithms chosen are already optimal for the circumstances of the problem the code has been measured, and the offending routines have been isolated all attempts to optimize will also be measured to ensure they do not make matters worse What I am looking for here is strategies and tricks to squeeze out up to the last few percent in a critical algorithm when there is nothing else left to do but whatever it takes. Ideally, try to make answers language agnostic, and indicate any down-sides to the suggested strategies where applicable. I'll add a reply with my own initial suggestions, and look forward to whatever else the SO community can think of.

Read the article

Does MATLAB perform tail call optimization?

- by Shea Levy

I've recently learned Haskell, and am trying to carry the pure functional style over to my other code when possible. An important aspect of this is treating all variables as immutable, i.e. constants. In order to do so, many computations that would be implemented using loops in an imperative style have to be performed using recursion, which typically incurs a memory penalty due to the allocation a new stack frame for each function call. In the special case of a tail call (where the return value of a called function is immediately returned to the callee's caller), however, this penalty can be bypassed by a process called tail call optimization (in one method, this can be done by essentially replacing a call with a jmp after setting up the stack properly). Does MATLAB perform TCO by default, or is there a way to tell it to?

Read the article

Is there a way that I can hard code a const XmlNameTable to be reused by all of my XmlTextReader(s)?

- by highone

Before I continue I would just like to say I know that "Premature optimization is the root of all evil." However this program is only a hobby project and I enjoy trying to find ways to optimize it. That being said, I was reading an article on improving xml performance and it recommended sharing "the XmlNameTable class that is used to store element and attribute names across multiple XML documents of the same type to improve performance." I wasn't able to find any information about doing this in my googling, so it is likely that this is either not possible, a no-no, or a stupid question, but what's the harm in asking?

Read the article

Including associations optimization in Rails

- by Vitaly

Hey, I'm looking for help with Ruby optimization regarding loading of associations on demand. This is simplified example. I have 3 models: Post, Comment, User. References are: Post has many comments and Comment has reference to User (:author). Now when I go to the post page, I expect to see post body + all comments (and their respective authors names). This requires following 2 queries: select * from Post -- to get post data (1 row) select * from Comment inner join User -- to get comment + usernames (N rows) In the code I have: Post.find(params[:id], :include => { :comments => [:author] } But it doesn't work as expected: as I see in the back end, there're still N+1 hits (some of them are cached though). How can I optimize that?

Read the article

why optimization does not happen?

- by aaa

hi. I have C/C++ code, that looks like this: static int function(double *I) { int n = 0; // more instructions, loops, for (int i; ...; ++i) n += fabs(I[i] > tolerance); return n; } function(I); // return value is not used. compiler inlines function, however it does not optimize out n manipulations. I would expect compiler is able to recognize that value is never used as rhs only. Is there some side effect, which prevents optimization? Thanks

Read the article

Copy method optimization in compilers

- by Dženan

Hi All! I have the following code: void Stack::operator =(Stack &rhs) { //do the actual copying } Stack::Stack(Stack &rhs) //copy-constructor { top=NULL; //initialize this as an empty stack (which it is) *this=rhs; //invoke assignment operator } Stack& Stack::CopyStack() { return *this; //this statement will invoke copy contructor } It is being used like this: unsigned Stack::count() { unsigned c=0; Stack copy=CopyStack(); while (!copy.empty()) { copy.pop(); c++; } return c; } Removing reference symbol from declaration of CopyStack (returning a copy instead of reference) makes no difference in visual studio 2008 (with respect to number of times copying is invoked). I guess it gets optimized away - normally it should first make a copy for the return value, then call assignment operator once more to assign it to variable sc. What is your experience with this sort of optimization in different compilers? Regards, Dženan

Read the article

Static variable for optimization

- by keithjgrant

I'm wondering if I can use a static variable for optimization: public function Bar() { static $i = moderatelyExpensiveFunctionCall(); if ($i) { return something(); } else { return somethingElse(); } } I know that once $i is initialized, it won't be changed by by that line of code on successive calls to Bar(). I assume this means that moderatelyExpensiveFunctionCall() won't be evaluated every time I call, but I'd like to know for certain. Once PHP sees a static variable that has been initialized, does it skip over that line of code? In other words, is this going to optimize my execution time if I make a lot of calls to Bar(), or am I wasting my time?

Read the article

Common optimization rules

- by mafutrct

This is a dangerous question, so let me try to phrase it correctly. Premature optimization is the root of all evil, but if you know you need it, there is a basic set of rules that should be considered. This set is what I'm wondering about. For instance, imagine you got a list of a few thousand items. How do you look up an item with a specific, unique ID? Of course, you simply use a Dictionary to map the ID to the item. And if you know that there is a setting stored in a database that is required all the time, you simply cache it instead of issuing a database request hundred times a second. I guess there are a few even more basic ideas. I am specifically not looking for "don't do it, for experts: don't do it yet" or "use a profiler" answers, but for really simple, general hints. If you feel this is an argumentative question, you probably misunderstood my intention.

Read the article

When should an API favour optimization over readability and ease-of-use?

- by jmlane

I am in the process of designing a small library, where one of my design goals is to use as much of the native domain language as possible in the API. While doing so, I've noticed that there are some cases in the API outline where a more intuitive, readable attribute/method call requires some functionally unnecessary encapsulation. Since the final product will not necessarily require high performance, I am unconcerned about making the decision to favour ease-of-use in my current project over the most efficient implementation of the code in question. I know not to assume readability and ease-of-use are paramount in all expected use-cases, such as when performance is required. I would like to know if there are more general reasons that argue for an API design preferring (marginally) more efficient implementations?

Read the article

Which isometric angles can be mirrored (and otherwise transformed) for optimization?

- by Tom

I am working on a basic isometric game, and am struggling to find the correct mirrors. Mirror can be any form of transform. I have managed to get SE out of SW, by scaling the sprite on X axis by -1. Same applies for NE angle. Something is bugging me, that I should be able to also mirror N to S, but I cannot manage to pull this one off. Am I just too sleepy and trying to do the impossible, or a basic -1 scale on Y axis is not enough? What are the common used mirror table for optimizing 8 angle (N, NE, E, SE, S, SW, W, NW) isometric sprites?

Read the article

When should code favour optimization over readability and ease-of-use?

- by jmlane

I am in the process of designing a small library, where one of my design goals is that the API should be as close to the domain language as possible. While working on the design, I've noticed that there are some cases in the code where a more intuitive, readable attribute/method call requires some functionally unnecessary encapsulation. Since the final product will not necessarily require high performance, I am unconcerned about making the decision to favour ease-of-use in my current project over the most efficient implementation of the code in question. I know not to assume readability and ease-of-use are paramount in all expected use-cases, such as when performance is required. I would like to know if there are more general reasons that argue for a design preferring more efficient implementations—even if only marginally so?

Read the article

Can knowing C actually hurt the code you write in higher level languages?

- by Jurily

The question seems settled, beaten to death even. Smart people have said smart things on the subject. To be a really good programmer, you need to know C. Or do you? I was enlightened twice this week. The first one made me realize that my assumptions don't go further than my knowledge behind them, and given the complexity of software running on my machine, that's almost non-existent. But what really drove it home was this Slashdot comment: The end result is that I notice the many naive ways in which traditional C "bare metal" programmers assume that higher level languages are implemented. They make bad "optimization" decisions in projects they influence, because they have no idea how a compiler works or how different a good runtime system may be from the naive macro-assembler model they understand. Then it hit me: C is just one more abstraction, like all others. Even the CPU itself is only an abstraction! I've just never seen it break, because I don't have the tools to measure it. I'm confused. Has my mind been mutilated beyond recovery, like Dijkstra said about BASIC? Am I living in a constant state of premature optimization? Is there hope for me, now that I realized I know nothing about anything? Is there anything to know, even? And why is it so fascinating, that everything I've written in the last five years might have been fundamentally wrong? To sum it up: is there any value in knowing more than the API docs tell me? EDIT: Made CW. Of course this also means now you must post examples of the interpreter/runtime optimizing better than we do :)

Read the article

Finding perfect numbers in C# (optimization)

- by paradox

I coded up a program in C# to find perfect numbers within a certain range as part of a programming challenge . However, I realized it is very slow when calculating perfect numbers upwards of 10000. Are there any methods of optimization that exist for finding perfect numbers? My code is as follows: using System; using System.Collections.Generic; using System.Linq; namespace ConsoleTest { class Program { public static List<int> FindDivisors(int inputNo) { List<int> Divisors = new List<int>(); for (int i = 1; i<inputNo; i++) { if (inputNo%i==0) Divisors.Add(i); } return Divisors; } public static void Main(string[] args) { const int limit = 100000; List<int> PerfectNumbers = new List<int>(); List<int> Divisors=new List<int>(); for (int i=1; i<limit; i++) { Divisors = FindDivisors(i); if (i==Divisors.Sum()) PerfectNumbers.Add(i); } Console.Write("Output ="); for (int i=0; i<PerfectNumbers.Count; i++) { Console.Write(" {0} ",PerfectNumbers[i]); } Console.Write("\n\n\nPress any key to continue . . . "); Console.ReadKey(true); } } }

Read the article

sql-server performance optimization by removing print statements

- by AG

We're going through a round of sql-server stored procedure optimizations. The one recommendation we've found that clearly applies for us is 'SET NOCOUNT ON' at the top of each procedure. (Yes, I've seen the posts that point out issues with this depending on what client objects you run the stored procedures from but these are not issues for us.) So now I'm just trying to add in a bit of common sense. If the benefit of SET NOCOUNT ON is simply to reduce network traffic by some small amount every time, wouldn't it also make sense to turn off all the PRINT statements we have in the stored procedures that we only use for debugging? I can't see how it can hurt performance. OTOH, it's a bit of a hassle to implement due to the fact that some of the print statements are the only thing within else clauses, so you can't just always comment out the one line and be done. The change carries some amount of risk so I don't want to do it if it isn't going to actually help. But I don't see eliminating print statements mentioned anywhere in articles on optimization. Is that because it is so obvious no one bothers to mention it?

Read the article

Simplification / optimization of GPS track

- by GreyCat

I've got a GPS track, produces by gpxlogger(1) (supplied as a client for gpsd). GPS receiver updates its coordinates every 1 second, gpxlogger's logic is very simple, it writes down location (lat, lon, ele) and a timestamp (time) received from GPS every n seconds (n = 3 in my case). After writing down a several hours worth of track, gpxlogger saves several megabyte long GPX file that includes several thousands of points. Afterwards, I try to plot this track on a map and use it with OpenLayers. It works, but several thousands of points make using the map a sloppy and slow experience. I understand that having several thousands of points of suboptimal. There are myriads of points that can be deleted without losing almost anything: when there are several points making up roughly the straight line and we're moving with the same constant speed between them, we can just leave the first and the last point and throw anything else. I thought of using gpsbabel for such track simplification / optimization job, but, alas, it's simplification filter works only with routes, i.e. analyzing only geometrical shape of path, without timestamps (i.e. not checking that the speed was roughly constant). Is there some ready-made utility / library / algorithm available to optimize tracks? Or may be I'm missing some clever option with gpsbabel?

Read the article

Optimization of a c++ matrix/bitmap class

- by Andrew

I am searching a 2D matrix (or bitmap) class which is flexible but also fast element access. The contents A flexible class should allow you to choose dimensions during runtime, and would look something like this (simplified): class Matrix { public: Matrix(int w, int h) : data(new int[x*y]), width(w) {} void SetElement(int x, int y, int val) { data[x+y*width] = val; } // ... private: // symbols int width; int* data; }; A faster often proposed solution using templates is (simplified): template <int W, int H> class TMatrix { TMatrix() data(new int[W*H]) {} void SetElement(int x, int y, int val) { data[x+y*W] = val; } private: int* data; }; This is faster as the width can be "inlined" in the code. The first solution does not do this. However this is not very flexible anymore, as you can't change the size anymore at runtime. So my question is: Is there a possibility to tell the compiler to generate faster code (like when using the template solution), when the size in the code is fixed and generate flexible code when its runtime dependend? I tried to achieve this by writing "const" where ever possible. I tried it with gcc and VS2005, but no success. This kind of optimization would be useful for many other similar cases.

Read the article

MySQL table organization and optimization (Rails)

- by aguynamedloren

I've been learning Ruby on Rails over the past few months with no prior programming experience. Lately, I've been thinking about database optimization and table organization. I know there are great books on the subject, but I typically learn by example / as I go. Here's a hypothetical situation: Let's say I am building a social network for a niche community with 250,000 members (users). The users have the ability to attend events. Let's say there are 50,000 past/present/future events. Much like Facebook events, a user can attend any number of events and an event can have any number of attendees. In the database, there would be a table for users and a table for events. Somehow I would have to create an association between the users and events. I could create an "events" column in the users table such that each user row would contain a hash of event IDs, or I could create an "attendees" column in the events table such that each event row would contain a hash of user IDs. Neither of these solutions seem ideal, however. On a users profile page, I want to display the list of events they are associated with, which would require scanning the 50,000 event rows for the user ID of said user if I include an "attendees" column in the events table. Likewise, on an event page, I want to display a list of attendees for the event, which would require scanning the 250,000 user rows for the event ID of said event if I include an "events" column in the users table. Option 3 would be to create a third table that contains the attendee information for each and every event - but I don't see how this would solve any problems. Are these non-issues? Rails makes accessing all of this information easy, but I guess I'm worried about scale. It is entirely possible that I am under-estimating the speed and processing power of modern databases / servers / etc. How long would it take to scan 250,000 user rows for specific event IDs - 10ms? 100ms? 1,000ms? I guess that's not that bad. Am I just over-thinking this?

Search Results

Search found 4136 results on 166 pages for 'micro optimization'.

Page 6/166 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >

- by Ngu Soon Hui

- by SpecialK

- by Summer_More_More_Tea

- by ??iu

- by int3

- by ML

- by rano

- by ana

- by jerryjvl

- by Shea Levy

- by highone

- by Vitaly

- by aaa

- by Dženan

- by keithjgrant

- by mafutrct

- by jmlane

- by Tom

- by jmlane

- by Jurily

- by paradox

- by AG

- by GreyCat

- by Andrew

- by aguynamedloren

< Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >