Search Results

Search found 14719 results on 589 pages for 'optimization level'.

Page 7/589 | < Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14  | Next Page >

  • Performance Optimization for Matrix Rotation

    - by Summer_More_More_Tea
    Hello everyone: I'm now trapped by a performance optimization lab in the book "Computer System from a Programmer's Perspective" described as following: In a N*N matrix M, where N is multiple of 32, the rotate operation can be represented as: Transpose: interchange elements M(i,j) and M(j,i) Exchange rows: Row i is exchanged with row N-1-i A example for matrix rotation(N is 3 instead of 32 for simplicity): ------- ------- |1|2|3| |3|6|9| ------- ------- |4|5|6| after rotate is |2|5|8| ------- ------- |7|8|9| |1|4|7| ------- ------- A naive implementation is: #define RIDX(i,j,n) ((i)*(n)+(j)) void naive_rotate(int dim, pixel *src, pixel *dst) { int i, j; for (i = 0; i < dim; i++) for (j = 0; j < dim; j++) dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)]; } I come up with an idea by inner-loop-unroll. The result is: Code Version Speed Up original 1x unrolled by 2 1.33x unrolled by 4 1.33x unrolled by 8 1.55x unrolled by 16 1.67x unrolled by 32 1.61x I also get a code snippet from pastebin.com that seems can solve this problem: void rotate(int dim, pixel *src, pixel *dst) { int stride = 32; int count = dim >> 5; src += dim - 1; int a1 = count; do { int a2 = dim; do { int a3 = stride; do { *dst++ = *src; src += dim; } while(--a3); src -= dim * stride + 1; dst += dim - stride; } while(--a2); src += dim * (stride + 1); dst -= dim * dim - stride; } while(--a1); } After carefully read the code, I think main idea of this solution is treat 32 rows as a data zone, and perform the rotating operation respectively. Speed up of this version is 1.85x, overwhelming all the loop-unroll version. Here are the questions: In the inner-loop-unroll version, why does increment slow down if the unrolling factor increase, especially change the unrolling factor from 8 to 16, which does not effect the same when switch from 4 to 8? Does the result have some relationship with depth of the CPU pipeline? If the answer is yes, could the degrade of increment reflect pipeline length? What is the probable reason for the optimization of data-zone version? It seems that there is no too much essential difference from the original naive version. EDIT: My test environment is Intel Centrino Duo processor and the verion of gcc is 4.4 Any advice will be highly appreciated! Kind regards!

    Read the article

  • Need help with basic optimization problem

    - by ??iu
    I know little of optimization problems, so hopefully this will be didactic for me: rotors = [1, 2, 3, 4...] widgets = ['a', 'b', 'c', 'd' ...] assert len(rotors) == len(widgets) part_values = [ (1, 'a', 34), (1, 'b', 26), (1, 'c', 11), (1, 'd', 8), (2, 'a', 5), (2, 'b', 17), .... ] Given a fixed number of widgets and a fixed number of rotors, how can you get a series of widget-rotor pairs that maximizes the total value where each widget and rotor can only be used once?

    Read the article

  • C++ Performance/memory optimization guidelines

    - by ML
    Hi All, Does anyone have a resource for C++ memory optimization guidelines? Best practices, tuning, etc? As an example: Class xxx { public: xxx(); virtual ~xxx(); protected: private: }; Would there be ANY benefit on the compiler or memory allocation to get rid of protected and private since there there are no items that are protected and private in this class?

    Read the article

  • Any Javascript optimization benchmarks?

    - by int3
    I watched Nicholas Zakas' talk, Speed up your Javascript, with some interest. I liked how he benchmarked the various performance improvements created by various optimization techniques, e.g. reducing calls to deeply nested objects, changing loops to count down instead of up, etc. I would like to run these benchmarks myself though, to see exactly how our current browsers are faring. I guess it wouldn't be too difficult to cook up some timed loops, but I'd like to know if there are any existing implementations out there.

    Read the article

  • Hibernate - query caching/second level cache does not work by value object containing subitems

    - by Zoltan Hamori
    Hi! I have been struggling with the following problem: I have a value object containing different panels. Each panel has a list of fields. Mapping: <class name="com.aviseurope.core.application.RACountryPanels" table="CTRY" schema="DBDEV1A" where="PEARL_CTRY='Y'" lazy="join"> <cache usage="read-only"/> <id name="ctryCode"> <column name="CTRY_CD_ID" sql-type="VARCHAR2(2)" not-null="true"/> </id> <bag name="panelPE" table="RA_COUNTRY_MAPPING" fetch="join" where="MANDATORY_FLAG!='N'"> <key column="COUNTRY_LOCATION_ID"/> <many-to-many class="com.aviseurope.core.application.RAFieldVO" column="RA_FIELD_MID" where="PANEL_ID='PE'"/> </bag> </class> I use the following criteria to get the value object: Session m_Session = HibernateUtil.currentSession(); m_Criteria = m_Session.createCriteria(RACountryPanels.class); m_Criteria.add(Expression.eq("ctryCode", p_Country)); m_Criteria.setCacheable(true); As I see the query cache contains only the main select like select * from CTRY where ctry_cd_id=? Both RACountryPanels and RAFieldVO are second level cached. If I check the 2nd level cache content I can see that it cointains the RAFields and the RACountryPanels as well and I can see the select .. from CTRY where ctry_cd_id=... in query cache region as well. When I call the servlet it seems that it is using the cache, but second time not. If I check the content of the cache using JMX, everything seems to be ok, but when I measure the object access time, it seems that it does not always use the cache. Cheers Zoltan

    Read the article

  • Mysql Server Optimization

    - by Ish Kumar
    Hi Geeks, We are having serious MySQL(InnoDB) performance issues at a moment when we do: (10-20) insertions on TABLE1 (10-20) updates on TABLE2 Note: Both above operations happens within fraction of a second. And this occurs every few (10-15) minutes. And all online users (approx 400-600) doing read operation on join of TABLE1 & TABLE2 every 1 second. Here is our mysql configuration info: http://docs.google.com/View?id=dfrswh7c_117fmgcmb44 Issues: Lot queries wait and expire later (saw it from phpmyadmin / processes). My poor MySQL server crashes sometimes Questions Q1: Any suggestions to optimize at MySQL level? Q2: I thinking to use persistent connections at application level, is it right? Info Added Later: Database Engine: InnoDB TABLE1 : 400,000 rows (inserting 8,000 daily) & TABLE2: 8,000 rows 1 second query: SELECT b.id, b.user_id, b.description, b.debit, b.created, b.price, u.username, u.email, u.mobile FROM TABLE1 b, TABLE2 u WHERE b.credit = 0 AND b.user_id = u.id AND b.auction_id = "12345" ORDER BY b.id DESC LIMIT 10; // there are few more but they are not so critical. Indexing is good, we are using them wisely. In above query all id's are indexed And TABLE1 has frequent insertions and TABLE2 has frequent updates.

    Read the article

  • Java code optimization leads to numerical inaccuracies and errors

    - by rano
    I'm trying to implement a version of the Fuzzy C-Means algorithm in Java and I'm trying to do some optimization by computing just once everything that can be computed just once. This is an iterative algorithm and regarding the updating of a matrix, the clusters x pixels membership matrix U, this is the update rule I want to optimize: where the x are the element of a matrix X (pixels x features) and v belongs to the matrix V (clusters x features). And m is a parameter that ranges from 1.1 to infinity. The distance used is the euclidean norm. If I had to implement this formula in a banal way I'd do: for(int i = 0; i < X.length; i++) { int count = 0; for(int j = 0; j < V.length; j++) { double num = D[i][j]; double sumTerms = 0; for(int k = 0; k < V.length; k++) { double thisDistance = D[i][k]; sumTerms += Math.pow(num / thisDistance, (1.0 / (m - 1.0))); } U[i][j] = (float) (1f / sumTerms); } } In this way some optimization is already done, I precomputed all the possible squared distances between X and V and stored them in a matrix D but that is not enough, since I'm cycling througn the elements of V two times resulting in two nested loops. Looking at the formula the numerator of the fraction is independent of the sum so I can compute numerator and denominator independently and the denominator can be computed just once for each pixel. So I came to a solution like this: int nClusters = V.length; double exp = (1.0 / (m - 1.0)); for(int i = 0; i < X.length; i++) { int count = 0; for(int j = 0; j < nClusters; j++) { double distance = D[i][j]; double denominator = D[i][nClusters]; double numerator = Math.pow(distance, exp); U[i][j] = (float) (1f / (numerator * denominator)); } } Where I precomputed the denominator into an additional column of the matrix D while I was computing the distances: for (int i = 0; i < X.length; i++) { for (int j = 0; j < V.length; j++) { double sum = 0; for (int k = 0; k < nDims; k++) { final double d = X[i][k] - V[j][k]; sum += d * d; } D[i][j] = sum; D[i][B.length] += Math.pow(1 / D[i][j], exp); } } By doing so I encounter numerical differences between the 'banal' computation and the second one that leads to different numerical value in U (not in the first iterates but soon enough). I guess that the problem is that exponentiate very small numbers to high values (the elements of U can range from 0.0 to 1.0 and exp , for m = 1.1, is 10) leads to ver y small values, whereas by dividing the numerator and the denominator and THEN exponentiating the result seems to be better numerically. The problem is it involves much more operations. Am I doing something wrong? Is there a possible solution to get both the code optimized and numerically stable? Any suggestion or criticism will be appreciated.

    Read the article

  • How to clear the entire second level cache in NHibernate

    - by Bittercoder
    I wish to clear the entire second level cache in NHibernate via code. Is there a way to do this which is independent of the cache provider being used? (we have customers using both memcache and syscache within the same application). We wish to clear the entire cache because changes external the database would have occurred (which we have no guarantees re: which tables/entities were affected).

    Read the article

  • QT warning level suggestion

    - by metdos
    What is the warning level you use while compiling QT projects? When I compiled with W4, I'm getting a lot of warnings such as: C4127: conditional expression is constant Should I compile at W3, or find other ways to handle warnings at W4, such as: adding a new header file and using pragma's(mentioned here C++ Coding Standards: 101 Rules, Guidelines, and Best Practices). What are your practices? Thansk.

    Read the article

  • PHP website Optimization

    - by ana
    I have a high traffic website and I need make sure my site is fast enough to display my pages to everyone rapidly. I searched on Google many articles about speed and optimization and here's what I found: Cache the page Save it to the disk Caching the page in memory: This is very fast but if I need to change the content of my page I have to remove it from cache and then re-save the file on the disk. Save it to disk This is very easy to maintain but every time the page is accessed I have to read on the disk. Which method should I go with? Thanks

    Read the article

  • Performance optimization strategies of last resort?

    - by jerryjvl
    There are plenty of performance questions on this site already, but it occurs to me that almost all are very problem-specific and fairly narrow. And almost all repeat the advice to avoid premature optimization. Let's assume: the code already is working correctly the algorithms chosen are already optimal for the circumstances of the problem the code has been measured, and the offending routines have been isolated all attempts to optimize will also be measured to ensure they do not make matters worse What I am looking for here is strategies and tricks to squeeze out up to the last few percent in a critical algorithm when there is nothing else left to do but whatever it takes. Ideally, try to make answers language agnostic, and indicate any down-sides to the suggested strategies where applicable. I'll add a reply with my own initial suggestions, and look forward to whatever else the SO community can think of.

    Read the article

  • Does MATLAB perform tail call optimization?

    - by Shea Levy
    I've recently learned Haskell, and am trying to carry the pure functional style over to my other code when possible. An important aspect of this is treating all variables as immutable, i.e. constants. In order to do so, many computations that would be implemented using loops in an imperative style have to be performed using recursion, which typically incurs a memory penalty due to the allocation a new stack frame for each function call. In the special case of a tail call (where the return value of a called function is immediately returned to the callee's caller), however, this penalty can be bypassed by a process called tail call optimization (in one method, this can be done by essentially replacing a call with a jmp after setting up the stack properly). Does MATLAB perform TCO by default, or is there a way to tell it to?

    Read the article

  • Including associations optimization in Rails

    - by Vitaly
    Hey, I'm looking for help with Ruby optimization regarding loading of associations on demand. This is simplified example. I have 3 models: Post, Comment, User. References are: Post has many comments and Comment has reference to User (:author). Now when I go to the post page, I expect to see post body + all comments (and their respective authors names). This requires following 2 queries: select * from Post -- to get post data (1 row) select * from Comment inner join User -- to get comment + usernames (N rows) In the code I have: Post.find(params[:id], :include => { :comments => [:author] } But it doesn't work as expected: as I see in the back end, there're still N+1 hits (some of them are cached though). How can I optimize that?

    Read the article

  • why optimization does not happen?

    - by aaa
    hi. I have C/C++ code, that looks like this: static int function(double *I) { int n = 0; // more instructions, loops, for (int i; ...; ++i) n += fabs(I[i] > tolerance); return n; } function(I); // return value is not used. compiler inlines function, however it does not optimize out n manipulations. I would expect compiler is able to recognize that value is never used as rhs only. Is there some side effect, which prevents optimization? Thanks

    Read the article

  • Static variable for optimization

    - by keithjgrant
    I'm wondering if I can use a static variable for optimization: public function Bar() { static $i = moderatelyExpensiveFunctionCall(); if ($i) { return something(); } else { return somethingElse(); } } I know that once $i is initialized, it won't be changed by by that line of code on successive calls to Bar(). I assume this means that moderatelyExpensiveFunctionCall() won't be evaluated every time I call, but I'd like to know for certain. Once PHP sees a static variable that has been initialized, does it skip over that line of code? In other words, is this going to optimize my execution time if I make a lot of calls to Bar(), or am I wasting my time?

    Read the article

  • Copy method optimization in compilers

    - by Dženan
    Hi All! I have the following code: void Stack::operator =(Stack &rhs) { //do the actual copying } Stack::Stack(Stack &rhs) //copy-constructor { top=NULL; //initialize this as an empty stack (which it is) *this=rhs; //invoke assignment operator } Stack& Stack::CopyStack() { return *this; //this statement will invoke copy contructor } It is being used like this: unsigned Stack::count() { unsigned c=0; Stack copy=CopyStack(); while (!copy.empty()) { copy.pop(); c++; } return c; } Removing reference symbol from declaration of CopyStack (returning a copy instead of reference) makes no difference in visual studio 2008 (with respect to number of times copying is invoked). I guess it gets optimized away - normally it should first make a copy for the return value, then call assignment operator once more to assign it to variable sc. What is your experience with this sort of optimization in different compilers? Regards, Dženan

    Read the article

  • Common optimization rules

    - by mafutrct
    This is a dangerous question, so let me try to phrase it correctly. Premature optimization is the root of all evil, but if you know you need it, there is a basic set of rules that should be considered. This set is what I'm wondering about. For instance, imagine you got a list of a few thousand items. How do you look up an item with a specific, unique ID? Of course, you simply use a Dictionary to map the ID to the item. And if you know that there is a setting stored in a database that is required all the time, you simply cache it instead of issuing a database request hundred times a second. I guess there are a few even more basic ideas. I am specifically not looking for "don't do it, for experts: don't do it yet" or "use a profiler" answers, but for really simple, general hints. If you feel this is an argumentative question, you probably misunderstood my intention.

    Read the article

  • How do I generate a level randomly?

    - by Charlton Santana
    I am currently hard coding 10 different instances like the code below, but but I'd like to create many more. Instead of having the same layout for the new level, I was wondering if there is anyway to generate a random X value for each block (this will be how far into the level it is). A level 100,000 pixels wide would be good enough but if anyone knows a system to make the level go on and on, I'd like to know that too. This is basically how I define a block now (with irrelevant code removed): block = new Block(R.drawable.block, 400, platformheight); block2 = new Block(R.drawable.block, 600, platformheight); block3 = new Block(R.drawable.block, 750, platformheight); The 400 is the X position, which I'd like to place randomly through the level, the platformheight variable defines the Y position which I don't want to change.

    Read the article

  • Which isometric angles can be mirrored (and otherwise transformed) for optimization?

    - by Tom
    I am working on a basic isometric game, and am struggling to find the correct mirrors. Mirror can be any form of transform. I have managed to get SE out of SW, by scaling the sprite on X axis by -1. Same applies for NE angle. Something is bugging me, that I should be able to also mirror N to S, but I cannot manage to pull this one off. Am I just too sleepy and trying to do the impossible, or a basic -1 scale on Y axis is not enough? What are the common used mirror table for optimizing 8 angle (N, NE, E, SE, S, SW, W, NW) isometric sprites?

    Read the article

  • When should an API favour optimization over readability and ease-of-use?

    - by jmlane
    I am in the process of designing a small library, where one of my design goals is to use as much of the native domain language as possible in the API. While doing so, I've noticed that there are some cases in the API outline where a more intuitive, readable attribute/method call requires some functionally unnecessary encapsulation. Since the final product will not necessarily require high performance, I am unconcerned about making the decision to favour ease-of-use in my current project over the most efficient implementation of the code in question. I know not to assume readability and ease-of-use are paramount in all expected use-cases, such as when performance is required. I would like to know if there are more general reasons that argue for an API design preferring (marginally) more efficient implementations?

    Read the article

  • When should code favour optimization over readability and ease-of-use?

    - by jmlane
    I am in the process of designing a small library, where one of my design goals is that the API should be as close to the domain language as possible. While working on the design, I've noticed that there are some cases in the code where a more intuitive, readable attribute/method call requires some functionally unnecessary encapsulation. Since the final product will not necessarily require high performance, I am unconcerned about making the decision to favour ease-of-use in my current project over the most efficient implementation of the code in question. I know not to assume readability and ease-of-use are paramount in all expected use-cases, such as when performance is required. I would like to know if there are more general reasons that argue for a design preferring more efficient implementations—even if only marginally so?

    Read the article

< Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14  | Next Page >