Search Results

Search found 4136 results on 166 pages for 'micro optimization'.

Page 65/166 | < Previous Page | 61 62 63 64 65 66 67 68 69 70 71 72  | Next Page >

  • unroll nested for loops in C++

    - by Hristo
    How would I unroll the following nested loops? for(k = begin; k != end; ++k) { for(j = 0; j < Emax; ++j) { for(i = 0; i < N; ++i) { if (j >= E[i]) continue; array[k] += foo(i, tr[k][i], ex[j][i]); } } } I tried the following, but my output isn't the same, and it should be: for(k = begin; k != end; ++k) { for(j = 0; j < Emax; ++j) { for(i = 0; i+4 < N; i+=4) { if (j >= E[i]) continue; array[k] += foo(i, tr[k][i], ex[j][i]); array[k] += foo(i+1, tr[k][i+1], ex[j][i+1]); array[k] += foo(i+2, tr[k][i+2], ex[j][i+2]); array[k] += foo(i+3, tr[k][i+3], ex[j][i+3]); } if (i < N) { for (; i < N; ++i) { if (j >= E[i]) continue; array[k] += foo(i, tr[k][i], ex[j][i]); } } } } I will be running this code in parallel using Intel's TBB so that it takes advantage of multiple cores. After this is finished running, another function prints out what is in array[] and right now, with my unrolling, the output isn't identical. Any help is appreciated. Thanks, Hristo

    Read the article

  • Hierarchical Hibernate, how many queries are executed?

    - by ghost1
    So I've been dealing with a home brew DB framework that has some seriously flaws, the justification for use being that not using an ORM will save on the number of queries executed. If I'm selecting all possibile records from the top level of a joinable object hierarchy, how many separate calls to the DB will be made when using an ORM (such as Hibernate)? I feel like calling bullshit on this, as joinable entities should be brought down in one query , right? Am I missing something here? note: lazy initialization doesn't matter in this scenario as all records will be used.

    Read the article

  • Is this implementation truely tail-recursive?

    - by CFP
    Hello everyone! I've come up with the following code to compute in a tail-recursive way the result of an expression such as 3 4 * 1 + cos 8 * (aka 8*cos(1+(3*4))) The code is in OCaml. I'm using a list refto emulate a stack. type token = Num of float | Fun of (float->float) | Op of (float->float->float);; let pop l = let top = (List.hd !l) in l := List.tl (!l); top;; let push x l = l := (x::!l);; let empty l = (l = []);; let pile = ref [];; let eval data = let stack = ref data in let rec _eval cont = match (pop stack) with | Num(n) -> cont n; | Fun(f) -> _eval (fun x -> cont (f x)); | Op(op) -> _eval (fun x -> cont (op x (_eval (fun y->y)))); in _eval (fun x->x) ;; eval [Fun(fun x -> x**2.); Op(fun x y -> x+.y); Num(1.); Num(3.)];; I've used continuations to ensure tail-recursion, but since my stack implements some sort of a tree, and therefore provides quite a bad interface to what should be handled as a disjoint union type, the call to my function to evaluate the left branch with an identity continuation somehow irks a little. Yet it's working perfectly, but I have the feeling than in calling the _eval (fun y->y) bit, there must be something wrong happening, since it doesn't seem that this call can replace the previous one in the stack structure... Am I misunderstanding something here? I mean, I understand that with only the first call to _eval there wouldn't be any problem optimizing the calls, but here it seems to me that evaluation the _eval (fun y->y) will require to be stacked up, and therefore will fill the stack, possibly leading to an overflow... Thanks!

    Read the article

  • Optimizing Vector elements swaps using CUDA

    - by Orion Nebula
    Hi all, Since I am new to cuda .. I need your kind help I have this long vector, for each group of 24 elements, I need to do the following: for the first 12 elements, the even numbered elements are multiplied by -1, for the second 12 elements, the odd numbered elements are multiplied by -1 then the following swap takes place: Graph: because I don't yet have enough points, I couldn't post the image so here it is: http://www.freeimagehosting.net/image.php?e4b88fb666.png I have written this piece of code, and wonder if you could help me further optimize it to solve for divergence or bank conflicts .. //subvector is a multiple of 24, Mds and Nds are shared memory _shared_ double Mds[subVector]; _shared_ double Nds[subVector]; int tx = threadIdx.x; int tx_mod = tx ^ 0x0001; int basex = __umul24(blockDim.x, blockIdx.x); Mds[tx] = M.elements[basex + tx]; __syncthreads(); // flip the signs if (tx < (tx/24)*24 + 12) { //if < 12 and even if ((tx & 0x0001)==0) Mds[tx] = -Mds[tx]; } else if (tx < (tx/24)*24 + 24) { //if >12 and < 24 and odd if ((tx & 0x0001)==1) Mds[tx] = -Mds[tx]; } __syncthreads(); if (tx < (tx/24)*24 + 6) { //for the first 6 elements .. swap with last six in the 24elements group (see graph) Nds[tx] = Mds[tx_mod + 18]; Mds [tx_mod + 18] = Mds [tx]; Mds[tx] = Nds[tx]; } else if (tx < (tx/24)*24 + 12) { // for the second 6 elements .. swp with next adjacent group (see graph) Nds[tx] = Mds[tx_mod + 6]; Mds [tx_mod + 6] = Mds [tx]; Mds[tx] = Nds[tx]; } __syncthreads(); Thanks in advance ..

    Read the article

  • Should Python import statements always be at the top of a module?

    - by Adam J. Forster
    PEP 08 states: Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants. However if the class/method/function that I am importing is only used in rare cases, surely it is more efficient to do the import when it is needed? Isn't this: class SomeClass(object): def not_often_called(self) from datetime import datetime self.datetime = datetime.now() more efficient than this? from datetime import datetime class SomeClass(object): def not_often_called(self) self.datetime = datetime.now()

    Read the article

  • Splitting tables by field to optimize MySQL?

    - by AK
    Do splitting fields into multiple tables ever yield faster queries? Consider the following two scenarios: Table1 ----------- int PersonID text Value1 float Value2 or Table1 ----------- int PersonID text Value1 Table2 ----------- int PersonID float Value2 If Value1 and Value2 are always being displayed together, I imagine Table1 is always faster because the second schema would require two SELECT statements. But are there any situations where you would choose the second? If the number of records were expected to be really large?

    Read the article

  • Optimizing an embedded SELECT query in mySQL

    - by Crazy Serb
    Ok, here's a query that I am running right now on a table that has 45,000 records and is 65MB in size... and is just about to get bigger and bigger (so I gotta think of the future performance as well here): SELECT count(payment_id) as signup_count, sum(amount) as signup_amount FROM payments p WHERE tm_completed BETWEEN '2009-05-01' AND '2009-05-30' AND completed > 0 AND tm_completed IS NOT NULL AND member_id NOT IN (SELECT p2.member_id FROM payments p2 WHERE p2.completed=1 AND p2.tm_completed < '2009-05-01' AND p2.tm_completed IS NOT NULL GROUP BY p2.member_id) And as you might or might not imagine - it chokes the mysql server to a standstill... What it does is - it simply pulls the number of new users who signed up, have at least one "completed" payment, tm_completed is not empty (as it is only populated for completed payments), and (the embedded Select) that member has never had a "completed" payment before - meaning he's a new member (just because the system does rebills and whatnot, and this is the only way to sort of differentiate between an existing member who just got rebilled and a new member who got billed for the first time). Now, is there any possible way to optimize this query to use less resources or something, and to stop taking my mysql resources down on their knees...? Am I missing any info to clarify this any further? Let me know... EDIT: Here are the indexes already on that table: PRIMARY PRIMARY 46757 payment_id member_id INDEX 23378 member_id payer_id INDEX 11689 payer_id coupon_id INDEX 1 coupon_id tm_added INDEX 46757 tm_added, product_id tm_completed INDEX 46757 tm_completed, product_id

    Read the article

  • Delphi fast large bitmap creation (without clearing)

    - by Ritsaert Hornstra
    When using the TBitmap wrapper for a GDI bitmap from the unit Graphics I noticed it will always clear out the bitmap (using a PatBlt call) when setting up a bitmap with SetSize( w, h ). When I copy in the bits later on (see routine below) it seems ScanLine is the fastest possibility and not SetDIBits. function ToBitmap: TBitmap; var i, N, x: Integer; S, D: PAnsiChar; begin Result := TBitmap.Create(); Result.PixelFormat := pf32bit; Result.SetSize( width, height ); S := Src; D := Result.ScanLine[ 0 ]; x := Integer( Result.ScanLine[ 1 ] ) - Integer( D ); N := width * sizeof( longword ); for i := 0 to height - 1 do begin Move( S^, D^, N ); Inc( S, N ); Inc( D, x ); end; end; The bitmaps I need to work with are quite large (150MB of RGB memory). With these iomages it takes 150ms to simply create an empty bitmap and a further 140ms to overwrite it's contents. Is there a way of initializing a TBitmap with the correct size WITHOUT initializing the pixels itself and leaving the memory of the pixels uninitialized (eg dirty)? Or is there another way to do such a thing. I know we could work on the pixels in place but this still leaves the 150ms of unnessesary initializtion of the pixels.

    Read the article

  • Avoid the use of loops (for) with R

    - by albergali
    Hi, I'm working with R and I have a code like this: i<-1 j<-1 for (i in 1:10) for (j in 1:100) if (data[i] == paths[j,1]) cluster[i,4] <- paths[j,2] where : data is a vector with 100 rows and 1 column paths is a matrix with 100 rows and 5 columns cluster is a matrix with 100 rows and 5 columns My question is: how could I avoid the use of "for" loops to iterate through the matrix? I don't know whether apply functions (lapply, tapply...) are useful in this case. This is a problem when j=10000 for example, because execution time is very long. Thank you

    Read the article

  • MySQL subqueries

    - by swamprunner7
    Can we do this query without subqueries? SELECT login, post_n, (SELECT SUM(vote) FROM votes WHERE votes.post_n=posts.post_n)AS votes, (SELECT COUNT(comments.post_n) FROM comments WHERE comments.post_n=posts.post_n)AS comments_count FROM users, posts WHERE posts.id=users.id AND (visibility=2 OR visibility=3) ORDER BY date DESC LIMIT 0, 15 tables: Users: id, login Posts: post_n, id, visibility Votes: post_n, vote id — it`s user id, Users the main table.

    Read the article

  • When compiling programs to run inside a VM, what should march and mtune be set to?

    - by Russ
    With VMs being slave to whatever the host machine is providing, what compiler flags should be provided to gcc? I would normally think that -march=native would be what you would use when compiling for a dedicated box, but the fine detail that -march=native is going to as indicated in this article makes me extremely wary of using it. So... what to set -march and -mtune to inside a VM? For a specific example... My specific case right now is compiling python (and more) in a linux guest inside a KVM-based "cloud" host that I have no real control over the host hardware (aside from 'simple' stuff like CPU GHz m CPU count, and available RAM). Currently, cpuinfo tells me I've got an "AMD Opteron(tm) Processor 6176" but I honestly don't know (yet) if that is reliable and whether the guest can get moved around to different architectures on me to meet the host's infrastructure shuffling needs (sounds hairy/unlikely). All I can really guarantee is my OS, which is a 64-bit linux kernel where uname -m yields x86_64.

    Read the article

  • Unicorn: Which number of worker processes to use?

    - by blackbird07
    I am running a Ruby on Rails app on a virtual Linux server that is capped at 1GB RAM. Currently, I am constantly hitting the limit and would like to optimize memory utilization. One option I am looking at is reducing the number of unicorn workers. So what is the best way to determine the number of unicorn workers to use? The current setting is 10 workers, but the maximum number of requests per second I have seen on Google Analytics Real-Time is 3 (only scored once at a peak time; in 99% of the time not going above 1 request per second). So is it a save assumption that I can - for now - go with 4 workers, leaving room for unexpected amounts of requests? What are the metrics I should have a look at for determining the number of workers and what are the tools I can use for that on my Ubuntu machine?

    Read the article

  • Should not a tail-recursive function also be faster?

    - by Balint Erdi
    I have the following Clojure code to calculate a number with a certain "factorable" property. (what exactly the code does is secondary). (defn factor-9 ([] (let [digits (take 9 (iterate #(inc %) 1)) nums (map (fn [x] ,(Integer. (apply str x))) (permutations digits))] (some (fn [x] (and (factor-9 x) x)) nums))) ([n] (or (= 1 (count (str n))) (and (divisible-by-length n) (factor-9 (quot n 10)))))) Now, I'm into TCO and realize that Clojure can only provide tail-recursion if explicitly told so using the recur keyword. So I've rewritten the code to do that (replacing factor-9 with recur being the only difference): (defn factor-9 ([] (let [digits (take 9 (iterate #(inc %) 1)) nums (map (fn [x] ,(Integer. (apply str x))) (permutations digits))] (some (fn [x] (and (factor-9 x) x)) nums))) ([n] (or (= 1 (count (str n))) (and (divisible-by-length n) (recur (quot n 10)))))) To my knowledge, TCO has a double benefit. The first one is that it does not use the stack as heavily as a non tail-recursive call and thus does not blow it on larger recursions. The second, I think is that consequently it's faster since it can be converted to a loop. Now, I've made a very rough benchmark and have not seen any difference between the two implementations although. Am I wrong in my second assumption or does this have something to do with running on the JVM (which does not have automatic TCO) and recur using a trick to achieve it? Thank you.

    Read the article

  • Nodes set of the same type with if-test. Make it less.

    - by Kalinin
    How to make the code more beautiful (compact)? <xsl:template match="part"> <table class="part"> <xsl:if test="name != ''"> <tr> <td>????????</td><td><xsl:value-of select="name"/></td> </tr> </xsl:if> <xsl:if test="model != ''"> <tr> <td>??????</td><td><xsl:value-of select="model"/></td> </tr> </xsl:if> <xsl:if test="year != ''"> <tr> <td>???</td><td><xsl:value-of select="year"/></td> </tr> </xsl:if> <xsl:if test="glass_type != ''"> <tr> <td>???</td><td><xsl:value-of select="glass_type"/></td> </tr> </xsl:if> <xsl:if test="scancode != ''"> <tr> <td>???????</td><td><xsl:value-of select="scancode"/></td> </tr> </xsl:if> <xsl:if test="eurocode != ''"> <tr> <td>???????</td><td><xsl:value-of select="eurocode"/></td> </tr> </xsl:if> <xsl:if test="coment != ''"> <tr> <td>???????????</td><td><xsl:value-of select="coment"/></td> </tr> </xsl:if> <xsl:if test="glass_size != ''"> <tr> <td>??????</td><td><xsl:value-of select="glass_size"/></td> </tr> </xsl:if> <xsl:if test="vendor != ''"> <tr> <td>?????????????</td><td><xsl:value-of select="vendor"/></td> </tr> </xsl:if> <xsl:if test="trademark != ''"> <tr> <td>???????? ?????</td><td><xsl:value-of select="trademark"/></td> </tr> </xsl:if> <xsl:if test="fprice != ''"> <tr> <td>????</td><td><xsl:value-of select="fprice"/></td> </tr> </xsl:if> </table> </xsl:template> Update: i wrote: <my:translations xmlns:my="my:my"> <w e="name" r="????????"/> <w e="model" r="??????"/> <w e="year" r="???"/> <w e="glass_type" r="???"/> <w e="scancode" r="???????"/> <w e="eurocode" r="???????"/> <w e="comment" r="???????????"/> <w e="glass_size" r="??????"/> <w e="vendor" r="?????????????"/> <w e="trademark" r="???????? ?????"/> <w e="fprice" r="????"/> </my:translations> <xsl:value-of select="//w/@r"/> And have no result from this code. Is it normal? And how can i get new element w?

    Read the article

  • How can I Query only __key__ on a Google Appengine PolyModel child?

    - by Gabriel
    So the situation is: I want to optimize my code some for doing counting of records. So I have a parent Model class Base, a PolyModel class Entry, and a child class of Entry Article: How would I query Article.key so I can reduce the query load but only get the Article count. My first thought was to use: q = db.GqlQuery("SELECT __key__ from Article where base = :1", i_base) but it turns out GqlQuery doesn't like that because articles are actually stored in a table called Entry. Would it be possible to Query the class attribute? something like: q = db.GqlQuery("select __key__ from Entry where base = :1 and :2 in class", i_base, 'Article') neither of which work. Turns out the answer is even easier. But I am going to finish this question because I looked everywhere for this. q = db.GqlQuery("select __key__ from Entry where base = :1 and class = :2", i_base, 'Article')

    Read the article

  • WPF: Improving Performance for Running on Older PCs

    - by Phil Sandler
    So, I'm building a WPF app and did a test deployment today, and found that it performed pretty poorly. I was surprised, as we are really not doing much in the way of visual effects or animations. I deployed on two machines: the fastest and the slowest that will need to run the application (the slowest PC has an Intel Celeron 1.80GHz with 2GB RAM). The application ran pretty well on the faster machine, but was choppy on the slower machine. And when I say "choppy", I mean the cursor jumped even just passing it over any open window of the app that had focus. I opened the Task Manager Performance window, and could see that the CPU usage jumped whenever the app had focus and the cursor was moving over it. If I gave focus to another (e.g. Excel), the CPU usage went back down after a second. This happened on both machines, but the choppiness was only noticeable on the slower machine. I had very limited time to tinker on the deployment machines, so didn't do a lot of detailed testing. The app runs fine on my development machine, but I also see the CPU spiking up to 10% there, just running the cursor over the window. I downloaded the WPF performance tool from MS and have been tinkering with it (on my dev machine). The docs say this about the "Frame Rate" metric in the Perforator tool: For applications without animation, this value should be near 0. The app is not doing any heavy animation, but the frame rate stays near 50 when the cursor is over any window. The screens I tested on have column headers in a grid that "highlight" and buttons that change color and appearance when scrolled over. Even moving the mouse on blank areas of the windows cause the same Frame rate and CPU usage (doesn't seem to be related to these minor animations). (Also, I am unable to figure out how to get anything but the two default tools--Perforator and Visual Profiler--installed into the WPF performance tool. That is probably a separate question). I also have Redgate's profiling tool, but I'm not sure if that can shed any light on rendering performance. So, I realize this is not an easy thing to troubleshoot without specifics or sample code (which I can't post). My questions are: What are some general things to look for (or avoid) in the code to improve performance? What steps can I take using the WPF performance tool to narrow down the problem? Is the PC spec listed above (Intel Celeron 1.80GHz with 2GB RAM) too slow to be running even vanilla WPF applications?

    Read the article

  • How to store static content across branches in a single location in version control

    - by Shravan
    [Just a random thought] I have a pdf doc that is downloaded when the user clicks on 'help' on my website. Now, this is a pretty huge document and is saved in version control (SVN) and is thus copied for all branches that exist in SVN. This is static content and something that developers are not working on, and does not change often. Is there a more efficient way to store it (that would not hamper local deployments) that would make SVN checkouts and updates relatively faster. I know the benefit we get is not huge, this is something that came to my head none the less.

    Read the article

  • Javascriptlibrary more efficient than Rickshaw for realtime visualizations

    - by dan kutz
    I want to visualize data as time-series graphs on mobile devices(tablets) and therefore stumbled upon rickshaw, which is based on D3. First I must say I was a little bit confused when I realized that realtime in web design is defined totally different to realtime in engineering which has fixed(and often very short) timeframes. Anyway my aim is to visualize the data as fast as possible, and on older tablets visualization with rickshaw is quite slow. Can anybody recommend another library, which may be more efficient in rendering? Or is there no way out and I have to go native? regards Dan.

    Read the article

  • How to optimize an database suggestion engine

    - by Dimitar Vouldjeff
    Hi, I`m making an online engine for item-to-item recommending movies. I have made some researches and I think that the best way to implement that is using pearson correlation and make a table with item1, item2 and correlation fields, but the problem is that after each rate of item I have to regenerate the correlation for in the worst case N records (where N is the number of items). Another think that I read is the following article, but I haven`t thought a way to implement it. So what is your suggestion to optimize this process? Or any other suggestions? Thanks.

    Read the article

  • Vector [] vs copying

    - by sak
    What is faster and/or generally better? vector<myType> myVec; int i; myType current; for( i = 0; i < 1000000; i ++ ) { current = myVec[ i ]; doSomethingWith( current ); doAlotMoreWith( current ); messAroundWith( current ); checkSomeValuesOf( current ); } or vector<myType> myVec; int i; for( i = 0; i < 1000000; i ++ ) { doSomethingWith( myVec[ i ] ); doAlotMoreWith( myVec[ i ] ); messAroundWith( myVec[ i ] ); checkSomeValuesOf( myVec[ i ] ); } I'm currently using the first solution. There are really millions of calls per second and every single bit comparison/move is performance-problematic.

    Read the article

  • Why is Javascript's Math.floor the slowest way to calculate floor in Javascript?

    - by z5h
    I'm generally not a fan of microbenchmarks. But this one has a very interesting result. http://ernestdelgado.com/archive/benchmark-on-the-floor/ It suggests that Math.floor is the SLOWEST way to calculate floor in Javascript. ~~n, n|n, n&n all being faster. This seems pretty shocking as I would expect that people implementing Javascript in today's modern browsers would be some pretty smart people. Does floor do something important that the other methods fail to do? Is there any reason to use it?

    Read the article

  • Does the order of columns in a query matter?

    - by James Simpson
    When selecting columns from a MySQL table, is performance affected by the order that you select the columns as compared to their order in the table (not considering indexes that may cover the columns)? For example, you have a table with rows uid, name, bday, and you have the following query. SELECT uid, name, bday FROM table Does MySQL see the following query any differently and thus cause any sort of performance hit? SELECT uid, bday, name FROM table

    Read the article

  • Is it better to echo javascript in raw format with php, or echo a script include that has been minif

    - by Scarface
    Hey guys quick question, I am currently echoing a lot of javascript that is based conditionally on login status and other variables. I was wondering if it would be better to simply echo the script include like <script type="text/javascript" src="javascript/openlogin.js"></script> that has been run through a minifying program and been gzipped or to echo the full script in raw format. The latter suggestion is messier to me but it reduces http requests while the latter would probably be smaller but take more cpu? Just wondering what some other people think. Thanks in advance for any advice.

    Read the article

  • NHibernate, each property is filled with a different select statement

    - by Eitan
    I'm retrieving a list of nhibernate entites which have relationships to other tables/entities. I've noticed instead of NHibernate performing JOINS and populating the properties, it retrieves the entity and then calls a select for each property. For example if a user can have many roles and I retrieve a user from the DB, Nhibernate retrieves the user and then populates the roles with another select statement. The problem is that I want to retrieve oh let's say a list of products which have various many-to-many relationships and relationships to items which have their own relationships. In the end I'm left with over a thousand DB calls to retrieve a list of 30 products. Thanks. I've also set default lazy loading to false because whenever I save the list of entities to a session, I get an error when trying to retrieve it on another page: LazyInitializationException: could not initialize proxy If anybody could shed any light I would truly appreciate it. Thanks. Eitan

    Read the article

  • The explain tells that the query is awful (it doesn't use a single key) but I'm using LIMIT 1. Is th

    - by Ricardo
    The explain command with the query: explain SELECT * FROM leituras WHERE categorias_id=75 AND textos_id=190304 AND cookie='3f203349ce5ad3c67770ebc882927646' AND endereco_ip='127.0.0.1' LIMIT 1 The result: id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE leituras ALL (null) (null) (null) (null) 1022597 Using where Will it make any difference adding some keys on the table? Even that the query will always return only one row.

    Read the article

< Previous Page | 61 62 63 64 65 66 67 68 69 70 71 72  | Next Page >