premature optimization - Page 37

Smoothing found path on grid

- by Denis Ermolin

I implemented several approaches such as A* and Potential fields for my tower defense game. But I want smooth paths, first I tried to find path on very small grid ( 5x5 pixels per tile) but it is extremly slow. I found nice video showing an RTS demo where paths are found on big grid but units dont move from each cell's center to center. How do I implement such behavior? Some examples would be great.

Read the article

Component-wise GLSL vector branching

- by Gustavo Maciel

I'm aware that it usually is a BAD idea to operate separately on GLSL vec's components separately. For example: //use instrinsic functions, they do the calculation on 4 components at a time. float dot = v1.x*v2.x + v1.y * v2.y + v1.z * v2.z; //NEVER float dot = dot(v1, v2); //YES //Multiply one by one is not good too, since the ALU can do the 4 components at a time too. vec3 mul = vec3(v1.x * v2.x, v1.y * v2.y, v1.z * v2.z); //NEVER vec3 mul = v1 * v2; I've been struggling thinking, are there equivalent operations for branching? For example: vec4 Overlay(vec4 v1, vec4 v2, vec4 opacity) { bvec4 less = lessThan(v1, vec4(0.5)); vec4 blend; for(int i = 0; i < 4; ++i) { if(less[i]) blend[i] = 2.0 * v1[i]*v2[i]; else blend[i] = 1.0 - 2.0 * (1.0 - v1[i])*(1.0 - v2[i]); } return v1 + (blend-v1)*opacity; } This is a Overlay operator that works component wise. I'm not sure if this is the best way to do it, since I'm afraid these for and if can be a bottleneck later. Tl;dr, Can I branch component wise? If yes, how can I optimize that Overlay function with it?

Read the article

How can I replicate Google Page Speed's lossless image compression as part of my workflow?

- by Keefer

I love that Google's Page Speed is able to losslessly compress a lot of my images, but I'd love to make it part of my workflow, prior to uploading a site and making it live. Is there anything I can run locally to give me the same lossless compression? I currently export images from Export For Web from Photoshop, and use a little application called PNGCrusher to reduce file size of PNGs. I'd love to find a faster way though than saving out and replacing the individual images from Page Speed's results.

Read the article

Research about best way to present multiple products on one page

- by Michael Dibbets

I am updating a webshop page. This is a fairly simple page that displays all the products that we currently sell. The page in development is visible here ( https://www.ortho.nl/wwebshop ). Now I was curious, and since I can't find anything via google etc..(probaly don't know the right keywords) what the best way is to present multiple products on one page. Should you use borders? Should you use colours? Which colours? what kind of tweaks will direct the customers attention to the right place? Does anyone know from experience or via research(and could you point me in the right direction to find that research) what the best way to present products is so conversion/clickthrough is optimised?

Read the article

Wikipedia A* pathfinding algorithm takes a lot of time

- by Vee

I've successfully implemented A* pathfinding in C# but it is very slow, and I don't understand why. I even tried not sorting the openNodes list but it's still the same. The map is 80x80, and there are 10-11 nodes. I took the pseudocode from here Wikipedia And this is my implementation: public static List<PGNode> Pathfind(PGMap mMap, PGNode mStart, PGNode mEnd) { mMap.ClearNodes(); mMap.GetTile(mStart.X, mStart.Y).Value = 0; mMap.GetTile(mEnd.X, mEnd.Y).Value = 0; List<PGNode> openNodes = new List<PGNode>(); List<PGNode> closedNodes = new List<PGNode>(); List<PGNode> solutionNodes = new List<PGNode>(); mStart.G = 0; mStart.H = GetManhattanHeuristic(mStart, mEnd); solutionNodes.Add(mStart); solutionNodes.Add(mEnd); openNodes.Add(mStart); // 1) Add the starting square (or node) to the open list. while (openNodes.Count > 0) // 2) Repeat the following: { openNodes.Sort((p1, p2) => p1.F.CompareTo(p2.F)); PGNode current = openNodes[0]; // a) We refer to this as the current square.) if (current == mEnd) { while (current != null) { solutionNodes.Add(current); current = current.Parent; } return solutionNodes; } openNodes.Remove(current); closedNodes.Add(current); // b) Switch it to the closed list. List<PGNode> neighborNodes = current.GetNeighborNodes(); double cost = 0; bool isCostBetter = false; for (int i = 0; i < neighborNodes.Count; i++) { PGNode neighbor = neighborNodes[i]; cost = current.G + 10; isCostBetter = false; if (neighbor.Passable == false || closedNodes.Contains(neighbor)) continue; // If it is not walkable or if it is on the closed list, ignore it. if (openNodes.Contains(neighbor) == false) { openNodes.Add(neighbor); // If it isn’t on the open list, add it to the open list. isCostBetter = true; } else if (cost < neighbor.G) { isCostBetter = true; } if (isCostBetter) { neighbor.Parent = current; // Make the current square the parent of this square. neighbor.G = cost; neighbor.H = GetManhattanHeuristic(current, neighbor); } } } return null; } Here's the heuristic I'm using: private static double GetManhattanHeuristic(PGNode mStart, PGNode mEnd) { return Math.Abs(mStart.X - mEnd.X) + Math.Abs(mStart.Y - mEnd.Y); } What am I doing wrong? It's an entire day I keep looking at the same code.

Read the article

Which of these algorithms is best for my goal?

- by JonathonG

I have created a program that restricts the mouse to a certain region based on a black/white bitmap. The program is 100% functional as-is, but uses an inaccurate, albeit fast, algorithm for repositioning the mouse when it strays outside the area. Currently, when the mouse moves outside the area, basically what happens is this: A line is drawn between a pre-defined static point inside the region and the mouse's new position. The point where that line intersects the edge of the allowed area is found. The mouse is moved to that point. This works, but only works perfectly for a perfect circle with the pre-defined point set in the exact center. Unfortunately, this will never be the case. The application will be used with a variety of rectangles and irregular, amorphous shapes. On such shapes, the point where the line drawn intersects the edge will usually not be the closest point on the shape to the mouse. I need to create a new algorithm that finds the closest point to the mouse's new position on the edge of the allowed area. I have several ideas about this, but I am not sure of their validity, in that they may have far too much overhead. While I am not asking for code, it might help to know that I am using Objective C / Cocoa, developing for OS X, as I feel the language being used might affect the efficiency of potential methods. My ideas are: Using a bit of trigonometry to project lines would work, but that would require some kind of intense algorithm to test every point on every line until it found the edge of the region... That seems too resource intensive since there could be something like 200 lines that would have each have to have as many as 200 pixels checked for black/white.... Using something like an A* pathing algorithm to find the shortest path to a black pixel; however, A* seems resource intensive, even though I could probably restrict it to only checking roughly in one direction. It also seems like it will take more time and effort than I have available to spend on this small portion of the much larger project I am working on, correct me if I am wrong and it would not be a significant amount of code (100 lines or around there). Mapping the border of the region before the application begins running the event tap loop. I think I could accomplish this by using my current line-based algorithm to find an edge point and then initiating an algorithm that checks all 8 pixels around that pixel, finds the next border pixel in one direction, and continues to do this until it comes back to the starting pixel. I could then store that data in an array to be used for the entire duration of the program, and have the mouse re-positioning method check the array for the closest pixel on the border to the mouse target position. That last method would presumably execute it's initial border mapping fairly quickly. (It would only have to map between 2,000 and 8,000 pixels, which means 8,000 to 64,000 checked, and I could even permanently store the data to make launching faster.) However, I am uncertain as to how much overhead it would take to scan through that array for the shortest distance for every single mouse move event... I suppose there could be a shortcut to restrict the number of elements in the array that will be checked to a variable number starting with the intersecting point on the line (from my original algorithm), and raise/lower that number to experiment with the overhead/accuracy tradeoff. Please let me know if I am over thinking this and there is an easier way that will work just fine, or which of these methods would be able to execute something like 30 times per second to keep mouse movement smooth, or if you have a better/faster method. I've posted relevant parts of my code below for reference, and included an example of what the area might look like. (I check for color value against a loaded bitmap that is black/white.) // // This part of my code runs every single time the mouse moves. // CGPoint point = CGEventGetLocation(event); float tX = point.x; float tY = point.y; if( is_in_area(tX,tY, mouse_mask)){ // target is inside O.K. area, do nothing }else{ CGPoint target; //point inside restricted region: float iX = 600; // inside x float iY = 500; // inside y // delta to midpoint between iX,iY and tX,tY float dX; float dY; float accuracy = .5; //accuracy to loop until reached do { dX = (tX-iX)/2; dY = (tY-iY)/2; if(is_in_area((tX-dX),(tY-dY),mouse_mask)){ iX += dX; iY += dY; } else { tX -= dX; tY -= dY; } } while (abs(dX)>accuracy || abs(dY)>accuracy); target = CGPointMake(roundf(tX), roundf(tY)); CGDisplayMoveCursorToPoint(CGMainDisplayID(),target); } Here is "is_in_area(int x, int y)" : bool is_in_area(NSInteger x, NSInteger y, NSBitmapImageRep *mouse_mask){ NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init]; NSUInteger pixel[4]; [mouse_mask getPixel:pixel atX:x y:y]; if(pixel[0]!= 0){ [pool release]; return false; } [pool release]; return true; }

Read the article

Space-efficient data structures for broad-phase collision detection

- by Marian Ivanov

As far as I know, these are three types of data structures that can be used for collision detection broadphase: Unsorted arrays: Check every object againist every object - O(n^2) time; O(log n) space. It's so slow, it's useless if n isn't really small. for (i=1;i<objects;i++){ for(j=0;j<i;j++) narrowPhase(i,j); }; Sorted arrays: Sort the objects, so that you get O(n^(2-1/k)) for k dimensions O(n^1.5) for 2d and O(n^1.67) for 3d and O(n) space. Assuming the space is 2D and sortedArray is sorted so that if the object begins in sortedArray[i] and another object ends at sortedArray[i-1]; they don't collide Heaps of stacks: Divide the objects between a heap of stacks, so that you only have to check the bucket, its children and its parents - O(n log n) time, but O(n^2) space. This is probably the most frequently used approach. Is there a way of having O(n log n) time with less space? When is it more efficient to use sorted arrays over heaps and vice versa?

Read the article

Speed up lighting in deferred shading

- by kochol

I implemented a simple deferred shading renderer. I use 3 G-Buffer for storing position (R32F), normal (G16R16F) and albedo (ARGB8). I use sphere map algorithm to store normals in world space. Currently I use inverse of view * projection matrix to calculate the position of each pixel from stored depth value. First I want to avoid per pixel matrix multiplication for calculating the position. Is there another way to store and calculate position in G-Buffer without the need of matrix multiplication Store the normal in view space Every lighting in my engine is in world space and I want do the lighting in view space to speed up my lighting pass. I want an optimized lighting pass for my deferred engine.

Read the article

How can I find unused/unapplied CSS rules in a stylesheet?

- by liori

Hello, I've got a huge CSS file and an HTML file. I'd like to find out which rules are not used while displaying a HTML file. Are there tools for this? The CSS file has evolved over few years and from what I know no one has ever removed anything from it--people just wrote new overriding rules again and again. EDIT: It was suggested to use Dust-Me Selectors or Chrome's Web Page Performance tool. But they both work on level of selectors, and not individual rules. I've got lots of cases where a rule inside a selector is always overridden--and this is what I mostly want to get rid of. For example: body { color: white; padding: 10em; } h1 { color: black; } p { color: black; } ... ul { color: black; } All the text in my HTML is inside some wrapper element, so it is never white. body's padding always works, so of course the whole body selector cannot be removed. And I'd like to get rid of such useless rules too. EDIT: And another case of useless rule: when it duplicates existing one without changing anything: a { margin-left: 5px; color: blue; } a:hover { margin-left: 5px; color: red; } I'd happily get rid of the second margin-left... again it seems to me that those tools does not find such things. Thank you,

Read the article

How to handle shoot instructions, in a multiplayer TD

- by Martin Elvar Jensen

I'm currently working on a Multiplayer Tower Defense game, using ImpactJS & Node. I seek some clarification about how to handle projectiles from towers, let me explain. So the server is running the master game, and the clients just follow the instruction from the server. Lets say there is about 20 towers on the stage, all needs instructions for which creeps to shoot at. Now lets say each towers fires twice in a second, that's 40 shots each second, (worst case scenario) which is 40 request per second to each client, would't this casue alot of stress to the server, saying that we have 50 games running the same time. So what i am really asking, is this method inefficient, and is there a smarter way to handle all these instructions. Thank you.

Read the article

A*, Tile costs and heuristic; How to approach

- by Kevin Toet

I'm doing exercises in tile games and AI to improve my programming. I've written a highly unoptimised pathfinder that does the trick and a simple tile class. The first problem i ran into was that the heuristic was rounded to int's which resulted in very straight paths. Resorting a Euclidian Heuristic seemed to fixed it as opposed to use the Manhattan approach. The 2nd problem I ran into was when i tried added tile costs. I was hoping to use the value's of the flags that i set on the tiles but the value's were too small to make the pathfinder consider them a huge obstacle so i increased their value's but that breaks the flags a certain way and no paths were found anymore. So my questions, before posting the code, are: What am I doing wrong that the Manhatten heuristic isnt working? What ways can I store the tile costs? I was hoping to (ab)use the enum flags for this The path finder isnt considering the chance that no path is available, how do i check this? Any code optimisations are welcome as I'd love to improve my coding. public static List<Tile> FindPath( Tile startTile, Tile endTile, Tile[,] map ) { return FindPath( startTile, endTile, map, TileFlags.WALKABLE ); } public static List<Tile> FindPath( Tile startTile, Tile endTile, Tile[,] map, TileFlags acceptedFlags ) { List<Tile> open = new List<Tile>(); List<Tile> closed = new List<Tile>(); open.Add( startTile ); Tile tileToCheck; do { tileToCheck = open[0]; closed.Add( tileToCheck ); open.Remove( tileToCheck ); for( int i = 0; i < tileToCheck.neighbors.Count; i++ ) { Tile tile = tileToCheck.neighbors[ i ]; //has the node been processed if( !closed.Contains( tile ) && ( tile.flags & acceptedFlags ) != 0 ) { //Not in the open list? if( !open.Contains( tile ) ) { //Set G int G = 10; G += tileToCheck.G; //Set Parent tile.parentX = tileToCheck.x; tile.parentY = tileToCheck.y; tile.G = G; //tile.H = Math.Abs(endTile.x - tile.x ) + Math.Abs( endTile.y - tile.y ) * 10; //TODO omg wtf and other incredible stories tile.H = Vector2.Distance( new Vector2( tile.x, tile.y ), new Vector2(endTile.x, endTile.y) ); tile.Cost = tile.G + tile.H + (int)tile.flags; //Calculate H; Manhattan style open.Add( tile ); } //Update the cost if it is else { int G = 10;//cost of going to non-diagonal tiles G += map[ tile.parentX, tile.parentY ].G; //If this path is shorter (G cost is lower) then change //the parent cell, G cost and F cost. if ( G < tile.G ) //if G cost is less, { tile.parentX = tileToCheck.x; //change the square's parent tile.parentY = tileToCheck.y; tile.G = G;//change the G cost tile.Cost = tile.G + tile.H + (int)tile.flags; // add terrain cost } } } } //Sort costs open = open.OrderBy( o => o.Cost).ToList(); } while( tileToCheck != endTile ); closed.Reverse(); List<Tile> validRoute = new List<Tile>(); Tile currentTile = closed[ 0 ]; validRoute.Add( currentTile ); do { //Look up the parent of the current cell. currentTile = map[ currentTile.parentX, currentTile.parentY ]; currentTile.renderer.material.color = Color.green; //Add tile to list validRoute.Add( currentTile ); } while ( currentTile != startTile ); validRoute.Reverse(); return validRoute; } And my Tile class: [Flags] public enum TileFlags: int { NONE = 0, DIRT = 1, STONE = 2, WATER = 4, BUILDING = 8, //handy WALKABLE = DIRT | STONE | NONE, endofenum } public class Tile : MonoBehaviour { //Tile Properties public int x, y; public TileFlags flags = TileFlags.DIRT; public Transform cachedTransform; //A* properties public int parentX, parentY; public int G; public float Cost; public float H; public List<Tile> neighbors = new List<Tile>(); void Awake() { cachedTransform = transform; } }

Read the article

Slick2D Rendering Lots of Polygons

- by Hazzard

I'm writing an little isometric game using Slick. The world terrain is made up of lots of quadrilaterals. In a small world that is 128 by 128 squares, over 16,000 quadrilaterals need to be rendered. This puts my pretty powerful computer down to 30 fps. I've though about caching "chunks" of the world so only single chunks would ever need updating at a time, but I don't know how to do this, and I am sure there are other ways to optimize it besides that. Maybe I'm doing the whole thing wrong, surely fancy 3D games that run fine on my machine are more intensive than this. My question is how can I improve the FPS and am I doing something wrong? Or does it actually take that much power to render those polygons? -- Here is the source code for the render method in my game state. It iterates through a 2d array or heights and draws polygons based on the height. public void render(GameContainer container, StateBasedGame game, Graphics gfx) throws SlickException { gfx.translate(offsetX * d + container.getWidth() / 2, offsetY * d + container.getHeight() / 2); gfx.scale(d, d); for (int y = 0; y < placeholder.length; y++) {// x & y are isometric // diag for (int x = 0; x < placeholder[0].length; x++) { Polygon poly; int hor = TestState.TILE_WIDTH * (x - y);// hor and ver are orthagonal int W = TestState.TILE_HEIGHT * (x + y) - 1 * heights[y + 1][x];//points to go off of int S = TestState.TILE_HEIGHT * (x + y) - 1 * heights[y + 1][x + 1]; int E = TestState.TILE_HEIGHT * (x + y) - 1 * heights[y][x + 1]; int N = TestState.TILE_HEIGHT * (x + y) - 1 * heights[y][x]; if (placeholder[y][x] == null) { poly = new Polygon();//Create actual surface polygon poly.addPoint(-TestState.TILE_WIDTH + hor, W); poly.addPoint(hor, S + TestState.TILE_HEIGHT); poly.addPoint(TestState.TILE_WIDTH + hor, E); poly.addPoint(hor, N - TestState.TILE_HEIGHT); float z = ((float) heights[y][x + 1] - heights[y + 1][x]) / 32 + 0.5f; placeholder[y][x] = new Tile(poly, new Color(z, z, z)); //ShapeRenderer.fill(placeholder[y][x]); } if (true) {//ONLY draw tile if it's on screen gfx.setColor(placeholder[y][x].getColor()); ShapeRenderer.fill(placeholder[y][x]); //gfx.fill(placeholder[y][x]); //placeholder[y][x]. //DRAW EDGES if (y + 1 == placeholder.length) {//draw South foundation edges gfx.setColor(Color.gray); Polygon found = new Polygon(); found.addPoint(-TestState.TILE_WIDTH + hor, W); found.addPoint(hor, S + TestState.TILE_HEIGHT); found.addPoint(hor, TestState.TILE_HEIGHT * (x + y + 1)); found.addPoint(-TestState.TILE_WIDTH + hor, TestState.TILE_HEIGHT * (x + y)); gfx.fill(found); } if (x + 1 == placeholder[0].length) {//north gfx.setColor(Color.darkGray); Polygon found = new Polygon(); found.addPoint(TestState.TILE_WIDTH + hor, E); found.addPoint(hor, S + TestState.TILE_HEIGHT); found.addPoint(hor, TestState.TILE_HEIGHT * (x + y + 1)); found.addPoint(TestState.TILE_WIDTH + hor, TestState.TILE_HEIGHT * (x + y)); gfx.fill(found); }//*/ } } } }

Read the article

Is micro-optimisation important when coding?

- by BozKay

I recently asked a question on stackoverflow.com to find out why isset() was faster than strlen() in php. This raised questions around the importance of readable code and whether performance improvements of micro-seconds in code were worth even considering. My father is a retired programmer, I showed him the responses and he was absolutely certain that if a coder does not consider performance in their code even at the micro level, they are not good programmers. I'm not so sure - perhaps the increase in computing power means we no longer have to consider these kind of micro-performance improvements? Perhaps this kind of considering is up to the people who write the actual language code? (of php in the above case). The environmental factors could be important - the internet consumes 10% of the worlds energy, I wonder how wasteful a few micro-seconds of code is when replicated trillions of times on millions of websites? I'd like to know answers preferably based on facts about programming. Is micro-optimisation important when coding? EDIT : My personal summary of 25 answers, thanks to all. Sometimes we need to really worry about micro-optimisations, but only in very rare circumstances. Reliability and readability are far more important in the majority of cases. However, considering micro-optimisation from time to time doesn't hurt. A basic understanding can help us not to make obvious bad choices when coding such as if (expensiveFunction() && counter < X) Should be if (counter < X && expensiveFunction()) (example from @zidarsk8) This could be an inexpensive function and therefore changing the code would be micro-optimisation. But, with a basic understanding, you would not have to because you would write it correctly in the first place.

Read the article

Include all php files in one file and include that file in every page if we're using hiphop?

- by Hasan Khan

I understand that in normal php if we're including a file we're merging the source of it in the script and it would take longer for that page to be parsed/processed but if we're using HipHop shouldn't it be ok to just create one single php file and include every file in it (that contains some class) and every page which needs those classes (in separate file each) can just include one single php file? Would this be ok in presence of HipHop?

Read the article

Displaying possible movement tiles

- by Ash Blue

What's the fastest way to highlight all possible movement tiles for a player on a square grid? Players can only move up, down, left, right. Tiles can cost more than one movement, multiple levels are available to move, and players can be larger than one tile. Think of games like Fire Emblem, Front Mission, and XCOM. My first thought was to recursively search for connecting tiles. This quickly demonstrated many shortcomings when blockers, movement costs, and other features were added into the mix. My second thought was to use an A* pathfinding algorithm to check all tiles presumed valid. Presumed valid tiles would come from an algorithm that generates a diamond of tiles from the player's speed (see example here http://jsfiddle.net/truefreestyle/Suww8/9/). Problem is this seems a little slow and expensive. Is there a faster way? Edit: In Lua for Corona SDK, I integrated the following movement generation controller. I've linked to a Gist here because the solution is around 90 lines of code. https://gist.github.com/ashblue/5546009

Read the article

Mobile cross-platform SDK for computationally intensive apps

- by K.Steff

I am aware of the PhoneGap toolkit for creating mobile applications for virtually all mobile platforms with a significant market share. However, the code in PhoneGap that is shared between the different platforms is written in JavaScript. While I like JS, I think it's hardly appropriate for computationally intensive tasks. The situation with Titanium is pretty much the same. So, is there any way that I can create a cross-platform mobile app that has the computationally intensive code shared between the platforms? Some context: Obviously, I don't want to implement the time consuming algorithm in many different languages, since this violates DRY, increases the chance for bugs slipping in at least one version and require boilerplate code to work. I've looked at Xamarin's MonoTouch and Mono for Android tools, but while they cover iOS and Android, they're not nearly as versatile for deployment as PhoneGap. On the other hand, (IMO) the statically typed nature of C# is more suited for intense computation than JS. Are there any other SDK/tools appropriate for the task that I don't know about or a point about the mentioned above that I've missed? Also, uploading data to a web service for processing is not an option, because of the traffic required.

Read the article

Is the STL efficient enough for mobile devices?

- by mx2

When it comes to mobile game development on iOS and Android NDK, some developers write their own C++ containers, while others claim that STL is more than adequate for mobile game development (For example, the author of iPhone 3D Programming uses STL rather than Objective-C in his examples. His defense is that STL is no slower than Objective-C). Then there are also mobile developers who abandon C++ entirely and develop games entirely (or mostly) in the C language (C89/C90). What are the benefits and drawbacks of each approach?

Read the article

Redirect AFTER Initiating download

- by mashup

I have a question - is there any way to initiate a download and AFTER the user has confirmed the download then redirect to another site? Is something like that possible via ASP or another language commonly used for websites? Bad PHP "user experience" scenario (In use right now) a) User comes to site, clicks download button b) Users sees "download" landing page, gets redirected after 5 seconds c) Download starts on Thankyoupage Good "user experience" scenario: (my dream solution, what I want) a) User comes to site, clicks download button b) Download starts immediately on landing page c) Download confirmed, redirects now to thank you page Any programming language is a go for this.

Read the article

Memory read/write access efficiency

- by wolfPack88

I've heard conflicting information from different sources, and I'm not really sure which one to believe. As such, I'll post what I understand and ask for corrections. Let's say I want to use a 2D matrix. There are three ways that I can do this (at least that I know of). 1: int i; char **matrix; matrix = malloc(50 * sizeof(char *)); for(i = 0; i < 50; i++) matrix[i] = malloc(50); 2: int i; int rowSize = 50; int pointerSize = 50 * sizeof(char *); int dataSize = 50 * 50; char **matrix; matrix = malloc(dataSize + pointerSize); char *pData = matrix + pointerSize - rowSize; for(i = 0; i < 50; i++) { pData += rowSize; matrix[i] = pData; } 3: //instead of accessing matrix[i][j] here, we would access matrix[i * 50 + j] char *matrix = malloc(50 * 50); In terms of memory usage, my understanding is that 3 is the most efficient, 2 is next, and 1 is least efficient, for the reasons below: 3: There is only one pointer and one allocation, and therefore, minimal overhead. 2: Once again, there is only one allocation, but there are now 51 pointers. This means there is 50 * sizeof(char *) more overhead. 1: There are 51 allocations and 51 pointers, causing the most overhead of all options. In terms of performance, once again my understanding is that 3 is the most efficient, 2 is next, and 1 is least efficient. Reasons being: 3: Only one memory access is needed. We will have to do a multiplication and an addition as opposed to two additions (as in the case of a pointer to a pointer), but memory access is slow enough that this doesn't matter. 2: We need two memory accesses; once to get a char *, and then to the appropriate char. Only two additions are performed here (once to get to the correct char * pointer from the original memory location, and once to get to the correct char variable from wherever the char * points to), so multiplication (which is slower than addition) is not required. However, on modern CPUs, multiplication is faster than memory access, so this point is moot. 1: Same issues as 2, but now the memory isn't contiguous. This causes cache misses and extra page table lookups, making it the least efficient of the lot. First and foremost: Is this correct? Second: Is there an option 4 that I am missing that would be even more efficient?

Read the article

What calls trigger a new batch?

- by sebf

I am finding my project is starting to show performance degradation and I need to optimize it. The answer to my previous question and this presentation from NVidia have helped greatly in understanding the performance characteristics of code using the GPU but there are a couple of things that aren't clear that I need to know to optimize my drawing. Specifically, what calls make the distinction between batches. I know that any state changes cause a new batch, so that includes: Render State Changes Buffer Changes Shader Changes Render Target Changes Correct? What else counts as a 'state change'? Does each Draw**Primitive() call constitute a new batch? Even if I were to issue the same call twice, with no state changes, or call it once on on part of the buffer, then again on another? If I were to update a buffer, but not change the bindings, would that be a new batch? That presentation and a DX9 page suggest using all of the texture slots available, which I take to mean loading multiple objects in 'parallel' by mapping their buffers/shaders/textures to slots 1-16. But I am not sure how this works - surely to do this you would need to change the buffer binding and that would count as a state change? (or is it a case of you do but it saves 16 calls so its OK?)

Read the article

Make pygame's frame rate faster

- by Smashery

By profiling my game, I see that the vast majority of the execution time of my hobby game is between the blit and the flip calls. Currently, it's only running at around 13fps. My video card is fairly decent, so my guess is that pygame is not using it. Does anyone know of any graphics/display options I need to set in pygame to make this faster? Or is this just something that I have to live with since I've chosen pygame?

Read the article

Investigate disk writes further to find out which process writes to my SSD

- by zuba

I try to minimize disk writes to my new SSD system drive. I'm stuck with iostat output: ~ > iostat -d 10 /dev/sdb Linux 2.6.32-44-generic (Pluto) 13.11.2012 _i686_ (2 CPU) Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdb 8,60 212,67 119,45 21010156 11800488 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdb 3,00 0,00 40,00 0 400 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdb 1,70 0,00 18,40 0 184 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdb 1,20 0,00 28,80 0 288 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdb 2,20 0,00 32,80 0 328 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdb 1,20 0,00 23,20 0 232 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdb 3,40 19,20 42,40 192 424 As I see there are writes to sdb. How can I resolve which process writes? I know about iotop, but it doesn't show which filesystem is being accessed.

Read the article

Object pools for efficient resource management

- by GameDevEnthusiast

How can I avoid using default new() to create each object? My previous demo had very unpleasant framerate hiccups during dynamic memory allocations (usually, when arrays are resized), and creating lots of small objects which often contain one pointer to some DirectX resource seems like an awful lot of waste. I'm thinking about: Creating a master look-up table to refer to objects by handles (for safety & ease of serialization), much like EntityList in source engine Creating a templated object pool, which will store items contiguously (more cache-friendly, fast iteration, etc.) and the stored elements will be accessed (by external systems) via the global lookup table. The object pool will use the swap-with-last trick for fast removal (it will invoke the object's ~destructor first) and will update the corresponding indices in the global table accordingly (when growing/shrinking/moving elements). The elements will be copied via plain memcpy(). Is it a good idea? Will it be safe to store objects of non-POD types (e.g. pointers, vtable) in such containers? Related post: Dynamic Memory Allocation and Memory Management

Read the article

Which opcodes are faster at the CPU level?

- by Geotarget

In every programming language there are sets of opcodes that are recommended over others. I've tried to list them here, in order of speed. Bitwise Integer Addition / Subtraction Integer Multiplication / Division Comparison Control flow Float Addition / Subtraction Float Multiplication / Division Where you need high-performance code, C++ can be hand optimized in assembly, to use SIMD instructions or more efficient control flow, data types, etc. So I'm trying to understand if the data type (int32 / float32 / float64) or the operation used (*, +, &) affects performance at the CPU level. Is a single multiply slower on the CPU than an addition? In MCU theory you learn that speed of opcodes is determined by the number of CPU cycles it takes to execute. So does it mean that multiply takes 4 cycles and add takes 2? Exactly what are the speed characteristics of the basic math and control flow opcodes? If two opcodes take the same number of cycles to execute, then both can be used interchangeably without any performance gain / loss? Any other technical details you can share regarding x86 CPU performance is appreciated

Read the article

Game has noticeable frame drops but when through a profiler it always runs smooth

- by felipedrl

I'm trying to optimize my PC game but I can find the bottleneck since every time I run it through a profiler (gDEBugger) it runs smooths. When running outside gDEBugger I get these annoying hiccups. It's not just the graphics, the sound also gets choppy. The drops are inconsistent across runs, i.e, sometimes I run the same scenario and get no drops at all, sometimes I get a few drops, and others the game is consistently slow. The only constant is: when running through gDEBugger I ALWAYS get a smooth run. I'm suspecting something outside my game is interfering and causing these drops, but what in the hell does gDEBugger do that nullifies these drops? A higher process priority? Any ideas? Thanks in advance.

Search Results

Search found 3512 results on 141 pages for 'premature optimization'.

Page 37/141 | < Previous Page | 33 34 35 36 37 38 39 40 41 42 43 44 | Next Page >

- by Denis Ermolin

- by Gustavo Maciel

- by Keefer

- by Michael Dibbets

- by Vee

- by JonathonG

- by Marian Ivanov

- by kochol

- by liori

- by Martin Elvar Jensen

- by Kevin Toet

- by Hazzard

- by BozKay

- by Hasan Khan

- by Ash Blue

- by K.Steff

- by mx2

- by mashup

- by wolfPack88

- by sebf

- by Smashery

- by zuba

- by GameDevEnthusiast

- by Geotarget

- by felipedrl

< Previous Page | 33 34 35 36 37 38 39 40 41 42 43 44 | Next Page >