vector art - Page 70 - Developer IT

determine collision angle on a rotating body

- by jorb

update: new diagram and updated description I have a contact listener set up to try and determine the side that a collision happened at relative to the a bodies rotation. One way to solve this is to find the value of the yellow angle between the red and blue vectors drawn above. The angle can be found by taking the arc cosine of the dot product of the two vectors (Evan pointed this out). One of my points of confusion is the difference in domain of the atan2 function html canvas coordinates and the Box2d rotation information. I know I have to account for this somehow... SS below questions: Does Box2D provide these angles more directly in the collision information? Am I even on the right track? If so, any hints? I have the following javascript so far: Ship.prototype.onCollide = function (other_ent,cx,cy) { var pos = this.body.GetPosition(); //collision position relative to body var d_cx = pos.x - cx; var d_cy = pos.y - cy; //length of initial vector var len = Math.sqrt(Math.pow(pos.x -cx,2) + Math.pow(pos.y-cy,2)); //body angle - can over rotate hence mod 2*Pi var ang = this.body.GetAngle() % (Math.PI * 2); //vector representing body's angle - same magnitude as the first var b_vx = len * Math.cos(ang); var b_vy = len * Math.sin(ang); //dot product of the two vectors var dot_prod = d_cx * b_vx + d_cy * b_vy; //new calculation of difference in angle - NOT WORKING! var d_ang = Math.acos(dot_prod); var side; if (Math.abs(d_ang) < Math.PI/2 ) side = "front"; else side = "back"; console.log("length",len); console.log("pos:",pos.x,pos.y); console.log("offs:",d_cx,d_cy); console.log("body vec",b_vx,b_vy); console.log("body angle:",ang); console.log("dot product",dot_prod); console.log("result:",d_ang); console.log("side",side); console.log("------------------------"); }

Read the article

How does gluLookAt work?

- by Chan

From my understanding, gluLookAt( eye_x, eye_y, eye_z, center_x, center_y, center_z, up_x, up_y, up_z ); is equivalent to: glRotatef(B, 0.0, 0.0, 1.0); glRotatef(A, wx, wy, wz); glTranslatef(-eye_x, -eye_y, -eye_z); But when I print out the ModelView matrix, the call to glTranslatef() doesn't seem to work properly. Here is the code snippet: #include <stdlib.h> #include <stdio.h> #include <GL/glut.h> #include <iomanip> #include <iostream> #include <string> using namespace std; static const int Rx = 0; static const int Ry = 1; static const int Rz = 2; static const int Ux = 4; static const int Uy = 5; static const int Uz = 6; static const int Ax = 8; static const int Ay = 9; static const int Az = 10; static const int Tx = 12; static const int Ty = 13; static const int Tz = 14; void init() { glClearColor(0.0, 0.0, 0.0, 0.0); glEnable(GL_DEPTH_TEST); glShadeModel(GL_SMOOTH); glEnable(GL_LIGHTING); glEnable(GL_LIGHT0); GLfloat lmodel_ambient[] = { 0.8, 0.0, 0.0, 0.0 }; glLightModelfv(GL_LIGHT_MODEL_AMBIENT, lmodel_ambient); } void displayModelviewMatrix(float MV[16]) { int SPACING = 12; cout << left; cout << "\tMODELVIEW MATRIX\n"; cout << "--------------------------------------------------" << endl; cout << setw(SPACING) << "R" << setw(SPACING) << "U" << setw(SPACING) << "A" << setw(SPACING) << "T" << endl; cout << "--------------------------------------------------" << endl; cout << setw(SPACING) << MV[Rx] << setw(SPACING) << MV[Ux] << setw(SPACING) << MV[Ax] << setw(SPACING) << MV[Tx] << endl; cout << setw(SPACING) << MV[Ry] << setw(SPACING) << MV[Uy] << setw(SPACING) << MV[Ay] << setw(SPACING) << MV[Ty] << endl; cout << setw(SPACING) << MV[Rz] << setw(SPACING) << MV[Uz] << setw(SPACING) << MV[Az] << setw(SPACING) << MV[Tz] << endl; cout << setw(SPACING) << MV[3] << setw(SPACING) << MV[7] << setw(SPACING) << MV[11] << setw(SPACING) << MV[15] << endl; cout << "--------------------------------------------------" << endl; cout << endl; } void reshape(int w, int h) { float ratio = static_cast<float>(w)/h; glViewport(0, 0, w, h); glMatrixMode(GL_PROJECTION); glLoadIdentity(); gluPerspective(45.0, ratio, 1.0, 425.0); } void draw() { float m[16]; glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); glMatrixMode(GL_MODELVIEW); glLoadIdentity(); glGetFloatv(GL_MODELVIEW_MATRIX, m); gluLookAt( 300.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f ); glColor3f(1.0, 0.0, 0.0); glutSolidCube(100.0); glGetFloatv(GL_MODELVIEW_MATRIX, m); displayModelviewMatrix(m); glutSwapBuffers(); } int main(int argc, char** argv) { glutInit(&argc, argv); glutInitDisplayMode(GLUT_DOUBLE | GLUT_RGB | GLUT_DEPTH); glutInitWindowSize(400, 400); glutInitWindowPosition(100, 100); glutCreateWindow("Demo"); glutReshapeFunc(reshape); glutDisplayFunc(draw); init(); glutMainLoop(); return 0; } No matter what value I use for the eye vector: 300, 0, 0 or 0, 300, 0 or 0, 0, 300 the translation vector is the same, which doesn't make any sense because the order of code is in backward order so glTranslatef should run first, then the 2 rotations. Plus, the rotation matrix, is completely independent of the translation column (in the ModelView matrix), then what would cause this weird behavior? Here is the output with the eye vector is (0.0f, 300.0f, 0.0f) MODELVIEW MATRIX -------------------------------------------------- R U A T -------------------------------------------------- 0 0 0 0 0 0 0 0 0 1 0 -300 0 0 0 1 -------------------------------------------------- I would expect the T column to be (0, -300, 0)! So could anyone help me explain this? The implementation of gluLookAt from http://www.mesa3d.org void GLAPIENTRY gluLookAt(GLdouble eyex, GLdouble eyey, GLdouble eyez, GLdouble centerx, GLdouble centery, GLdouble centerz, GLdouble upx, GLdouble upy, GLdouble upz) { float forward[3], side[3], up[3]; GLfloat m[4][4]; forward[0] = centerx - eyex; forward[1] = centery - eyey; forward[2] = centerz - eyez; up[0] = upx; up[1] = upy; up[2] = upz; normalize(forward); /* Side = forward x up */ cross(forward, up, side); normalize(side); /* Recompute up as: up = side x forward */ cross(side, forward, up); __gluMakeIdentityf(&m[0][0]); m[0][0] = side[0]; m[1][0] = side[1]; m[2][0] = side[2]; m[0][1] = up[0]; m[1][1] = up[1]; m[2][1] = up[2]; m[0][2] = -forward[0]; m[1][2] = -forward[1]; m[2][2] = -forward[2]; glMultMatrixf(&m[0][0]); glTranslated(-eyex, -eyey, -eyez); }

Read the article

SIMD Extensions for the Database Storage Engine

- by jchang

For the last 15 years, Intel and AMD have been progressively adding special purpose extensions to their processor architectures. The extensions mostly pertain to vector operations with Single Instruction, Multiple Data (SIMD) concept. The reasoning was that achieving significant performance improvement over each successive generation for the general purpose elements had become extraordinarily difficult. On the other hand, SIMD performance could be significantly improved with special purpose registers...(read more)

Read the article

How to implement explosion in OpenGL with a particle effect?

- by Chan

I'm relatively new to OpenGL and I'm clueless how to implement explosion. So could anyone give me some ideas how to start? Suppose the explosion occurs at location $(x, y, z)$, then I'm thinking of randomly generate a collection of vectors with $(x, y, z)$ as origin, then draw some particle (glutSolidCube) which move along this vector for some period of time, says after 1000 updates, it disappear. Is this approach feasible? A minimal example would be greatly appreciated.

Read the article

Drawing a texture at the end of a trace (crosshair?) UDK

- by Dave Voyles

I'm trying to draw a crosshair at the end of my trace. If my crosshair does not hit a pawn or static mesh (ex, just a skybox) then the crosshair stays locked on a certain point at that actor - I want to say its origin. Ex: Run across a pawn, then it turns yellow and stays on that pawn. If it runs across the skybox, then it stays at one point on the box. Weird? How can I get my crosshair to stay consistent? I've included two images for reference, to help illustrate. Note: The wrench is actually my crosshair. The "X" is just a debug crosshair. Ignore that. /// Image 1 /// /// Image 2 /// /*************************************************************************** * Draws the crosshair ***************************************************************************/ function bool CheckCrosshairOnFriendly() { local float CrosshairSize; local vector HitLocation, HitNormal, StartTrace, EndTrace, ScreenPos; local actor HitActor; local MyWeapon W; local Pawn MyPawnOwner; /** Sets the PawnOwner */ MyPawnOwner = Pawn(PlayerOwner.ViewTarget); /** Sets the Weapon */ W = MyWeapon(MyPawnOwner.Weapon); /** If we don't have an owner, then get out of the function */ if ( MyPawnOwner == None ) { return false; } /** If we have a weapon... */ if ( W != None) { /** Values for the trace */ StartTrace = W.InstantFireStartTrace(); EndTrace = StartTrace + W.MaxRange() * vector(PlayerOwner.Rotation); HitActor = MyPawnOwner.Trace(HitLocation, HitNormal, EndTrace, StartTrace, true, vect(0,0,0),, TRACEFLAG_Bullet); DrawDebugLine(StartTrace, EndTrace, 100,100,100,); /** Projection for the crosshair to convert 3d coords into 2d */ ScreenPos = Canvas.Project(HitLocation); /** If we haven't hit any actors... */ if ( Pawn(HitActor) == None ) { HitActor = (HitActor == None) ? None : Pawn(HitActor.Base); } } /** If our trace hits a pawn... */ if ((Pawn(HitActor) == None)) { /** Draws the crosshair for no one - Grey*/ CrosshairSize = 28 * (Canvas.ClipY / 768) * (Canvas.ClipX /1024); Canvas.SetDrawColor(100,100,128,255); Canvas.SetPos(ScreenPos.X - (CrosshairSize * 0.5f), ScreenPos.Y -(CrosshairSize * 0.5f)); Canvas.DrawTile(class'UTHUD'.default.AltHudTexture, CrosshairSize, CrosshairSize, 600, 262, 28, 27); return false; } /** Draws the crosshair for friendlies - Yellow */ CrosshairSize = 28 * (Canvas.ClipY / 768) * (Canvas.ClipX /1024); Canvas.SetDrawColor(255,255,128,255); Canvas.SetPos(ScreenPos.X - (CrosshairSize * 0.5f), ScreenPos.Y -(CrosshairSize * 0.5f)); Canvas.DrawTile(class'UTHUD'.default.AltHudTexture, CrosshairSize, CrosshairSize, 600, 262, 28, 27); return true; }

Read the article

Pathfinding for fleeing

- by Philipp

As you know there are plenty of solutions when you wand to find the best path in a 2-dimensional environment which leads from point A to point B. But how do I calculate a path when an object is at point A, and wants to get away from point B, as fast and far as possible? A bit of background information: My game uses a 2d environment which isn't tile-based but has floating point accuracy. The movement is vector-based. The pathfinding is done by partitioning the game world into rectangles which are walkable or non-walkable and building a graph out of their corners. I already have pathfinding between points working by using Dijkstras algorithm. The use-case for the fleeing algorithm is that in certain situations, actors in my game should perceive another actor as a danger and flee from it. The trivial solution would be to just move the actor in a vector in the direction which is opposite from the threat until a "safe" distance was reached or the actor reaches a wall where it then covers in fear. The problem with this approach is that actors will be blocked by small obstacles they could easily get around. As long as moving along the wall wouldn't bring them closer to the threat they could do that, but it would look smarter when they would avoid obstacles in the first place: Another problem I see is with dead ends in the map geometry. In some situations a being must choose between a path which gets it faster away now but ends in a dead end where it would be trapped, or another path which would mean that it wouldn't get that far away from the danger at first (or even a bit closer) but on the other hand would have a much greater long-term reward in that it would eventually get them much further away. So the short-term reward of getting away fast must be somehow valued against the long-term reward of getting away far. There is also another rating problem for situations where an actor should accept to move closer to a minor threat to get away from a much larger threat. But completely ignoring all minor threats would be foolish, too (that's why the actor in this graphic goes out of its way to avoid the minor threat in the upper right area): Are there any standard solutions for this problem?

Read the article

Best C++ containers for UI in Games.

- by Vijayendra

I am writing some UI stuff for my games in C++. Basically its a very common problem, but I dont know the best answer yet. Suppose inside my UI Library I have a view class which renders 2D/3D scene. This view can contain many subviews. I needs a container which allows me to iterate over these views fast and also insert/delete subviews. I am not sure which container is best for the job - list, vector or something else?

Read the article

Build a view frustum from angles

- by MulletDevil

I have 4 angles, left, right, top & bottom. These angles are in degrees. They define the angle between the forward vector and the corresponding side. I am trying to use these to calculate the required values for Perseective Off Centre function found here http://docs.unity3d.com/Documentation/ScriptReference/Camera-projectionMatrix.html I tried doing (near plane-far plane) * Tan(angle) But that didn't give the correct results.

Read the article

Create a kind of Interface c++ [migrated]

- by Liuka

I'm writing a little 2d rendering framework with managers for input and resources like textures and meshes (for 2d geometry models, like quads) and they are all contained in a class "engine" that interacts with them and with a directX class. So each class have some public methods like init or update. They are called by the engine class to render the resources, create them, but a lot of them should not be called by the user: //in pseudo c++ //the textures manager class class TManager { private: vector textures; .... public: init(); update(); renderTexture(); //called by the "engine class" loadtexture(); gettexture(); //called by the user } class Engine { private: Tmanager texManager; public: Init() { //initialize all the managers } Render(){...} Update(){...} Tmanager* GetTManager(){return &texManager;} //to get a pointer to the manager //if i want to create or get textures } In this way the user, calling Engine::GetTmanager will have access to all the public methods of Tmanager, including init update and rendertexture, that must be called only by Engine inside its init, render and update functions. So, is it a good idea to implement a user interface in the following way? //in pseudo c++ //the textures manager class class TManager { private: vector textures; .... public: init(); update(); renderTexture(); //called by the "engine class" friend class Tmanager_UserInterface; operator Tmanager_UserInterface*(){return reinterpret_cast<Tmanager_UserInterface*>(this)} } class Tmanager_UserInterface : private Tmanager { //delete constructor //in this class there will be only methods like: loadtexture(); gettexture(); } class Engine { private: Tmanager texManager; public: Init() Render() Update() Tmanager_UserInterface* GetTManager(){return texManager;} } //in main function //i need to load a texture //i always have access to Engine class engine-GetTmanger()-LoadTexture(...) //i can just access load and get texture; In this way i can implement several interface for each object, keeping visible only the functions i (and the user) will need. There are better ways to do the same?? Or is it just useless(i dont hide the "framework private functions" and the user will learn to dont call them)? Before i have used this method: class manager { public: //engine functions userfunction(); } class engine { private: manager m; public: init(){//call manager init function} manageruserfunciton() { //call manager::userfunction() } } in this way i have no access to the manager class but it's a bad way because if i add a new feature to the manager i need to add a new method in the engine class and it takes a lot of time. sorry for the bad english.

Read the article

SIMD Extensions for the Database Storage Engine

- by jchang

For the last 15 years, Intel and AMD have been progressively adding special purpose extensions to their processor architectures. The extensions mostly pertain to vector operations with Single Instruction, Multiple Data (SIMD) concept. The motivation was that achieving significant performance improvement over each successive generation for the general purpose elements had become extraordinarily difficult. On the other hand, SIMD performance could be significantly improved with special purpose registers...(read more)

Read the article

Pseudo-magnet implementation with chipmunk

- by Eimantas

How should I go about implementing "natural" magnet on a certain body in chipmunk space? Context is of simple bodies lying in the space (think chessboard). When one of the figures is activated as a magnet - others should start moving towards it. Currently I'm applying force (cpBodyApplyForce)to the other figures with vector calculated towards the activated figure. However this doesn't really feel "natural". Are there any known algorithms for imitating magnets?

Read the article

Getting Started with Inkscape

<b>MakeTechEasier:</b> "Inkscape is a powerful free vector drawing program for Windows, Linux, and Mac, and this guide will get you started with using it to create your own smooth, colorful, scalable graphics."

Read the article

Easy road from DisplayObject to Molehill?

- by Bart van Heukelom

I have a finished Flash game which is rendered using the built-in display tree, i.e. Bitmaps contained in Sprites (and a text here and there, few vector graphics, and one bitmap-filled shape). For extra performance, I'd like it to use Molehill for rendering, but that's not possible out of the box. What's the easiest way to make this game use Molehill when available, but fall back to the current method if it's not available?

Read the article

Problem creating levels using inherited classes/polymorphism

- by Adam

I'm trying to write my level classes by having a base class that each level class inherits from...The base class uses pure virtual functions. My base class is only going to be used as a vector that'll have the inherited level classes pushed onto it...This is what my code looks like at the moment, I've tried various things and get the same result (segmentation fault). //level.h class Level { protected: Mix_Music *music; SDL_Surface *background; SDL_Surface *background2; vector<Enemy> enemy; bool loaded; int time; public: Level(); virtual ~Level(); int bgX, bgY; int bg2X, bg2Y; int width, height; virtual void load(); virtual void unload(); virtual void update(); virtual void draw(); }; //level.cpp Level::Level() { bgX = 0; bgY = 0; bg2X = 0; bg2Y = 0; width = 2048; height = 480; loaded = false; time = 0; } Level::~Level() { } //virtual functions are empty... I'm not sure exactly what I'm supposed to include in the inherited class structure, but this is what I have at the moment... //level1.h class Level1: public Level { public: Level1(); ~Level1(); void load(); void unload(); void update(); void draw(); }; //level1.cpp Level1::Level1() { } Level1::~Level1() { enemy.clear(); Mix_FreeMusic(music); SDL_FreeSurface(background); SDL_FreeSurface(background2); music = NULL; background = NULL; background2 = NULL; Mix_CloseAudio(); } void Level1::load() { music = Mix_LoadMUS("music/song1.xm"); background = loadImage("image/background.png"); background2 = loadImage("image/background2.png"); Mix_OpenAudio(48000, MIX_DEFAULT_FORMAT, 2, 4096); Mix_PlayMusic(music, -1); } void Level1::unload() { } //functions have level-specific code in them... Right now for testing purposes, I just have the main loop call Level1 level1; and use the functions, but when I run the game I get a segmentation fault. This is the first time I've tried writing inherited classes, so I know I'm doing something wrong, but I can't seem to figure out what exactly.

Read the article

Adobe Air Mobile AS3 app: challenges and how to overcome them?

- by Arthur Wulf White

I made a PC flash game for LD 26 - minimalism and I am working on porting it to Android. Some questions I'd like to ask: Is it bad to heavily use vector graphics (ie. this.graphics.lineTo()) in Mobile Air? Does Stencyl completely alleviate this issue? Are there any inherit disadvantages to using Air Mobile that I'm missing? Where is the documentation for Air mobile (I googled and found no recent books or documentation pdf so far)

Read the article

How to calculate continuous motion with angular velocity in 2d

- by Rulk

I'm really new with physics. Maybe someone would be able to help me to solve the next problem: I need to calculate position of an agent on the plane(2D) in next time step where time step is large(20+ seconds) What I know about agent's motion: Initial Position Direction(normalised vector) Velocity(linear function from time ) - object always moves along it's direction Angular Velocity(linear function from time) Optional: External force direction External force (linear function from time) Running discreet simulation with t-0 is not an option.

Read the article

Are there specific benefits to using XNA for 2D development if you don't plan on releasing on xbox/windows phone?

- by ssb

I've been using XNA for a while to tinker with 2D game development, but I can't help but feel constrained by the content pipeline when targeting PC only. Things like no vector fonts or direct use of graphics files make it a pain while other frameworks do these things with no problem. I like XNA because it's robust and has a lot of support, but what are the specific benefits that I'd get developing exclusively for PC, if there are any at all?

Read the article

LWJGL: Camera distance from image plane?

- by Rogem

Let me paste some code before I ask the question... public static void createWindow(int[] args) { try { Display.setFullscreen(false); DisplayMode d[] = Display.getAvailableDisplayModes(); for (int i = 0; i < d.length; i++) { if (d[i].getWidth() == args[0] && d[i].getHeight() == args[1] && d[i].getBitsPerPixel() == 32) { displayMode = d[i]; break; } } Display.setDisplayMode(displayMode); Display.create(); } catch (Exception e) { e.printStackTrace(); System.exit(0); } } public static void initGL() { GL11.glEnable(GL11.GL_TEXTURE_2D); GL11.glShadeModel(GL11.GL_SMOOTH); GL11.glClearColor(0.0f, 0.0f, 0.0f, 0.0f); GL11.glClearDepth(1.0); GL11.glEnable(GL11.GL_DEPTH_TEST); GL11.glDepthFunc(GL11.GL_LEQUAL); GL11.glMatrixMode(GL11.GL_PROJECTION); GL11.glLoadIdentity(); GLU.gluPerspective(45.0f, (float) displayMode.getWidth() / (float) displayMode.getHeight(), 0.1f, 100.0f); GL11.glMatrixMode(GL11.GL_MODELVIEW); GL11.glHint(GL11.GL_PERSPECTIVE_CORRECTION_HINT, GL11.GL_NICEST); } So, with the camera and screen setup out of the way, I can now ask the actual question: How do I know what the camera distance is from the image plane? I also would like to know what the angle between the image plane's center normal and a line drawn from the middle of one of the edges to the camera position is. This will be used to consequently draw a vector from the camera's position through the player's click-coordinates to determine the world coordinates they clicked (or could've clicked). Also, when I set the camera coordinates, do I set the coordinates of the camera or do I set the coordinates of the image plane? Thank you for your help. EDIT: So, I managed to solve how to calculate the distance of the camera... Here's the relevant code... private static float getScreenFOV(int dim) { if (dim == 0) { float dist = (float) Math.tan((Math.PI / 2 - Math.toRadians(FOV_Y))/2) * 0.5f; float FOV_X = 2 * (float) Math.atan(getScreenRatio() * 0.5f / dist); return FOV_X; } else if (dim == 1) { return FOV_Y; } return 0; } FOV_Y is the Field of View that one defines in gluPerspective (float fovy in javadoc). This seems to be (and would logically be) for the height of the screen. Now I just need to figure out how to calculate that vector.

Read the article

Silverlight Cream for March 28, 2010 -- #823

- by Dave Campbell

In this Issue: Michael Washington, Andy Beaulieu, Bill Reiss, jocelyn, Shawn Wildermuth, Cameron Albert, Shawn Oster, Alex Yakhnin, ondrejsv, Giorgetti Alessandro, Jeff Handley, SilverLaw, deepm, and Kyle McClellan. Shoutouts: If I've listed this before, it's worth another... Introduction to Prototyping with SketchFlow (twelve video series) and on the same page is Creating a Beehive Game with Behaviors in Blend 3 (ten video series) Shawn Oster announced his Slides + Code + Video from ‘An Introduction to Developing Applications for Microsoft Silverlight’ from MIX10 Tim Heuer announced earlier this week: Silverlight Client for Facebook updated for Silverlight 4 RC Nikhil Kothari announced the availability of his MIX10 Talk - Slides and Code András Velvárt backed up his great MIX09 effort with MIX10.Zoomery.com... everything in one DZ effort... thanks András! Andy Beaulieu posted his material for his Code Camp 13 in Waltham: Windows Phone: Silverlight for Casual Games From SilverlightCream.com: Silverlight MVVM - The Revolution Has Begun Michael Washington did an awesome tutorial on MVVM and Silverlight creating a simple Silverlight File Manager. The post has a link to the tutorial at CodeProject... great tutorial. Windows Phone 7 + Silverlight Performance Andy Beaulieu has a post up we should all bookmark... getting a handle on the graphics performance of our app on WP7. Great examples, and external links. Space Rocks game step 6: Keyboard handling Bill Reiss has a post up about keyboard input for the WP7 game he's building ... this is Episode 6 ... you're working along with him, right? Panoramic Navigation on Windows Phone 7 with No Code! jocelyn at InnovativeSingapore (I found this by way of Shawn's post), has a Panoramic Navigation template out there for WP7 for all of us to grab... great post about it too. My First WP7 Application Shawn Wildermuth has been playing with WP7 development and has his XBOX Game library app up on the emulator... all with source of course Silverlight and Windows Phone 7 Game Cameron Albert built a web-based game called 'Shape Attack' and also did it for WP7 to compare the performance... check it out for yourself, but hey, it's game source for the phone... cool :) Changing the Onscreen Keyboard layout in Silverlight for Windows Phone using InputScope Shawn Oster has a cool post on changing the keyboard on WP7 to go along with what you're expecting the user to type... how cool is that?? Deep Zoom on WP7 Check out the quick work Alex Yakhnin made of putting DeepZoom on WP7... all source included. How to: Create a sketchy Siverlight GroupBox in Blend/SketchFlow ondrejsv has the xaml up to take Tim Greenfield's GroupBox control and insert it into SketchFlow. Silverlight / Castle Windsor – implementing a simple logging framework Giorgetti Alessandro posted about CastleWindsor for Silverlight, and a logging system inherited from LevelFilteredLogger in the absence of Log4Net. DomainDataSource in a ViewModel Jeff Handley responds to a common forum post about using DomainDataSource in a ViewModel. Read his comments on AutoLoad and ElementName Bindins. Digital Jugendstil TextEffect (Art Nouveau) - Silverlight 3 SilverLaw has a cool TagCloud demo and a UserControl he calls Art Nouveau up at the Expression Gallery... not for a business app, I don't think :) Configuring your DomainService for a Windows Phone 7 application deepm discusses RIA Services for WP7 and how to enable a WP7 app to communicate with a DomainService. Writing a Custom Filter or Parameter for DomainDataSource Kyle McClellan by way of Jeff Handley's blog, is discussing how to leverage the custom parameter types you defined in the previous version of RIA Services. Stay in the 'Light! Twitter SilverlightNews | Twitter WynApse | WynApse.com | Tagged Posts | SilverlightCream Join me @ SilverlightCream | Phoenix Silverlight User Group Technorati Tags: Silverlight Silverlight 3 Silverlight 4 Windows Phone MIX10

Read the article

How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article

Editor's Notebook - Social Aura: Insights from the Oracle Social Media Summit

- by user462779

Panelists talk social marketing at the Oracle Social Media Summit On November 14, I traveled to Las Vegas for the first-ever Oracle Social Media Summit. The two day event featured an impressive collection of social media luminaries including: David Kirkpatrick (founder and CEO of Techonomy Media and author of The Facebook Effect), John Yi (Head of Marketing Partnerships, Facebook), Matt Dickman (EVP of Social Business Innovation, Weber Shandwick), and Lyndsay Iorio (Social Media & Communications Manager, NBC Sports Group) among others. It was also a great opportunity to talk shop with some of our new Vitrue and Involver colleagues who have been returning great social media results even before their companies were acquired by Oracle. I was live tweeting the event from @OracleProfit which was great for those who wanted to follow along with the proceedings from the comfort of their office or blackjack table. But I've also found over the years that live tweeting an event is a handy way to take notes: I can sift back through my record of what people said or thoughts I had at the time and organize the Twitter messages into some kind of summary account of the proceedings. I've had nearly a month to reflect on the presentations and conversations at the event and a few key topics have emerged: David Kirkpatrick's comment during the opening presentation really set the stage for the conversations that followed. Especially if you are a marketer or publisher, the idea that you are in a one-way broadcast relationship with your audience is a thing of the past. "Rising above the noise" does not mean reaching for a megaphone, ALL CAPS, or exclamation marks. Hype will not motivate social media denizens to do anything but unfollow and tune you out. But knowing your audience, creating quality content and/or offers for them, treating them with respect, and making an authentic effort to please them: that's what I believe is now necessary. And Kirkpatrick's comment early in the day really made the point. Later in the day, our friends @Vitrue demonstrated this point by elaborating on a comment by Facebook's John Yi. If a social strategy is comprised of nothing more than cutting/pasting the same message into different social media properties, you're missing the opportunity to have an actual conversation. That's not shouting at your audience, but it does feel like an empty gesture. Walter Benjamin, perplexed by auraless Twitter messages Not to get too far afield, but 20th century cultural critic Walter Benjamin has a concept that is useful for understanding the dynamics of the empty social media gesture: Aura. In his work The Work of Art in the Age of Mechanical Reproduction, Benjamin struggled to understand the difference he percieved between the value of a hand-made art object (a painting, wood cutting, sculpture, etc.) and a photograph. For Benjamin, aura is similar to the "soul" of an artwork--the intangible essence that is created when an artist picks up a tool and puts creative energy and effort into a work. I'll defer to Wikipedia: "He argues that the "sphere of authenticity is outside the technical" so that the original artwork is independent of the copy, yet through the act of reproduction something is taken from the original by changing its context. He also introduces the idea of the "aura" of a work and its absence in a reproduction." So make sure you put aura into your social interactions. Don't just mechanically reproduce them. Keeping aura in your interactions requires the intervention of an actual human being. That's why @NoahHorton's comment about content curation struck me as incredibly important. Maybe it's just my own prejudice, being in the content curation business myself. And it's not to totally discount machine-aided content management systems, content recommendation engines, and other tech-driven tools for building an exceptional content experience. It's just that without that human interaction--that editor who reviews the analytics and responds to user feedback--interactions over social media feel a bit empty. It is SOCIAL media, right? (We'll leave the conversation about social machines for another day). At the end of the day, experimentation is key. Just like trying to find that right joke to tell at the beginning of your presentation or that good opening like at a cocktail party, social media messages and interactions can take some trial and error. Don't be afraid to try things, tinker with incomplete ideas, abandon things that don't work, and engage in the conversation. And make sure your heart is in it, otherwise your audience can tell. And finally:

Read the article

Developer Training – 6 Online Courses to Learn SQL Server, MySQL and Technology

- by Pinal Dave

Video courses are the next big thing and I am so happy that I have so far authored 6 different video courses with Pluralsight. Here is the list of the courses. I have listed all of my video courses over here. Note: If you click on the courses and it does not open, you need to login to Pluralsight with a valid username and password or sign up for a FREE trial. Please leave a comment with your favorite course in the comment section. Random 10 winners will get surprise gift via email. Bonus: If you list your favorite module from the course site. SQL Server Performance: Introduction to Query Tuning SQL Server performance tuning is an in-depth topic, and an art to master. A key component of overall application performance tuning is query tuning. Writing queries in an efficient manner, and making sure they execute in the most optimal way possible, is always a challenge. The basics revolve around the details of how SQL Server carries out query execution, so the optimizations explored in this course follow along the same lines. Click to View Course SQL Server Performance: Indexing Basics Indexes are the most crucial objects of the database. They are the first stop for any DBA and Developer when it is about performance tuning. There is a good side as well evil side of the indexes. To master the art of performance tuning one has to understand the fundamentals of the indexes and the best practices associated with the same. This course is for every DBA and Developer who deals with performance tuning and wants to use indexes to improve the performance of the server. Click to View Course SQL Server Questions and Answers This course is designed to help you better understand how to use SQL Server effectively. The course presents many of the common misconceptions about SQL Server, and then carefully debunks those misconceptions with clear explanations and short but compelling demos, showing you how SQL Server really works. This course is for anyone working with SQL Server databases who wants to improve her knowledge and understanding of this complex platform. Click to View Course MySQL Fundamentals MySQL is a popular choice of database for use in web applications, and is a central component of the widely used LAMP open source web application software stack. This course covers the fundamentals of MySQL, including how to install MySQL as well as written basic data retrieval and data modification queries. Click to View Course Building a Successful Blog Expressing yourself is the most common behavior of humans. Blogging has made easy to express yourself. Just like a letter or book has a structure and formula, blogging also has structure and formula. In this introductory course on blogging we will go over a few of the basics of blogging and show the way to get started with blogging immediately. If you already have a blog, this course will be even more relevant as this will discuss many of the common questions and issue you face in your blogging routine. Click to View Course Introduction to ColdFusion ColdFusion is rapid web application development platform. In this course you will learn the basics of how to use ColdFusion platform and rapidly develop web sites. The course begins with learning basics of ColdFusion Markup Language and moves to common development language practices. From there we move to frequent database operations and advanced concepts of Forms, Sessions and Cookies. The last module sums up all the concepts covered in the course with sample application. Click to View Course Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Training, T SQL, Technology

Read the article

Is a university education really worth it for a good programmer?

- by Jon Purdy

The title says it all, but here's the personal side of it: I've been doing design and programming for about as long as I can remember. If there's a programming problem, I can figure it out. (Though admittedly StackOverflow has allowed me to skip the figuring out and get straight to the doing in many instances.) I've made games, esoteric programming languages, and widgets and gizmos galore. I'm currently working on a general-purpose programming language. There's nothing I do better than programming. However, I'm just as passionate about design. Thus when I felt leaving high school that my design skills were lacking, I decided to attend university for New Media Design and Imaging, a digital design-related major. For a year, I diligently studied art and programmed in my free time. As the next year progressed, however, I was obligated to take fewer art and design classes and more technical classes. The trouble was of course that these classes were geared toward non-technical students, and were far beneath my skill level at the time. No amount of petitioning could overcome the institution's reluctance to allow me to test out of such classes, and the major offered no promise for any greater challenge in the future, so I took the extreme route: I switched into the technical equivalent of the major, New Media Interactive Development. A lot of my credits moved over into the new major, but many didn't. It would have been infeasible to switch to a more rigorous technical major such as Computer Science, and having tutored Computer Science students at every level here, I doubt I would be exposed to anything that I haven't already or won't eventually find out on my own, since I'm so involved in the field. I'm now on track to graduate perhaps a year later than I had planned, which puts a significant financial strain on my family and my future self. My schedule continues to be bogged down with classes that are wholly unnecessary for me to take. I'm being re-introduced to subjects that I've covered a thousand times over, simply because I've always been interested in it all. And though I succeed in avoiding the cynical and immature tactic of failing to complete work out of some undeserved sense of superiority, I'm becoming increasingly disillusioned by the lack of intellectual stimulation. Further, my school requires students to complete a number of quarters of co-op work experience proportional to their major. My original major required two quarters, but my current requires three, delaying my graduation even more. To top it all off, college is putting a severe strain on my relationship with my very close partner of a few years, so I've searched diligently for co-op jobs in my area, alas to no avail. I'm now in my third year, and approaching that point past which I can no longer handle this. Either I keep my head down, get a degree no matter what it takes, and try to get a job with a company that will pay me enough to do what I love that I can eventually pay off my loans; or I cut my losses now, move wherever there is work, and in six months start paying off what debt I've accumulated thus far. So the real question is: is a university education really more than just a formality? It's a big decision, and one I can't make lightly. I think this is the appropriate venue for this kind of question, and I hope it sticks around for the sake of others who might someday find themselves in similar situations. My heartfelt thanks for reading, and in advance for your help.

Read the article

Join us on our Journey to be #1 in SaaS!

- by jessica.ebbelaar(at)oracle.com

WHY ORACLE? Oracle is a robust organization that has proven to maintain growth and innovation at all levels with a constant evolving attitude. The main ingredient of Oracles success is the 105.000 talented employees who constantly amaze each other in building a better and more innovative organization. Oracle is a company where YOU can make a difference. What is OD? Oracle Direct is a state-of-the-art, multi-channel EMEA sales operation bringing to life the benefits of Oracle’s complete technology stack. It offers you the unique opportunity to work with the most talented and like-minded sales professionals in the industry. You will have access to world class training and structured career development programmes allowing you to accelerate your Solution Sales career across a multitude of product lines and a choice of attractive locations. What positions are OD Hiring? Oracle is on a journey to be the #1 SaaS vendor in EMEA. Due to recent expansion and acquisitions within our Cloud Business, we are now growing our EMEA Cloud Applications Sales Group in Dublin. We have many exciting NEW opportunities across our CRM and HCM SaaS Sales teams. As a SaaS Sales Account Manager, you will proactively manage an assigned territory / vertical with responsibility for the full sales cycle. This role requires strong business development, solution selling, account management and closing skills. WHY ORACLE? Oracle is a robust organization that has proven to maintain growth and innovation at all levels with a constant evolving attitude. The main ingredient of Oracles success is the 105.000 talented employees who constantly amaze each other in building a better and more innovative organization. Oracle is a company where YOU can make a difference. What is OD? Oracle Direct is a state-of-the-art, multi-channel EMEA sales operation bringing to life the benefits of Oracle’s complete technology stack. It offers you the unique opportunity to work with the most talented and like-minded sales professionals in the industry. You will have access to world class training and structured career development programmes allowing you to accelerate your Solution Sales career across a multitude of product lines and a choice of attractive locations. What positions are OD Hiring? Oracle is on a journey to be the #1 SaaS vendor in EMEA. Due to recent expansion and acquisitions within our Cloud Business, we are now growing our EMEA Cloud Applications Sales Group in Dublin. We have many exciting NEW opportunities across our CRM and HCM SaaS Sales teams. As a SaaS Sales Account Manager, you will proactively manage an assigned territory / vertical with responsibility for the full sales cycle. This role requires strong business development, solution selling, account management and closing skills. What is the Business Development Group (BDG) The Business Development Group is the key entry point in Oracle for the future Sales and Management talent of the organisation. We are the Demand Generation engine for Oracle in EMEA. We provide revenue generating, quality sales pipeline to our Inside and Field Sales professionals as well as to our Channel Partners. Our current focus is to provide an agile and flexible service offering to our customers and stakeholders to meet ever changing business needs, whilst constantly striving to improve the customer experience, quality of our pipeline, market coverage and penetration. As a SaaS Business Development Consultant (BDC) you will be the first touch point with new customers. Your goal is to proactively identify and qualify business opportunities leading to revenue for Oracle. You will work closely with your Inside Sales colleagues who will progress your qualified pipeline and opportunities. Work for us Work for the only multi-pillar SaaS vendor in the market Be part of a FUN, fast paced and truly International sales team Develop you solution sales EXPERTISE Drive your CAREER development within a structured and supportive environment The Profile You have a passion for selling cutting-edge technology You thrive in a fast paced and dynamic work environment where being the best is paramount Your priority is always the customer You live for a challenge and you love to win Join us on our Journey to be #1 in SaaS and be part of our Cloud Success Story! You will find more information about open roles here

Read the article

Best approach to depth streaming via existing codec

- by Kevin

I'm working on a development system (and game) intended for games set mostly in static third-person views. We produce our scenery by CG and photographic techniques. Our background art is rendered off-line by a production-grade renderer. To allow the runtime imagery to properly interact with the background art, I wrote a program to convert from depth output by Mental Ray into a texture, and a pixel shader to draw a quad such that the Z data comes from the texture. This technique is working out very well, but now we've decided that some of the camera angle changes between scenes should be animated. The animation itself is straightforward to produce from our CG models. We intend to encode it to some HD video codec such as H.264. The problem is that in order to maintain our runtime imagery on the screen, the depth buffer will need to be loaded for each video frame. Due to the bandwidth, the video's depth data will need to be compressed efficiently. I've looked into methods for performing temporal compression of depth info and found an interesting research paper here: http://web4.cs.ucl.ac.uk/staff/j.kautz/publications/depth-streaming.pdf The method establishes a mapping between 16-bit depth values and YCbCr values. The mapping is tuned to the properties of existing video codecs in order to maximize precision of the decoded depths after the YCbCr has undergone video compression. It allows an existing, unmodified video codec to be used on the backend. I'm looking at how to pull this off with the least possible work. (This design change was unplanned.) Our game engine itself is native C++, presently for Win32 and DirectX, although we've worked hard to keep platform dependence segregated because we intend other ports. We don't have motion video facilities in the engine yet but will ultimately need that anyway for cinematics. I was planning on using some off-the-shelf motion video solution we can plug into our engine, and haven't chosen one yet. This new added requirement makes selecting one harder since, among other things, we'll now need to bypass colourspace conversion on one of the streams, and also will need to be playing two streams simultaneously in lockstep, on top of in some cases audio on one of them (for the cinematics). I'm also wondering if it's possible (or even useful) to do the conversion from YCbCr to depth in a pixel shader, or if it's better to just do it in CPU and separately load the resulting depth values into a locked tex. The conversion unfortunately does involve branching logic per-pixel. (There are more naive mappings that don't need branching, but they produce inferior results.) It could be reduced to a table lookup but the table would be 32MB. Programming is second-nature to me but I'm not that experienced with pix shaders and have zero knowledge of off-the-shelf video solutions. I'd therefore be interested in advice from others who may have dealt more with depth streaming, pixel shaders, and/or off-the-shelf codecs, regarding how feasible the proposed application is and what off-the-shelf video systems out there would best get along with this usage case.

Search Results

Search found 3456 results on 139 pages for 'vector art'.

Page 70/139 | < Previous Page | 66 67 68 69 70 71 72 73 74 75 76 77 | Next Page >

- by jorb

- by Chan

- by jchang

- by Chan

- by Dave Voyles

- by Philipp

- by Vijayendra

- by MulletDevil

- by Liuka

- by jchang

- by Eimantas

- by Bart van Heukelom

- by Adam

- by Arthur Wulf White

- by Rulk

- by ssb

- by Rogem

- by Dave Campbell

- by rchrd

- by user462779

- by Pinal Dave

- by Jon Purdy

- by jessica.ebbelaar(at)oracle.com

- by Kevin

< Previous Page | 66 67 68 69 70 71 72 73 74 75 76 77 | Next Page >