Search Results

Search found 1886 results on 76 pages for 'matrix convolution'.

Page 11/76 | < Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18  | Next Page >

  • matrix to transform unit cube to space defined by 8 arbitrary points

    - by aadster
    I asked a question relating to similar to this already, but I think this is a clearer objective of what Im trying to achieve.. or whether its possible at all! Im trying to find a transformation (matrix ideally) which would transform the 8 points of a 3d unit cube to 8 arbitrary points in space. The 8 target points have no known structure. e.g: My gut feeling is that a matrix is unable to provide this xform since the cube faces vertices can be concave.. but are there any other methods of transformation? Thanks!

    Read the article

  • R: How can I reorder the rows of a matrix, data.frame or vector according to another one.

    - by John
    test1 <- as.matrix(c(1, 2, 3, 4, 5)) row.names(test1) <- c("a", "b", "c", "d", "e") test2 <- as.matrix(c(6, 7, 8, 9, 10)) row.names(test2) <- c("e", "d", "c", "b", "a") test1 [,1] a 1 d 2 c 3 b 4 e 5 test2 [,1] e 6 d 7 c 8 b 9 a 10 How can I reorder test2 so that the rows are in the same order as test1? e.g: test2 [,1] a 10 d 7 c 8 b 9 e 6 I tried to use the reorder function with: reorder (test1, test2) but I could not figure out the correct syntax. I see that reorder takes a vector, and I'm here using a matrix. My real data has one character vector and another as a data.frame. I figured that the data structure would not matter too much for this example above, I just need help with the syntax and can adapt it to my real problem.

    Read the article

  • python - from matrix to dictionary in single line

    - by Sanich
    matrix is a list of lists. I've to return a dictionary of the form {i:(l1[i],l2[i],...,lm[i])} Where the key i is matched with a tuple the i'th elements from each list. Say matrix=[[1,2,3,4],[9,8,7,6],[4,8,2,6]] so the line: >>> dict([(i,tuple(matrix[k][i] for k in xrange(len(matrix)))) for i in xrange(len(matrix[0]))]) does the job pretty well and outputs: {0: (1, 9, 4), 1: (2, 8, 8), 2: (3, 7, 2), 3: (4, 6, 6)} but fails if the matrix is empty: matrix=[]. The output should be: {} How can i deal with this?

    Read the article

  • How do I overload () operator with two parameters; like (3,5)?

    - by hkBattousai
    I have a mathematical matrix class. It contains a member function which is used to access any element of the class. template >class T> class Matrix { public: // ... void SetElement(T dbElement, uint64_t unRow, uint64_t unCol); // ... }; template <class T> void Matrix<T>::SetElement(T Element, uint64_t unRow, uint64_t unCol) { try { // "TheMatrix" is define as "std::vector<T> TheMatrix" TheMatrix.at(m_unColSize * unRow + unCol) = Element; } catch(std::out_of_range & e) { // Do error handling here } } I'm using this method in my code like this: // create a matrix with 2 rows and 3 columns whose elements are double Matrix<double> matrix(2, 3); // change the value of the element at 1st row and 2nd column to 6.78 matrix.SetElement(6.78, 1, 2); This works well, but I want to use operator overloading to simplify things, like below: Matrix<double> matrix(2, 3); matrix(1, 2) = 6.78; // HOW DO I DO THIS?

    Read the article

  • Create a basic matrix in C (input by user !)

    - by DM
    Hi there, Im trying to ask the user to enter the number of columns and rows they want in a matrix, and then enter the values in the matrix...Im going to let them insert numbers one row at a time. How can I create such function ? #include<stdio.h> main(){ int mat[10][10],i,j; for(i=0;i<2;i++) for(j=0;j<2;j++){ scanf("%d",&mat[i][j]); } for(i=0;i<2;i++) for(j=0;j<2;j++) printf("%d",mat[i][j]); } This works for inputting the numbers, but it displays them all in one line... The issue here is that I dont know how many columns or rows the user wants, so I cant print out %d %d %d in a matrix form .. Any thoughts ? Thanks :)

    Read the article

  • Python — How can I find the square matrix of a lower triangular numpy matrix? (with a symmetrical upper triangle)

    - by Dana Gray
    I generated a lower triangular matrix, and I want to complete the matrix using the values in the lower triangular matrix to form a square matrix, symmetrical around the diagonal zeros. lower_triangle = numpy.array([ [0,0,0,0], [1,0,0,0], [2,3,0,0], [4,5,6,0]]) I want to generate the following complete matrix, maintaining the zero diagonal: complete_matrix = numpy.array([ [0, 1, 2, 4], [1, 0, 3, 5], [2, 3, 0, 6], [4, 5, 6, 0]]) Thanks.

    Read the article

  • How do I compete the transformation matrix needed to transform a rectangle into a trapezium?

    - by Rich Bradshaw
    I'm playing around with css transforms and the equivalent filters in IE, and want to simulate perspective by transforming a 2d rectangle into a trapezium. Specifically, I want the right hand side of the rectangle to stay the same height, and the left hand side to be say 80% of the height, so that the mid points of both sides are horizontally in line with each other. I'm familiar with matrix algebra, but can't think how to determine what matrix will do that.

    Read the article

  • What's a good matrix manipulation library available for C ?

    - by banister
    Hi, I am doing a lot of image processing in C and I need a good, reasonably lightweight, and above all FAST matrix manipulation library. I am mostly focussing on affine transformations and matrix inversions, so i do not need anything too sophisticated or bloated. Primarily I would like something that is very fast (using SSE perhaps?), with a clean API and (hopefully) prepackaged by many of the unix package management systems. Note this is for C not for C++. Thanks :)

    Read the article

  • Best way to get the highest sum from a Matrix (using Java but algorithm is the issue here)

    - by user294896
    Sorry I dont know the correct terminology to use but I have a 3x3 matrix like this 1 3 4 5 4 5 2 2 5 and I want get the highest score by picking a value from each row/column but I cant pick the same row or column more than once , so the answer in this case is 3 + 5 + 5 = 13 (row0,col1 + row1,col0 + row2,col2) 4 + 5 + 5 = 14 is not allowed because would have picked two values from col2 I'm using Java, and typically the matrix would be 15 by 15 in size. Is there a name for what Im trying to do, and whats the algorithm thanks Paul

    Read the article

  • Transform 3D vectors between coordinate systems

    - by Nir Cig
    I've got 6 points in 3D space: A,B,C,D,E,F, that represent 4 vectors. AB is perpendicular to AC and DE is perpendicular to DF. I need to find a transformation matrix M, that transforms AB to DE and AC to DF. In other words: M·AB=DE, M·AC=DF If no scaling was involved, this could be solved with a simple rotation matrix. But since the ratios |AB|/|DE|, |AC|/|DF| might be different, I'm not sure how to proceed.

    Read the article

  • Transforming object world space matrix to a position in world space

    - by Fredrik Boston Westman
    Im trying to make a function for picking objects with a bounding sphere however I have run in to a problem. First I check against my my bounding sphere, then if it checks out then I test against the vertexes. I have already tested my vertex picking method and it work fine, however when I check first with my bounding sphere method it dosnt register anything. My conclusion is that when im transform my sphere position in to the position of the object in world space, the transformation goes wrong ( I base this on the fact the the x coordinate always becomes 1, even tho i translate non of my meshes along the x-axis to 1). So my question is: What is the proper way to transform a objects world space matrix to a position vector ? This is how i do it now: First i set my position vector to 0. XMVECTOR meshPos = XMVectorSet(0.0f, 0.0f, 0.0f, 0.0f); Then I trannsform it with my object space matrix, and then add the offset to the center of the mesh. meshPos = XMVector3TransformCoord(meshPos, meshWorld) + centerOffset;

    Read the article

  • Nucleus Research – Research Note: Technology Value Matrix – First Half 2012 Enterprise Applications

    - by LanaProut
    1024x768 Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman","serif";} The Technology Value Matrix evaluates products that have a global presence and provide core functionality for finance and accounting, human resources, manufacturing, supply chain management, project management, and customer relationship management.  Oracle E-Business Suite and Oracle JD Edwards EnterpriseOne are leaders in the Value Matrix for the first half of 2012.  Click here to view the report.

    Read the article

  • Dot Matrix printers setup...

    - by Parhs
    Hello! I am using debian which is similar to ubuntu. They have 7 dot matrix printers some very old like this one http://www.omnidatasys.net/product/desc_printer_ti880.htm which works from 1979 daily and at text is faster than many inkjects. I believe that it has his own language... Sending text to serial port (port server) prints garbage. However i think is prints only capital english up to 95 asccii and greek and the rest up to 127 i think greek capital.(special chip ) Sending english capital letters prints garbage i think but i amnt sure... i will try again... The other printer are ESC/P compatible and i use generic epson driver provided from ghostscript... However i think that sending text via lp -dpr1 filename It prints the text as a grafic...Changing from printer font face(courier,times roman etc) or pitch has no effect... I am wondering if is there any work arround for this? In AIX they claim that lp command printed output as text as it prints and cobol programs send raw text to to lp printers . However in AIX they use some custom filters for the printers and has more options for dot matrix printers.. I would like to know if there is a solution for this.. To avoid graphics mode for text and change font face somehow.. The most Straight-through approach would be to use no driver ,just send ESC/P from cobol but this requires too much work... Thank you again!

    Read the article

  • Matrix Multiplication with C++ AMP

    - by Daniel Moth
    As part of our API tour of C++ AMP, we looked recently at parallel_for_each. I ended that post by saying we would revisit parallel_for_each after introducing array and array_view. Now is the time, so this is part 2 of parallel_for_each, and also a post that brings together everything we've seen until now. The code for serial and accelerated Consider a naïve (or brute force) serial implementation of matrix multiplication  0: void MatrixMultiplySerial(std::vector<float>& vC, const std::vector<float>& vA, const std::vector<float>& vB, int M, int N, int W) 1: { 2: for (int row = 0; row < M; row++) 3: { 4: for (int col = 0; col < N; col++) 5: { 6: float sum = 0.0f; 7: for(int i = 0; i < W; i++) 8: sum += vA[row * W + i] * vB[i * N + col]; 9: vC[row * N + col] = sum; 10: } 11: } 12: } We notice that each loop iteration is independent from each other and so can be parallelized. If in addition we have really large amounts of data, then this is a good candidate to offload to an accelerator. First, I'll just show you an example of what that code may look like with C++ AMP, and then we'll analyze it. It is assumed that you included at the top of your file #include <amp.h> 13: void MatrixMultiplySimple(std::vector<float>& vC, const std::vector<float>& vA, const std::vector<float>& vB, int M, int N, int W) 14: { 15: concurrency::array_view<const float,2> a(M, W, vA); 16: concurrency::array_view<const float,2> b(W, N, vB); 17: concurrency::array_view<concurrency::writeonly<float>,2> c(M, N, vC); 18: concurrency::parallel_for_each(c.grid, 19: [=](concurrency::index<2> idx) restrict(direct3d) { 20: int row = idx[0]; int col = idx[1]; 21: float sum = 0.0f; 22: for(int i = 0; i < W; i++) 23: sum += a(row, i) * b(i, col); 24: c[idx] = sum; 25: }); 26: } First a visual comparison, just for fun: The beginning and end is the same, i.e. lines 0,1,12 are identical to lines 13,14,26. The double nested loop (lines 2,3,4,5 and 10,11) has been transformed into a parallel_for_each call (18,19,20 and 25). The core algorithm (lines 6,7,8,9) is essentially the same (lines 21,22,23,24). We have extra lines in the C++ AMP version (15,16,17). Now let's dig in deeper. Using array_view and extent When we decided to convert this function to run on an accelerator, we knew we couldn't use the std::vector objects in the restrict(direct3d) function. So we had a choice of copying the data to the the concurrency::array<T,N> object, or wrapping the vector container (and hence its data) with a concurrency::array_view<T,N> object from amp.h – here we used the latter (lines 15,16,17). Now we can access the same data through the array_view objects (a and b) instead of the vector objects (vA and vB), and the added benefit is that we can capture the array_view objects in the lambda (lines 19-25) that we pass to the parallel_for_each call (line 18) and the data will get copied on demand for us to the accelerator. Note that line 15 (and ditto for 16 and 17) could have been written as two lines instead of one: extent<2> e(M, W); array_view<const float, 2> a(e, vA); In other words, we could have explicitly created the extent object instead of letting the array_view create it for us under the covers through the constructor overload we chose. The benefit of the extent object in this instance is that we can express that the data is indeed two dimensional, i.e a matrix. When we were using a vector object we could not do that, and instead we had to track via additional unrelated variables the dimensions of the matrix (i.e. with the integers M and W) – aren't you loving C++ AMP already? Note that the const before the float when creating a and b, will result in the underling data only being copied to the accelerator and not be copied back – a nice optimization. A similar thing is happening on line 17 when creating array_view c, where we have indicated that we do not need to copy the data to the accelerator, only copy it back. The kernel dispatch On line 18 we make the call to the C++ AMP entry point (parallel_for_each) to invoke our parallel loop or, as some may say, dispatch our kernel. The first argument we need to pass describes how many threads we want for this computation. For this algorithm we decided that we want exactly the same number of threads as the number of elements in the output matrix, i.e. in array_view c which will eventually update the vector vC. So each thread will compute exactly one result. Since the elements in c are organized in a 2-dimensional manner we can organize our threads in a two-dimensional manner too. We don't have to think too much about how to create the first argument (a grid) since the array_view object helpfully exposes that as a property. Note that instead of c.grid we could have written grid<2>(c.extent) or grid<2>(extent<2>(M, N)) – the result is the same in that we have specified M*N threads to execute our lambda. The second argument is a restrict(direct3d) lambda that accepts an index object. Since we elected to use a two-dimensional extent as the first argument of parallel_for_each, the index will also be two-dimensional and as covered in the previous posts it represents the thread ID, which in our case maps perfectly to the index of each element in the resulting array_view. The kernel itself The lambda body (lines 20-24), or as some may say, the kernel, is the code that will actually execute on the accelerator. It will be called by M*N threads and we can use those threads to index into the two input array_views (a,b) and write results into the output array_view ( c ). The four lines (21-24) are essentially identical to the four lines of the serial algorithm (6-9). The only difference is how we index into a,b,c versus how we index into vA,vB,vC. The code we wrote with C++ AMP is much nicer in its indexing, because the dimensionality is a first class concept, so you don't have to do funny arithmetic calculating the index of where the next row starts, which you have to do when working with vectors directly (since they store all the data in a flat manner). I skipped over describing line 20. Note that we didn't really need to read the two components of the index into temporary local variables. This mostly reflects my personal choice, in some algorithms to break down the index into local variables with names that make sense for the algorithm, i.e. in this case row and col. In other cases it may i,j,k or x,y,z, or M,N or whatever. Also note that we could have written line 24 as: c(idx[0], idx[1])=sum  or  c(row, col)=sum instead of the simpler c[idx]=sum Targeting a specific accelerator Imagine that we had more than one hardware accelerator on a system and we wanted to pick a specific one to execute this parallel loop on. So there would be some code like this anywhere before line 18: vector<accelerator> accs = MyFunctionThatChoosesSuitableAccelerators(); accelerator acc = accs[0]; …and then we would modify line 18 so we would be calling another overload of parallel_for_each that accepts an accelerator_view as the first argument, so it would become: concurrency::parallel_for_each(acc.default_view, c.grid, ...and the rest of your code remains the same… how simple is that? Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

  • template style matrix implementation in c

    - by monkeyking
    From time to time I use the following code for generating a matrix style datastructure typedef double myType; typedef struct matrix_t{ |Compilation started at Mon Apr 5 02:24:15 myType **matrix; | size_t x; |gcc structreaderGeneral.c -std=gnu99 -lz size_t y; | }matrix; |Compilation finished at Mon Apr 5 02:24:15 | | matrix alloc_matrix(size_t x, size_t y){ | if(0) | fprintf(stderr,"\t-> Alloc matrix with dim (%lu,%lu) byteprline=%lu bytetotal:%l\| u\n",x,y,y*sizeof(myType),x*y*sizeof(myType)); | | myType **m = (myType **)malloc(x*sizeof(myType **)); | for(size_t i=0;i<x;i++) | m[i] =(myType *) malloc(y*sizeof(myType *)); | | matrix ret; | ret.x=x; | ret.y=y; | ret.matrix=m; | return ret; | } And then I would change my typedef accordingly if I needed a different kind of type for the entries in my matrix. Now I need 2 matrices with different types, an easy solution would be to copy/paste the code, but is there some way to do a template styled implementation. Thanks

    Read the article

  • How to prevent 2D camera rotation if it would violate the bounds of the camera?

    - by Andrew Price
    I'm working on a Camera class and I have a rectangle field named Bounds that determines the bounds of the camera. I have it working for zooming and moving the camera so that the camera cannot exit its bounds. However, I'm a bit confused on how to do the same for rotation. Currently I allow rotating of the camera's Z-axis. However, if sufficiently zoomed out, upon rotating the camera, areas of the screen outside the camera's bounds can be shown. I'd like to deny the rotation assuming it meant that the newly rotated camera would expose areas outside the camera's bounds, but I'm not quite sure how. I'm still new to Matrix and Vector math and I'm not quite sure how to model if the newly rotated camera sees outside of its bounds, undo the rotation. Here's an image showing the problem: http://i.stack.imgur.com/NqprC.png The red is out of bounds and as a result, the camera should never be allowed to rotate itself like this. This seems like it would be a problem with all rotated values, but this is not the case when the camera is zoomed in enough. Here are the current member variables for the Camera class: private Vector2 _position = Vector2.Zero; private Vector2 _origin = Vector2.Zero; private Rectangle? _bounds = Rectangle.Empty; private float _rotation = 0.0f; private float _zoom = 1.0f; Is this possible to do? If so, could someone give me some guidance on how to accomplish this? Thanks. EDIT: I forgot to mention I am using a transformation matrix style camera that I input in to SpriteBatch.Begin. I am using the same transformation matrix from this tutorial.

    Read the article

  • Strange 3D game engine camera with X,Y,Zoom instead of X,Y,Z

    - by Jenko
    I'm using a 3D game engine, that uses a 4x4 matrix to modify the camera projection, in this format: r r r x r r r y r r r z - - - zoom Strangely though, the camera does not respond to the Z translation parameter, and so you're forced to use X, Y, Zoom to move the camera around. Technically this is plausible for isometric-style games such as Age Of Empires III. But this is a 3D engine, and so why would they have designed the camera to ignore Z and respond only to zoom? Am I missing something here? I've tried every method of setting the camera and it really seems to ignore Z. So currently I have to resort to moving the main object in the scene graph instead of moving the camera in relation to the objects. My question: Do you have any idea why the engine would use such a scheme? Is it common? Why? Or does it seem like I'm missing something and the SetProjection(Matrix) function is broken and somehow ignores the Z translation in the matrix? (unlikely, but possible) Anyhow, what are the workarounds? Is moving objects around the only way? Edit: I'm sorry I cannot reveal much about the engine because we're in a binding contract. It's a locally developed engine (Australia) written in managed C# used for data visualizations. Edit: The default mode of the engine is orthographic, although I've switched it into perspective mode. Its probably more effective to use X, Y, Zoom in orthographic mode, but I need to use perspective mode to render everyday objects as well.

    Read the article

  • 3D rotation matrices deform object while rotating

    - by Kevin
    I'm writing a small 3D renderer (using an orthographic projection right now). I've run into some trouble with my 3D rotation matrices. They seem to squeeze my 3D object (a box primitive) at certain angles. Here's a live demo (only tested in Google Chrome): http://dl.dropbox.com/u/109400107/3D/index.html The box is viewed from the top along the Y axis and is rotating around the X and Z axis. These are my 3 rotation matrices (Only rX and rZ are being used): var rX = new Matrix([ [1, 0, 0], [0, Math.cos(radiants), -Math.sin(radiants)], [0, Math.sin(radiants), Math.cos(radiants)] ]); var rY = new Matrix([ [Math.cos(radiants), 0, Math.sin(radiants)], [0, 1, 0], [-Math.sin(radiants), 0, Math.cos(radiants)] ]); var rZ = new Matrix([ [Math.cos(radiants), -Math.sin(radiants), 0], [Math.sin(radiants), Math.cos(radiants), 0], [0, 0, 1] ]); Before projecting the verticies I multiply them by rZ and rX like so: vert1.multiply(rZ); vert1.multiply(rX); vert2.multiply(rZ); vert2.multiply(rX); vert3.multiply(rZ); vert3.multiply(rX); The projection itself looks like this: bX = (pos.x + (vert1.x*scale)); bY = (pos.y + (vert1.z*scale)); Where "pos.x" and "pos.y" is an offset for centering the box on the screen. I just can't seem to find a solution to this and I'm still relativly new to working with Matricies. You can view the source-code of the demo page if you want to see the whole thing.

    Read the article

  • PeopleSoft HCM @ OHUG 11: Enter the Matrix

    - by Jay Zuckert
    The PeopleSoft HCM team is back from a very busy and exciting OHUG conference in Orlando. The packed, standing-room only PeopleSoft HCM Roadmap keynote was the highlight of the conference for many attendees and the reviews are in : PeopleSoft rocked the house ! Great demonstration of products in the keynote. Best keynote in a long time, and fun. Engaging and entertaining, great demonstration of capabilities. Message received loud and clear, PeopleSoft applications are here to stay.  PeopleSoft has a real vision moving forward. Real-time polls using mobile texting were cutting edge.                          Tracy Martin (as Trinity) and other members of the PeopleSoft HCM team presented a ‘must-see’ Matrix-themed session while dressed as movie characters. The keynote highlighted planned HCM capabilities for Matrix administration and future organization visualization enhancements. The team also previewed the planned Manager Dashboard and Talent Summary.                           Following the keynote, some of the cast posed for photo opportunities at the OHUG booth in the exhibition hall. As you can imagine, they received some interesting looks walking by the other vendor booths. The PeopleSoft HCM team also presented numerous other OHUG sessions covering PeopleSoft Talent Management, Compensation, HR HelpDesk, Payroll, Global HCM Practices, Time & Labor, Absence Management, and Benefits. All of those presentations are available from the OHUG site at www.ohug.org. When not in one of the well-attended PeopleSoft HCM sessions, conference attendees filled the Oracle booth in the exhibition hall to see live product demonstrations. True to their PeopleSoft roots, some of the PeopleSoft HCM team played as hard as they worked in Orlando and enjoyed the OHUG Appreciation event along with customers at the Hard Rock. We are already busy planning for Oracle OpenWorld 2011 and prepping sessions our PeopleSoft HCM customers are sure to like. We hope to see you there in San Francisco from Oct. 2-6. To learn more about OpenWorld or to register, click here.

    Read the article

  • tile_static, tile_barrier, and tiled matrix multiplication with C++ AMP

    - by Daniel Moth
    We ended the previous post with a mechanical transformation of the C++ AMP matrix multiplication example to the tiled model and in the process introduced tiled_index and tiled_grid. This is part 2. tile_static memory You all know that in regular CPU code, static variables have the same value regardless of which thread accesses the static variable. This is in contrast with non-static local variables, where each thread has its own copy. Back to C++ AMP, the same rules apply and each thread has its own value for local variables in your lambda, whereas all threads see the same global memory, which is the data they have access to via the array and array_view. In addition, on an accelerator like the GPU, there is a programmable cache, a third kind of memory type if you'd like to think of it that way (some call it shared memory, others call it scratchpad memory). Variables stored in that memory share the same value for every thread in the same tile. So, when you use the tiled model, you can have variables where each thread in the same tile sees the same value for that variable, that threads from other tiles do not. The new storage class for local variables introduced for this purpose is called tile_static. You can only use tile_static in restrict(direct3d) functions, and only when explicitly using the tiled model. What this looks like in code should be no surprise, but here is a snippet to confirm your mental image, using a good old regular C array // each tile of threads has its own copy of locA, // shared among the threads of the tile tile_static float locA[16][16]; Note that tile_static variables are scoped and have the lifetime of the tile, and they cannot have constructors or destructors. tile_barrier In amp.h one of the types introduced is tile_barrier. You cannot construct this object yourself (although if you had one, you could use a copy constructor to create another one). So how do you get one of these? You get it, from a tiled_index object. Beyond the 4 properties returning index objects, tiled_index has another property, barrier, that returns a tile_barrier object. The tile_barrier class exposes a single member, the method wait. 15: // Given a tiled_index object named t_idx 16: t_idx.barrier.wait(); 17: // more code …in the code above, all threads in the tile will reach line 16 before a single one progresses to line 17. Note that all threads must be able to reach the barrier, i.e. if you had branchy code in such a way which meant that there is a chance that not all threads could reach line 16, then the code above would be illegal. Tiled Matrix Multiplication Example – part 2 So now that we added to our understanding the concepts of tile_static and tile_barrier, let me obfuscate rewrite the matrix multiplication code so that it takes advantage of tiling. Before you start reading this, I suggest you get a cup of your favorite non-alcoholic beverage to enjoy while you try to fully understand the code. 01: void MatrixMultiplyTiled(vector<float>& vC, const vector<float>& vA, const vector<float>& vB, int M, int N, int W) 02: { 03: static const int TS = 16; 04: array_view<const float,2> a(M, W, vA); 05: array_view<const float,2> b(W, N, vB); 06: array_view<writeonly<float>,2> c(M,N,vC); 07: parallel_for_each(c.grid.tile< TS, TS >(), 08: [=] (tiled_index< TS, TS> t_idx) restrict(direct3d) 09: { 10: int row = t_idx.local[0]; int col = t_idx.local[1]; 11: float sum = 0.0f; 12: for (int i = 0; i < W; i += TS) { 13: tile_static float locA[TS][TS], locB[TS][TS]; 14: locA[row][col] = a(t_idx.global[0], col + i); 15: locB[row][col] = b(row + i, t_idx.global[1]); 16: t_idx.barrier.wait(); 17: for (int k = 0; k < TS; k++) 18: sum += locA[row][k] * locB[k][col]; 19: t_idx.barrier.wait(); 20: } 21: c[t_idx.global] = sum; 22: }); 23: } Notice that all the code up to line 9 is the same as per the changes we made in part 1 of tiling introduction. If you squint, the body of the lambda itself preserves the original algorithm on lines 10, 11, and 17, 18, and 21. The difference being that those lines use new indexing and the tile_static arrays; the tile_static arrays are declared and initialized on the brand new lines 13-15. On those lines we copy from the global memory represented by the array_view objects (a and b), to the tile_static vanilla arrays (locA and locB) – we are copying enough to fit a tile. Because in the code that follows on line 18 we expect the data for this tile to be in the tile_static storage, we need to synchronize the threads within each tile with a barrier, which we do on line 16 (to avoid accessing uninitialized memory on line 18). We also need to synchronize the threads within a tile on line 19, again to avoid the race between lines 14, 15 (retrieving the next set of data for each tile and overwriting the previous set) and line 18 (not being done processing the previous set of data). Luckily, as part of the awesome C++ AMP debugger in Visual Studio there is an option that helps you find such races, but that is a story for another blog post another time. May I suggest reading the next section, and then coming back to re-read and walk through this code with pen and paper to really grok what is going on, if you haven't already? Cool. Why would I introduce this tiling complexity into my code? Funny you should ask that, I was just about to tell you. There is only one reason we tiled our extent, had to deal with finding a good tile size, ensure the number of threads we schedule are correctly divisible with the tile size, had to use a tiled_index instead of a normal index, and had to understand tile_barrier and to figure out where we need to use it, and double the size of our lambda in terms of lines of code: the reason is to be able to use tile_static memory. Why do we want to use tile_static memory? Because accessing tile_static memory is around 10 times faster than accessing the global memory on an accelerator like the GPU, e.g. in the code above, if you can get 150GB/second accessing data from the array_view a, you can get 1500GB/second accessing the tile_static array locA. And since by definition you are dealing with really large data sets, the savings really pay off. We have seen tiled implementations being twice as fast as their non-tiled counterparts. Now, some algorithms will not have performance benefits from tiling (and in fact may deteriorate), e.g. algorithms that require you to go only once to global memory will not benefit from tiling, since with tiling you already have to fetch the data once from global memory! Other algorithms may benefit, but you may decide that you are happy with your code being 150 times faster than the serial-version you had, and you do not need to invest to make it 250 times faster. Also algorithms with more than 3 dimensions, which C++ AMP supports in the non-tiled model, cannot be tiled. Also note that in future releases, we may invest in making the non-tiled model, which already uses tiling under the covers, go the extra step and use tile_static memory on your behalf, but it is obviously way to early to commit to anything like that, and we certainly don't do any of that today. Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

< Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18  | Next Page >