Search Results

Search found 2221 results on 89 pages for 'inverse matrix'.

Page 12/89 | < Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19  | Next Page >

  • Best way to get the highest sum from a Matrix (using Java but algorithm is the issue here)

    - by user294896
    Sorry I dont know the correct terminology to use but I have a 3x3 matrix like this 1 3 4 5 4 5 2 2 5 and I want get the highest score by picking a value from each row/column but I cant pick the same row or column more than once , so the answer in this case is 3 + 5 + 5 = 13 (row0,col1 + row1,col0 + row2,col2) 4 + 5 + 5 = 14 is not allowed because would have picked two values from col2 I'm using Java, and typically the matrix would be 15 by 15 in size. Is there a name for what Im trying to do, and whats the algorithm thanks Paul

    Read the article

  • Transform 3D vectors between coordinate systems

    - by Nir Cig
    I've got 6 points in 3D space: A,B,C,D,E,F, that represent 4 vectors. AB is perpendicular to AC and DE is perpendicular to DF. I need to find a transformation matrix M, that transforms AB to DE and AC to DF. In other words: M·AB=DE, M·AC=DF If no scaling was involved, this could be solved with a simple rotation matrix. But since the ratios |AB|/|DE|, |AC|/|DF| might be different, I'm not sure how to proceed.

    Read the article

  • Transforming object world space matrix to a position in world space

    - by Fredrik Boston Westman
    Im trying to make a function for picking objects with a bounding sphere however I have run in to a problem. First I check against my my bounding sphere, then if it checks out then I test against the vertexes. I have already tested my vertex picking method and it work fine, however when I check first with my bounding sphere method it dosnt register anything. My conclusion is that when im transform my sphere position in to the position of the object in world space, the transformation goes wrong ( I base this on the fact the the x coordinate always becomes 1, even tho i translate non of my meshes along the x-axis to 1). So my question is: What is the proper way to transform a objects world space matrix to a position vector ? This is how i do it now: First i set my position vector to 0. XMVECTOR meshPos = XMVectorSet(0.0f, 0.0f, 0.0f, 0.0f); Then I trannsform it with my object space matrix, and then add the offset to the center of the mesh. meshPos = XMVector3TransformCoord(meshPos, meshWorld) + centerOffset;

    Read the article

  • Nucleus Research – Research Note: Technology Value Matrix – First Half 2012 Enterprise Applications

    - by LanaProut
    1024x768 Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman","serif";} The Technology Value Matrix evaluates products that have a global presence and provide core functionality for finance and accounting, human resources, manufacturing, supply chain management, project management, and customer relationship management.  Oracle E-Business Suite and Oracle JD Edwards EnterpriseOne are leaders in the Value Matrix for the first half of 2012.  Click here to view the report.

    Read the article

  • Dot Matrix printers setup...

    - by Parhs
    Hello! I am using debian which is similar to ubuntu. They have 7 dot matrix printers some very old like this one http://www.omnidatasys.net/product/desc_printer_ti880.htm which works from 1979 daily and at text is faster than many inkjects. I believe that it has his own language... Sending text to serial port (port server) prints garbage. However i think is prints only capital english up to 95 asccii and greek and the rest up to 127 i think greek capital.(special chip ) Sending english capital letters prints garbage i think but i amnt sure... i will try again... The other printer are ESC/P compatible and i use generic epson driver provided from ghostscript... However i think that sending text via lp -dpr1 filename It prints the text as a grafic...Changing from printer font face(courier,times roman etc) or pitch has no effect... I am wondering if is there any work arround for this? In AIX they claim that lp command printed output as text as it prints and cobol programs send raw text to to lp printers . However in AIX they use some custom filters for the printers and has more options for dot matrix printers.. I would like to know if there is a solution for this.. To avoid graphics mode for text and change font face somehow.. The most Straight-through approach would be to use no driver ,just send ESC/P from cobol but this requires too much work... Thank you again!

    Read the article

  • Matrix Multiplication with C++ AMP

    - by Daniel Moth
    As part of our API tour of C++ AMP, we looked recently at parallel_for_each. I ended that post by saying we would revisit parallel_for_each after introducing array and array_view. Now is the time, so this is part 2 of parallel_for_each, and also a post that brings together everything we've seen until now. The code for serial and accelerated Consider a naïve (or brute force) serial implementation of matrix multiplication  0: void MatrixMultiplySerial(std::vector<float>& vC, const std::vector<float>& vA, const std::vector<float>& vB, int M, int N, int W) 1: { 2: for (int row = 0; row < M; row++) 3: { 4: for (int col = 0; col < N; col++) 5: { 6: float sum = 0.0f; 7: for(int i = 0; i < W; i++) 8: sum += vA[row * W + i] * vB[i * N + col]; 9: vC[row * N + col] = sum; 10: } 11: } 12: } We notice that each loop iteration is independent from each other and so can be parallelized. If in addition we have really large amounts of data, then this is a good candidate to offload to an accelerator. First, I'll just show you an example of what that code may look like with C++ AMP, and then we'll analyze it. It is assumed that you included at the top of your file #include <amp.h> 13: void MatrixMultiplySimple(std::vector<float>& vC, const std::vector<float>& vA, const std::vector<float>& vB, int M, int N, int W) 14: { 15: concurrency::array_view<const float,2> a(M, W, vA); 16: concurrency::array_view<const float,2> b(W, N, vB); 17: concurrency::array_view<concurrency::writeonly<float>,2> c(M, N, vC); 18: concurrency::parallel_for_each(c.grid, 19: [=](concurrency::index<2> idx) restrict(direct3d) { 20: int row = idx[0]; int col = idx[1]; 21: float sum = 0.0f; 22: for(int i = 0; i < W; i++) 23: sum += a(row, i) * b(i, col); 24: c[idx] = sum; 25: }); 26: } First a visual comparison, just for fun: The beginning and end is the same, i.e. lines 0,1,12 are identical to lines 13,14,26. The double nested loop (lines 2,3,4,5 and 10,11) has been transformed into a parallel_for_each call (18,19,20 and 25). The core algorithm (lines 6,7,8,9) is essentially the same (lines 21,22,23,24). We have extra lines in the C++ AMP version (15,16,17). Now let's dig in deeper. Using array_view and extent When we decided to convert this function to run on an accelerator, we knew we couldn't use the std::vector objects in the restrict(direct3d) function. So we had a choice of copying the data to the the concurrency::array<T,N> object, or wrapping the vector container (and hence its data) with a concurrency::array_view<T,N> object from amp.h – here we used the latter (lines 15,16,17). Now we can access the same data through the array_view objects (a and b) instead of the vector objects (vA and vB), and the added benefit is that we can capture the array_view objects in the lambda (lines 19-25) that we pass to the parallel_for_each call (line 18) and the data will get copied on demand for us to the accelerator. Note that line 15 (and ditto for 16 and 17) could have been written as two lines instead of one: extent<2> e(M, W); array_view<const float, 2> a(e, vA); In other words, we could have explicitly created the extent object instead of letting the array_view create it for us under the covers through the constructor overload we chose. The benefit of the extent object in this instance is that we can express that the data is indeed two dimensional, i.e a matrix. When we were using a vector object we could not do that, and instead we had to track via additional unrelated variables the dimensions of the matrix (i.e. with the integers M and W) – aren't you loving C++ AMP already? Note that the const before the float when creating a and b, will result in the underling data only being copied to the accelerator and not be copied back – a nice optimization. A similar thing is happening on line 17 when creating array_view c, where we have indicated that we do not need to copy the data to the accelerator, only copy it back. The kernel dispatch On line 18 we make the call to the C++ AMP entry point (parallel_for_each) to invoke our parallel loop or, as some may say, dispatch our kernel. The first argument we need to pass describes how many threads we want for this computation. For this algorithm we decided that we want exactly the same number of threads as the number of elements in the output matrix, i.e. in array_view c which will eventually update the vector vC. So each thread will compute exactly one result. Since the elements in c are organized in a 2-dimensional manner we can organize our threads in a two-dimensional manner too. We don't have to think too much about how to create the first argument (a grid) since the array_view object helpfully exposes that as a property. Note that instead of c.grid we could have written grid<2>(c.extent) or grid<2>(extent<2>(M, N)) – the result is the same in that we have specified M*N threads to execute our lambda. The second argument is a restrict(direct3d) lambda that accepts an index object. Since we elected to use a two-dimensional extent as the first argument of parallel_for_each, the index will also be two-dimensional and as covered in the previous posts it represents the thread ID, which in our case maps perfectly to the index of each element in the resulting array_view. The kernel itself The lambda body (lines 20-24), or as some may say, the kernel, is the code that will actually execute on the accelerator. It will be called by M*N threads and we can use those threads to index into the two input array_views (a,b) and write results into the output array_view ( c ). The four lines (21-24) are essentially identical to the four lines of the serial algorithm (6-9). The only difference is how we index into a,b,c versus how we index into vA,vB,vC. The code we wrote with C++ AMP is much nicer in its indexing, because the dimensionality is a first class concept, so you don't have to do funny arithmetic calculating the index of where the next row starts, which you have to do when working with vectors directly (since they store all the data in a flat manner). I skipped over describing line 20. Note that we didn't really need to read the two components of the index into temporary local variables. This mostly reflects my personal choice, in some algorithms to break down the index into local variables with names that make sense for the algorithm, i.e. in this case row and col. In other cases it may i,j,k or x,y,z, or M,N or whatever. Also note that we could have written line 24 as: c(idx[0], idx[1])=sum  or  c(row, col)=sum instead of the simpler c[idx]=sum Targeting a specific accelerator Imagine that we had more than one hardware accelerator on a system and we wanted to pick a specific one to execute this parallel loop on. So there would be some code like this anywhere before line 18: vector<accelerator> accs = MyFunctionThatChoosesSuitableAccelerators(); accelerator acc = accs[0]; …and then we would modify line 18 so we would be calling another overload of parallel_for_each that accepts an accelerator_view as the first argument, so it would become: concurrency::parallel_for_each(acc.default_view, c.grid, ...and the rest of your code remains the same… how simple is that? Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

  • Inverse Kinematics with OpenGL/Eigen3 : unstable jacobian pseudoinverse

    - by SigTerm
    I'm trying to implement simple inverse kinematics test using OpenGL, Eigen3 and "jacobian pseudoinverse" method. The system works fine using "jacobian transpose" algorithm, however, as soon as I attempt to use "pseudoinverse", joints become unstable and start jerking around (eventually they freeze completely - unless I use "jacobian transpose" fallback computation). I've investigated the issue and turns out that in some cases jacobian.inverse()*jacobian has zero determinant and cannot be inverted. However, I've seen other demos on the internet (youtube) that claim to use same method and they do not seem to have this problem. So I'm uncertain where is the cause of the issue. Code is attached below: *.h: struct Ik{ float targetAngle; float ikLength; VectorXf angles; Vector3f root, target; Vector3f jointPos(int ikIndex); size_t size() const; Vector3f getEndPos(int index, const VectorXf& vec); void resize(size_t size); void update(float t); void render(); Ik(): targetAngle(0), ikLength(10){ } }; *.cpp: size_t Ik::size() const{ return angles.rows(); } Vector3f Ik::getEndPos(int index, const VectorXf& vec){ Vector3f pos(0, 0, 0); while(true){ Eigen::Affine3f t; float radAngle = pi*vec[index]/180.0f; t = Eigen::AngleAxisf(radAngle, Vector3f(-1, 0, 0)) * Eigen::Translation3f(Vector3f(0, 0, ikLength)); pos = t * pos; if (index == 0) break; index--; } return pos; } void Ik::resize(size_t size){ angles.resize(size); angles.setZero(); } void drawMarker(Vector3f p){ glBegin(GL_LINES); glVertex3f(p[0]-1, p[1], p[2]); glVertex3f(p[0]+1, p[1], p[2]); glVertex3f(p[0], p[1]-1, p[2]); glVertex3f(p[0], p[1]+1, p[2]); glVertex3f(p[0], p[1], p[2]-1); glVertex3f(p[0], p[1], p[2]+1); glEnd(); } void drawIkArm(float length){ glBegin(GL_LINES); float f = 0.25f; glVertex3f(0, 0, length); glVertex3f(-f, -f, 0); glVertex3f(0, 0, length); glVertex3f(f, -f, 0); glVertex3f(0, 0, length); glVertex3f(f, f, 0); glVertex3f(0, 0, length); glVertex3f(-f, f, 0); glEnd(); glBegin(GL_LINE_LOOP); glVertex3f(f, f, 0); glVertex3f(-f, f, 0); glVertex3f(-f, -f, 0); glVertex3f(f, -f, 0); glEnd(); } void Ik::update(float t){ targetAngle += t * pi*2.0f/10.0f; while (t > pi*2.0f) t -= pi*2.0f; target << 0, 8 + 3*sinf(targetAngle), cosf(targetAngle)*4.0f+5.0f; Vector3f tmpTarget = target; Vector3f targetDiff = tmpTarget - root; float l = targetDiff.norm(); float maxLen = ikLength*(float)angles.size() - 0.01f; if (l > maxLen){ targetDiff *= maxLen/l; l = targetDiff.norm(); tmpTarget = root + targetDiff; } Vector3f endPos = getEndPos(size()-1, angles); Vector3f diff = tmpTarget - endPos; float maxAngle = 360.0f/(float)angles.size(); for(int loop = 0; loop < 1; loop++){ MatrixXf jacobian(diff.rows(), angles.rows()); jacobian.setZero(); float step = 1.0f; for (int i = 0; i < angles.size(); i++){ Vector3f curRoot = root; if (i) curRoot = getEndPos(i-1, angles); Vector3f axis(1, 0, 0); Vector3f n = endPos - curRoot; float l = n.norm(); if (l) n /= l; n = n.cross(axis); if (l) n *= l*step*pi/180.0f; //std::cout << n << "\n"; for (int j = 0; j < 3; j++) jacobian(j, i) = n[j]; } std::cout << jacobian << std::endl; MatrixXf jjt = jacobian.transpose()*jacobian; //std::cout << jjt << std::endl; float d = jjt.determinant(); MatrixXf invJ; float scale = 0.1f; if (!d /*|| true*/){ invJ = jacobian.transpose(); scale = 5.0f; std::cout << "fallback to jacobian transpose!\n"; } else{ invJ = jjt.inverse()*jacobian.transpose(); std::cout << "jacobian pseudo-inverse!\n"; } //std::cout << invJ << std::endl; VectorXf add = invJ*diff*step*scale; //std::cout << add << std::endl; float maxSpeed = 15.0f; for (int i = 0; i < add.size(); i++){ float& cur = add[i]; cur = std::max(-maxSpeed, std::min(maxSpeed, cur)); } angles += add; for (int i = 0; i < angles.size(); i++){ float& cur = angles[i]; if (i) cur = std::max(-maxAngle, std::min(maxAngle, cur)); } } } void Ik::render(){ glPushMatrix(); glTranslatef(root[0], root[1], root[2]); for (int i = 0; i < angles.size(); i++){ glRotatef(angles[i], -1, 0, 0); drawIkArm(ikLength); glTranslatef(0, 0, ikLength); } glPopMatrix(); drawMarker(target); for (int i = 0; i < angles.size(); i++) drawMarker(getEndPos(i, angles)); } Any help will be appreciated.

    Read the article

  • template style matrix implementation in c

    - by monkeyking
    From time to time I use the following code for generating a matrix style datastructure typedef double myType; typedef struct matrix_t{ |Compilation started at Mon Apr 5 02:24:15 myType **matrix; | size_t x; |gcc structreaderGeneral.c -std=gnu99 -lz size_t y; | }matrix; |Compilation finished at Mon Apr 5 02:24:15 | | matrix alloc_matrix(size_t x, size_t y){ | if(0) | fprintf(stderr,"\t-> Alloc matrix with dim (%lu,%lu) byteprline=%lu bytetotal:%l\| u\n",x,y,y*sizeof(myType),x*y*sizeof(myType)); | | myType **m = (myType **)malloc(x*sizeof(myType **)); | for(size_t i=0;i<x;i++) | m[i] =(myType *) malloc(y*sizeof(myType *)); | | matrix ret; | ret.x=x; | ret.y=y; | ret.matrix=m; | return ret; | } And then I would change my typedef accordingly if I needed a different kind of type for the entries in my matrix. Now I need 2 matrices with different types, an easy solution would be to copy/paste the code, but is there some way to do a template styled implementation. Thanks

    Read the article

  • How to prevent 2D camera rotation if it would violate the bounds of the camera?

    - by Andrew Price
    I'm working on a Camera class and I have a rectangle field named Bounds that determines the bounds of the camera. I have it working for zooming and moving the camera so that the camera cannot exit its bounds. However, I'm a bit confused on how to do the same for rotation. Currently I allow rotating of the camera's Z-axis. However, if sufficiently zoomed out, upon rotating the camera, areas of the screen outside the camera's bounds can be shown. I'd like to deny the rotation assuming it meant that the newly rotated camera would expose areas outside the camera's bounds, but I'm not quite sure how. I'm still new to Matrix and Vector math and I'm not quite sure how to model if the newly rotated camera sees outside of its bounds, undo the rotation. Here's an image showing the problem: http://i.stack.imgur.com/NqprC.png The red is out of bounds and as a result, the camera should never be allowed to rotate itself like this. This seems like it would be a problem with all rotated values, but this is not the case when the camera is zoomed in enough. Here are the current member variables for the Camera class: private Vector2 _position = Vector2.Zero; private Vector2 _origin = Vector2.Zero; private Rectangle? _bounds = Rectangle.Empty; private float _rotation = 0.0f; private float _zoom = 1.0f; Is this possible to do? If so, could someone give me some guidance on how to accomplish this? Thanks. EDIT: I forgot to mention I am using a transformation matrix style camera that I input in to SpriteBatch.Begin. I am using the same transformation matrix from this tutorial.

    Read the article

  • Strange 3D game engine camera with X,Y,Zoom instead of X,Y,Z

    - by Jenko
    I'm using a 3D game engine, that uses a 4x4 matrix to modify the camera projection, in this format: r r r x r r r y r r r z - - - zoom Strangely though, the camera does not respond to the Z translation parameter, and so you're forced to use X, Y, Zoom to move the camera around. Technically this is plausible for isometric-style games such as Age Of Empires III. But this is a 3D engine, and so why would they have designed the camera to ignore Z and respond only to zoom? Am I missing something here? I've tried every method of setting the camera and it really seems to ignore Z. So currently I have to resort to moving the main object in the scene graph instead of moving the camera in relation to the objects. My question: Do you have any idea why the engine would use such a scheme? Is it common? Why? Or does it seem like I'm missing something and the SetProjection(Matrix) function is broken and somehow ignores the Z translation in the matrix? (unlikely, but possible) Anyhow, what are the workarounds? Is moving objects around the only way? Edit: I'm sorry I cannot reveal much about the engine because we're in a binding contract. It's a locally developed engine (Australia) written in managed C# used for data visualizations. Edit: The default mode of the engine is orthographic, although I've switched it into perspective mode. Its probably more effective to use X, Y, Zoom in orthographic mode, but I need to use perspective mode to render everyday objects as well.

    Read the article

  • 3D rotation matrices deform object while rotating

    - by Kevin
    I'm writing a small 3D renderer (using an orthographic projection right now). I've run into some trouble with my 3D rotation matrices. They seem to squeeze my 3D object (a box primitive) at certain angles. Here's a live demo (only tested in Google Chrome): http://dl.dropbox.com/u/109400107/3D/index.html The box is viewed from the top along the Y axis and is rotating around the X and Z axis. These are my 3 rotation matrices (Only rX and rZ are being used): var rX = new Matrix([ [1, 0, 0], [0, Math.cos(radiants), -Math.sin(radiants)], [0, Math.sin(radiants), Math.cos(radiants)] ]); var rY = new Matrix([ [Math.cos(radiants), 0, Math.sin(radiants)], [0, 1, 0], [-Math.sin(radiants), 0, Math.cos(radiants)] ]); var rZ = new Matrix([ [Math.cos(radiants), -Math.sin(radiants), 0], [Math.sin(radiants), Math.cos(radiants), 0], [0, 0, 1] ]); Before projecting the verticies I multiply them by rZ and rX like so: vert1.multiply(rZ); vert1.multiply(rX); vert2.multiply(rZ); vert2.multiply(rX); vert3.multiply(rZ); vert3.multiply(rX); The projection itself looks like this: bX = (pos.x + (vert1.x*scale)); bY = (pos.y + (vert1.z*scale)); Where "pos.x" and "pos.y" is an offset for centering the box on the screen. I just can't seem to find a solution to this and I'm still relativly new to working with Matricies. You can view the source-code of the demo page if you want to see the whole thing.

    Read the article

  • inverse relation to multiple inheriting classes in django

    - by Ofri Raviv
    Here are my schematic models: class Law(models.Model): ... class Bill(models.Model): ... # data for a proposed law, or change of an existing law class PrivateBill(Bill): ... # data for a Bill that was initiated by a parliament member class GovernmentBill(Bill): ... # data for a Bill that was initiated by the government It is possible and likely that in the future I (or maybe someone else) would want to add more Bill types. Every Bill should point to a Law (indicating what law this bill is going to change) and my question is: What is the best way to implement this? If I add the ForeignKey(Law) to Bill, I'll have a relation from every Bill to Law, but a Law would only have an inverse relation to Bills (bill_set), and not a different inverse relation to each type of bill. Of course I'll be able to filter each type of bill to get only the ones pointing to a specific Law, but this is something I think I'll need to use often, so I think having privatebill_set, governmentbill_set etc would make the code more readable. Another possible solution is to add the foreign key to each of the inheriting classes (this would give me a privatebill_set, governmentbill_set, futurebill_set), but that seems hairy because I would be relying on future programmers to remember to add that relation. How would you solve this?

    Read the article

  • PeopleSoft HCM @ OHUG 11: Enter the Matrix

    - by Jay Zuckert
    The PeopleSoft HCM team is back from a very busy and exciting OHUG conference in Orlando. The packed, standing-room only PeopleSoft HCM Roadmap keynote was the highlight of the conference for many attendees and the reviews are in : PeopleSoft rocked the house ! Great demonstration of products in the keynote. Best keynote in a long time, and fun. Engaging and entertaining, great demonstration of capabilities. Message received loud and clear, PeopleSoft applications are here to stay.  PeopleSoft has a real vision moving forward. Real-time polls using mobile texting were cutting edge.                          Tracy Martin (as Trinity) and other members of the PeopleSoft HCM team presented a ‘must-see’ Matrix-themed session while dressed as movie characters. The keynote highlighted planned HCM capabilities for Matrix administration and future organization visualization enhancements. The team also previewed the planned Manager Dashboard and Talent Summary.                           Following the keynote, some of the cast posed for photo opportunities at the OHUG booth in the exhibition hall. As you can imagine, they received some interesting looks walking by the other vendor booths. The PeopleSoft HCM team also presented numerous other OHUG sessions covering PeopleSoft Talent Management, Compensation, HR HelpDesk, Payroll, Global HCM Practices, Time & Labor, Absence Management, and Benefits. All of those presentations are available from the OHUG site at www.ohug.org. When not in one of the well-attended PeopleSoft HCM sessions, conference attendees filled the Oracle booth in the exhibition hall to see live product demonstrations. True to their PeopleSoft roots, some of the PeopleSoft HCM team played as hard as they worked in Orlando and enjoyed the OHUG Appreciation event along with customers at the Hard Rock. We are already busy planning for Oracle OpenWorld 2011 and prepping sessions our PeopleSoft HCM customers are sure to like. We hope to see you there in San Francisco from Oct. 2-6. To learn more about OpenWorld or to register, click here.

    Read the article

  • tile_static, tile_barrier, and tiled matrix multiplication with C++ AMP

    - by Daniel Moth
    We ended the previous post with a mechanical transformation of the C++ AMP matrix multiplication example to the tiled model and in the process introduced tiled_index and tiled_grid. This is part 2. tile_static memory You all know that in regular CPU code, static variables have the same value regardless of which thread accesses the static variable. This is in contrast with non-static local variables, where each thread has its own copy. Back to C++ AMP, the same rules apply and each thread has its own value for local variables in your lambda, whereas all threads see the same global memory, which is the data they have access to via the array and array_view. In addition, on an accelerator like the GPU, there is a programmable cache, a third kind of memory type if you'd like to think of it that way (some call it shared memory, others call it scratchpad memory). Variables stored in that memory share the same value for every thread in the same tile. So, when you use the tiled model, you can have variables where each thread in the same tile sees the same value for that variable, that threads from other tiles do not. The new storage class for local variables introduced for this purpose is called tile_static. You can only use tile_static in restrict(direct3d) functions, and only when explicitly using the tiled model. What this looks like in code should be no surprise, but here is a snippet to confirm your mental image, using a good old regular C array // each tile of threads has its own copy of locA, // shared among the threads of the tile tile_static float locA[16][16]; Note that tile_static variables are scoped and have the lifetime of the tile, and they cannot have constructors or destructors. tile_barrier In amp.h one of the types introduced is tile_barrier. You cannot construct this object yourself (although if you had one, you could use a copy constructor to create another one). So how do you get one of these? You get it, from a tiled_index object. Beyond the 4 properties returning index objects, tiled_index has another property, barrier, that returns a tile_barrier object. The tile_barrier class exposes a single member, the method wait. 15: // Given a tiled_index object named t_idx 16: t_idx.barrier.wait(); 17: // more code …in the code above, all threads in the tile will reach line 16 before a single one progresses to line 17. Note that all threads must be able to reach the barrier, i.e. if you had branchy code in such a way which meant that there is a chance that not all threads could reach line 16, then the code above would be illegal. Tiled Matrix Multiplication Example – part 2 So now that we added to our understanding the concepts of tile_static and tile_barrier, let me obfuscate rewrite the matrix multiplication code so that it takes advantage of tiling. Before you start reading this, I suggest you get a cup of your favorite non-alcoholic beverage to enjoy while you try to fully understand the code. 01: void MatrixMultiplyTiled(vector<float>& vC, const vector<float>& vA, const vector<float>& vB, int M, int N, int W) 02: { 03: static const int TS = 16; 04: array_view<const float,2> a(M, W, vA); 05: array_view<const float,2> b(W, N, vB); 06: array_view<writeonly<float>,2> c(M,N,vC); 07: parallel_for_each(c.grid.tile< TS, TS >(), 08: [=] (tiled_index< TS, TS> t_idx) restrict(direct3d) 09: { 10: int row = t_idx.local[0]; int col = t_idx.local[1]; 11: float sum = 0.0f; 12: for (int i = 0; i < W; i += TS) { 13: tile_static float locA[TS][TS], locB[TS][TS]; 14: locA[row][col] = a(t_idx.global[0], col + i); 15: locB[row][col] = b(row + i, t_idx.global[1]); 16: t_idx.barrier.wait(); 17: for (int k = 0; k < TS; k++) 18: sum += locA[row][k] * locB[k][col]; 19: t_idx.barrier.wait(); 20: } 21: c[t_idx.global] = sum; 22: }); 23: } Notice that all the code up to line 9 is the same as per the changes we made in part 1 of tiling introduction. If you squint, the body of the lambda itself preserves the original algorithm on lines 10, 11, and 17, 18, and 21. The difference being that those lines use new indexing and the tile_static arrays; the tile_static arrays are declared and initialized on the brand new lines 13-15. On those lines we copy from the global memory represented by the array_view objects (a and b), to the tile_static vanilla arrays (locA and locB) – we are copying enough to fit a tile. Because in the code that follows on line 18 we expect the data for this tile to be in the tile_static storage, we need to synchronize the threads within each tile with a barrier, which we do on line 16 (to avoid accessing uninitialized memory on line 18). We also need to synchronize the threads within a tile on line 19, again to avoid the race between lines 14, 15 (retrieving the next set of data for each tile and overwriting the previous set) and line 18 (not being done processing the previous set of data). Luckily, as part of the awesome C++ AMP debugger in Visual Studio there is an option that helps you find such races, but that is a story for another blog post another time. May I suggest reading the next section, and then coming back to re-read and walk through this code with pen and paper to really grok what is going on, if you haven't already? Cool. Why would I introduce this tiling complexity into my code? Funny you should ask that, I was just about to tell you. There is only one reason we tiled our extent, had to deal with finding a good tile size, ensure the number of threads we schedule are correctly divisible with the tile size, had to use a tiled_index instead of a normal index, and had to understand tile_barrier and to figure out where we need to use it, and double the size of our lambda in terms of lines of code: the reason is to be able to use tile_static memory. Why do we want to use tile_static memory? Because accessing tile_static memory is around 10 times faster than accessing the global memory on an accelerator like the GPU, e.g. in the code above, if you can get 150GB/second accessing data from the array_view a, you can get 1500GB/second accessing the tile_static array locA. And since by definition you are dealing with really large data sets, the savings really pay off. We have seen tiled implementations being twice as fast as their non-tiled counterparts. Now, some algorithms will not have performance benefits from tiling (and in fact may deteriorate), e.g. algorithms that require you to go only once to global memory will not benefit from tiling, since with tiling you already have to fetch the data once from global memory! Other algorithms may benefit, but you may decide that you are happy with your code being 150 times faster than the serial-version you had, and you do not need to invest to make it 250 times faster. Also algorithms with more than 3 dimensions, which C++ AMP supports in the non-tiled model, cannot be tiled. Also note that in future releases, we may invest in making the non-tiled model, which already uses tiling under the covers, go the extra step and use tile_static memory on your behalf, but it is obviously way to early to commit to anything like that, and we certainly don't do any of that today. Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

  • Card deck and sparse matrix interview questions

    - by MrDatabase
    I just had a technical phone screen w/ a start-up. Here's the technical questions I was asked ... and my answers. What do think of these answers? Feel free to post better answers :-) Question 1: how would you represent a standard 52 card deck in (basically any language)? How would you shuffle the deck? Answer: use an array containing a "Card" struct or class. Each instance of card has some unique identifier... either it's position in the array or a unique integer member variable in the range [0, 51]. Shuffle the cards by traversing the array once from index zero to index 51. Randomly swap ith card with "another card" (I didn't remember how this shuffle algorithm works exactly). Watch out for using the same probability for each card... that's a gotcha in this algorithm. I mentioned the algorithm is from Programming Pearls. Question 2: how to represent a large sparse matrix? the matrix can be very large... like 1000x1000... but only a relatively small number (~20) of the entries are non-zero. Answer: condense the array into a list of the non-zero entries. for a given entry (i,j) in the array... "map" (i,j) to a single integer k... then use k as a key into a dictionary or hashtable. For the 1000x1000 sparse array map (i,j) to k using something like f(i, j) = i + j * 1001. 1001 is just one plus the maximum of all i and j. I didn't recall exactly how this mapping worked... but the interviewer got the idea (I think). Are these good answers? I'm wondering because after I finished the second question the interviewer said the dreaded "well that's all the questions I have for now." Cheers!

    Read the article

  • SSRS 2005 Matrix and border styles when exporting to XLS

    - by Mufasa
    The Matrix in SSRS (SQL Server Reporting Services 2005) seems to have issues with certain the border styles when exporting to XLS (but not PDF or web view; maybe other formats, not sure?). For example: Create a matrix and set the Matrix border style to Black Solid 1px, but all 4 of the cells to have a border style of Black None 1px. When viewed via the ASP.NET control, it looks correct. But after export to XLS, it creates borders around all of the header cells (column and row headers, and the top left cell), and even the right most data column. But all the cells in the middle of the report correctly have no border set. Update: If the Matrix borders are set to None, then the borders on the cells don't show up in XLS. So, how do you set an outer border around the Matrix, but not have it apply the 'all sides' border to every cell that touches the edge of the Matrix when exported to Excel?

    Read the article

  • One row is skipped each time the program scans a matrix from file !

    - by ZaZu
    Hello there, I had this code working yesterday, but it seems like I edited it a bit and lost the working version. I cant get this to work anymore. I basically want to scan a matrix from a .txt file. But each time it scans the first row, the second one is skipped, and it reads the third instead :( Here is my code : for(i=0;i<=test->rowmat1;i++){ for(j=0;j<=test->colmat1;j++){ fscanf(fin,"%f\t",&test->mat[i][j]); } fscanf(fin,"%*[^\n]",&test->mat[i][j]); } For example, for a matrix of : 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 If I extract 3 rows and 3 cols, I get : 1.00 2.00 3.00 7.00 8.00 9.00 Then fails, it wants to skip over the second line but there is nothing after 10 11 12 Why did it stop working ? What do I have wrong ? Please help, Thanks in advance.

    Read the article

  • Can I make a matrix row group span its child groups in SSRS?

    - by AaronSieb
    I have a matrix, whose rows are grouped into two groups. A class, and a time for that class. The class cell is going to end up being several lines long, and I'd like the rows for each time slot of the class to line up next to the class description, like this: ----------------------------------------- **Class** | 7:00am | [row data] Description of |---------------------- the class, this | 12:00pm | [row data] is several lines |---------------------- long. | 1:00pm | [row data] ----------------------------------------- But what I'm getting is this: ----------------------------------------- **Class** | 7:00am | [row data] Description of | | the class, this | | is several lines | | long. | | ----------------------------------------- | 12:00pm | [row data] | | | | | | | | ----------------------------------------- | 1:00pm | [row data] | | | | | | | | ----------------------------------------- Is there any way to make SSRS collapse the matrix?

    Read the article

  • Are CMAttitude and CATransform3D related by rotational matrices?

    - by Alex Stone
    I'm looking at the core motion class CMAttitude, it can express the device's orientation as a 3x3 rotational matrix. At the same time I've taken a look at the CATransform3D, which encapsulates the view's attitude, as well as scaling. The CATransform3D is a 4x4 matrix. I've seen that the OpenGL rotational matrix is 4x4 and is simply 0001 padded in the 4th row and column. I'm wandering if the CMAttitude's rotational matrix is related to CATransform's matrix? Can I use the device's rotation in space obtained via a rotational matrix to transform a UIView using CATransform3D? My intention is to let the user move the phone and apply the same transform to a UIView on the screen. Bonus question: if they are related, how do I transform a CMAttitude's rotational matrix to CATransform3D?

    Read the article

  • How can I find out how many rows of a matrix satisfy a rather complicated criterion (in R)?

    - by Brani
    As an example, here is a way to get a matrix of all possible outcomes of rolling 4 (fair) dice. z <- as.matrix(expand.grid(c(1:6),c(1:6),c(1:6),c(1:6))) As you may already have understood, I'm trying to work out a question that was closed, though, in my opinion, it's a challenging one. I used counting techniques to solve it (I mean by hand) and I finaly arrived to a number of outcomes, with a sum of subset being 5, equal to 1083 out of 1296. That result is consistent with the answers provided to that question, before it was closed. I was wondering how could that subset of outcomes (say z1, where dim(z1) = [1083,4] ) be generated using R. Do you have any ideas? Thank you.

    Read the article

  • k-means clustering in R on very large, sparse matrix?

    - by movingabout
    Hello, I am trying to do some k-means clustering on a very large matrix. The matrix is approximately 500000 rows x 4000 cols yet very sparse (only a couple of "1" values per row). The whole thing does not fit into memory, so I converted it into a sparse ARFF file. But R obviously can't read the sparse ARFF file format. I also have the data as a plain CSV file. Is there any package available in R for loading such sparse matrices efficiently? I'd then use the regular k-means algorithm from the cluster package to proceed. Many thanks

    Read the article

< Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19  | Next Page >