Search Results

Search found 1557 results on 63 pages for 'daniel scocco'.

Page 3/63 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

Lead, Follow, or Get out of the way

- by Daniel Moth

This is one of the sayings (attributed to Thomas Paine) that totally resonated with me from the first time I heard it, which was only 3 years ago during some training course at work: "Lead, Follow, or Get out of the way" You'll find many books with this title and you'll find it quoted by politicians and other leaders in various countries at various times... the quote is open to interpretation and works on many levels. To set the tone of what this means to me, I'll use a simple micro example: In any given conversation, you are either leading it or following it, at different times/snapshots of the conversation. If you are not willing or able to lead it, and you are not willing or able to follow it, then you should depart. The bad alternative which this guidance encourages you NOT to do is to stick around and obstruct progress by not following, not leading, and simply complaining or trying to derail the discussion in no particular direction. The same pattern applies at your position/role at work. Either follow your management/leadership team, or try to lead them to what you think is a better place, or change jobs. Don't stick around complaining about the direction things are going, while not actively trying to either change things or make peace with it. In the previous paragraph you can replace the word "your management" with "the people reporting to you" and the guidance still holds. Either lead your direct reports to where you think they should go, or follow their lead, or change jobs. Complaining about folks not taking direction while doing nothing is not a maintainable state. To me this quote is not about a permanent state, it is not about some people always leading and some always following: It is about a role/hat that anybody can play/wear at any given moment. One minute I am leading you, the next I am following you, and the next we are both following someone else and so on... When there is disagreement, debate the different directions for as long as it takes for you to be comfortable that you can either follow or lead. If you don't become comfortable with either of those, get out of the way. Something to remember is that it is impossible to learn how to lead well, without learning how to follow well (probably deserves its own blog entry)... Things go wrong when someone thinks that they must always be leading, or when everybody wants to follow and nobody steps up to lead... Things go wrong when more than one person wants to lead and they don't try to reach agreement on a shared direction, stubbornly sticking to their guns pulling the rest of the team in multiple directions... Things go wrong when more than one person wants to lead and after numerous and lengthy discussions, none of them decides to follow or get out of the way... Things go wrong when people don't want to lead, don't want to follow, and insist on sticking around... While there are a few ways things that can go wrong as enumerated in the previous paragraph, the most common one in my experience is the last one I mentioned. You'll recognize these folks as the ones that always complain about everything that is wrong with their company/product but do nothing about it. Every time you hear someone giving feedback on how something is wrong or suboptimal, ask them "So now that you identified the problem, what do you think the solution is and what are you doing to drive us to that solution?" The next time things start going wrong, step up and remind everyone: Lead, Follow, or Get out of the way. For more perspectives, and for input to help you form your own interpretation, search the web for this phrase to see in what contexts it is being used (bing, google). Finally, regardless of your political views, I hope you can appreciate if only as an example this perspective of someone leading by actually getting out of the way. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
Give a session on C++ AMP – here is how

- by Daniel Moth

Ever since presenting on C++ AMP at the AMD Fusion conference in June, then the Gamefest conference in August, and the BUILD conference in September, I've had numerous requests about my material from folks that want to re-deliver the same session. The C++ AMP session I put together has evolved over the 3 presentations to its final form that I used at BUILD, so that is the one I recommend you base yours on. Please get the slides and the recording from channel9 (I'll refer to slide numbers below). This is how I've been presenting the C++ AMP session: Context (slide 3, 04:18-08:18) Start with a demo, on my dual-GPU machine. I've been using the N-Body sample (for VS 11 Developer Preview). (slide 4) Use an nvidia slide that has additional examples of performance improvements that customers enjoy with heterogeneous computing. (slide 5) Talk a bit about the differences today between CPU and GPU hardware, leading to the fact that these will continue to co-exist and that GPUs are great for data parallel algorithms, but not much else today. One is a jack of all trades and the other is a number cruncher. (slide 6) Use the APU example from amd, as one indication that the hardware space is still in motion, emphasizing that the C++ AMP solution is a data parallel API, not a GPU API. It has a future proof design for hardware we have yet to see. (slide 7) Provide more meta-data, as blogged about when I first introduced C++ AMP. Code (slide 9-11) Introduce C++ AMP coding with a simplistic array-addition algorithm – the slides speak for themselves. (slide 12-13) index<N>, extent<N>, and grid<N>. (Slide 14-16) array<T,N>, array_view<T,N> and comparison between them. (Slide 17) parallel_for_each. (slide 18, 21) restrict. (slide 19-20) actual restrictions of restrict(direct3d) – the slides speak for themselves. (slide 22) bring it altogether with a matrix multiplication example. (slide 23-24) accelerator, and accelerator_view. (slide 26-29) Introduce tiling incl. tiled matrix multiplication [tiling probably deserves a whole session instead of 6 minutes!]. IDE (slide 34,37) Briefly touch on the concurrency visualizer. It supports GPU profiling, but enhancements specific to C++ AMP we hope will come at the Beta timeframe, which is when I'll be spending more time talking about it. (slide 35-36, 51:54-59:16) Demonstrate the GPU debugging experience in VS 11. Summary (slide 39) Re-iterate some of the points of slide 7, and add the point that the C++ AMP spec will be open for other compiler vendors to implement, even on other platforms (in fact, Microsoft is actively working on that). (slide 40) Links to content – see slide – including where all your questions should go: http://social.msdn.microsoft.com/Forums/en/parallelcppnative/threads. "But I don't have time for a full blown session, I only need 2 (or just 1, or 3) C++ AMP slides to use in my session on related topic X" If all you want is a small number of slides, you can take some from the session above and customize them. But because I am so nice, I have created some slides for you, including talking points in the notes section. Download them here. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
concurrency::extent<N> from amp.h

- by Daniel Moth

Overview We saw in a previous post how index<N> represents a point in N-dimensional space and in this post we'll see how to define the N-dimensional space itself. With C++ AMP, an N-dimensional space can be specified with the template class extent<N> where you define the size of each dimension. From a look and feel perspective, you'd expect the programmatic interface of a point type and size type to be similar (even though the concepts are different). Indeed, exactly like index<N>, extent<N> is essentially a coordinate vector of N integers ordered from most- to least- significant, BUT each integer represents the size for that dimension (and hence cannot be negative). So, if you read the description of index, you won't be surprised with the below description of extent<N> There is the rank field returning the value of N you passed as the template parameter. You can construct one extent from another (via the copy constructor or the assignment operator), you can construct it by passing an integer array, or via convenience constructor overloads for 1- 2- and 3- dimension extents. Note that the parameterless constructor creates an extent of the specified rank with all bounds initialized to 0. You can access the components of the extent through the subscript operator (passing it an integer). You can perform some arithmetic operations between extent objects through operator overloading, i.e. ==, !=, +=, -=, +, -. There are operator overloads so that you can perform operations between an extent and an integer: -- (pre- and post- decrement), ++ (pre- and post- increment), %=, *=, /=, +=, –= and, finally, there are additional overloads for plus and minus (+,-) between extent<N> and index<N> objects, returning a new extent object as the result. In addition to the usual suspects, extent offers a contains function that tests if an index is within the bounds of the extent (assuming an origin of zero). It also has a size function that returns the total linear size of this extent<N> in units of elements. Example code extent<2> e(3, 4); _ASSERT(e.rank == 2); _ASSERT(e.size() == 3 * 4); e += 3; e[1] += 6; e = e + index<2>(3,-4); _ASSERT(e == extent<2>(9, 9)); _ASSERT( e.contains(index<2>(8, 8))); _ASSERT(!e.contains(index<2>(8, 9))); grid<N> Our upcoming pre-release bits also have a similar type to extent, grid<N>. The way you create a grid is by passing it an extent, e.g. extent<3> e(4,2,6); grid<3> g(e); I am not going to dive deeper into grid, suffice for now to think of grid<N> simply as an alias for the extent<N> object, that you create when you encounter a function that expects a grid object instead of an extent object. Usage The extent class on its own simply defines the size of the N-dimensional space. We'll see in future posts that when you create containers (arrays) and wrappers (array_views) for your data, it is an extent<N> object that you'll need to use to create those (and use an index<N> object to index into them). We'll also see that it is a grid<N> object that you pass to the new parallel_for_each function that I'll cover in the next post. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
The way I think about Diagnostic tools

- by Daniel Moth

Every software has issues, or as we like to call them "bugs". That is not a discussion point, just a mere fact. It follows that an important skill for developers is to be able to diagnose issues in their code. Of course we need to advance our tools and techniques so we can prevent bugs getting into the code (e.g. unit testing), but beyond designing great software, diagnosing bugs is an equally important skill. To diagnose issues, the most important assets are good techniques, skill, experience, and maybe talent. What also helps is having good diagnostic tools and what helps further is knowing all the features that they offer and how to use them. The following classification is how I like to think of diagnostics. Note that like with any attempt to bucketize anything, you run into overlapping areas and blurry lines. Nevertheless, I will continue sharing my generalizations ;-) It is important to identify at the outset if you are dealing with a performance or a correctness issue. If you have a performance issue, use a profiler. I hear people saying "I am using the debugger to debug a performance issue", and that is fine, but do know that a dedicated profiler is the tool for that job. Just because you don't need them all the time and typically they cost more plus you are not as familiar with them as you are with the debugger, doesn't mean you shouldn't invest in one and instead try to exclusively use the wrong tool for the job. Visual Studio has a profiler and a concurrency visualizer (for profiling multi-threaded apps). If you have a correctness issue, then you have several options - that's next :-) This is how I think of identifying a correctness issue Do you want a tool to find the issue for you at design time? The compiler is such a tool - it gives you an exact list of errors. Compilers now also offer warnings, which is their way of saying "this may be an error, but I am not smart enough to know for sure". There are also static analysis tools, which go a step further than the compiler in identifying issues in your code, sometimes with the aid of code annotations and other times just by pointing them at your raw source. An example is FxCop and much more in Visual Studio 11 Code Analysis. Do you want a tool to find the issue for you with code execution? Just like static tools, there are also dynamic analysis tools that instead of statically analyzing your code, they analyze what your code does dynamically at runtime. Whether you have to setup some unit tests to invoke your code at runtime, or have to manually run your app (and interact with it) under the tool, or have to use a script to execute your binary under the tool… that varies. The result is still a list of issues for you to address after the analysis is complete or a pause of the execution when the first issue is encountered. If a code path was not taken, no analysis for it will exist, obviously. An example is the GPU Race detection tool that I'll be talking about on the C++ AMP team blog. Another example is the MSR concurrency CHESS tool. Do you want you to find the issue at design time using a tool? Perform a code walkthrough on your own or with colleagues. There are code review tools that go beyond just diffing sources, and they help you with that aspect too. For example, there is a new one in Visual Studio 11 and searching with my favorite search engine yielded this article based on the Developer Preview. Do you want you to find the issue with code execution? Use a debugger - let’s break this down further next. This is how I think of debugging: There is post mortem debugging. That means your code has executed and you did something in order to examine what happened during its execution. This can vary from manual printf and other tracing statements to trace events (e.g. ETW) to taking dumps. In all cases, you are left with some artifact that you examine after the fact (after code execution) to discern what took place hoping it will help you find the bug. Learn how to debug dump files in Visual Studio. There is live debugging. I will elaborate on this in a separate post, but this is where you inspect the state of your program during its execution, and try to find what the problem is. More from me in a separate post on live debugging. There is a hybrid of live plus post-mortem debugging. This is for example what tools like IntelliTrace offer. If you are a tools vendor interested in the diagnostics space, it helps to understand where in the above classification your tool excels, where its primary strength is, so you can market it as such. Then it helps to see which of the other areas above your tool touches on, and how you can make it even better there. Finally, see what areas your tool doesn't help at all with, and evaluate whether it should or continue to stay clear. Even though the classification helps us think about this space, the reality is that the best tools are either extremely excellent in only one of this areas, or more often very good across a number of them. Another approach is to offer a toolset covering all areas, with appropriate integration and hand off points from one to the other. Anyway, with that brain dump out of the way, in follow-up posts I will dive into live debugging, and specifically live debugging in Visual Studio - stay tuned if that interests you. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
concurrency::accelerator

- by Daniel Moth

Overview An accelerator represents a "target" on which C++ AMP code can execute and where data can reside. Typically (but not necessarily) an accelerator is a GPU device. Accelerators are represented in C++ AMP as objects of the accelerator class. For many scenarios, you do not need to obtain an accelerator object, since the runtime has a notion of a default accelerator, which is what it thinks is the best one in the system. Examples where you need to deal with accelerator objects are if you need to pick your own accelerator (based on your specific criteria), or if you need to use more than one accelerators from your app. Construction and operator usage You can query and obtain a std::vector of all the accelerators on your system, which the runtime discovers on startup. Beyond enumerating accelerators, you can also create one directly by passing to the constructor a system-wide unique path to a device if you know it (i.e. the “Device Instance Path” property for the device in Device Manager), e.g. accelerator acc(L"PCI\\VEN_1002&DEV_6898&SUBSYS_0B001002etc"); There are some predefined strings (for predefined accelerators) that you can pass to the accelerator constructor (and there are corresponding constants for those on the accelerator class itself, so you don’t have to hardcode them every time). Examples are the following: accelerator::default_accelerator represents the default accelerator that the C++ AMP runtime picks for you if you don’t pick one (the heuristics of how it picks one will be covered in a future post). Example: accelerator acc; accelerator::direct3d_ref represents the reference rasterizer emulator that simulates a direct3d device on the CPU (in a very slow manner). This emulator is available on systems with Visual Studio installed and is useful for debugging. More on debugging in general in future posts. Example: accelerator acc(accelerator::direct3d_ref); accelerator::direct3d_warp represents a target that I will cover in future blog posts. Example: accelerator acc(accelerator::direct3d_warp); accelerator::cpu_accelerator represents the CPU. In this first release the only use of this accelerator is for using the staging arrays technique that I'll cover separately. Example: accelerator acc(accelerator::cpu_accelerator); You can also create an accelerator by shallow copying another accelerator instance (via the corresponding constructor) or simply assigning it to another accelerator instance (via the operator overloading of =). Speaking of operator overloading, you can also compare (for equality and inequality) two accelerator objects between them to determine if they refer to the same underlying device. Querying accelerator characteristics Given an accelerator object, you can access its description, version, device path, size of dedicated memory in KB, whether it is some kind of emulator, whether it has a display attached, whether it supports double precision, and whether it was created with the debugging layer enabled for extensive error reporting. Below is example code that accesses some of the properties; in your real code you'd probably be checking one or more of them in order to pick an accelerator (or check that the default one is good enough for your specific workload): void inspect_accelerator(concurrency::accelerator acc) { std::wcout << "New accelerator: " << acc.description << std::endl; std::wcout << "is_debug = " << acc.is_debug << std::endl; std::wcout << "is_emulated = " << acc.is_emulated << std::endl; std::wcout << "dedicated_memory = " << acc.dedicated_memory << std::endl; std::wcout << "device_path = " << acc.device_path << std::endl; std::wcout << "has_display = " << acc.has_display << std::endl; std::wcout << "version = " << (acc.version >> 16) << '.' << (acc.version & 0xFFFF) << std::endl; } accelerator_view In my next blog post I'll cover a related class: accelerator_view. Suffice to say here that each accelerator may have from 1..n related accelerator_view objects. You can get the accelerator_view from an accelerator via the default_view property, or create new ones by invoking the create_view method that creates an accelerator_view object for you (by also accepting a queuing_mode enum value of deferred or immediate that we'll also explore in the next blog post). Comments about this post by Daniel Moth welcome at the original blog.

Read the article
Sending Outlook Invites

- by Daniel Moth

Sending an Outlook invite for a meeting (also referred to as S+ in Microsoft) is a simple thing to get right if you just run the quick mental check below, which is driven by visual cues in the Outlook UI. I know that some folks don’t do this often or are new to Outlook, so if you know one of those folks share this blog post with them and if they read nothing else ask them to read step 7. Add on the To line the folks that you want to be at the meeting. Indicate optional invitees. Click on the “To” button to bring up the dialog that lets you move folks to be Optional (you can also do this from the Scheduling Assistant). Set the Reminder according to the attendee that has to travel the most. 5 minutes is the minimum. Use the Response Options and uncheck the "Request Response" if your event is going ahead regardless of who can make it or not, i.e. if everyone is optional. Don’t force every recipient to make an extra click, instead make the extra click yourself - you are the organizer. Add a good subject Make the subject such that just by reading it folks know what the meeting is about. Examples, e.g. "Review…", "Finalize…", "XYZ sync up" If this is only between two people and what is commonly referred to as a one to one, the subject would be something like "MyName/YourName 1:1" Write the subject in such a way that when the recipient sees this on their calendar among all the other items, they know what this meeting is about without having to see location, recipients, or any other information about the invite. Add a location, typically a meeting room. If recipients are from different buildings, schedule it where the folks that are doing the other folks a favor live. Otherwise schedule it wherever the least amount of people will have to travel. If you send me an invite to come to your building, and there is more of us than you, you are silently sending me the message that you are doing me a favor so if you don’t want to do that, include a note of why this is in your building, e.g. "Sorry we are slammed with back to back meetings today so hope you can come over to our building". If this is in someone's office, the location would be something like "Moth's office (7/666)" where in parenthesis you see the office location. If some folks are remote in another building/country, or if you know you picked a time which wasn't free for everyone, add an Online option (click the Lync Meeting button). Add a date and time. This MUST be at a time that is showing on the recipients’ calendar as FREE or at worst TENTATIVE. You can check that on the Scheduling Assistant. The reality is that this is not always possible, so in that case you MUST say something about it in the Invite Body, e.g. "Sorry I can see X has a conflict, but I cannot find a better slot", or "With so many of us there are some conflicts and I cannot find a better slot so hope this works", or "Apologies but due to Y we must have this meeting at this time and I know there are some conflicts, hope you can make it anyway". When you do that, I better not be able to find a better slot myself for all of us, and of course when you do that you have implicitly designated the Busy folks as optional. Finally, the body of the invite. This has the agenda of the meeting and if applicable the courtesy apologies due to messing up steps 6 & 7. This should not be the introduction to the meeting, in other words the recipients should not be surprised when they see the invite and go to the body to read it. Notifying them of the meeting takes place via separate email where you explain the purpose and give them a heads up that you'll be sending an invite. That separate email is also your chance to attach documents, don’t do that as part of the invite. TIP: If you have sent mail about the meeting, you can then go to your sent folder to select the message and click the "Meeting" button (Ctrl+Alt+R). This will populate the body with the necessary background, auto select the mandatory and optional attendees as per the TO/CC line, and have a subject that may be good enough already (or you can tweak it). Long to write, but very quick to remember and enforce since most of it is common sense and the checklist is driven of the visual cues in the UI you use to send the invite. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
concurrency::index<N> from amp.h

- by Daniel Moth

Overview C++ AMP introduces a new template class index<N>, where N can be any value greater than zero, that represents a unique point in N-dimensional space, e.g. if N=2 then an index<2> object represents a point in 2-dimensional space. This class is essentially a coordinate vector of N integers representing a position in space relative to the origin of that space. It is ordered from most-significant to least-significant (so, if the 2-dimensional space is rows and columns, the first component represents the rows). The underlying type is a signed 32-bit integer, and component values can be negative. The rank field returns N. Creating an index The default parameterless constructor returns an index with each dimension set to zero, e.g. index<3> idx; //represents point (0,0,0) An index can also be created from another index through the copy constructor or assignment, e.g. index<3> idx2(idx); //or index<3> idx2 = idx; To create an index representing something other than 0, you call its constructor as per the following 4-dimensional example: int temp[4] = {2,4,-2,0}; index<4> idx(temp); Note that there are convenience constructors (that don’t require an array argument) for creating index objects of rank 1, 2, and 3, since those are the most common dimensions used, e.g. index<1> idx(3); index<2> idx(3, 6); index<3> idx(3, 6, 12); Accessing the component values You can access each component using the familiar subscript operator, e.g. One-dimensional example: index<1> idx(4); int i = idx[0]; // i=4 Two-dimensional example: index<2> idx(4,5); int i = idx[0]; // i=4 int j = idx[1]; // j=5 Three-dimensional example: index<3> idx(4,5,6); int i = idx[0]; // i=4 int j = idx[1]; // j=5 int k = idx[2]; // k=6 Basic operations Once you have your multi-dimensional point represented in the index, you can now treat it as a single entity, including performing common operations between it and an integer (through operator overloading): -- (pre- and post- decrement), ++ (pre- and post- increment), %=, *=, /=, +=, -=,%, *, /, +, -. There are also operator overloads for operations between index objects, i.e. ==, !=, +=, -=, +, –. Here is an example (where no assertions are broken): index<2> idx_a; index<2> idx_b(0, 0); index<2> idx_c(6, 9); _ASSERT(idx_a.rank == 2); _ASSERT(idx_a == idx_b); _ASSERT(idx_a != idx_c); idx_a += 5; idx_a[1] += 3; idx_a++; _ASSERT(idx_a != idx_b); _ASSERT(idx_a == idx_c); idx_b = idx_b + 10; idx_b -= index<2>(4, 1); _ASSERT(idx_a == idx_b); Usage You'll most commonly use index<N> objects to index into data types that we'll cover in future posts (namely array and array_view). Also when we look at the new parallel_for_each function we'll see that an index<N> object is the single parameter to the lambda, representing the (multi-dimensional) thread index… In the next post we'll go beyond being able to represent an N-dimensional point in space, and we'll see how to define the N-dimensional space itself through the extent<N> class. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
tile_static, tile_barrier, and tiled matrix multiplication with C++ AMP

- by Daniel Moth

We ended the previous post with a mechanical transformation of the C++ AMP matrix multiplication example to the tiled model and in the process introduced tiled_index and tiled_grid. This is part 2. tile_static memory You all know that in regular CPU code, static variables have the same value regardless of which thread accesses the static variable. This is in contrast with non-static local variables, where each thread has its own copy. Back to C++ AMP, the same rules apply and each thread has its own value for local variables in your lambda, whereas all threads see the same global memory, which is the data they have access to via the array and array_view. In addition, on an accelerator like the GPU, there is a programmable cache, a third kind of memory type if you'd like to think of it that way (some call it shared memory, others call it scratchpad memory). Variables stored in that memory share the same value for every thread in the same tile. So, when you use the tiled model, you can have variables where each thread in the same tile sees the same value for that variable, that threads from other tiles do not. The new storage class for local variables introduced for this purpose is called tile_static. You can only use tile_static in restrict(direct3d) functions, and only when explicitly using the tiled model. What this looks like in code should be no surprise, but here is a snippet to confirm your mental image, using a good old regular C array // each tile of threads has its own copy of locA, // shared among the threads of the tile tile_static float locA[16][16]; Note that tile_static variables are scoped and have the lifetime of the tile, and they cannot have constructors or destructors. tile_barrier In amp.h one of the types introduced is tile_barrier. You cannot construct this object yourself (although if you had one, you could use a copy constructor to create another one). So how do you get one of these? You get it, from a tiled_index object. Beyond the 4 properties returning index objects, tiled_index has another property, barrier, that returns a tile_barrier object. The tile_barrier class exposes a single member, the method wait. 15: // Given a tiled_index object named t_idx 16: t_idx.barrier.wait(); 17: // more code …in the code above, all threads in the tile will reach line 16 before a single one progresses to line 17. Note that all threads must be able to reach the barrier, i.e. if you had branchy code in such a way which meant that there is a chance that not all threads could reach line 16, then the code above would be illegal. Tiled Matrix Multiplication Example – part 2 So now that we added to our understanding the concepts of tile_static and tile_barrier, let me obfuscate rewrite the matrix multiplication code so that it takes advantage of tiling. Before you start reading this, I suggest you get a cup of your favorite non-alcoholic beverage to enjoy while you try to fully understand the code. 01: void MatrixMultiplyTiled(vector<float>& vC, const vector<float>& vA, const vector<float>& vB, int M, int N, int W) 02: { 03: static const int TS = 16; 04: array_view<const float,2> a(M, W, vA); 05: array_view<const float,2> b(W, N, vB); 06: array_view<writeonly<float>,2> c(M,N,vC); 07: parallel_for_each(c.grid.tile< TS, TS >(), 08: [=] (tiled_index< TS, TS> t_idx) restrict(direct3d) 09: { 10: int row = t_idx.local[0]; int col = t_idx.local[1]; 11: float sum = 0.0f; 12: for (int i = 0; i < W; i += TS) { 13: tile_static float locA[TS][TS], locB[TS][TS]; 14: locA[row][col] = a(t_idx.global[0], col + i); 15: locB[row][col] = b(row + i, t_idx.global[1]); 16: t_idx.barrier.wait(); 17: for (int k = 0; k < TS; k++) 18: sum += locA[row][k] * locB[k][col]; 19: t_idx.barrier.wait(); 20: } 21: c[t_idx.global] = sum; 22: }); 23: } Notice that all the code up to line 9 is the same as per the changes we made in part 1 of tiling introduction. If you squint, the body of the lambda itself preserves the original algorithm on lines 10, 11, and 17, 18, and 21. The difference being that those lines use new indexing and the tile_static arrays; the tile_static arrays are declared and initialized on the brand new lines 13-15. On those lines we copy from the global memory represented by the array_view objects (a and b), to the tile_static vanilla arrays (locA and locB) – we are copying enough to fit a tile. Because in the code that follows on line 18 we expect the data for this tile to be in the tile_static storage, we need to synchronize the threads within each tile with a barrier, which we do on line 16 (to avoid accessing uninitialized memory on line 18). We also need to synchronize the threads within a tile on line 19, again to avoid the race between lines 14, 15 (retrieving the next set of data for each tile and overwriting the previous set) and line 18 (not being done processing the previous set of data). Luckily, as part of the awesome C++ AMP debugger in Visual Studio there is an option that helps you find such races, but that is a story for another blog post another time. May I suggest reading the next section, and then coming back to re-read and walk through this code with pen and paper to really grok what is going on, if you haven't already? Cool. Why would I introduce this tiling complexity into my code? Funny you should ask that, I was just about to tell you. There is only one reason we tiled our extent, had to deal with finding a good tile size, ensure the number of threads we schedule are correctly divisible with the tile size, had to use a tiled_index instead of a normal index, and had to understand tile_barrier and to figure out where we need to use it, and double the size of our lambda in terms of lines of code: the reason is to be able to use tile_static memory. Why do we want to use tile_static memory? Because accessing tile_static memory is around 10 times faster than accessing the global memory on an accelerator like the GPU, e.g. in the code above, if you can get 150GB/second accessing data from the array_view a, you can get 1500GB/second accessing the tile_static array locA. And since by definition you are dealing with really large data sets, the savings really pay off. We have seen tiled implementations being twice as fast as their non-tiled counterparts. Now, some algorithms will not have performance benefits from tiling (and in fact may deteriorate), e.g. algorithms that require you to go only once to global memory will not benefit from tiling, since with tiling you already have to fetch the data once from global memory! Other algorithms may benefit, but you may decide that you are happy with your code being 150 times faster than the serial-version you had, and you do not need to invest to make it 250 times faster. Also algorithms with more than 3 dimensions, which C++ AMP supports in the non-tiled model, cannot be tiled. Also note that in future releases, we may invest in making the non-tiled model, which already uses tiling under the covers, go the extra step and use tile_static memory on your behalf, but it is obviously way to early to commit to anything like that, and we certainly don't do any of that today. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
Scheduling thread tiles with C++ AMP

- by Daniel Moth

This post assumes you are totally comfortable with, what some of us call, the simple model of C++ AMP, i.e. you could write your own matrix multiplication. We are now ready to explore the tiled model, which builds on top of the non-tiled one. Tiling the extent We know that when we pass a grid (which is just an extent under the covers) to the parallel_for_each call, it determines the number of threads to schedule and their index values (including dimensionality). For the single-, two-, and three- dimensional cases you can go a step further and subdivide the threads into what we call tiles of threads (others may call them thread groups). So here is a single-dimensional example: extent<1> e(20); // 20 units in a single dimension with indices from 0-19 grid<1> g(e); // same as extent tiled_grid<4> tg = g.tile<4>(); …on the 3rd line we subdivided the single-dimensional space into 5 single-dimensional tiles each having 4 elements, and we captured that result in a concurrency::tiled_grid (a new class in amp.h). Let's move on swiftly to another example, in pictures, this time 2-dimensional: So we start on the left with a grid of a 2-dimensional extent which has 8*6=48 threads. We then have two different examples of tiling. In the first case, in the middle, we subdivide the 48 threads into tiles where each has 4*3=12 threads, hence we have 2*2=4 tiles. In the second example, on the right, we subdivide the original input into tiles where each has 2*2=4 threads, hence we have 4*3=12 tiles. Notice how you can play with the tile size and achieve different number of tiles. The numbers you pick must be such that the original total number of threads (in our example 48), remains the same, and every tile must have the same size. Of course, you still have no clue why you would do that, but stick with me. First, we should see how we can use this tiled_grid, since the parallel_for_each function that we know expects a grid. Tiled parallel_for_each and tiled_index It turns out that we have additional overloads of parallel_for_each that accept a tiled_grid instead of a grid. However, those overloads, also expect that the lambda you pass in accepts a concurrency::tiled_index (new in amp.h), not an index<N>. So how is a tiled_index different to an index? A tiled_index object, can have only 1 or 2 or 3 dimensions (matching exactly the tiled_grid), and consists of 4 index objects that are accessible via properties: global, local, tile_origin, and tile. The global index is the same as the index we know and love: the global thread ID. The local index is the local thread ID within the tile. The tile_origin index returns the global index of the thread that is at position 0,0 of this tile, and the tile index is the position of the tile in relation to the overall grid. Confused? Here is an example accompanied by a picture that hopefully clarifies things: array_view<int, 2> data(8, 6, p_my_data); parallel_for_each(data.grid.tile<2,2>(), [=] (tiled_index<2,2> t_idx) restrict(direct3d) { /* todo */ }); Given the code above and the picture on the right, what are the values of each of the 4 index objects that the t_idx variables exposes, when the lambda is executed by T (highlighted in the picture on the right)? If you can't work it out yourselves, the solution follows: t_idx.global = index<2> (6,3) t_idx.local = index<2> (0,1) t_idx.tile_origin = index<2> (6,2) t_idx.tile = index<2> (3,1) Don't move on until you are comfortable with this… the picture really helps, so use it. Tiled Matrix Multiplication Example – part 1 Let's paste here the C++ AMP matrix multiplication example, bolding the lines we are going to change (can you guess what the changes will be?) 01: void MatrixMultiplyTiled_Part1(vector<float>& vC, const vector<float>& vA, const vector<float>& vB, int M, int N, int W) 02: { 03: 04: array_view<const float,2> a(M, W, vA); 05: array_view<const float,2> b(W, N, vB); 06: array_view<writeonly<float>,2> c(M, N, vC); 07: parallel_for_each(c.grid, 08: [=](index<2> idx) restrict(direct3d) { 09: 10: int row = idx[0]; int col = idx[1]; 11: float sum = 0.0f; 12: for(int i = 0; i < W; i++) 13: sum += a(row, i) * b(i, col); 14: c[idx] = sum; 15: }); 16: } To turn this into a tiled example, first we need to decide our tile size. Let's say we want each tile to be 16*16 (which assumes that we'll have at least 256 threads to process, and that c.grid.extent.size() is divisible by 256, and moreover that c.grid.extent[0] and c.grid.extent[1] are divisible by 16). So we insert at line 03 the tile size (which must be a compile time constant). 03: static const int TS = 16; ...then we need to tile the grid to have tiles where each one has 16*16 threads, so we change line 07 to be as follows 07: parallel_for_each(c.grid.tile<TS,TS>(), ...that means that our index now has to be a tiled_index with the same characteristics as the tiled_grid, so we change line 08 08: [=](tiled_index<TS, TS> t_idx) restrict(direct3d) { ...which means, without changing our core algorithm, we need to be using the global index that the tiled_index gives us access to, so we insert line 09 as follows 09: index<2> idx = t_idx.global; ...and now this code just works and it is tiled! Closing thoughts on part 1 The process we followed just shows the mechanical transformation that can take place from the simple model to the tiled model (think of this as step 1). In fact, when we wrote the matrix multiplication example originally, the compiler was doing this mechanical transformation under the covers for us (and it has additional smarts to deal with the cases where the total number of threads scheduled cannot be divisible by the tile size). The point is that the thread scheduling is always tiled, even when you use the non-tiled model. But with this mechanical transformation, we haven't gained anything… Hint: our goal with explicitly using the tiled model is to gain even more performance. In the next post, we'll evolve this further (beyond what the compiler can automatically do for us, in this first release), so you can see the full usage of the tiled model and its benefits… Comments about this post by Daniel Moth welcome at the original blog.

Read the article
parallel_for_each from amp.h – part 1

- by Daniel Moth

This posts assumes that you've read my other C++ AMP posts on index<N> and extent<N>, as well as about the restrict modifier. It also assumes you are familiar with C++ lambdas (if not, follow my links to C++ documentation). Basic structure and parameters Now we are ready for part 1 of the description of the new overload for the concurrency::parallel_for_each function. The basic new parallel_for_each method signature returns void and accepts two parameters: a grid<N> (think of it as an alias to extent) a restrict(direct3d) lambda, whose signature is such that it returns void and accepts an index of the same rank as the grid So it looks something like this (with generous returns for more palatable formatting) assuming we are dealing with a 2-dimensional space: // some_code_A parallel_for_each( g, // g is of type grid<2> [ ](index<2> idx) restrict(direct3d) { // kernel code } ); // some_code_B The parallel_for_each will execute the body of the lambda (which must have the restrict modifier), on the GPU. We also call the lambda body the "kernel". The kernel will be executed multiple times, once per scheduled GPU thread. The only difference in each execution is the value of the index object (aka as the GPU thread ID in this context) that gets passed to your kernel code. The number of GPU threads (and the values of each index) is determined by the grid object you pass, as described next. You know that grid is simply a wrapper on extent. In this context, one way to think about it is that the extent generates a number of index objects. So for the example above, if your grid was setup by some_code_A as follows: extent<2> e(2,3); grid<2> g(e); ...then given that: e.size()==6, e[0]==2, and e[1]=3 ...the six index<2> objects it generates (and hence the values that your lambda would receive) are: (0,0) (1,0) (0,1) (1,1) (0,2) (1,2) So what the above means is that the lambda body with the algorithm that you wrote will get executed 6 times and the index<2> object you receive each time will have one of the values just listed above (of course, each one will only appear once, the order is indeterminate, and they are likely to call your code at the same exact time). Obviously, in real GPU programming, you'd typically be scheduling thousands if not millions of threads, not just 6. If you've been following along you should be thinking: "that is all fine and makes sense, but what can I do in the kernel since I passed nothing else meaningful to it, and it is not returning any values out to me?" Passing data in and out It is a good question, and in data parallel algorithms indeed you typically want to pass some data in, perform some operation, and then typically return some results out. The way you pass data into the kernel, is by capturing variables in the lambda (again, if you are not familiar with them, follow the links about C++ lambdas), and the way you use data after the kernel is done executing is simply by using those same variables. In the example above, the lambda was written in a fairly useless way with an empty capture list: [ ](index<2> idx) restrict(direct3d), where the empty square brackets means that no variables were captured. If instead I write it like this [&](index<2> idx) restrict(direct3d), then all variables in the some_code_A region are made available to the lambda by reference, but as soon as I try to use any of those variables in the lambda, I will receive a compiler error. This has to do with one of the direct3d restrictions, where only one type can be capture by reference: objects of the new concurrency::array class that I'll introduce in the next post (suffice for now to think of it as a container of data). If I write the lambda line like this [=](index<2> idx) restrict(direct3d), all variables in the some_code_A region are made available to the lambda by value. This works for some types (e.g. an integer), but not for all, as per the restrictions for direct3d. In particular, no useful data classes work except for one new type we introduce with C++ AMP: objects of the new concurrency::array_view class, that I'll introduce in the post after next. Also note that if you capture some variable by value, you could use it as input to your algorithm, but you wouldn’t be able to observe changes to it after the parallel_for_each call (e.g. in some_code_B region since it was passed by value) – the exception to this rule is the array_view since (as we'll see in a future post) it is a wrapper for data, not a container. Finally, for completeness, you can write your lambda, e.g. like this [av, &ar](index<2> idx) restrict(direct3d) where av is a variable of type array_view and ar is a variable of type array - the point being you can be very specific about what variables you capture and how. So it looks like from a large data perspective you can only capture array and array_view objects in the lambda (that is how you pass data to your kernel) and then use the many threads that call your code (each with a unique index) to perform some operation. You can also capture some limited types by value, as input only. When the last thread completes execution of your lambda, the data in the array_view or array are ready to be used in the some_code_B region. We'll talk more about all this in future posts… (a)synchronous Please note that the parallel_for_each executes as if synchronous to the calling code, but in reality, it is asynchronous. I.e. once the parallel_for_each call is made and the kernel has been passed to the runtime, the some_code_B region continues to execute immediately by the CPU thread, while in parallel the kernel is executed by the GPU threads. However, if you try to access the (array or array_view) data that you captured in the lambda in the some_code_B region, your code will block until the results become available. Hence the correct statement: the parallel_for_each is as-if synchronous in terms of visible side-effects, but asynchronous in reality. That's all for now, we'll revisit the parallel_for_each description, once we introduce properly array and array_view – coming next. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
Effectiveness and Efficiency

- by Daniel Moth

In the professional environment, i.e. at work, I am always seeking personal growth and to be challenged. The result is that my assignments, my work list, my tasks, my goals, my commitments, my [insert whatever word resonates with you] keep growing (in scope and desired impact). Which in turn means I have to keep finding new ways to deliver more value, while not falling into the trap of working more hours. To do that I continuously evaluate both my effectiveness and my efficiency. EFFECTIVENESS The first thing I check is my effectiveness: Am I doing the right things? Am I focusing too much on unimportant things? Am I spending more time doing stuff that is important to my team/org/division/business/company, or am I spending it on stuff that is important to me and that I enjoy doing? Am I valuing activities that maybe I have outgrown and should be delegated to others who are at a stage I have surpassed (in Microsoft speak: is the work I am doing level appropriate or am I still operating at the previous level)? Notice how the answers to those questions change over time and due to certain events, so I have to remind myself to revisit them frequently. Events that force me to re-examine them are: change of role, change of team/org/etc, change of direction of team/org/etc, re-org, new hires on the team that take on some of the work I did, personal promotion, change of manager... and if none of those events has occurred since the last annual review, I ask myself those at each annual review anyway. If you think you are not being effective at work, make a list of the stuff that you do and start tracking where your time goes. In parallel, have a discussion with your manager about where they think your time should go. Ultimately your time is finite and hence it is your most precious investment, don't waste it. If your management doesn't value as highly what you spend your time on, then either convince your management, or stop spending your time on it, or find different management: Lead, Follow, or get out of the way! That's my view on effectiveness. You have to fix that before moving to being efficient, or you may end up being very efficient at stuff that nobody wants you to be doing in the first place. For example, you may be spending your time writing blog posts and becoming better and faster at it all the time. If your manager thinks that is not even part of your job description, you are wasting your time to satisfy your inner desires. Nobody can help you with your effectiveness other than your management chain and your management peers - they are the judges of it. EFFICIENCY The second thing I check is my efficiency: Am I doing things right? For me, doing things right means that I deliver the same quality of work faster [than what I used to, and than my peers, and than expected of me]. The result is that I can achieve more [than what I used to, and than my peers, and than expected of me]. Notice how the efficiency goal is a more portable one. If, by whatever criteria, you think you are the best at [insert your own skill here], this can change at two events: because you have new colleagues (who are potentially better than your older ones), and it can change with a change of manager (who has potentially higher expectations). That's about it. Once you are efficient at something, you carry that with you... All you need to really be doing here is, when taking on new kinds of work that you haven't done before, try a few approaches and devise a system so that you can become efficient at this new activity too... Just keep "collecting" stuff that you are efficient at. If you think you are not being efficient at something, break it down: What are the steps you take to complete that task? How long do you spend on each step? Talk to others about what steps they take, to see if you can optimize some steps away or trade them for better steps, or just learn how to complete a step faster. Have a system for every task you take so that you can have repeatable success. That's my view on efficiency. You have to fix it so that you can free up time to do more. When you plan a route from A to B - all else being equal - you try to get there as fast as possible so why would you not want to do that with your everyday work? For example, imagine you are inefficient at processing email: You spend more time than necessary dealing with email, and you still end up with dropped email threads and with slower response times than others. How can you improve? Talk to someone that you think is good at this, understand their system (e.g. here is my email processing system) and come up with one that works for you. Parting Thoughts Are you considered, by your colleagues and manager, an effective and efficient person at your workplace? If you are, what would you change if you were asked by your management to do the job of two people? Seriously, think about that! Your immediate reaction may be "that is not possible", but it actually is. You just have to re-assess what things that were previously important will now stop being important, by discussing them with your management and reaching agreement on relative priorities. For example, stuff that was previously on your plate may now have to be delegated or dropped. Where you thought you were efficient, maybe now you have to find an even faster path to completion, perhaps keeping in mind that Perfect is the Enemy of “Good Enough”. My personal experience (from both observing others and from my own reflection) is that when folks are struggling to keep up at work it is because of two reasons: They are investing energy in stuff that they enjoy doing which the business regards as having a lower priority than a lot of other things on their plate. They are completing tasks to a level of higher quality than what is required (due to personal pride) missing the big picture which almost always mandates completing three tasks at good enough quality than knocking only one of them out of the park while the other two come in late or not at all. There is a lot of content on the web, so I strongly encourage you to use your favorite search engine to read other views on effectiveness and efficiency (Bing, Google). Comments about this post by Daniel Moth welcome at the original blog.

Read the article
How I Record Screencasts

- by Daniel Moth

I get this asked a lot so here is my brain dump on the topic. What A screencast is just a demo that you present to yourself while recording the screen. As such, my advice for clearing your screen for demo purposes and setting up Visual Studio still applies here (adjusting for the fact I wrote those blog posts when I was running Vista and VS2008, not Windows 8 and VS2012). To see examples of screencasts, watch any of my screencasts on channel9. Why If you are a technical presenter, think of when you get best reactions from a developer audience in your sessions: when you are doing demos, of course. Imagine if you could package those alone and share them with folks to watch over and over? If you have ever gone through a tutorial trying to recreate steps to explore a feature, think how much more helpful it would be if you could watch a video and follow along. Think of how many folks you "touch" with a conference presentation, and how many more you can reach with an online shorter recording of the demo. If you invest so much of your time for the first type of activity, isn't the second type of activity also worth an investment? Fact: If you are able to record a screencast of a demo, you will be much better prepared to deliver it in person. In fact lately I will force myself to make a screencast of any demo I need to present live at an upcoming event. It is also a great backup - if for whatever reason something fails (software, network, etc) during an attempt of a live demo, you can just play the recorded video for the live audience. There are other reasons (e.g. internal sharing of the latest implemented feature) but the context above is the one within which I create most of my screencasts. Software & Hardware I use Camtasia from Tech Smith, version 7.1.1. Microsoft has a variety of options for capturing the screen to video, but I have been using this software for so long now that I have not invested time to explore alternatives… I also use whatever cheapo headset is near me, but sometimes I get some complaints from some folks about the audio so now I try to remember to use "the good headset". I do not use a web camera as I am not a huge fan of PIP. Preparation First you have to know your technology and demo. Once you think you know it, write down the outline and major steps of the demo. Keep it short 5-20 minutes max. I break that rule sometimes but try not to. The longer the video is the more chances that people will not have the patience to sit through it and the larger the download wmv file ends up being. Run your demo a few times, timing yourself each time to ensure that you have the planned timing correct, but also to make sure that you are comfortable with what you are going to demo. Unlike with a live audience, there is no live reaction/feedback to steer you, so it can be a bit unnerving at first. It can also lead you to babble too much, so try extra hard to be succinct when demoing/screencasting on your own. TIP: Before recording, hide your desktop/taskbar clock if it is showing. Recording To record you start the Camtasia Recorder tool Configure the settings thought the menus Capture menu to choose custom size or full screen. I try to use full screen and remember to lower the resolution of your screen to as low as possible, e.g. 1024x768 or 1360x768 or something like that. From the Tools -> Options dialog you can choose to record audio and the volume level. Effects menu I typically leave untouched but you should explore and experiment to your liking, e.g. how the mouse pointer is captured, and whether there should be a delay for the recording when you start it. Once you've configured these settings, typically you just launch this tool and hit the F9 key to start recording. TIP: As you record, if you ever start to "lose your way" hit F9 again to pause recording, regroup your thoughts and flow, and then hit F9 again to resume. Finally, hit F10 to stop recording. At that point the video starts playing for you in the recorder. This is where you can preview the video to see that you are happy with it before saving. If you are happy, hit the Save As menu to choose where you want to save the video. TIP: If you've really lost your way to the extent where you'll need to do some editing, hit F10 to stop recording, save the video and then record some more - you'll be able to stitch the videos together later and this will make it easier for you to delete the parts where you messed up. TIP: Before you commit to recording the whole demo, every time you should record 5 seconds and preview them to ensure that you are capturing the screen the way you want to and that your audio is still correctly configured and at the right level. Trust me, you do not want to be recording 15 minutes only to find out that you messed up on the configuration somewhere. Editing To edit the video you launch another Camtasia app, the Camtasia Studio. File->New Project. File->Save Project and choose location. File->Import Media and choose the video(s) you saved earlier. These adds them to the area at the top/middle but not at the timeline at the bottom. Right click on the video and choose Add to timeline. It will prompt you for the Editing dimensions and I always choose Recording Dimensions. Do whatever edits you want to do for this video, then add the next video if you have one to stitch and repeat. In terms of edits there are many options. The simplest is to do nothing, which is the option I did when I first starting doing these in 2006. Nowadays, I typically cut out pieces that I don't like and also lower/mute the audio in other areas and also speed up the video in some areas. A full tutorial on how to do this is beyond the scope of this blog post, but your starting point is to select portions on the timeline and then open the Edit menu at the very top (tip: the context menu doesn't have all options). You can spend hours editing a recording, so don’t lose track of time! When you are done editing, save again, and you are now ready to Produce. Producing Production is specific to where you will publish. I've only ever published on channel9, so for that I do the following File -> Produce and share. This opens a wizard dialog In the dropdown choose Custom production settings Hit Next and then choose WMV Hit Next and keep the default of Camtasia Studio Best Quality and File Size (recommended) Hit Next and choose Editing dimensions video size Hit Next, hit Options and you get a dialog. Enter a Title for the project tab and then on the author tab enter the Creator and Homepage. Hit OK Hit Next. Hit Next again. Enter a video file name in the Production name textbox and then hit Finish. Now do other stuff while you wait for the video to be produced and you hear it playing. After the video is produced watch it to ensure it was produced correctly (e.g. sometimes you get mouse issues) and then you are ready for publishing it. Publishing Follow the instructions of the place where you are going to publish. If you are MSFT internal and want to choose channel9 then contact those folks so they can share their instructions (if you don't know who they are ping me and I'll connect you but they are easy to find in the GAL). For me this involves using a tool to point to the video, choosing a file name (again), choosing an image from the video to display when it is not playing, choosing what output formats I want, and then later on a webpage adding tags, adding a description, and adding a title. That’s all folks, have fun! Comments about this post by Daniel Moth welcome at the original blog.

Read the article
array and array_view from amp.h

- by Daniel Moth

This is a very long post, but it also covers what are probably the classes (well, array_view at least) that you will use the most with C++ AMP, so I hope you enjoy it! Overview The concurrency::array and concurrency::array_view template classes represent multi-dimensional data of type T, of N dimensions, specified at compile time (and you can later access the number of dimensions via the rank property). If N is not specified, it is assumed that it is 1 (i.e. single-dimensional case). They are rectangular (not jagged). The difference between them is that array is a container of data, whereas array_view is a wrapper of a container of data. So in that respect, array behaves like an STL container, whereas the closest thing an array_view behaves like is an STL iterator (albeit with random access and allowing you to view more than one element at a time!). The data in the array (whether provided at creation time or added later) resides on an accelerator (which is specified at creation time either explicitly by the developer, or set to the default accelerator at creation time by the runtime) and is laid out contiguously in memory. The data provided to the array_view is not stored by/in the array_view, because the array_view is simply a view over the real source (which can reside on the CPU or other accelerator). The underlying data is copied on demand to wherever the array_view is accessed. Elements which differ by one in the least significant dimension of the array_view are adjacent in memory. array objects must be captured by reference into the lambda you pass to the parallel_for_each call, whereas array_view objects must be captured by value (into the lambda you pass to the parallel_for_each call). Creating array and array_view objects and relevant properties You can create array_view objects from other array_view objects of the same rank and element type (shallow copy, also possible via assignment operator) so they point to the same underlying data, and you can also create array_view objects over array objects of the same rank and element type e.g. array_view<int,3> a(b); // b can be another array or array_view of ints with rank=3 Note: Unlike the constructors above which can be called anywhere, the ones in the rest of this section can only be called from CPU code. You can create array objects from other array objects of the same rank and element type (copy and move constructors) and from other array_view objects, e.g. array<float,2> a(b); // b can be another array or array_view of floats with rank=2 To create an array from scratch, you need to at least specify an extent object, e.g. array<int,3> a(myExtent);. Note that instead of an explicit extent object, there are convenience overloads when N<=3 so you can specify 1-, 2-, 3- integers (dependent on the array's rank) and thus have the extent created for you under the covers. At any point, you can access the array's extent thought the extent property. The exact same thing applies to array_view (extent as constructor parameters, incl. convenience overloads, and property). While passing only an extent object to create an array is enough (it means that the array will be written to later), it is not enough for the array_view case which must always wrap over some other container (on which it relies for storage space and actual content). So in addition to the extent object (that describes the shape you'd like to be viewing/accessing that data through), to create an array_view from another container (e.g. std::vector) you must pass in the container itself (which must expose .data() and a .size() methods, e.g. like std::array does), e.g. array_view<int,2> aaa(myExtent, myContainerOfInts); Similarly, you can create an array_view from a raw pointer of data plus an extent object. Back to the array case, to optionally initialize the array with data, you can pass an iterator pointing to the start (and optionally one pointing to the end of the source container) e.g. array<double,1> a(5, myVector.begin(), myVector.end()); We saw that arrays are bound to an accelerator at creation time, so in case you don’t want the C++ AMP runtime to assign the array to the default accelerator, all array constructors have overloads that let you pass an accelerator_view object, which you can later access via the accelerator_view property. Note that at the point of initializing an array with data, a synchronous copy of the data takes place to the accelerator, and then to copy any data back we'll see that an explicit copy call is required. This does not happen with the array_view where copying is on demand... refresh and synchronize on array_view Note that in the previous section on constructors, unlike the array case, there was no overload that accepted an accelerator_view for array_view. That is because the array_view is simply a wrapper, so the allocation of the data has already taken place before you created the array_view. When you capture an array_view variable in your call to parallel_for_each, the copy of data between the non-CPU accelerator and the CPU takes place on demand (i.e. it is implicit, versus the explicit copy that has to happen with the array). There are some subtleties to the on-demand-copying that we cover next. The assumption when using an array_view is that you will continue to access the data through the array_view, and not through the original underlying source, e.g. the pointer to the data that you passed to the array_view's constructor. So if you modify the data through the array_view on the GPU, the original pointer on the CPU will not "know" that, unless one of two things happen: you access the data through the array_view on the CPU side, i.e. using indexing that we cover below you explicitly call the array_view's synchronize method on the CPU (this also gets called in the array_view's destructor for you) Conversely, if you make a change to the underlying data through the original source (e.g. the pointer), the array_view will not "know" about those changes, unless you call its refresh method. Finally, note that if you create an array_view of const T, then the data is copied to the accelerator on demand, but it does not get copied back, e.g. array_view<const double, 5> myArrView(…); // myArrView will not get copied back from GPU There is also a similar mechanism to achieve the reverse, i.e. not to copy the data of an array_view to the GPU. copy_to, data, and global copy/copy_async functions Both array and array_view expose two copy_to overloads that allow copying them to another array, or to another array_view, and these operations can also be achieved with assignment (via the = operator overloads). Also both array and array_view expose a data method, to get a raw pointer to the underlying data of the array or array_view, e.g. float* f = myArr.data();. Note that for array_view, this only works when the rank is equal to 1, due to the data only being contiguous in one dimension as covered in the overview section. Finally, there are a bunch of global concurrency::copy functions returning void (and corresponding concurrency::copy_async functions returning a future) that allow copying between arrays and array_views and iterators etc. Just browse intellisense or amp.h directly for the full set. Note that for array, all copying described throughout this post is deep copying, as per other STL container expectations. You can never have two arrays point to the same data. indexing into array and array_view plus projection Reading or writing data elements of an array is only legal when the code executes on the same accelerator as where the array was bound to. In the array_view case, you can read/write on any accelerator, not just the one where the original data resides, and the data gets copied for you on demand. In both cases, the way you read and write individual elements is via indexing as described next. To access (or set the value of) an element, you can index into it by passing it an index object via the subscript operator. Furthermore, if the rank is 3 or less, you can use the function ( ) operator to pass integer values instead of having to use an index object. e.g. array<float,2> arr(someExtent, someIterator); //or array_view<float,2> arr(someExtent, someContainer); index<2> idx(5,4); float f1 = arr[idx]; float f2 = arr(5,4); //f2 ==f1 //and the reverse for assigning, e.g. arr(idx[0], 7) = 6.9; Note that for both array and array_view, regardless of rank, you can also pass a single integer to the subscript operator which results in a projection of the data, and (for both array and array_view) you get back an array_view of rank N-1 (or if the rank was 1, you get back just the element at that location). Not Covered In this already very long post, I am not going to cover three very cool methods (and related overloads) that both array and array_view expose: view_as, section, reinterpret_as. We'll revisit those at some point in the future, probably on the team blog. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
Matrix Multiplication with C++ AMP

- by Daniel Moth

As part of our API tour of C++ AMP, we looked recently at parallel_for_each. I ended that post by saying we would revisit parallel_for_each after introducing array and array_view. Now is the time, so this is part 2 of parallel_for_each, and also a post that brings together everything we've seen until now. The code for serial and accelerated Consider a naïve (or brute force) serial implementation of matrix multiplication 0: void MatrixMultiplySerial(std::vector<float>& vC, const std::vector<float>& vA, const std::vector<float>& vB, int M, int N, int W) 1: { 2: for (int row = 0; row < M; row++) 3: { 4: for (int col = 0; col < N; col++) 5: { 6: float sum = 0.0f; 7: for(int i = 0; i < W; i++) 8: sum += vA[row * W + i] * vB[i * N + col]; 9: vC[row * N + col] = sum; 10: } 11: } 12: } We notice that each loop iteration is independent from each other and so can be parallelized. If in addition we have really large amounts of data, then this is a good candidate to offload to an accelerator. First, I'll just show you an example of what that code may look like with C++ AMP, and then we'll analyze it. It is assumed that you included at the top of your file #include <amp.h> 13: void MatrixMultiplySimple(std::vector<float>& vC, const std::vector<float>& vA, const std::vector<float>& vB, int M, int N, int W) 14: { 15: concurrency::array_view<const float,2> a(M, W, vA); 16: concurrency::array_view<const float,2> b(W, N, vB); 17: concurrency::array_view<concurrency::writeonly<float>,2> c(M, N, vC); 18: concurrency::parallel_for_each(c.grid, 19: [=](concurrency::index<2> idx) restrict(direct3d) { 20: int row = idx[0]; int col = idx[1]; 21: float sum = 0.0f; 22: for(int i = 0; i < W; i++) 23: sum += a(row, i) * b(i, col); 24: c[idx] = sum; 25: }); 26: } First a visual comparison, just for fun: The beginning and end is the same, i.e. lines 0,1,12 are identical to lines 13,14,26. The double nested loop (lines 2,3,4,5 and 10,11) has been transformed into a parallel_for_each call (18,19,20 and 25). The core algorithm (lines 6,7,8,9) is essentially the same (lines 21,22,23,24). We have extra lines in the C++ AMP version (15,16,17). Now let's dig in deeper. Using array_view and extent When we decided to convert this function to run on an accelerator, we knew we couldn't use the std::vector objects in the restrict(direct3d) function. So we had a choice of copying the data to the the concurrency::array<T,N> object, or wrapping the vector container (and hence its data) with a concurrency::array_view<T,N> object from amp.h – here we used the latter (lines 15,16,17). Now we can access the same data through the array_view objects (a and b) instead of the vector objects (vA and vB), and the added benefit is that we can capture the array_view objects in the lambda (lines 19-25) that we pass to the parallel_for_each call (line 18) and the data will get copied on demand for us to the accelerator. Note that line 15 (and ditto for 16 and 17) could have been written as two lines instead of one: extent<2> e(M, W); array_view<const float, 2> a(e, vA); In other words, we could have explicitly created the extent object instead of letting the array_view create it for us under the covers through the constructor overload we chose. The benefit of the extent object in this instance is that we can express that the data is indeed two dimensional, i.e a matrix. When we were using a vector object we could not do that, and instead we had to track via additional unrelated variables the dimensions of the matrix (i.e. with the integers M and W) – aren't you loving C++ AMP already? Note that the const before the float when creating a and b, will result in the underling data only being copied to the accelerator and not be copied back – a nice optimization. A similar thing is happening on line 17 when creating array_view c, where we have indicated that we do not need to copy the data to the accelerator, only copy it back. The kernel dispatch On line 18 we make the call to the C++ AMP entry point (parallel_for_each) to invoke our parallel loop or, as some may say, dispatch our kernel. The first argument we need to pass describes how many threads we want for this computation. For this algorithm we decided that we want exactly the same number of threads as the number of elements in the output matrix, i.e. in array_view c which will eventually update the vector vC. So each thread will compute exactly one result. Since the elements in c are organized in a 2-dimensional manner we can organize our threads in a two-dimensional manner too. We don't have to think too much about how to create the first argument (a grid) since the array_view object helpfully exposes that as a property. Note that instead of c.grid we could have written grid<2>(c.extent) or grid<2>(extent<2>(M, N)) – the result is the same in that we have specified M*N threads to execute our lambda. The second argument is a restrict(direct3d) lambda that accepts an index object. Since we elected to use a two-dimensional extent as the first argument of parallel_for_each, the index will also be two-dimensional and as covered in the previous posts it represents the thread ID, which in our case maps perfectly to the index of each element in the resulting array_view. The kernel itself The lambda body (lines 20-24), or as some may say, the kernel, is the code that will actually execute on the accelerator. It will be called by M*N threads and we can use those threads to index into the two input array_views (a,b) and write results into the output array_view ( c ). The four lines (21-24) are essentially identical to the four lines of the serial algorithm (6-9). The only difference is how we index into a,b,c versus how we index into vA,vB,vC. The code we wrote with C++ AMP is much nicer in its indexing, because the dimensionality is a first class concept, so you don't have to do funny arithmetic calculating the index of where the next row starts, which you have to do when working with vectors directly (since they store all the data in a flat manner). I skipped over describing line 20. Note that we didn't really need to read the two components of the index into temporary local variables. This mostly reflects my personal choice, in some algorithms to break down the index into local variables with names that make sense for the algorithm, i.e. in this case row and col. In other cases it may i,j,k or x,y,z, or M,N or whatever. Also note that we could have written line 24 as: c(idx[0], idx[1])=sum or c(row, col)=sum instead of the simpler c[idx]=sum Targeting a specific accelerator Imagine that we had more than one hardware accelerator on a system and we wanted to pick a specific one to execute this parallel loop on. So there would be some code like this anywhere before line 18: vector<accelerator> accs = MyFunctionThatChoosesSuitableAccelerators(); accelerator acc = accs[0]; …and then we would modify line 18 so we would be calling another overload of parallel_for_each that accepts an accelerator_view as the first argument, so it would become: concurrency::parallel_for_each(acc.default_view, c.grid, ...and the rest of your code remains the same… how simple is that? Comments about this post by Daniel Moth welcome at the original blog.

Read the article
Objective-C As A First OOP Language?

- by Daniel Scocco

I am just finishing the second semester of my CS degree. So far I learned C, all the fundamental algorithms and data structures (e.g., searching, sorting, linked lists, heaps, hash tables, trees, graphs, etc). Next year we'll start with OOP, using either Java or C++. Recently I got some ideas for some iPhone apps and got itchy to start working on them. However I heard some bad things about Objectice-C in the past, so I am wondering if learning it as my first OOP language could be a problem. Not to mention that I think it will be hard to find books/online courses that teach basic OOP concepts using Objective-C to illustrate the concepts (as opposed to books using Java or C++, which are plenty), so this could be another problem. In summary: should I start learning Objective-C and OOP concepts right now by my own, or wait one more semester until I learn Java/C++ at university and then jump into Objective-C? Update: For those interested in getting started with OOP via Objective-C I just found some nice tutorials inside Apple's Developer Library - http://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/OOP_ObjC/Introduction/Introduction.html

Read the article
When can I be sure a directed graph is acyclic?

- by Daniel Scocco

The definition for directed acyclic graph is this: "there is no way to start at some vertex v and follow a sequence of edges that eventually loops back to v again." So far so good, but I am trying to find some premises that will be simpler to test and that will also guarantee the graph is acyclic. I came up with those premises, but they are pretty basic so I am sure other people figured it out in the past (or they are incorrect). The problem is I couldn't find anything related on books/online, hence why I decided to post this question. Premise 1: If all vertices of the graph have an incoming edge, then the graph can't be acyclic. Is this correct? Premise 2: Assume the graph in question does have one vertex with no incoming edges. In this case, in order to have a cycle, at least one of the other vertices would need to have two or more incoming edges. Is this correct?

Read the article
Processing Email in Outlook

- by Daniel Moth

A. Why Goal 1 = Help others: Have at most a 24-hour response turnaround to internal (from colleague) emails, typically achieving same day response. Goal 2 = Help projects: Not to implicitly pass/miss an opportunity to have impact on electronic discussions around any project on the radar. Not achieving goals 1 & 2 = Colleagues stop relying on you, drop you off conversations, don't see you as a contributing resource or someone that cares, you are perceived as someone with no peripheral vision. Note this is perfect if all you are doing is cruising at your job, trying to fly under the radar, with no ambitions of having impact beyond your absolute minimum 'day job'. B. DON'T: Leave unread email lurking around Don't: Receive or process all incoming emails in a single folder ('inbox' or 'unread mail'). This is actually possible if you receive a small number of emails (e.g. new to the job, not working at a company like Microsoft). Even so, with (your future) success at any level (company, community) comes large incoming email, so learn to deal with it. With large volumes, it is best to let the system help you by doing some categorization and filtering on your behalf (instead of trying to do that in your head as you process the single folder). See later section on how to achieve this. Don't: Leave emails as 'unread' (or worse: read them, then mark them as unread). Often done by individuals who think they possess super powers ("I can mentally cache and distinguish between the emails I chose not to read, the ones that are actually new, and the ones I decided to revisit in the future; the fact that they all show up the same (bold = unread) does not confuse me"). Interactions with this super-powered individuals typically end up with them saying stuff like "I must have missed that email you are talking about (from 2 weeks ago)" or "I am a bit behind, so I haven't read your email, can you remind me". TIP: The only place where you are "allowed" unread email is in your Deleted Items folder. Don't: Interpret a read email as an email that has been processed. Doing that, means you will always end up with fake unread email (that you have actually read, but haven't dealt with completely so you then marked it as unread) lurking between actual unread email. Another side effect is reading the email and making a 'mental' note to action it, then leaving the email as read, so the only thing left to remind you to carry out the action is… you. You are not super human, you will forget. This is a key distinction. Reading (or even scanning) a new email, means you now know what needs to be done with it, in order for it to be truly considered processed. Truly processing an email is to, for example, write an email of your own (e.g. to reply or forward), or take a non-email related action (e.g. create calendar entry, do something on some website), or read it carefully to gain some knowledge (e.g. it had a spec as an attachment), or keep it around as reference etc. 'Reading' means that you know what to do, not that you have done it. An email that is read is an email that is triaged, not an email that is resolved. Sometimes the thing that needs to be done based on receiving the email, you can (and want) to do immediately after reading the email. That is fine, you read the email and you processed it (typically when it takes no longer than X minutes, where X is your personal tolerance – mine is roughly 2 minutes). Other times, you decide that you don't want to spend X minutes at that moment, so after reading the email you need a quick system for "marking" the email as to be processed later (and you still leave it as 'read' in outlook). See later section for how. C. DO: Use Outlook rules and have multiple folders where incoming email is automatically moved to Outlook email rules are very powerful and easy to configure. Use them to automatically file email into folders. Here are mine (note that if a rule catches an email message then no further rules get processed): "personal" Email is either personal or business related. Almost all personal email goes to my gmail account. The personal emails that end up on my work email account, go to a dedicated folder – that is achieved via a rule that looks at the email's 'From' field. For those that slip through, I use the new Outlook 2010 quick step of "Conversation To Folder" feature to let the slippage only occur once per conversation, and then update my rules. "External" and "ViaBlog" The remaining external emails either come from my blog (rule on the subject line) or are unsolicited (rule on the domain name not being microsoft) and they are filed accordingly. "invites" I may do a separate blog post on calendar management, but suffice to say it should be kept up to date. All invite requests end up in this folder, so that even if mail gets out of control, the calendar can stay under control (only 1 folder to check). I.e. so I can let the organizer know why I won't be attending their meeting (or that I will be). Note: This folder is the only one that shows the total number of items in it, instead of the total unread. "Inbox" The only email that ends up here is email sent TO me and me only. Note that this is also the only email that shows up above the systray icon in the notification toast – all other emails cannot interrupt. "ToMe++" Email where I am on the TO line, but there are other recipients as well (on the TO or CC line). "CC" Email where I am on the CC line. I need to read these, but nobody is expecting a response or action from me so they are not as urgent (and if they are and follow up with me, they'll receive a link to this). "@ XYZ" Emails to aliases that are about projects that I directly work on (and I wasn't on the TO or CC line, of course). Test: these projects are in my commitments that I get measured on at the end of the year. "Z Mass" and subfolders under it per distribution list (DL) Emails to aliases that are about topics that I am interested in, but not that I formally own/contribute to. Test: if I unsubscribed from these aliases, nobody could rightfully complain. "Admin" folder, which resides under "Z Mass" folder Emails to aliases that I was added typically by an admin, e.g. broad emails to the floor/group/org/building/division/company that I am a member of. "BCC" folder, which resides under "Z Mass" Emails where I was not on the TO or the CC line explicitly and the alias it was sent to is not one I explicitly subscribed to (or I have been added to the BCC line, which I briefly touched on in another post). When there are only a few quick minutes to catch up on email, read as much as possible from these folders, in this order: Invites, Inbox, ToMe++. Only when these folders are all read (remember that doesn't mean that each email in them has been fully dealt with), we can move on to the @XYZ and then the CC folders. Only when those are read we can go on to the remaining folders. Note that the typical flow in the "Z Mass" subfolders is to scan subject lines and use the new Ctrl+Delete Outlook 2010 feature to ignore conversations. D. DO: Use Outlook Search folders in combination with categories As you process each folder, when you open a new email (i.e. click on it and read it in the preview pane) the email becomes read and stays read and you have to decide whether: It can take 2 minutes to deal with for good, right now, or It will take longer than 2 minutes, so it needs to be postponed with a clear next step, which is one of ToReply – there may be intermediate action steps, but ultimately someone else needs to receive email about this Action – no email is required, but I need to do something ReadLater – no email is required from the quick scan, but this is too long to fully read now, so it needs to be read it later WaitingFor – the email is informing of an intermediate status and 'promising' a future email update. Need to track. SomedayMaybe – interesting but not important, non-urgent, non-time-bound information. I may want to spend part of one of my weekends reading it. For all these 'next steps' use Outlook categories (right click on the email and assign category, or use shortcut key). Note that I also use category 'WaitingFor' for email that I send where I am expecting a response and need to track it. Create a new search folder for each category (I dragged the search folders into my favorites at the top left of Outlook, above my inboxes). So after the activity of reading/triaging email in the normal folders (where the email arrived) is done, the result is a bunch of emails appearing in the search folders (configure them to show the total items, not the total unread items). To actually process email (that takes more than 2 minutes to deal with) process the search folders, starting with ToReply and Action. E. DO: Get into a Routine Now you have a system in place, get into a routine of using it. Here is how I personally use mine, but this part I keep tweaking: Spend short bursts of time (between meetings, during boring but mandatory meetings and, in general, 2-4 times a day) aiming to have no unread emails (and in the process deal with some emails that take less than 2 minutes). Spend around 30 minutes at the end of each day processing most urgent items in search folders. Spend as long as it takes each Friday (or even the weekend) ensuring there is no unnecessary email baggage carried forward to the following week. F. Other resources Official Outlook help on: Create custom actions rules, Manage e-mail messages with rules, creating a search folder. Video on ignoring conversations (Ctrl+Del). Official blog post on Quick Steps and in particular the Move Conversation to folder. If you've read "Getting Things Done" it is very obvious that my approach to email management is driven by GTD. A very similar approach was described previously by ScottHa (also influenced by GTD), worth reading here. He also described how he sets up 2 outlook rules ('invites' and 'external') which I also use – worth reading that too. Comments about this post welcome at the original blog.

Read the article
Processing Email in Outlook

- by Daniel Moth

A. Why Goal 1 = Help others: Have at most a 24-hour response turnaround to internal (from colleague) emails, typically achieving same day response. Goal 2 = Help projects: Not to implicitly pass/miss an opportunity to have impact on electronic discussions around any project on the radar. Not achieving goals 1 & 2 = Colleagues stop relying on you, drop you off conversations, don't see you as a contributing resource or someone that cares, you are perceived as someone with no peripheral vision. Note this is perfect if all you are doing is cruising at your job, trying to fly under the radar, with no ambitions of having impact beyond your absolute minimum 'day job'. B. DON'T: Leave unread email lurking around Don't: Receive or process all incoming emails in a single folder ('inbox' or 'unread mail'). This is actually possible if you receive a small number of emails (e.g. new to the job, not working at a company like Microsoft). Even so, with (your future) success at any level (company, community) comes large incoming email, so learn to deal with it. With large volumes, it is best to let the system help you by doing some categorization and filtering on your behalf (instead of trying to do that in your head as you process the single folder). See later section on how to achieve this. Don't: Leave emails as 'unread' (or worse: read them, then mark them as unread). Often done by individuals who think they possess super powers ("I can mentally cache and distinguish between the emails I chose not to read, the ones that are actually new, and the ones I decided to revisit in the future; the fact that they all show up the same (bold = unread) does not confuse me"). Interactions with this super-powered individuals typically end up with them saying stuff like "I must have missed that email you are talking about (from 2 weeks ago)" or "I am a bit behind, so I haven't read your email, can you remind me". TIP: The only place where you are "allowed" unread email is in your Deleted Items folder. Don't: Interpret a read email as an email that has been processed. Doing that, means you will always end up with fake unread email (that you have actually read, but haven't dealt with completely so you then marked it as unread) lurking between actual unread email. Another side effect is reading the email and making a 'mental' note to action it, then leaving the email as read, so the only thing left to remind you to carry out the action is… you. You are not super human, you will forget. This is a key distinction. Reading (or even scanning) a new email, means you now know what needs to be done with it, in order for it to be truly considered processed. Truly processing an email is to, for example, write an email of your own (e.g. to reply or forward), or take a non-email related action (e.g. create calendar entry, do something on some website), or read it carefully to gain some knowledge (e.g. it had a spec as an attachment), or keep it around as reference etc. 'Reading' means that you know what to do, not that you have done it. An email that is read is an email that is triaged, not an email that is resolved. Sometimes the thing that needs to be done based on receiving the email, you can (and want) to do immediately after reading the email. That is fine, you read the email and you processed it (typically when it takes no longer than X minutes, where X is your personal tolerance – mine is roughly 2 minutes). Other times, you decide that you don't want to spend X minutes at that moment, so after reading the email you need a quick system for "marking" the email as to be processed later (and you still leave it as 'read' in outlook). See later section for how. C. DO: Use Outlook rules and have multiple folders where incoming email is automatically moved to Outlook email rules are very powerful and easy to configure. Use them to automatically file email into folders. Here are mine (note that if a rule catches an email message then no further rules get processed): "personal" Email is either personal or business related. Almost all personal email goes to my gmail account. The personal emails that end up on my work email account, go to a dedicated folder – that is achieved via a rule that looks at the email's 'From' field. For those that slip through, I use the new Outlook 2010 quick step of "Conversation To Folder" feature to let the slippage only occur once per conversation, and then update my rules. "External" and "ViaBlog" The remaining external emails either come from my blog (rule on the subject line) or are unsolicited (rule on the domain name not being microsoft) and they are filed accordingly. "invites" I may do a separate blog post on calendar management, but suffice to say it should be kept up to date. All invite requests end up in this folder, so that even if mail gets out of control, the calendar can stay under control (only 1 folder to check). I.e. so I can let the organizer know why I won't be attending their meeting (or that I will be). Note: This folder is the only one that shows the total number of items in it, instead of the total unread. "Inbox" The only email that ends up here is email sent TO me and me only. Note that this is also the only email that shows up above the systray icon in the notification toast – all other emails cannot interrupt. "ToMe++" Email where I am on the TO line, but there are other recipients as well (on the TO or CC line). "CC" Email where I am on the CC line. I need to read these, but nobody is expecting a response or action from me so they are not as urgent (and if they are and follow up with me, they'll receive a link to this). "@ XYZ" Emails to aliases that are about projects that I directly work on (and I wasn't on the TO or CC line, of course). Test: these projects are in my commitments that I get measured on at the end of the year. "Z Mass" and subfolders under it per distribution list (DL) Emails to aliases that are about topics that I am interested in, but not that I formally own/contribute to. Test: if I unsubscribed from these aliases, nobody could rightfully complain. "Admin" folder, which resides under "Z Mass" folder Emails to aliases that I was added typically by an admin, e.g. broad emails to the floor/group/org/building/division/company that I am a member of. "BCC" folder, which resides under "Z Mass" Emails where I was not on the TO or the CC line explicitly and the alias it was sent to is not one I explicitly subscribed to (or I have been added to the BCC line, which I briefly touched on in another post). When there are only a few quick minutes to catch up on email, read as much as possible from these folders, in this order: Invites, Inbox, ToMe++. Only when these folders are all read (remember that doesn't mean that each email in them has been fully dealt with), we can move on to the @XYZ and then the CC folders. Only when those are read we can go on to the remaining folders. Note that the typical flow in the "Z Mass" subfolders is to scan subject lines and use the new Ctrl+Delete Outlook 2010 feature to ignore conversations. D. DO: Use Outlook Search folders in combination with categories As you process each folder, when you open a new email (i.e. click on it and read it in the preview pane) the email becomes read and stays read and you have to decide whether: It can take 2 minutes to deal with for good, right now, or It will take longer than 2 minutes, so it needs to be postponed with a clear next step, which is one of ToReply – there may be intermediate action steps, but ultimately someone else needs to receive email about this Action – no email is required, but I need to do something ReadLater – no email is required from the quick scan, but this is too long to fully read now, so it needs to be read it later WaitingFor – the email is informing of an intermediate status and 'promising' a future email update. Need to track. SomedayMaybe – interesting but not important, non-urgent, non-time-bound information. I may want to spend part of one of my weekends reading it. For all these 'next steps' use Outlook categories (right click on the email and assign category, or use shortcut key). Note that I also use category 'WaitingFor' for email that I send where I am expecting a response and need to track it. Create a new search folder for each category (I dragged the search folders into my favorites at the top left of Outlook, above my inboxes). So after the activity of reading/triaging email in the normal folders (where the email arrived) is done, the result is a bunch of emails appearing in the search folders (configure them to show the total items, not the total unread items). To actually process email (that takes more than 2 minutes to deal with) process the search folders, starting with ToReply and Action. E. DO: Get into a Routine Now you have a system in place, get into a routine of using it. Here is how I personally use mine, but this part I keep tweaking: Spend short bursts of time (between meetings, during boring but mandatory meetings and, in general, 2-4 times a day) aiming to have no unread emails (and in the process deal with some emails that take less than 2 minutes). Spend around 30 minutes at the end of each day processing most urgent items in search folders. Spend as long as it takes each Friday (or even the weekend) ensuring there is no unnecessary email baggage carried forward to the following week. F. Other resources Official Outlook help on: Create custom actions rules, Manage e-mail messages with rules, creating a search folder. Video on ignoring conversations (Ctrl+Del). Official blog post on Quick Steps and in particular the Move Conversation to folder. If you've read "Getting Things Done" it is very obvious that my approach to email management is driven by GTD. A very similar approach was described previously by ScottHa (also influenced by GTD), worth reading here. He also described how he sets up 2 outlook rules ('invites' and 'external') which I also use – worth reading that too. Comments about this post welcome at the original blog.

Read the article
Microsoft Windows HPC Server R2 Beta2

- by Daniel Moth

Internally and unofficially we refer to this as "HPC Server v3" and its Beta2 became available last week. Read the full story on this blog post from Ryan and this one from Don. There has been a lot of excitement on the web for this release with coverage from last Wednesday here, here, here, here, here and here. Don't forget that Visual Studio 2010 makes it easy to develop for HPC Server including the MPI Cluster Debugger integration that I explained here and here. Comments about this post welcome at the original blog.

Read the article
dasBlog

- by Daniel Moth

Some people like blogging on a site that is completely managed by someone else (e.g. http://wordpress.com/) and others, like me, prefer hosting their own blog at their own domain. In the latter case you need to decide what blog engine to install on your web space to power your blog. There are many free blog engines to choose from (e.g. the one from http://wordpress.org/). If, like me, you want to use a blog engine that is based on the .NET platform you have many choices including BlogEngine.NET, Subtext and the one I picked: dasBlog. In this post I'll describe the steps I took to get going with the open source dasBlog (home page, source page). A. Installing First I installed dasBlog on my local Windows 7 machine where I have IIS7 installed. To install dasBlog, I started by clicking the "Install" button on its web gallery page. After that I went through configuration, theming and adding content as described below. Once I was happy that everything was working correctly on the local machine, I set this up on a hosting service. I went for a Windows IIS7 shared hosting 3 month Economy plan from GoDaddy. The dasBlog site lists a bunch of other hosts. You can read the installation instructions for dasBlog, and with GoDaddy I just had to click one button since it is available as part of their quick-install apps. With GoDaddy I had a previewdns option that allowed me to play around and preview my site before going live. B. Configuring After it was installed (on local machine and/or hosting provider), I followed the obvious steps to create an admin user and logged in. This displays an admin navigation bar with the following options: 1. Navigator Links: I decided I was not going to use this feature. I manage links on the side of my blog manually elsewhere as part of the theme. So, I deleted every entry on this page and ignored it thereafter. 2. Blogroll: Ditto - same comment as for Navigator Links. 3. Content Filters: I did not delete (or add) these, but I did ensure both checkboxes are not checked. I.e. I am not using this feature now, but I may return to it in the future. 4. Activity: This is a read-only view of various statistics. So nothing to configure here, but useful to come back to for complementary statistics to whatever other statistical package you use (e.g. free stats as part of the hosting and I also use feedburner for syndication stats). 5. Cross-posting: I did not need that, so I turned it off via the Configuration Settings discussed next. 6. Configuration Settings: This is where the bulk of the configuration for the blog takes place and they are stored in a single XML file: Site.Config file. There are truly self-explanatory options to pick for Basic Settings, Services Settings and Services to Ping, Syndication Settings (this is where you link to your feedburner name if you have one) and Mail to Weblog Settings (I keep this turned off). There are also "Xml Storage System Settings" (I keep this turned off), "OpenId Settings" (I allow OpenID commenters), "Spammer Settings" (Enable captcha, never show email addresses) and "Comment settings" (Enable comments, don't allow on older posts, don't allow html). There are also Appearance Settings (I checked the "Use Post Title for Permalink", replaced spaces with hyphen and unchecked the "Use Unique Title"). Finally, there are also Notification Settings, but they are a bit of hit and miss in my case, in that I don’t always get the emails (still investigating this). C. Adding Content You can add content via the "Add Entry" link on the admin navigation bar or by configuring the "Mail to Weblog" settings and sending email or, do what I've started doing, use Live Writer (also the team has a blog). Another way to add content is programmatically if, for example, you are migrating content from another blog (and I'll cover that in separate post sharing the code). What you should know is that all blog content (posts and comments) live in XML files in a folder called "content" under your dasBlog installation. D. Theming There is a very good guide about themes for dasBlog, there is also a similar guide with screenshots (scroll down to "So how do I create a theme") and the dasBlog macro reference. When you install dasBlog, there are many themes available; each theme is in its own folder (representing the folder name) under the themes folder. You may have noticed that you can switch between these via the "Appearance Settings" described above (look for the combobox after the Default Theme label). I created my own theme by copy-pasting an existing theme folder, renaming it and then switching to it as the default. I then opened the folder in Visual Studio and hacked around the HTML in the 3 files (itemTemplate, homeTemplate and dayTemplate). These files have a blogtemplate file extension, which I temporarily renamed to HTML as I was editing them. There is no more advice I can offer here as this is a matter of taste and the aforementioned links is all I used. Personally, I had salvaged the CSS (and structure) from my previous blog and wanted to make this one match it as closely as possible - I think I have succeeded. E. If you run into any issue with dasBlog... ...use your favorite search engine to find answers. Many bloggers have been using this engine for a while and have documented issues and workarounds over time. One such example is ScottHa's dasBlog category; another example is therightstuff where I "borrowed" the idea/macro for the outlook-style on-page navigation. If you don't find what you want through searching, try posting a question to the forums. Comments about this post welcome at the original blog.

Read the article
Windows Phone 7 developer resources

- by Daniel Moth

Developers of Windows Mobile 6.x (and indeed Windows CE) applications still use the rich .NET Compact Framework 3.5 with Visual Studio 2008 for development. That is still a great platform and the Mobile Development Handbook is still a useful resource (if I may say so myself :-). The release of Windows Phone 7, changes the programming paradigm. The programming model has NETCF in its guts, but the developer uses the Silverlight or XNA APIs (and they can call from one into the other). I thought I'd gather here (for your reference and mine) the top 10 resources for getting started. Windows Phone Developer Home - get the official word and latest announcements. Windows Phone Developer Tools RTW - download the free developer tools (on my machine the installation took 30 minutes, over my existing vanilla Visual Studio 2010 install). Windows Phone 7 Jump Start video training - watch the 12 sessions by Wigley/Miles. Windows Phone 7 Developer Training Kit - work through the labs. Windows Phone RSS tag - channel9 has tons more WP7 videos, stay tuned. Windows Phone 7 in 7 Minutes - watch 20 7-minute videos. Programming Windows Phone 7 - read 11 free chapters from Petzold's eBook. The Windows Phone Developer Blog - subscribe to the official blog. Getting Started with Windows Phone Development - explore all links from the MSDN Library root page. Silverlight for Windows Phone – another root MSDN library page. If after all that you get your hands dirty and still can't find the answer ask questions at the WP7 development MSDN Forum. On a personal note, I was pleased to see that the Parallel Stacks debugger window works fine with the WP7 project ;-) Comments about this post welcome at the original blog.

Read the article
Preserving Permalinks

- by Daniel Moth

One of the things that gets me on a rant is websites that break permalinks. If you have posted something somewhere and there is a public URL pointing to it, that URL should never ever return a 404. You are breaking all websites that ever linked to you and you are breaking all search engine links to your content (that others will try and follow). It is a pet peeve of mine. So when I had to move my blog, obviously I would preserve the root URL (www.danielmoth.com/Blog/), but I also wanted to preserve every URL my blog has generated over the years. To be clear, our focus here is on the URL formatting, not the content migration which I'll talk about in my next post. In this post, I'll describe my solution first and then what it solves. 1. The IIS7 Rewrite Module and web.config There are a few ways you can map an old URL to a new one (so when requests to the old URL come in, they get redirected to the new one). The new blog engine I use (dasBlog) has built-in functionality to do that (Scott refers to it here). Instead, the way I chose to address the issue was to use the IIS7 rewrite module. The IIS7 rewrite module allows redirecting URLs based on pattern matching, regular expressions and, of course, hardcoded full URLs for things that don't fall into any pattern. You can configure it visually from IIS Manager using a handy dialog that allows testing patterns against input URLs. Here is what mine looked like after configuring a few rules: To learn more about this technology check out this video, the reference page and this overview blog post; all 3 pages have a collection of related resources at the bottom worth checking out too. All the visual configuration ends up in a web.config file at the root folder of your website. If you are on a shared hosting service, probably the only way you can use the Rewrite Module is by directly editing the web.config file. Next, I'll describe the URLs I had to map and how that manifested itself in the web.config file. What I did was create the rules locally using the GUI, and then took the generated web.config file and uploaded it to my live site. You can view my web.config here. 2. Monthly Archives Observe the difference between the way the two blog engines generate this type of URL Blogger: /Blog/2004_07_01_mothblog_archive.html dasBlog: /Blog/default,month,2004-07.aspx In my web.config file, the rule that deals with this is the one named "monthlyarchive_redirect". 3. Categories Observe the difference between the way the two blog engines generate this type of URL Blogger: /Blog/labels/Personal.html dasBlog: /Blog/CategoryView,category,Personal.aspx In my web.config file the rule that deals with this is the one named "category_redirect". 4. Posts Observe the difference between the way the two blog engines generate this type of URL Blogger: /Blog/2004/07/hello-world.html dasBlog: /Blog/Hello-World.aspx In my web.config file the rule that deals with this is the one named "post_redirect". Note: The decision is taken to use dasBlog URLs that do not include the date info (see the description of my Appearance settings). If we included the date info then it would have to include the day part, which blogger did not generate. This makes it impossible to redirect correctly and to have a single permalink for blog posts moving forward. An implication of this decision, is that no two blog posts can have the same title. The tool I will describe in my next post (inelegantly) deals with duplicates, but not with triplicates or higher. 5. Unhandled by a generic rule Unfortunately, the two blog engines use different rules for generating URLs for blog posts. Most of the time the conversion is as simple as the example of the previous section where a post titled "Hello World" generates a URL with the words separated by a hyphen. Some times that is not the case, for example: /Blog/2006/05/medc-wrap-up.html /Blog/MEDC-Wrapup.aspx or /Blog/2005/01/best-of-moth-2004.html /Blog/Best-Of-The-Moth-2004.aspx or /Blog/2004/11/more-windows-mobile-2005-details.html /Blog/More-Windows-Mobile-2005-Details-Emerge.aspx In short, blogger does not add words to the title beyond ~39 characters, it drops some words from the title generation (e.g. a, an, on, the), and it preserve hyphens that appear in the title. For this reason, we need to detect these and explicitly list them for redirects (no regular expression can help here because the full set of rules is not listed anywhere). In my web.config file the rule that deals with this is the one named "Redirect rule1 for FullRedirects" combined with the rewriteMap named "StaticRedirects". Note: The tool I describe in my next post will detect all the URLs that need to be explicitly redirected and will list them in a file ready for you to copy them to your web.config rewriteMap. 6. C# code doing the same as the web.config I wrote some naive code that does the same thing as the web.config: given a string it will return a new string converted according to the 3 rules above. It does not take into account the 4th case where an explicit hard-coded conversion is needed (the tool I present in the next post does take that into account). static string REGEX_post_redirect = "[0-9]{4}/[0-9]{2}/([0-9a-z-]+).html"; static string REGEX_category_redirect = "labels/([_0-9a-z-% ]+).html"; static string REGEX_monthlyarchive_redirect = "([0-9]{4})_([0-9]{2})_[0-9]{2}_mothblog_archive.html"; static string Redirect(string oldUrl) { GroupCollection g; if (RunRegExOnIt(oldUrl, REGEX_post_redirect, 2, out g)) return string.Concat(g[1].Value, ".aspx"); if (RunRegExOnIt(oldUrl, REGEX_category_redirect, 2, out g)) return string.Concat("CategoryView,category,", g[1].Value, ".aspx"); if (RunRegExOnIt(oldUrl, REGEX_monthlyarchive_redirect, 3, out g)) return string.Concat("default,month,", g[1].Value, "-", g[2], ".aspx"); return string.Empty; } static bool RunRegExOnIt(string toRegEx, string pattern, int groupCount, out GroupCollection g) { if (pattern.Length == 0) { g = null; return false; } g = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled).Match(toRegEx).Groups; return (g.Count == groupCount); } Comments about this post welcome at the original blog.

Read the article
Tool to convert blogger.com content to dasBlog

- by Daniel Moth

Due to blogger.com dropping FTP support, I've had to move my blog. If you are in a similar situation, this post will help you by showing you the necessary steps to take. Goals No loss on blog posts, comments AND all existing permalinks continue to work (redirect to the correct place). Steps Download the XML files corresponding to your blogger.com content and store them in a folder. Install and configure dasBlog on your local machine. Configure your web.config file (will need updating once you run step 4). Use the tool I describe further down to generate the content and place it at the right place. Test your site locally. Once you are happy, repeat step 2 on your hosting provider of choice. Remember to copy up your dasBlog theme folder if you created one. Copy up the local web.config file and the XML dasBlog content files generated by the tool of step 4. Test your site on the server. Once you are happy, go live (following instructions from your hoster). In my case, I gave the nameservers from my new hoster to my existing domain registrar and they made the switch. Tool (code) At step 4 above I referred to a tool. That is an overstatement, it is simply one 450-line C#code file that you can download here: BloggerToDasBlog.cs. I used this from a .NET 2.0 console app (and I run it under the Visual Studio debugger, i.e. F5) like this: Program.cs. The console app referenced the dasBlog 2.3 ASP.NET Blogging Engine i.e. the newtelligence.DasBlog.Runtime.dll assembly. Let me describe what the code does: Input: A path to a folder where the XML files from the old blogger.com blog reside. It can deal with both types of XML file. A full file path to a file where it creates XML redirect input (as required by the rewriteMap mentioned here). The blog URL. The author's email. The blog author name. A path to an empty folder where the new XML dasBlog content files will get created. The subfolder name used after the domain name in the URL. The 3 reg ex patterns to use. You can use the same as mine, but will need to tweak the monthly_archive rule. Again, to see what values I passed for all the above, see my Program.cs file. Output: It creates dasBlog XML files in the folder specified. It creates those by parsing the old blogger.com XML files that reside in the folder specified. After that is generated, copy it to the "Content" folder under your dasBlog installation. It creates an XML file with a single ignorable root element and a bunch of inner XML elements. You can copy paste these in the web.config file as discussed in this post. Other notes: For each blog post, it detects outgoing links to itself (i.e. to the same blog), and rewrites those to point to the new URLs. So internal links do not rely on the web.config redirects. It deals with duplicate post titles; it does not deal with triplicates and higher. Removes all references to blogger.com (e.g. references to [email protected], the injected hidden footer for statistics that each blog post has and others – see the code). It creates a lot of diagnostic output (in the Output window) and indeed the documentation for the code is in the Debug.WriteLine statements ;) This is not code I will maintain or support – it was a throwaway one-use project that I am sharing here as a starting point for anyone finding themselves in the same boat that I was. Enjoy "as is". Comments about this post welcome at the original blog.

Read the article
Visual Studio 2010 released!

- by Daniel Moth

Visual Studio 2010 releases to the word today. Get the full story from Soma's blog post (inc. links for buy, try etc). Our team is very proud of what we have contributed to this release and you can learn more about it through our content on the Parallel Computing MSDN home. Comments about this post welcome at the original blog.

Read the article
Best of "The Moth" 2010

- by Daniel Moth

It is the time again (like in 2004, 2005, 2006, 2007, 2008, 2009) to look back at my blog for the past year and identify areas of interest that seem to be more prominent than others. After doing so, representative posts follow in my top 5 list (in random order). 1. This was the year where I had to move for the first time since 2004 my blog engine (blogger.com –> dasBlog), host provider (zen –> godaddy), web server technology and OS (apache on Linux –> IIS on Windows Server). My goal was not to break any permalinks or the look and feel of this website. A series of posts covered how I achieved that goal, culminating in a tool for others to use if they wanted to do the same: Tool to convert blogger.com content to dasBlog. Going forward I aim to be sharing more small code utilities like that one… 2. At work I am known for being fairly responsive on email, and more importantly never dropping email balls on the floor. This is due to my email processing system, which I shared here: Processing Email in Outlook. I will be sharing more tips with regards to making the best of the Office products. 3. There is no doubt in my mind that this is the year people will remember as the one where Microsoft finally fights back in the mobile space. Even though the new platform means my Windows Mobile book sales will dwindle :-), I am ecstatic about Windows Phone 7 both as a consumer and as a developer. On the release day, to get you started I shared the top 10 Windows Phone 7 developer resources. I will be sharing my tips from my experience in writing code for and consuming this new platform… 4. For my HPC developer friends using Visual Studio, I shared Slides and code for MPI Cluster Debugger and also gave you all the links you need for getting started with Dryad and DryadLINQ from MSR. Expect more from me on cluster development in the coming year… 5. Still in the HPC space, but actually also in the game and even mainstream development, the big disruption and opportunity comes in the form of GPGPU and, on the Microsoft platform, (currently) DirectCompute. Expect more from me on gpgpu development in the coming year… Subscribe via the link on the left to stay tuned for 2011… I wish you a very Happy New Year (with whatever definition of happiness works for you)! Comments about this post welcome at the original blog.

Read the article

Search Results

Search found 1557 results on 63 pages for 'daniel scocco'.

Page 3/63 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Scocco

- by Daniel Scocco

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

- by Daniel Moth

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >