Search Results

Search found 31 results on 2 pages for 'simd'.

Page 1/2 | 1 2  | Next Page >

  • Common SIMD techniques

    - by zxcat
    Hi! Where can I find information about common SIMD tricks? I have an instruction set and know, how to write non-tricky SIMD code, but I know, SIMD now is much more powerful. It can hold complex conditional branchless code. For example (ARMv6), the following sequence of instructions sets each byte of Rd equal to the unsigned minimum of the corresponding bytes of Ra and Rb: USUB8 Rd, Ra, Rb SEL Rd, Rb, Ra Links to tutorials / uncommon SIMD techniques are good too :) ARMv6 is the most interesting for me, but x86(SSE,...)/Neon(in ARMv7)/others are good too. Thank you.

    Read the article

  • What approach to take for SIMD optimizations

    - by goldenmean
    Hi, I am trying to optimize below code for SIMD operations (8way/4way/2way SIMD whiechever possible and if it gives gains in performance) I am tryin to analyze it first on paper to understand the algorithm used. How can i optimize it for SIMD:- void idct(uint8_t *dst, int stride, int16_t *input, int type) { int16_t *ip = input; uint8_t *cm = ff_cropTbl + MAX_NEG_CROP; int A, B, C, D, Ad, Bd, Cd, Dd, E, F, G, H; int Ed, Gd, Add, Bdd, Fd, Hd; int i; /* Inverse DCT on the rows now */ for (i = 0; i < 8; i++) { /* Check for non-zero values */ if ( ip[0] | ip[1] | ip[2] | ip[3] | ip[4] | ip[5] | ip[6] | ip[7] ) { A = M(xC1S7, ip[1]) + M(xC7S1, ip[7]); B = M(xC7S1, ip[1]) - M(xC1S7, ip[7]); C = M(xC3S5, ip[3]) + M(xC5S3, ip[5]); D = M(xC3S5, ip[5]) - M(xC5S3, ip[3]); Ad = M(xC4S4, (A - C)); Bd = M(xC4S4, (B - D)); Cd = A + C; Dd = B + D; E = M(xC4S4, (ip[0] + ip[4])); F = M(xC4S4, (ip[0] - ip[4])); G = M(xC2S6, ip[2]) + M(xC6S2, ip[6]); H = M(xC6S2, ip[2]) - M(xC2S6, ip[6]); Ed = E - G; Gd = E + G; Add = F + Ad; Bdd = Bd - H; Fd = F - Ad; Hd = Bd + H; /* Final sequence of operations over-write original inputs. */ ip[0] = (int16_t)(Gd + Cd) ; ip[7] = (int16_t)(Gd - Cd ); ip[1] = (int16_t)(Add + Hd); ip[2] = (int16_t)(Add - Hd); ip[3] = (int16_t)(Ed + Dd) ; ip[4] = (int16_t)(Ed - Dd ); ip[5] = (int16_t)(Fd + Bdd); ip[6] = (int16_t)(Fd - Bdd); } ip += 8; /* next row */ } ip = input; for ( i = 0; i < 8; i++) { /* Check for non-zero values (bitwise or faster than ||) */ if ( ip[1 * 8] | ip[2 * 8] | ip[3 * 8] | ip[4 * 8] | ip[5 * 8] | ip[6 * 8] | ip[7 * 8] ) { A = M(xC1S7, ip[1*8]) + M(xC7S1, ip[7*8]); B = M(xC7S1, ip[1*8]) - M(xC1S7, ip[7*8]); C = M(xC3S5, ip[3*8]) + M(xC5S3, ip[5*8]); D = M(xC3S5, ip[5*8]) - M(xC5S3, ip[3*8]); Ad = M(xC4S4, (A - C)); Bd = M(xC4S4, (B - D)); Cd = A + C; Dd = B + D; E = M(xC4S4, (ip[0*8] + ip[4*8])) + 8; F = M(xC4S4, (ip[0*8] - ip[4*8])) + 8; if(type==1){ //HACK E += 16*128; F += 16*128; } G = M(xC2S6, ip[2*8]) + M(xC6S2, ip[6*8]); H = M(xC6S2, ip[2*8]) - M(xC2S6, ip[6*8]); Ed = E - G; Gd = E + G; Add = F + Ad; Bdd = Bd - H; Fd = F - Ad; Hd = Bd + H; /* Final sequence of operations over-write original inputs. */ if(type==0){ ip[0*8] = (int16_t)((Gd + Cd ) >> 4); ip[7*8] = (int16_t)((Gd - Cd ) >> 4); ip[1*8] = (int16_t)((Add + Hd ) >> 4); ip[2*8] = (int16_t)((Add - Hd ) >> 4); ip[3*8] = (int16_t)((Ed + Dd ) >> 4); ip[4*8] = (int16_t)((Ed - Dd ) >> 4); ip[5*8] = (int16_t)((Fd + Bdd ) >> 4); ip[6*8] = (int16_t)((Fd - Bdd ) >> 4); }else if(type==1){ dst[0*stride] = cm[(Gd + Cd ) >> 4]; dst[7*stride] = cm[(Gd - Cd ) >> 4]; dst[1*stride] = cm[(Add + Hd ) >> 4]; dst[2*stride] = cm[(Add - Hd ) >> 4]; dst[3*stride] = cm[(Ed + Dd ) >> 4]; dst[4*stride] = cm[(Ed - Dd ) >> 4]; dst[5*stride] = cm[(Fd + Bdd ) >> 4]; dst[6*stride] = cm[(Fd - Bdd ) >> 4]; }else{ dst[0*stride] = cm[dst[0*stride] + ((Gd + Cd ) >> 4)]; dst[7*stride] = cm[dst[7*stride] + ((Gd - Cd ) >> 4)]; dst[1*stride] = cm[dst[1*stride] + ((Add + Hd ) >> 4)]; dst[2*stride] = cm[dst[2*stride] + ((Add - Hd ) >> 4)]; dst[3*stride] = cm[dst[3*stride] + ((Ed + Dd ) >> 4)]; dst[4*stride] = cm[dst[4*stride] + ((Ed - Dd ) >> 4)]; dst[5*stride] = cm[dst[5*stride] + ((Fd + Bdd ) >> 4)]; dst[6*stride] = cm[dst[6*stride] + ((Fd - Bdd ) >> 4)]; } } else { if(type==0){ ip[0*8] = ip[1*8] = ip[2*8] = ip[3*8] = ip[4*8] = ip[5*8] = ip[6*8] = ip[7*8] = ((xC4S4 * ip[0*8] + (IdctAdjustBeforeShift<<16))>>20); }else if(type==1){ dst[0*stride]= dst[1*stride]= dst[2*stride]= dst[3*stride]= dst[4*stride]= dst[5*stride]= dst[6*stride]= dst[7*stride]= cm[128 + ((xC4S4 * ip[0*8] + (IdctAdjustBeforeShift<<16))>>20)]; }else{ if(ip[0*8]){ int v= ((xC4S4 * ip[0*8] + (IdctAdjustBeforeShift<<16))>>20); dst[0*stride] = cm[dst[0*stride] + v]; dst[1*stride] = cm[dst[1*stride] + v]; dst[2*stride] = cm[dst[2*stride] + v]; dst[3*stride] = cm[dst[3*stride] + v]; dst[4*stride] = cm[dst[4*stride] + v]; dst[5*stride] = cm[dst[5*stride] + v]; dst[6*stride] = cm[dst[6*stride] + v]; dst[7*stride] = cm[dst[7*stride] + v]; } } } ip++; /* next column */ dst++; } }

    Read the article

  • SIMD Extensions for the Database Storage Engine

    - by jchang
    For the last 15 years, Intel and AMD have been progressively adding special purpose extensions to their processor architectures. The extensions mostly pertain to vector operations with Single Instruction, Multiple Data (SIMD) concept. The reasoning was that achieving significant performance improvement over each successive generation for the general purpose elements had become extraordinarily difficult. On the other hand, SIMD performance could be significantly improved with special purpose registers...(read more)

    Read the article

  • SIMD Extensions for the Database Storage Engine

    - by jchang
    For the last 15 years, Intel and AMD have been progressively adding special purpose extensions to their processor architectures. The extensions mostly pertain to vector operations with Single Instruction, Multiple Data (SIMD) concept. The motivation was that achieving significant performance improvement over each successive generation for the general purpose elements had become extraordinarily difficult. On the other hand, SIMD performance could be significantly improved with special purpose registers...(read more)

    Read the article

  • implement SIMD in C++

    - by Hristo
    I'm working on a bit of code and I'm trying to optimize it as much as possible, basically get it running under a certain time limit. The following makes the call... static affinity_partitioner ap; parallel_for(blocked_range<size_t>(0, T), LoopBody(score), ap); ... and the following is what is executed. void operator()(const blocked_range<size_t> &r) const { int temp; int i; int j; size_t k; size_t begin = r.begin(); size_t end = r.end(); for(k = begin; k != end; ++k) { // for each trainee temp = 0; for(i = 0; i < N; ++i) { // for each sample int trr = trRating[k][i]; int ei = E[i]; for(j = 0; j < ei; ++j) { // for each expert temp += delta(i, trr, exRating[j][i]); } } myscore[k] = temp; } } I'm using Intel's TBB to optimize this. But I've also been reading about SIMD and SSE2 and things along that nature. So my question is, how do I store the variables (i,j,k) in registers so that they can be accessed faster by the CPU? I think the answer has to do with implementing SSE2 or some variation of it, but I have no idea how to do that. Any ideas? Thanks, Hristo

    Read the article

  • SSE (SIMD extensions) support in gcc

    - by goldenmean
    Hi, I see a code as below: include "stdio.h" #define VECTOR_SIZE 4 typedef float v4sf __attribute__ ((vector_size(sizeof(float)*VECTOR_SIZE))); // vector of four single floats typedef union f4vector { v4sf v; float f[VECTOR_SIZE]; } f4vector; void print_vector (f4vector *v) { printf("%f,%f,%f,%f\n", v->f[0], v->f[1], v->f[2], v->f[3]); } int main() { union f4vector a, b, c; a.v = (v4sf){1.2, 2.3, 3.4, 4.5}; b.v = (v4sf){5., 6., 7., 8.}; c.v = a.v + b.v; print_vector(&a); print_vector(&b); print_vector(&c); } This code builds fine and works expectedly using gcc (it's inbuild SSE / MMX extensions and vector data types. this code is doing a SIMD vector addition using 4 single floats. I want to understand in detail what does each keyword/function call on this typedef line means and does: typedef float v4sf __attribute__ ((vector_size(sizeof(float)*VECTOR_SIZE))); What is the vector_size() function return; What is the __attribute__ keyword for Here is the float data type being type defined to vfsf type? I understand the rest part. thanks, -AD

    Read the article

  • Benefit of using multiple SIMD instruction sets simultaneously

    - by GenTiradentes
    I'm writing a highly parallel application that's multithreaded. I've already got an SSE accelerated thread class written. If I were to write an MMX accelerated thread class, then run both at the same time (one SSE thread and one MMX thread per core) would the performance improve noticeably? I would think that this setup would help hide memory latency, but I'd like to be sure before I start pouring time into it.

    Read the article

  • SSE SIMD Optimization For Loop

    - by Projectile Fish
    I have some code in a loop for(int i = 0; i < n; i++) { u[i] = c * u[i] + s * b[i]; } So, u and b are vectors of the same length, and c and s are scalars. Is this code a good candidate for vectorization for use with SSE in order to get a speedup?

    Read the article

  • approximating log10[x^k0 + k1]

    - by Yale Zhang
    Greetings. I'm trying to approximate the function Log10[x^k0 + k1], where .21 < k0 < 21, 0 < k1 < ~2000, and x is integer < 2^14. k0 & k1 are constant. For practical purposes, you can assume k0 = 2.12, k1 = 2660. The desired accuracy is 5*10^-4 relative error. This function is virtually identical to Log[x], except near 0, where it differs a lot. I already have came up with a SIMD implementation that is ~1.15x faster than a simple lookup table, but would like to improve it if possible, which I think is very hard due to lack of efficient instructions. My SIMD implementation uses 16bit fixed point arithmetic to evaluate a 3rd degree polynomial (I use least squares fit). The polynomial uses different coefficients for different input ranges. There are 8 ranges, and range i spans (64)2^i to (64)2^(i + 1). The rational behind this is the derivatives of Log[x] drop rapidly with x, meaning a polynomial will fit it more accurately since polynomials are an exact fit for functions that have a derivative of 0 beyond a certain order. SIMD table lookups are done very efficiently with a single _mm_shuffle_epi8(). I use SSE's float to int conversion to get the exponent and significand used for the fixed point approximation. I also software pipelined the loop to get ~1.25x speedup, so further code optimizations are probably unlikely. What I'm asking is if there's a more efficient approximation at a higher level? For example: Can this function be decomposed into functions with a limited domain like log2((2^x) * significand) = x + log2(significand) hence eliminating the need to deal with different ranges (table lookups). The main problem I think is adding the k1 term kills all those nice log properties that we know and love, making it not possible. Or is it? Iterative method? don't think so because the Newton method for log[x] is already a complicated expression Exploiting locality of neighboring pixels? - if the range of the 8 inputs fall in the same approximation range, then I can look up a single coefficient, instead of looking up separate coefficients for each element. Thus, I can use this as a fast common case, and use a slower, general code path when it isn't. But for my data, the range needs to be ~2000 before this property hold 70% of the time, which doesn't seem to make this method competitive. Please, give me some opinion, especially if you're an applied mathematician, even if you say it can't be done. Thanks.

    Read the article

  • Fast, Vectorizable method of taking floating point number modulus of special primes?

    - by caffiend
    Is there a fast method for taking the modulus of a floating point number? With integers, there are tricks for Mersenne primes, so that its possible to calculate y = x MOD 2^31 without needing division. Can any similar tricks be applied for floating point numbers? Preferably, in a way that can be converted into vector/SIMD operations, or moved into GPGPU code. The primes I'm interested in would be 2^7 and 2^31, although if there are more efficient ones for floating point numbers, those would be welcome.

    Read the article

  • CPU Architecture and floating-point math

    - by Jo-Herman Haugholt
    I'm trying to wrap my head around some details about how floating point math is performed on the CPU, trying to better understand what data types to use etc. I think I have a fairly good understanding of how integer math is performed. If I've understood correctly, and disregarding SIMD, a 32-bit CPU will generally perform integer math at at least 32-bit precision etc. Is it correct that floating-point math is dependent on the presence of a FPU? And that the FPU on the x86 is 80-bit, so floating point math is performed at this precision unless using SIMD? What about ARM?

    Read the article

  • Getting started with Oracle Database In-Memory Part III - Querying The IM Column Store

    - by Maria Colgan
    In my previous blog posts, I described how to install, enable, and populate the In-Memory column store (IM column store). This weeks post focuses on how data is accessed within the IM column store. Let’s take a simple query “What is the most expensive air-mail order we have received to date?” SELECT Max(lo_ordtotalprice) most_expensive_order FROM lineorderWHERE  lo_shipmode = 5; The LINEORDER table has been populated into the IM column store and since we have no alternative access paths (indexes or views) the execution plan for this query is a full table scan of the LINEORDER table. You will notice that the execution plan has a new set of keywords “IN MEMORY" in the access method description in the Operation column. These keywords indicate that the LINEORDER table has been marked for INMEMORY and we may use the IM column store in this query. What do I mean by “may use”? There are a small number of cases were we won’t use the IM column store even though the object has been marked INMEMORY. This is similar to how the keyword STORAGE is used on Exadata environments. You can confirm that the IM column store was actually used by examining the session level statistics, but more on that later. For now let's focus on how the data is accessed in the IM column store and why it’s faster to access the data in the new column format, for analytical queries, rather than the buffer cache. There are four main reasons why accessing the data in the IM column store is more efficient. 1. Access only the column data needed The IM column store only has to scan two columns – lo_shipmode and lo_ordtotalprice – to execute this query while the traditional row store or buffer cache has to scan all of the columns in each row of the LINEORDER table until it reaches both the lo_shipmode and the lo_ordtotalprice column. 2. Scan and filter data in it's compressed format When data is populated into the IM column it is automatically compressed using a new set of compression algorithms that allow WHERE clause predicates to be applied against the compressed formats. This means the volume of data scanned in the IM column store for our query will be far less than the same query in the buffer cache where it will scan the data in its uncompressed form, which could be 20X larger. 3. Prune out any unnecessary data within each column The fastest read you can execute is the read you don’t do. In the IM column store a further reduction in the amount of data accessed is possible due to the In-Memory Storage Indexes(IM storage indexes) that are automatically created and maintained on each of the columns in the IM column store. IM storage indexes allow data pruning to occur based on the filter predicates supplied in a SQL statement. An IM storage index keeps track of minimum and maximum values for each column in each of the In-Memory Compression Unit (IMCU). In our query the WHERE clause predicate is on the lo_shipmode column. The IM storage index on the lo_shipdate column is examined to determine if our specified column value 5 exist in any IMCU by comparing the value 5 to the minimum and maximum values maintained in the Storage Index. If the value 5 is outside the minimum and maximum range for an IMCU, the scan of that IMCU is avoided. For the IMCUs where the value 5 does fall within the min, max range, an additional level of data pruning is possible via the metadata dictionary created when dictionary-based compression is used on IMCU. The dictionary contains a list of the unique column values within the IMCU. Since we have an equality predicate we can easily determine if 5 is one of the distinct column values or not. The combination of the IM storage index and dictionary based pruning, enables us to only scan the necessary IMCUs. 4. Use SIMD to apply filter predicates For the IMCU that need to be scanned Oracle takes advantage of SIMD vector processing (Single Instruction processing Multiple Data values). Instead of evaluating each entry in the column one at a time, SIMD vector processing allows a set of column values to be evaluated together in a single CPU instruction. The column format used in the IM column store has been specifically designed to maximize the number of column entries that can be loaded into the vector registers on the CPU and evaluated in a single CPU instruction. SIMD vector processing enables the Oracle Database In-Memory to scan billion of rows per second per core versus the millions of rows per second per core scan rate that can be achieved in the buffer cache. I mentioned earlier in this post that in order to confirm the IM column store was used; we need to examine the session level statistics. You can monitor the session level statistics by querying the performance views v$mystat and v$statname. All of the statistics related to the In-Memory Column Store begin with IM. You can see the full list of these statistics by typing: display_name format a30 SELECT display_name FROM v$statname WHERE  display_name LIKE 'IM%'; If we check the session statistics after we execute our query the results would be as follow; SELECT Max(lo_ordtotalprice) most_expensive_order FROM lineorderWHERE lo_shipmode = 5; SELECT display_name FROM v$statname WHERE  display_name IN ('IM scan CUs columns accessed',                        'IM scan segments minmax eligible',                        'IM scan CUs pruned'); As you can see, only 2 IMCUs were accessed during the scan as the majority of the IMCUs (44) in the LINEORDER table were pruned out thanks to the storage index on the lo_shipmode column. In next weeks post I will describe how you can control which queries use the IM column store and which don't. +Maria Colgan

    Read the article

  • CodePlex Daily Summary for Wednesday, February 17, 2010

    CodePlex Daily Summary for Wednesday, February 17, 2010New ProjectsAcademic Success Accounting System: The system is intended to use by school teacher to set marks to students and estimate their academic success and possibilities. The client applicat...Access.PowerTools: Access PowerTools is currently a sample MS Access add-in project to try & test features of Add-in Express™ 2009 for Microsoft® Office and .net (ht...AntoonCms: AntoonCms makes it easy to maintain a simple website with it's builtin administration pages. It's developed in C# on target Framework 2.0 The CMS...ASP.NET MVC Mehr Lib: Mehr Lib makes it easier for ASP.NET MVC developers to do develop projects. It's developed in C#. This version currently include Ajax master detail...BCryptTool: Developer tool that calculates BCrypt hash codes for strings. BCrypt is an implementation of the Blowfish cipher and a computationally-expensive ha...Coronasoft Cryostasis scripting engine: A scripting engine that allows you to dynamically load plugins from just about any supported .NET language. Its written in C#. Languages supported ...Critical Point Search: Critical Point Searchcritical points: critical pointsFont Family Name Retrieval: This library helps developer to retrieve the font family name from the TTF, OTF and TTC font files, so that developer can display the font without ...jQuery Form Input Hints Plugin: Automatically display hints on input textboxes in your forms using this jQuery plugin. I wrote this code to be as simple and as easy to use as pos...Kojax: kojax projectKronRetro: KronRetro! Making a Habbo Retro just got easier! Powered by PHP & MySQL you can make a Habbo Retro site fast!MVVM Wrapper Kit: MVVM Wrapper Kit makes it easier for View Model programmers to wrap their business objects and collections while preserving change notification and...ObjectCartographer: ObjectCartographer is an object to object mapper and object factory. It's developed in C#.PE-file Reader Writer API (PERWAPI): PERWAPI is a reader writer module for .NET program executables. It has been used as back-end for progamming language compilers such as Gardens Poi...Pinger: A simple Pinger, pings an address until you press a buttonQPV: 0.1: QPV aka Que pelicula es una aplicacion que consiste crear una base de datos potente de peliculas, criticas e informacion para poder filtrar pelicul...SIMD Detector: This SIMD class helps developers to detect the types of SIMD instruction available on users' processor. It supports Intel and AMD CPUs. It is writt...StackOverflow Test Project: Following Andrew Siemer's StackOverflow Knowledge Exchange Project.WeBlog: A blogging platform built on the MVC framework The project will showcase current technologies such as MVC 2, Silverlight 4 and jQuery 1.4. Data pro...Webmedia: this is my webmedia projectWindows Azure RSS Reader: This is and online RSS reader based on the Windows Azure platformWordEditor. A Word Editor for Windows, and an extended RichTextBox control.: This is a word editor that can be used as a stand alone word processor, or added to an existing project.Домашняя Бухгалтерия: Программа для ведения домашнего бухгалтерского учета финансов. New ReleasesAccess.PowerTools: Access PowerTools Add-In Community Edition v.0.0.1: Access PowerTools Add-In Community Edition v.0.0.1 is a sample MS Access add-in project to try & test Add-in Express™ 2009 for Microsoft® Office an...Active CSS: ActiveCSS-0.1.1: revision for version 0.1ASP.NET: Microsoft Ajax Minifier 4.0: The Microsoft Ajax Minifier enables you to improve the performance of your Ajax applications by reducing the size of your Cascading Style Sheet and...ASP.NET MVC Mehr Lib: V1.0: Mehr Lib V1.0 This version currently include ajax master detail combo facilities.ASP.net Ribbon: Version 1.2: New controls : Expandable gallery Color Picker Multi color File Menu Some JS modifications. Some CSS modifications. Includes some functionna...ASP.NET Web Forms Model-View-Presenter (MVP) Contrib: WebForms MVP Contrib CTP6: This is a release of the WebForms MVP Contrib project for WebForms MVP CTP6. Release includes: WebForms MVP Contrib framework Ninject IoC containerAwesomiumDotNet: AwesomiumDotNet 1.2.1: - Added Awesomium 1.5 features: URL filtering, header rewrite rules, SetOpensExternalLinksInCallingFrame. - Numerous fixes and improvements.BCryptTool: BCryptTool v0.1: The Microsoft .NET Framework 3.5 (SP1) is needed to run this program.Buzz Dot Net: Buzz Dot Net v.1.10216: Features Parse Google Buzz feed to Objects Partial MVVM Implementation Partial OptimizationsCanvas VSDOC Intellisense: v1.0.0.0a: canvas-vsdoc.js and canvas-utils.js JavaScript intellisense for HTML5 Canvas element.CheckHeader: CheckHeader v0.8.5: The Microsoft .NET Framework 3.5 (SP1) is needed to run this program.Claymore MVP: Claymore 1.0.2.0: Changelog Added ASP.NET WebForm support via ClaymoreHttpModule class. Added xsd schema for Visual Studio Intellisense within App.config and Web....Dam Gd - URL Shortner: Dam.gd Version 1.1: This is the latest instalment in our URL shortner. It uses The Easy API http://theeasyapi.com to access data that is used for the back-end analyti...D-AMPS: D-AMPS 0.9.1: Initial version.easySMS: easySMS 1.0 Source code: easySMS 1.0 Source codeFont Family Name Retrieval: 1st Release: Version 1.0.0Free Silverlight & WPF Chart Control - Visifire: Visifire Now Supports DataBinding: Hi, Today we are releasing the much awaited DataBinding feature in Visifire 3.0.3 beta 3. Now you can Bind any DataSource at the Series level so t...GenerateTypedBamApi: Version 2.0: Changes in this release: NEW: Export functionality no longer requires Excel to be installed (uses OLE DB vs. Excel Automation; also enables usage i...Gmail Notifier 2: GmailNotifier2 1.2.1: Fixes issues #9652, #9653iTuner - The iTunes Companion: iTuner 1.1.3699: This includes the first pass of the iTuner Librarian including management of dead tracks, duplicates, and empty directories... While I promised a ...jQuery Form Input Hints Plugin: jQuery.InputHints v1.0: jQuery.InputHints v1.0 Includes Standard & minified source Demo HTML file VS2008 SolutionLibWowArmory: LibWowArmory 0.2.3 beta: LibWowArmory 0.2.3 betaThis release of the LibWowArmory source code matches the WoW Armory as of version 3.3.2. Changes since version 0.2.2:Update...Managed Extensibility Framework: MEF Preview 9: We have merged the .net 3.5 and Silverlight 3 into a single zip. The bin folder contains the binaries for .net 3.5 whereas bin\SL contains the bina...MDX Parser,Builder,DOM and OLAP visual controls with Writeback for Silverlight: Ranet.UILibrary.Olap-1.3.3.0-6571.msi: February 16, 2010 * MdxDesigner: Fix for the issue where when an element is clicked, the mouse wheel stops working until the cursor leaves and r...MEFGeneric: MEFGeneric Preview 9: MEFGeneric Preview 9 release.Mesopotamia Experiment: Mesopotamia 1.2.26: Bug Fixes - mud map - progress window - recycle app domains on robotics engine crashes( in command prompt and visual, major work) - fixed rooomba h...Microsoft Solution Framework for Business Intelligence in Media: Release 1.0: This is the public release of the Microsoft Solution Framework for Business Intelligence in Media (Release 1.0).MVVM Wrapper Kit: MVVM Wrapper Beta: A simple test project is included to get you up and running, and wrapping those business objects.nBayes - Bayesian Filtering in C#: nBayes v0.2: nBayes' indexing system is factored in such a way that you can easily replace the index with a custom implementation. This release introduces an ad...NetSqlAzMan - .NET SQL Authorization Manager: 3.6.0.5: 3.6.0.5 16-February-2010 - Fix: SqlAzManSid Class. "Equals" matches object signiture instead of IAzManSid signiture. When a real null object is pas...ObjectCartographer: ObjectCartographer Code 1.0: This is the first release and contains code to help with object to object mapping (including mapping from one object to multiple objects), object f...Office Apps: 0.8.6: Bug fix's, added Calendar.OI - Open Internet: OI HTML and .XAP files (OI offline): this is the HTML code and the XAP file. please right-click the app at http://bit.ly/openinternet and select "install openinternet application to th...PE-file Reader Writer API (PERWAPI): PERWAPI-1.1.3: Perwapi version 1.1.3 is the complete distribution package. It contains Binary files, pdb files and xml files for the PERWAPI and SymbolRW compone...Pinger: Pinger 1.0.0.0 Binary: The Latest BinaryRNA Comparative Analysis Software Tools: RNA Comparative Analysis Software Tools 2.0: RNA Comparative Analysis Software Tools Version 2.0 Note: The RNA Comparative Analysis Software Tools are provided as is, without any warranty. No...SAL- Self Artificial Learning: Artificial Learning working proof of concept: This is a working proof of concept. It includes the Dev version (in .zip format) and the consumer version (in .exe format)SharePoint Management PowerShell scripts: SharePoint 2010 PowerShell Scripts: All the SharePoint 2010 PowerShell Scripts The first file is an Excel 2010 file allowing to find quiclky and easily the new cmdlets available wi...SIMD Detector: 1st Release: Version 1.3 Supports MMX/MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, SSE4a, SSE5, 3DNow.Terminals: Terminals 1.9 Beta Release: This is a beta release so the new features being added to terminals can be tested properly. The major change in this release is that Terminals has...Text Designer Outline Text Library: 9th minor release: Added the ability to select brush, such as gradient brush or texture brush for the text body. Added CSharp library, TextDesignerCSLibrary. Manage...VivoSocial: VivoSocial 7.0.2: This release has several updated modules. See the Support Forums for more details. Since we update modules very often, we will be changing how we d...WatchersNET CKEditor™ Provider for DotNetNuke: CKEditor Provider 1.6.00: changes CKEditor Upgrade to Version 3.2 SVN 5132 File Browser: After File Upload, File will be Auto Selected File Browser: Icons are corrected ...WordEditor. A Word Editor for Windows, and an extended RichTextBox control.: WordEditor Source Code: This contains the latest solution file, with all project files included.Домашняя Бухгалтерия: Alapha Realease: Принимаются ваши предложения по дизайну и функциональности программы.Most Popular ProjectsRawrWBFS ManagerAJAX Control ToolkitMicrosoft SQL Server Product Samples: DatabaseSilverlight ToolkitWindows Presentation Foundation (WPF)Image Resizer Powertoy Clone for WindowsMicrosoft SQL Server Community & SamplesASP.NETLiveUpload to FacebookMost Active ProjectsDinnerNow.netRawrBlogEngine.NETSimple SavantNB_Store - Free DotNetNuke Ecommerce Catalog Modulepatterns & practices – Enterprise LibraryPHPExcelSharpyjQuery Library for SharePoint Web ServicesFluent Validation for .NET

    Read the article

  • IBM System x3850 X5 TPC-H Benchmark

    - by jchang
    IBM just published a TPC-H SF 1000 result for their x3850 X5 , 4-way Xeon 7560 system featuring a special MAX5 memory expansion board to support 1.5TB memory. In Dec 2010, IBM also published a TPC-H SF1000 for their Power 780 system, 8-way, quad-core, (4 logical processors per physical core). In Feb 2011, Ingres published a TPC-H SF 100 on a 2-way Xeon 5680 for their VectorWise column-store engine (plus enhancements for memory architecture, SIMD and compression). The figure table below shows TPC-H...(read more)

    Read the article

  • Columnar Databases

    - by jchang
    Ingres just published a TPC-H benchmark for VectorWise , an analytic database technology employing 1) SIMD processing (Intel SSE 4.2), 2) better memory optimizations to leverage on-chip cache, 3) compression, 4) Column-based storage. Ingres originated as a research project at UC Berkeley (see Wikipedia ) in the 1970s, and has since become a commercially supported, open source database system. Apparently, Ingres project people later founded Sybase. So Ingres in a sense, is the grandfather (or perhap...(read more)

    Read the article

  • Installing Lubuntu 14.04.1 forcepae fails

    - by Rantanplan
    I tried to install Lubuntu 14.04.1 from a CD. First, I chose Try Lubuntu without installing which gave: ERROR: PAE is disabled on this Pentium M (PAE can potentially be enabled with kernel parameter "forcepae" ... Following the description on https://help.ubuntu.com/community/PAE, I used forcepae and tried Try Lubuntu without installing again. That worked fine. dmesg | grep -i pae showed: [ 0.000000] Kernel command line: file=/cdrom/preseed/lubuntu.seed boot=casper initrd=/casper/initrd.lz quiet splash -- forcepae [ 0.008118] PAE forced! On the live-CD session, I tried installing Lubuntu double clicking on the install button on the desktop. Here, the CD starts running but then stops running and nothing happens. Next, I rebooted and tried installing Lubuntu directly from the boot menu screen using forcepae again. After a while, I receive the following error message: The installer encountered an unrecoverable error. A desktop session will now be run so that you may investigate the problem or try installing again. Hitting Enter brings me to the desktop. For what errors should I search? And how? Finally, I rebooted once more and tried Check disc for defects with forcepae option; no errors have been found. Now, I am wondering how to find the error or whether it would be better to follow advice c in https://help.ubuntu.com/community/PAE: "Move the hard disk to a computer on which the processor has PAE capability and PAE flag (that is, almost everything else than a Banias). Install the system as usual but don't add restricted drivers. After the install move the disk back." Thanks for some hints! Perhaps some of the following can help: On Lubuntu 12.04: cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 13 model name : Intel(R) Pentium(R) M processor 1.50GHz stepping : 6 microcode : 0x17 cpu MHz : 600.000 cache size : 2048 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr mce cx8 mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss tm pbe up bts est tm2 bogomips : 1284.76 clflush size : 64 cache_alignment : 64 address sizes : 32 bits physical, 32 bits virtual power management: uname -a Linux humboldt 3.2.0-67-generic #101-Ubuntu SMP Tue Jul 15 17:45:51 UTC 2014 i686 i686 i386 GNU/Linux lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 12.04.5 LTS Release: 12.04 Codename: precise cpuid eax in eax ebx ecx edx 00000000 00000002 756e6547 6c65746e 49656e69 00000001 000006d6 00000816 00000180 afe9f9bf 00000002 02b3b001 000000f0 00000000 2c04307d 80000000 80000004 00000000 00000000 00000000 80000001 00000000 00000000 00000000 00000000 80000002 20202020 20202020 65746e49 2952286c 80000003 6e655020 6d756974 20295228 7270204d 80000004 7365636f 20726f73 30352e31 007a4847 Vendor ID: "GenuineIntel"; CPUID level 2 Intel-specific functions: Version 000006d6: Type 0 - Original OEM Family 6 - Pentium Pro Model 13 - Stepping 6 Reserved 0 Brand index: 22 [not in table] Extended brand string: " Intel(R) Pentium(R) M processor 1.50GHz" CLFLUSH instruction cache line size: 8 Feature flags afe9f9bf: FPU Floating Point Unit VME Virtual 8086 Mode Enhancements DE Debugging Extensions PSE Page Size Extensions TSC Time Stamp Counter MSR Model Specific Registers MCE Machine Check Exception CX8 COMPXCHG8B Instruction SEP Fast System Call MTRR Memory Type Range Registers PGE PTE Global Flag MCA Machine Check Architecture CMOV Conditional Move and Compare Instructions FGPAT Page Attribute Table CLFSH CFLUSH instruction DS Debug store ACPI Thermal Monitor and Clock Ctrl MMX MMX instruction set FXSR Fast FP/MMX Streaming SIMD Extensions save/restore SSE Streaming SIMD Extensions instruction set SSE2 SSE2 extensions SS Self Snoop TM Thermal monitor 31 reserved TLB and cache info: b0: unknown TLB/cache descriptor b3: unknown TLB/cache descriptor 02: Instruction TLB: 4MB pages, 4-way set assoc, 2 entries f0: unknown TLB/cache descriptor 7d: unknown TLB/cache descriptor 30: unknown TLB/cache descriptor 04: Data TLB: 4MB pages, 4-way set assoc, 8 entries 2c: unknown TLB/cache descriptor On Lubuntu 14.04.1 live-CD with forcepae: cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 13 model name : Intel(R) Pentium(R) M processor 1.50GHz stepping : 6 microcode : 0x17 cpu MHz : 600.000 cache size : 2048 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss tm pbe bts est tm2 bogomips : 1284.68 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 32 bits virtual power management: uname -a Linux lubuntu 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:12 UTC 2014 i686 i686 i686 GNU/Linux lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 14.04.1 LTS Release: 14.04 Codename: trusty cpuid CPU 0: vendor_id = "GenuineIntel" version information (1/eax): processor type = primary processor (0) family = Intel Pentium Pro/II/III/Celeron/Core/Core 2/Atom, AMD Athlon/Duron, Cyrix M2, VIA C3 (6) model = 0xd (13) stepping id = 0x6 (6) extended family = 0x0 (0) extended model = 0x0 (0) (simple synth) = Intel Pentium M (Dothan B1) / Celeron M (Dothan B1), 90nm miscellaneous (1/ebx): process local APIC physical ID = 0x0 (0) cpu count = 0x0 (0) CLFLUSH line size = 0x8 (8) brand index = 0x16 (22) brand id = 0x16 (22): Intel Pentium M, .13um feature information (1/edx): x87 FPU on chip = true virtual-8086 mode enhancement = true debugging extensions = true page size extensions = true time stamp counter = true RDMSR and WRMSR support = true physical address extensions = false machine check exception = true CMPXCHG8B inst. = true APIC on chip = false SYSENTER and SYSEXIT = true memory type range registers = true PTE global bit = true machine check architecture = true conditional move/compare instruction = true page attribute table = true page size extension = false processor serial number = false CLFLUSH instruction = true debug store = true thermal monitor and clock ctrl = true MMX Technology = true FXSAVE/FXRSTOR = true SSE extensions = true SSE2 extensions = true self snoop = true hyper-threading / multi-core supported = false therm. monitor = true IA64 = false pending break event = true feature information (1/ecx): PNI/SSE3: Prescott New Instructions = false PCLMULDQ instruction = false 64-bit debug store = false MONITOR/MWAIT = false CPL-qualified debug store = false VMX: virtual machine extensions = false SMX: safer mode extensions = false Enhanced Intel SpeedStep Technology = true thermal monitor 2 = true SSSE3 extensions = false context ID: adaptive or shared L1 data = false FMA instruction = false CMPXCHG16B instruction = false xTPR disable = false perfmon and debug = false process context identifiers = false direct cache access = false SSE4.1 extensions = false SSE4.2 extensions = false extended xAPIC support = false MOVBE instruction = false POPCNT instruction = false time stamp counter deadline = false AES instruction = false XSAVE/XSTOR states = false OS-enabled XSAVE/XSTOR = false AVX: advanced vector extensions = false F16C half-precision convert instruction = false RDRAND instruction = false hypervisor guest status = false cache and TLB information (2): 0xb0: instruction TLB: 4K, 4-way, 128 entries 0xb3: data TLB: 4K, 4-way, 128 entries 0x02: instruction TLB: 4M pages, 4-way, 2 entries 0xf0: 64 byte prefetching 0x7d: L2 cache: 2M, 8-way, sectored, 64 byte lines 0x30: L1 cache: 32K, 8-way, 64 byte lines 0x04: data TLB: 4M pages, 4-way, 8 entries 0x2c: L1 data cache: 32K, 8-way, 64 byte lines extended feature flags (0x80000001/edx): SYSCALL and SYSRET instructions = false execution disable = false 1-GB large page support = false RDTSCP = false 64-bit extensions technology available = false Intel feature flags (0x80000001/ecx): LAHF/SAHF supported in 64-bit mode = false LZCNT advanced bit manipulation = false 3DNow! PREFETCH/PREFETCHW instructions = false brand = " Intel(R) Pentium(R) M processor 1.50GHz" (multi-processing synth): none (multi-processing method): Intel leaf 1 (synth) = Intel Pentium M (Dothan B1), 90nm

    Read the article

  • Low hanging fruit where "a sufficiently smart compiler" is needed to get us back to Moore's Law?

    - by jamie
    Paul Graham argues that: It would be great if a startup could give us something of the old Moore's Law back, by writing software that could make a large number of CPUs look to the developer like one very fast CPU. ... The most ambitious is to try to do it automatically: to write a compiler that will parallelize our code for us. There's a name for this compiler, the sufficiently smart compiler, and it is a byword for impossibility. But is it really impossible? Can someone provide a concrete example where a paralellizing compiler would solve a pain point? Web-apps don't appear to be a problem: just run a bunch of Node processes. Real-time raytracing isn't a problem: the programmers are writing multi-threaded, SIMD assembly language quite happily (indeed, some might complain if we make it easier!). The holy grail is to be able to accelerate any program, be it MySQL, Garage Band, or Quicken. I'm looking for a middle ground: is there a real-world problem that you have experienced where a "smart-enough" compiler would have provided a real benefit, i.e that someone would pay for?

    Read the article

  • Efficiency concerning thread granularity

    - by MaelmDev
    Lately, I've been thinking of ways to use multithreading to improve the speed of different parts of a game engine. What confuses me is the appropriate granularity of threads, especially when dealing with single-instruction-multiple-data (SIMD) tasks. Let's use line-of-sight detection as an example. Each AI actor must be able to detect objects of interest around them and mark them. There are three basic ways to go about this with multithreading: Don't use threading at all. Create a thread for each actor. Create a thread for each actor-object combination. Option 1 is obviously going to be the least efficient method. However, choosing between the next two options is more difficult. Only using one thread per actor is still running through every object in series instead of in parallel. However, are CPU's able to create and join threads in the granularity posed in Option 3 efficiently? It seems like that many calls to the OS could be really slow, and varying enormously between different hardware.

    Read the article

  • Are there any concrete examples of where a paralellizing compiler would provide a value-adding benefit?

    - by jamie
    Paul Graham argues that: It would be great if a startup could give us something of the old Moore's Law back, by writing software that could make a large number of CPUs look to the developer like one very fast CPU. ... The most ambitious is to try to do it automatically: to write a compiler that will parallelize our code for us. There's a name for this compiler, the sufficiently smart compiler, and it is a byword for impossibility. But is it really impossible? Can someone provide a concrete example where a paralellizing compiler would solve a pain point? Web-apps don't appear to be a problem: just run a bunch of Node processes. Real-time raytracing isn't a problem: the programmers are writing multi-threaded, SIMD assembly language quite happily (indeed, some might complain if we make it easier!). The holy grail is to be able to accelerate any program, be it MySQL, Garage Band, or Quicken. I'm looking for a middle ground: is there a real-world problem that you have experienced where a "smart-enough" compiler would have provided a real benefit, i.e that someone would pay for? A good answer is one where there is a process where the computer runs at 100% CPU on a single core for a painful period of time. That time might be 10 seconds, if the task is meant to be quick. It might be 500ms if the task is meant to be interactive. It might be 10 hours. Please describe such a problem. Really, that's all I'm looking for: candidate areas for further investigation. (Hence, raytracing is off the list because all the low-hanging fruit have been feasted upon.) I am not interested in why it cannot be done. There are a million people willing to point to the sound reasons why it cannot be done. Such answers are not useful.

    Read the article

  • Which opcodes are faster at the CPU level?

    - by Geotarget
    In every programming language there are sets of opcodes that are recommended over others. I've tried to list them here, in order of speed. Bitwise Integer Addition / Subtraction Integer Multiplication / Division Comparison Control flow Float Addition / Subtraction Float Multiplication / Division Where you need high-performance code, C++ can be hand optimized in assembly, to use SIMD instructions or more efficient control flow, data types, etc. So I'm trying to understand if the data type (int32 / float32 / float64) or the operation used (*, +, &) affects performance at the CPU level. Is a single multiply slower on the CPU than an addition? In MCU theory you learn that speed of opcodes is determined by the number of CPU cycles it takes to execute. So does it mean that multiply takes 4 cycles and add takes 2? Exactly what are the speed characteristics of the basic math and control flow opcodes? If two opcodes take the same number of cycles to execute, then both can be used interchangeably without any performance gain / loss? Any other technical details you can share regarding x86 CPU performance is appreciated

    Read the article

  • How do I reorder vector data using ARM Neon intrinsics?

    - by goldenmean
    This is specifically related to ARM Neon SIMD coding. I am using ARM Neon instrinsics for certain module in a video decoder. I have a vectorized data as follows: There are four 32 bit elements in a Neon register - say, Q0 - which is of size 128 bit. 3B 3A 1B 1A There are another four, 32 bit elements in other Neon register say Q1 which is of size 128 bit. 3D 3C 1D 1C I want the final data to be in order as shown below: 1D 1C 1B 1A 3D 3C 3D 3A What Neon instrinsics can achieve the desired data order?

    Read the article

  • How do i a vector data using ARM Neon intrinsics?

    - by goldenmean
    Hello, This is specifically related to ARM Neon SIMD coding.I am using ARM Neon instrinsics for certain module in a Video Decoder. I have a vectorized data as follows:- There are four, 32 bit elements in a Neon register say Q0 which is of size 128 bit. 3B 3A 1B 1A There are another four, 32 bit elements in other Neon register say Q1 which is of size 128 bit. 3D 3C 1D 1C I want the final data to be in order as shown below:- 1D 1C 1B 1A 3D 3C 3D 3A Using what Neon instrinscis can achive the desired data order? thanks, -AD

    Read the article

  • CodePlex Daily Summary for Saturday, March 20, 2010

    CodePlex Daily Summary for Saturday, March 20, 2010New ProjectsaMaze Mapa Generator: Parte do Projeto aMazeASP.Net RIA Controls: Simple ASP.Net server controls to integrate Flash and Silverlight controls into your web applications. Included controls don't use any JavaScript,...BMap.NET: BMaps.NET is a .NET application written in C#, for access Bing Maps from your computer without web browsers. With it you can access to Bing Maps an...DaliNet: A .NET API for the Tridonic.Atco DALI USB device.Fabrica7: This is the main project of Fabrica 7 Corp.Image Ripper: A Winform application parse & fetch various HD pictures in specific photo galleries.IoCWrap: Provides interfaces which wrap various IoC container implementations so that it is possible to switch to a different provider without changing any ...NetSockets: NetSockets is a .NET class library that provides easy-to-use, multi-threaded, event-based, client and server network communication.Network Backup: Network Backup is a home and small company backup solution for workstations and a backup server. It incorporates a backup service, scheduler, data ...NUnit.Specs: Specification extensions for NUnit.Nutrivida: Sistema para avaliação de especialização.OHTB Snake: OHTB Snake is a multiplayer game. In this incarnation, snakes may eat 3 types of powerups: standard berries, causing them to grow; sawberries, caus...Playground TDrouen: Tjerk's PlaygroundPower Plan Chooser: This is my first endeavor into a C# Windows application with XAML. The program sits in the notification area (task bar) and lets you quickly activa...Search IMDB in C#: In lack of an IMDB API most of us resort to screen scraping utilities to query the Internet Movie Database. This one is written in C# (.NET 2.0 sta...SIGPRO Desktop: FUNCERNSql2008 PerfMonCounter Fix: Small console application to Fix the SQL 2008 Express Edition installation error: Pequena aplicação para Corrigir o seguinte erro de Instalação do...TwiztedTracker: TwiztedTracker designed to make your bug tracking easy.UmbracoXsltLogHelper: I needed a way to easily add log rows from my xslt macros, and added a single-line-extension for that reason. Then I played around with the umbraco...VisualStock: VisualStock is stock data visualization, analysis application build on the Micorsoft Composite Application Library.WHS File Mover: A Windows Home Server Plugin to move files from a local directory ("drop" or "staging" directory to a folder share)XML based Content Deployment in SharePoint: XML based Content Deployment in Sharepoint helps you to easy deploy content into SharePoint, including webs, lists, items, files and folder. You wi...New ReleasesASP.Net RIA Controls: Version 1.0 Beta: The first functionnal version.BMap.NET: BMap.NET 1: This is the 1st version of BMap.NETDigital Media Processing Project 1: Image Processor: Image Processor 1.0: All features implemented. Added: clipping imageFamily Tree Analyzer: Version 1.3.1.0: Version 1.3.1.0 Added a cancel button to marriage and children IGI Searches Opening Results window now automatically shows first record Updated IGI...Free Silverlight & WPF Chart Control - Visifire: Visifire SL and WPF Charts 3.0.5 Released: Hi, This release contains fix for the following bug: * Chart threw exception if ZoomingEnabled property was set to True at real-time. You ca...Homework Helper: Homework Helper v.1.1: Sorry but the latest release didn't seem to be the latest. This should be the right one!Image Ripper: Image Ripper: Image Ripper based on HtmlAgilityPack and GData library.ManPowerEngine: 0.1: UpdatesSound System added. Bitmap Collider in Physics System works now. Improved the performance of HTTP download in images Physics Framework...NIPO Data Processing Component Framework: NIPO 1.0: The first release of NIPO. Includes the NIPO binary dll and documentation. This release does not include a starter application since it is still in...patterns & practices SharePoint Guidance: SPG2010 Drop7: SharePoint Guidance Drop Notes Microsoft patterns and practices ****************************************** ***************************************...Photosynth Point Cloud Exporter: Photosynth Point Cloud Exporter 1.0.2: Photosynth webservice reference updated to work with the new site OBJ file format support added (Note: this format doesn't support vertex colors)Power Plan Chooser: Power Plan Chooser 1.0.0: Power Plan Chooser is a small utility that sits in the notification area (task bar) in Windows 7 and allows the user to quickly activate one of the...Restart Explorer: RestartExplorer Release 1.00.0001: Initial release: Start, stop and restart Windows Explorer with this utility.Search IMDB in C#: Search IMDB 1.0: Source code included with compiled example.SIMD Detector: 3rd Release: Added Intel AES instruction check Added a CSharp Winform NetSIMDDetector application. Changes the red ball and green ball images to red cross a...Sql2008 PerfMonCounter Fix: Sql2008FIx_PerfMonCounter.zip: Small console application to Fix the SQL 2008 Express Edition installation error: http://support.microsoft.com/kb/300956 Rule Name PerfMonCounter...UmbracoXsltLogHelper: 0.9 Working Beta: First version. XsltLogHelper09 is the installable package.VCC: Latest build, v2.1.30319.0: Automatic drop of latest buildWCF RIA Services Contrib: RIA Services Contrib RC Release: This version is recompiled against the RC release of WCF RIA Services.XML based Content Deployment in SharePoint: SPContentDeployment 1.0.0.0: The first link contains the resources and a sample project. The second link contains everything included in the first package and an additional fo...Yet Another GPS: YAGPS Alfa.2: Yet another GPS tracker is a very powerful GPS track application for Windows Mobile Speed Guage, Sat Count number, KML for google map file formatZGuideTV.NET: ZGuideTV.NET 0.92: Vendredi 19 mars 2010 (ZGuideTV.NET bêta 9 build 0.92) - English below Corrections : - Gestion de certains contrôles dans l'écran principal. - Div...Most Popular ProjectsMetaSharpRawrWBFS ManagerSilverlight ToolkitASP.NET Ajax LibraryMicrosoft SQL Server Product Samples: DatabaseAJAX Control ToolkitLiveUpload to FacebookWindows Presentation Foundation (WPF)ASP.NETMost Active ProjectsLINQ to TwitterRawrOData SDK for PHPjQuery Library for SharePoint Web ServicesDirectQPHPExcelpatterns & practices – Enterprise LibraryBlogEngine.NETFarseer Physics EngineNB_Store - Free DotNetNuke Ecommerce Catalog Module

    Read the article

  • RiverTrail - JavaScript GPPGU Data Parallelism

    - by JoshReuben
    Where is WebCL ? The Khronos WebCL working group is working on a JavaScript binding to the OpenCL standard so that HTML 5 compliant browsers can host GPGPU web apps – e.g. for image processing or physics for WebGL games - http://www.khronos.org/webcl/ . While Nokia & Samsung have some protype WebCL APIs, Intel has one-upped them with a higher level of abstraction: RiverTrail. Intro to RiverTrail Intel Labs JavaScript RiverTrail provides GPU accelerated SIMD data-parallelism in web applications via a familiar JavaScript programming paradigm. It extends JavaScript with simple deterministic data-parallel constructs that are translated at runtime into a low-level hardware abstraction layer. With its high-level JS API, programmers do not have to learn a new language or explicitly manage threads, orchestrate shared data synchronization or scheduling. It has been proposed as a draft specification to ECMA a (known as ECMA strawman). RiverTrail runs in all popular browsers (except I.E. of course). To get started, download a prebuilt version https://github.com/downloads/RiverTrail/RiverTrail/rivertrail-0.17.xpi , install Intel's OpenCL SDK http://www.intel.com/go/opencl and try out the interactive River Trail shell http://rivertrail.github.com/interactive For a video overview, see  http://www.youtube.com/watch?v=jueg6zB5XaM . ParallelArray the ParallelArray type is the central component of this API & is a JS object that contains ordered collections of scalars – i.e. multidimensional uniform arrays. A shape property describes the dimensionality and size– e.g. a 2D RGBA image will have shape [height, width, 4]. ParallelArrays are immutable & fluent – they are manipulated by invoking methods on them which produce new ParallelArray objects. ParallelArray supports several constructors over arrays, functions & even the canvas. // Create an empty Parallel Array var pa = new ParallelArray(); // pa0 = <>   // Create a ParallelArray out of a nested JS array. // Note that the inner arrays are also ParallelArrays var pa = new ParallelArray([ [0,1], [2,3], [4,5] ]); // pa1 = <<0,1>, <2,3>, <4.5>>   // Create a two-dimensional ParallelArray with shape [3, 2] using the comprehension constructor var pa = new ParallelArray([3, 2], function(iv){return iv[0] * iv[1];}); // pa7 = <<0,0>, <0,1>, <0,2>>   // Create a ParallelArray from canvas.  This creates a PA with shape [w, h, 4], var pa = new ParallelArray(canvas); // pa8 = CanvasPixelArray   ParallelArray exposes fluent API functions that take an elemental JS function for data manipulation: map, combine, scan, filter, and scatter that return a new ParallelArray. Other functions are scalar - reduce  returns a scalar value & get returns the value located at a given index. The onus is on the developer to ensure that the elemental function does not defeat data parallelization optimization (avoid global var manipulation, recursion). For reduce & scan, order is not guaranteed - the onus is on the dev to provide an elemental function that is commutative and associative so that scan will be deterministic – E.g. Sum is associative, but Avg is not. map Applies a provided elemental function to each element of the source array and stores the result in the corresponding position in the result array. The map method is shape preserving & index free - can not inspect neighboring values. // Adding one to each element. var source = new ParallelArray([1,2,3,4,5]); var plusOne = source.map(function inc(v) {     return v+1; }); //<2,3,4,5,6> combine Combine is similar to map, except an index is provided. This allows elemental functions to access elements from the source array relative to the one at the current index position. While the map method operates on the outermost dimension only, combine, can choose how deep to traverse - it provides a depth argument to specify the number of dimensions it iterates over. The elemental function of combine accesses the source array & the current index within it - element is computed by calling the get method of the source ParallelArray object with index i as argument. It requires more code but is more expressive. var source = new ParallelArray([1,2,3,4,5]); var plusOne = source.combine(function inc(i) { return this.get(i)+1; }); reduce reduces the elements from an array to a single scalar result – e.g. Sum. // Calculate the sum of the elements var source = new ParallelArray([1,2,3,4,5]); var sum = source.reduce(function plus(a,b) { return a+b; }); scan Like reduce, but stores the intermediate results – return a ParallelArray whose ith elements is the results of using the elemental function to reduce the elements between 0 and I in the original ParallelArray. // do a partial sum var source = new ParallelArray([1,2,3,4,5]); var psum = source.scan(function plus(a,b) { return a+b; }); //<1, 3, 6, 10, 15> scatter a reordering function - specify for a certain source index where it should be stored in the result array. An optional conflict function can prevent an exception if two source values are assigned the same position of the result: var source = new ParallelArray([1,2,3,4,5]); var reorder = source.scatter([4,0,3,1,2]); // <2, 4, 5, 3, 1> // if there is a conflict use the max. use 33 as a default value. var reorder = source.scatter([4,0,3,4,2], 33, function max(a, b) {return a>b?a:b; }); //<2, 33, 5, 3, 4> filter // filter out values that are not even var source = new ParallelArray([1,2,3,4,5]); var even = source.filter(function even(iv) { return (this.get(iv) % 2) == 0; }); // <2,4> Flatten used to collapse the outer dimensions of an array into a single dimension. pa = new ParallelArray([ [1,2], [3,4] ]); // <<1,2>,<3,4>> pa.flatten(); // <1,2,3,4> Partition used to restore the original shape of the array. var pa = new ParallelArray([1,2,3,4]); // <1,2,3,4> pa.partition(2); // <<1,2>,<3,4>> Get return value found at the indices or undefined if no such value exists. var pa = new ParallelArray([0,1,2,3,4], [10,11,12,13,14], [20,21,22,23,24]) pa.get([1,1]); // 11 pa.get([1]); // <10,11,12,13,14>

    Read the article

1 2  | Next Page >