Search Results

Search found 4580 results on 184 pages for 'faster'.

Page 45/184 | < Previous Page | 41 42 43 44 45 46 47 48 49 50 51 52 | Next Page >

Network latency and speed of light

- by James

This was kinda of covered by the following Is minimum latency fixed by the speed of light? , but i would like to add the follow up a bit. The scenario is as follows; we have two opposing sites one on the West Coast of the US and one in Ireland. The customer is in central Europe, and has requested a latency test. Ireland gives responses of ~65-70ms. However the West Coast guys claim to be faster with a response of 60ms. Now a quick check says that light in fiber would take about 42ms to make the trip to the States and 8.5ms to Ireland. So obviously this is a single hop and does not include routers, switches, firewalls, protocol overhead etc. Would I be right to call BS on their figures? As a final note I tested a ping to Google IP address that was allegedly on the west coast from a site that covered a similar distance and was amazed to get a response time of 20ms. Suggesting ICMP packets that travel twice the speed of light. So A) what am I missing B) Am I right to suspect shenanigans? UPDATE: Guys thanks so far for your help and I have been reading various previous questions on this. About 5 years I had an issue where the hop from the UK to Ireland added 10ms of latency no matter what we did. In the end I moved the servers; So imagine my surprise when I have guys that claim they are 5ms faster with a transatlantic trip. So again should I call BS? Oh and assume both sites are normal mortals that don't have access to Google magical routing, warp dives or flux capacitors. :)

Read the article
Is a Hyperthreaded CPU more powerful and more efficient than a Dual-core CPU? [closed]

- by user1811864

which computer to choose with Pentium processor hello they are getting rid of the old computer equipment in the office and i have to choose the computer to take home i get first choice to pick. -15 inch lcd screen 4 gb of ram core 2 duo dual Core E8400 3.00 GHz dvd writer windows vista/ linux -15 inch crt monitor with 2 gb ram and pentium 4 2 ghz single core HT technology windows xp hardisks both 250 GB my friend is telling me to choose the second one Pentium single core HT because he told me it runs faster becuase of HT technology and cooler and consumes less current electricity so it wont get overheated because it has HT technology so it's high definition for encoding and watching HD movies and HD sound and is like a gaming pc to play internet games. And also he said the dual core 8400 runs at 3 ghz compared to the 2 ghz so it heats very much because of the two extra cores so it takes more current raising electricty bills and is not good for gaming and watching HD movies and internet flash animations and games because of getting heated everytime. And he wants to choose and take the E8400 because he has air conditioning at home so it will be safe from heating. So which one computer should i take is it really faster because of the HT High definition technology and will i be able to play internet flash card games better and watch good HD movies Youtube etc and play all the music and songs.

Read the article
How to speed up request/response to django using apache or another solution?

- by jbcurtin

Hey all, I'm mainly a developer, but every now and again I jump into the sys-admin position. For the most part I've gotten away with deploying php and python apps using apache. I write today because I'm starting to research faster alternatives to apache, yet still have some of the core features I require like put and delete methods and the ability to connect to a socket via apache. ( This I have not tried, but might be a nice whistle if I ever employ comet on my apps. ) As you've probably guessed, I use javascript exclusively for all my websites utilizing deep linking for SEO support. The main areas that I'm looking to increase performance is the connection between the django apps and the web server to the client response. Every day I work my best to keep the smallest memory foot print as possible, however I am getting to the end of my rope when it comes to working with apache. In general, keep in mind that I'm just starting this research so I'm looking more for material to read then solutions at this moment. My main questions: Am I missing something about apache that makes it faster then everything else? What would be a good server environment to deploy just static files one? What are some of the leading open-source and paid alternatives?

Read the article
why would resetting the Netgear N300 router fix my Win 7 laptop's slow wifi?

- by rjnagle

In the past day the wifi download speeds of my Win 7 HP 64 bit laptop have slowed considerably. I am trying to troubleshoot the problem and to figure out whether it's hardware related (i.e., is the Intel(R) Centrino(R) Wireless-N 2230 the problem?) or router related. I have a Netgear N300 router connected to my modem. I'm using Speedtest to measure my speed. First, during my problem state, my ipad can download and upload at normal speeds. It's only my Win 7 laptop which is having problems. Because my ipad downloads at normal speeds, that would tell me that the problem is specific to the laptop (either HW or SW). But when I restarted my Netgear router, the laptop wifi problems disappeared. That just doesn't make sense. If we know that one device can connect properly to the router, why would a laptop have problems? What are some possible reasons why this might happen? Also, during my problem state, I noticed that on my laptop upload speeds were faster than my download speeds. Anybody have a guess about what might cause upload speeds on one device to be faster than another? Is there any actions i could take (or options to enable) so this problem won't occur. (I initially thought my problem might be software related or memory related -- Norton AV or browser plugins. But even after I disabled everything and made sure memory footprint was minimal, the slowdown was still occurring -- and it solved itself altogether when the router was reset).

Read the article
100% CPU when doing 4 or more concurrent requests with Magento

- by pancake

Currently I'm having trouble with a server running Magento, it's unbelievably slow. It's a VPS with a few Magento installations on it used for development, so I'm the only one using them. When I do 4 request all 2 seconds after each other I'm finished in 10 seconds. Slow, but still within the limits of my patience. When I do 4 "concurrent" requests, however (opening 4 tabs in a row, very quickly) all four cores go to 100% and stay there for like a minute. How is this possible? I know that there are a lot of possibilities here, so any tips on how to make an Apache/PHP server go faster are also welcome. It used to go a lot faster before, and I've also tried APC, but it kept causing problems (PHP errors, something with memory pools) so I've disabled it. By the way, the Magento cache is off and compiling is also off. I know this makes Magento slower than usual, but I don't think a 60 second response time is normal for any Magento installation. Virtual hardware: 4 Cores and 4096MB RAM Swap is never used (checked with htop) 100GB disk space, of which 10% is in use Software: Debian 6 DirectAdmin and apache custombuild PHP 5.2.17 (CLI) If you need more info, please tell me how to get it, because I probably don't know how. I do know how to use the command line in linux and the usage of quite a few commands, but my experience with managing a server is limited.

Read the article
Help with optimizing C# function via C and/or Assembly

- by MusiGenesis

I have this C# method which I'm trying to optimize: // assume arrays are same dimensions private void DoSomething(int[] bigArray1, int[] bigArray2) { int data1; byte A1; byte B1; byte C1; byte D1; int data2; byte A2; byte B2; byte C2; byte D2; for (int i = 0; i < bigArray1.Length; i++) { data1 = bigArray1[i]; data2 = bigArray2[i]; A1 = (byte)(data1 >> 0); B1 = (byte)(data1 >> 8); C1 = (byte)(data1 >> 16); D1 = (byte)(data1 >> 24); A2 = (byte)(data2 >> 0); B2 = (byte)(data2 >> 8); C2 = (byte)(data2 >> 16); D2 = (byte)(data2 >> 24); A1 = A1 > A2 ? A1 : A2; B1 = B1 > B2 ? B1 : B2; C1 = C1 > C2 ? C1 : C2; D1 = D1 > D2 ? D1 : D2; bigArray1[i] = (A1 << 0) | (B1 << 8) | (C1 << 16) | (D1 << 24); } } The function basically compares two int arrays. For each pair of matching elements, the method compares each individual byte value and takes the larger of the two. The element in the first array is then assigned a new int value constructed from the 4 largest byte values (irrespective of source). I think I have optimized this method as much as possible in C# (probably I haven't, of course - suggestions on that score are welcome as well). My question is, is it worth it for me to move this method to an unmanaged C DLL? Would the resulting method execute faster (and how much faster), taking into account the overhead of marshalling my managed int arrays so they can be passed to the method? If doing this would get me, say, a 10% speed improvement, then it would not be worth my time for sure. If it was 2 or 3 times faster, then I would probably have to do it. Note: please, no "premature optimization" comments, thanks in advance. This is simply "optimization".

Read the article
What language/framework (technology) to use for website (flash games portal)

- by cripox

Hello, I know there are a lot of similar questions on the net, but because I am a newbie in web development I didn't find the solution for my specific problem. I am planing on creating a flash games portal from scratch. It is a big chance that there will be big traffic from the beginning (millions of pageviews). I want to reduce the server costs as much as possible but in the same time to not be tide to an expensive contract as there is a chance that the project will not be as successfully as I want and in that case the money would be very little. The question is : what technology to use? I don't know any web dev technology yet so it doesn't matter what I will learn. My web dev experience is a little php 8 years ago, and from then I programmed in C++ / Java- game and mobile development. I like Java and C syntax and language very much and I tend to dislike dynamic typing or non robust scripting (like php)- but I can get along if these are the best choices. The candidates are now: - Grails (my best for now) Ruby on Rails Cake PHP Other technologies (Google App Engine, Python/Django etc...) I was considering at first using pure C and compiling the web app in the server- just to squeeze more from the servers, but soon I understand that this is overkill. Next my eyes came on Ruby - as there is a lot of buzz for it's easiness of use. Next I discovered Grails and looked at Java because it is said that it is "faster". But I don't know what this "Faster" really means on my needs, so here comes the first question: 1) What will be my biggest consumption on the server, other than bandwidth, for a lot of flash content requests? Is it memory? I heard that Java needs a lot of memory, but is faster. Is it CPU? I am planning to take some daily VPS.NET nodes at first, to see if there is a demand, and if the "spike" is permanent to move to a dedicated server (serverloft.com has some good offers), else to remain with less nodes. I was also considering developing in Google App Engine- cheap or free hosting to use at first - so I can test my assumption- and also very easy to use (no need for sys administration) but the costs became high if used more ( 3 million games played / month .. x mb/ each). And the issue with Google is that it looks me in this technology. My other concern is scalability (not only for traffic/users, but as adding functionality) My plans are to release a functional site in just 4 weeks (just the basics frontend and some quick basic backend - so I can be able to modify some things and add games manually) - but then to raise it and add more things to it. I am planning to take a little different approach than other portals so I need to write it from scratch (a script will not do). 2) Will Grails take much more resources than RoR or Php server wise? I heard that making it on Java stack will be hardware expensive and is overkill if you don't make a bank application. My application will not be very complex (I hope and i will try to) but will have a lot of traffic. I also took in account using CDN for files, but the cheapest CDN found was 5c/GB (vps.net) and the cost per gb on serverloft (http://www.serverloft.com/dedizierte-server/server-details.php?products=4) is only 1.79 cents/GB and comes with the other resources either. I am new to this domain (web). I am learning the ropes and searching on the web for ~half of year but don't have any really practical experience, so I know that I must have some naive thinking and other issues that i don't know from now, so please give me any advice you want regarding anything, not just the specific questions asked. And thank you so much for such great community!

Read the article
Heaps of Trouble?

- by Paul White NZ

If you’re not already a regular reader of Brad Schulz’s blog, you’re missing out on some great material. In his latest entry, he is tasked with optimizing a query run against tables that have no indexes at all. The problem is, predictably, that performance is not very good. The catch is that we are not allowed to create any indexes (or even new statistics) as part of our optimization efforts. In this post, I’m going to look at the problem from a slightly different angle, and present an alternative solution to the one Brad found. Inevitably, there’s going to be some overlap between our entries, and while you don’t necessarily need to read Brad’s post before this one, I do strongly recommend that you read it at some stage; he covers some important points that I won’t cover again here. The Example We’ll use data from the AdventureWorks database, copied to temporary unindexed tables. A script to create these structures is shown below: CREATE TABLE #Custs ( CustomerID INTEGER NOT NULL, TerritoryID INTEGER NULL, CustomerType NCHAR(1) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, ); GO CREATE TABLE #Prods ( ProductMainID INTEGER NOT NULL, ProductSubID INTEGER NOT NULL, ProductSubSubID INTEGER NOT NULL, Name NVARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, ); GO CREATE TABLE #OrdHeader ( SalesOrderID INTEGER NOT NULL, OrderDate DATETIME NOT NULL, SalesOrderNumber NVARCHAR(25) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, CustomerID INTEGER NOT NULL, ); GO CREATE TABLE #OrdDetail ( SalesOrderID INTEGER NOT NULL, OrderQty SMALLINT NOT NULL, LineTotal NUMERIC(38,6) NOT NULL, ProductMainID INTEGER NOT NULL, ProductSubID INTEGER NOT NULL, ProductSubSubID INTEGER NOT NULL, ); GO INSERT #Custs ( CustomerID, TerritoryID, CustomerType ) SELECT C.CustomerID, C.TerritoryID, C.CustomerType FROM AdventureWorks.Sales.Customer C WITH (TABLOCK); GO INSERT #Prods ( ProductMainID, ProductSubID, ProductSubSubID, Name ) SELECT P.ProductID, P.ProductID, P.ProductID, P.Name FROM AdventureWorks.Production.Product P WITH (TABLOCK); GO INSERT #OrdHeader ( SalesOrderID, OrderDate, SalesOrderNumber, CustomerID ) SELECT H.SalesOrderID, H.OrderDate, H.SalesOrderNumber, H.CustomerID FROM AdventureWorks.Sales.SalesOrderHeader H WITH (TABLOCK); GO INSERT #OrdDetail ( SalesOrderID, OrderQty, LineTotal, ProductMainID, ProductSubID, ProductSubSubID ) SELECT D.SalesOrderID, D.OrderQty, D.LineTotal, D.ProductID, D.ProductID, D.ProductID FROM AdventureWorks.Sales.SalesOrderDetail D WITH (TABLOCK); The query itself is a simple join of the four tables: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #Prods P JOIN #OrdDetail D ON P.ProductMainID = D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID JOIN #OrdHeader H ON D.SalesOrderID = H.SalesOrderID JOIN #Custs C ON H.CustomerID = C.CustomerID ORDER BY P.ProductMainID ASC OPTION (RECOMPILE, MAXDOP 1); Remember that these tables have no indexes at all, and only the single-column sampled statistics SQL Server automatically creates (assuming default settings). The estimated query plan produced for the test query looks like this (click to enlarge): The Problem The problem here is one of cardinality estimation – the number of rows SQL Server expects to find at each step of the plan. The lack of indexes and useful statistical information means that SQL Server does not have the information it needs to make a good estimate. Every join in the plan shown above estimates that it will produce just a single row as output. Brad covers the factors that lead to the low estimates in his post. In reality, the join between the #Prods and #OrdDetail tables will produce 121,317 rows. It should not surprise you that this has rather dire consequences for the remainder of the query plan. In particular, it makes a nonsense of the optimizer’s decision to use Nested Loops to join to the two remaining tables. Instead of scanning the #OrdHeader and #Custs tables once (as it expected), it has to perform 121,317 full scans of each. The query takes somewhere in the region of twenty minutes to run to completion on my development machine. A Solution At this point, you may be thinking the same thing I was: if we really are stuck with no indexes, the best we can do is to use hash joins everywhere. We can force the exclusive use of hash joins in several ways, the two most common being join and query hints. A join hint means writing the query using the INNER HASH JOIN syntax; using a query hint involves adding OPTION (HASH JOIN) at the bottom of the query. The difference is that using join hints also forces the order of the join, whereas the query hint gives the optimizer freedom to reorder the joins at its discretion. Adding the OPTION (HASH JOIN) hint results in this estimated plan: That produces the correct output in around seven seconds, which is quite an improvement! As a purely practical matter, and given the rigid rules of the environment we find ourselves in, we might leave things there. (We can improve the hashing solution a bit – I’ll come back to that later on). Faster Nested Loops It might surprise you to hear that we can beat the performance of the hash join solution shown above using nested loops joins exclusively, and without breaking the rules we have been set. The key to this part is to realize that a condition like (A = B) can be expressed as (A <= B) AND (A >= B). Armed with this tremendous new insight, we can rewrite the join predicates like so: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #OrdDetail D JOIN #OrdHeader H ON D.SalesOrderID >= H.SalesOrderID AND D.SalesOrderID <= H.SalesOrderID JOIN #Custs C ON H.CustomerID >= C.CustomerID AND H.CustomerID <= C.CustomerID JOIN #Prods P ON P.ProductMainID >= D.ProductMainID AND P.ProductMainID <= D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID ORDER BY D.ProductMainID OPTION (RECOMPILE, LOOP JOIN, MAXDOP 1, FORCE ORDER); I’ve also added LOOP JOIN and FORCE ORDER query hints to ensure that only nested loops joins are used, and that the tables are joined in the order they appear. The new estimated execution plan is: This new query runs in under 2 seconds. Why Is It Faster? The main reason for the improvement is the appearance of the eager Index Spools, which are also known as index-on-the-fly spools. If you read my Inside The Optimiser series you might be interested to know that the rule responsible is called JoinToIndexOnTheFly. An eager index spool consumes all rows from the table it sits above, and builds a index suitable for the join to seek on. Taking the index spool above the #Custs table as an example, it reads all the CustomerID and TerritoryID values with a single scan of the table, and builds an index keyed on CustomerID. The term ‘eager’ means that the spool consumes all of its input rows when it starts up. The index is built in a work table in tempdb, has no associated statistics, and only exists until the query finishes executing. The result is that each unindexed table is only scanned once, and just for the columns necessary to build the temporary index. From that point on, every execution of the inner side of the join is answered by a seek on the temporary index – not the base table. A second optimization is that the sort on ProductMainID (required by the ORDER BY clause) is performed early, on just the rows coming from the #OrdDetail table. The optimizer has a good estimate for the number of rows it needs to sort at that stage – it is just the cardinality of the table itself. The accuracy of the estimate there is important because it helps determine the memory grant given to the sort operation. Nested loops join preserves the order of rows on its outer input, so sorting early is safe. (Hash joins do not preserve order in this way, of course). The extra lazy spool on the #Prods branch is a further optimization that avoids executing the seek on the temporary index if the value being joined (the ‘outer reference’) hasn’t changed from the last row received on the outer input. It takes advantage of the fact that rows are still sorted on ProductMainID, so if duplicates exist, they will arrive at the join operator one after the other. The optimizer is quite conservative about introducing index spools into a plan, because creating and dropping a temporary index is a relatively expensive operation. It’s presence in a plan is often an indication that a useful index is missing. I want to stress that I rewrote the query in this way primarily as an educational exercise – I can’t imagine having to do something so horrible to a production system. Improving the Hash Join I promised I would return to the solution that uses hash joins. You might be puzzled that SQL Server can create three new indexes (and perform all those nested loops iterations) faster than it can perform three hash joins. The answer, again, is down to the poor information available to the optimizer. Let’s look at the hash join plan again: Two of the hash joins have single-row estimates on their build inputs. SQL Server fixes the amount of memory available for the hash table based on this cardinality estimate, so at run time the hash join very quickly runs out of memory. This results in the join spilling hash buckets to disk, and any rows from the probe input that hash to the spilled buckets also get written to disk. The join process then continues, and may again run out of memory. This is a recursive process, which may eventually result in SQL Server resorting to a bailout join algorithm, which is guaranteed to complete eventually, but may be very slow. The data sizes in the example tables are not large enough to force a hash bailout, but it does result in multiple levels of hash recursion. You can see this for yourself by tracing the Hash Warning event using the Profiler tool. The final sort in the plan also suffers from a similar problem: it receives very little memory and has to perform multiple sort passes, saving intermediate runs to disk (the Sort Warnings Profiler event can be used to confirm this). Notice also that because hash joins don’t preserve sort order, the sort cannot be pushed down the plan toward the #OrdDetail table, as in the nested loops plan. Ok, so now we understand the problems, what can we do to fix it? We can address the hash spilling by forcing a different order for the joins: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #Prods P JOIN #Custs C JOIN #OrdHeader H ON H.CustomerID = C.CustomerID JOIN #OrdDetail D ON D.SalesOrderID = H.SalesOrderID ON P.ProductMainID = D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID ORDER BY D.ProductMainID OPTION (MAXDOP 1, HASH JOIN, FORCE ORDER); With this plan, each of the inputs to the hash joins has a good estimate, and no hash recursion occurs. The final sort still suffers from the one-row estimate problem, and we get a single-pass sort warning as it writes rows to disk. Even so, the query runs to completion in three or four seconds. That’s around half the time of the previous hashing solution, but still not as fast as the nested loops trickery. Final Thoughts SQL Server’s optimizer makes cost-based decisions, so it is vital to provide it with accurate information. We can’t really blame the performance problems highlighted here on anything other than the decision to use completely unindexed tables, and not to allow the creation of additional statistics. I should probably stress that the nested loops solution shown above is not one I would normally contemplate in the real world. It’s there primarily for its educational and entertainment value. I might perhaps use it to demonstrate to the sceptical that SQL Server itself is crying out for an index. Be sure to read Brad’s original post for more details. My grateful thanks to him for granting permission to reuse some of his material. Paul White Email: [email protected] Twitter: @PaulWhiteNZ

Read the article
CodePlex Daily Summary for Sunday, May 25, 2014

CodePlex Daily Summary for Sunday, May 25, 2014Popular ReleasesClosedXML - The easy way to OpenXML: ClosedXML 0.71.0: Major improvement when saving large files.SimCityPak: SimCityPak 0.3.1.0: Main New Features: Fixed Importing of Instance Names (get rid of the Dutch translations) Added advanced editor for Decal Dictionaries Added possibility to import .PNG to generate new decals Added advanced editor for Path display entriesTiny Deduplicator: Tiny Deduplicator 1.0.1.0: Increased version number to 1.0.1.0 Moved all options to a separate 'Options' dialog window. Allows the user to specify a selection strategy which will help when dealing with large numbers of duplicate files. Available options are "None," "Keep First," and "Keep Last"SEToolbox: SEToolbox 01.031.009 Release 1: Added mirroring of ConveyorTubeCurved. Updated Ship cube rotation to rotate ship back to original location (cubes are reoriented but ship appears no different to outsider), and to rotate Grouped items. Repair now fixes the loss of Grouped controls due to changes in Space Engineers 01.030. Added export asteroids. Rejoin ships will merge grouping and conveyor systems (even though broken ships currently only maintain the Grouping on one part of the ship). Installation of this version wi...Player Framework by Microsoft: Player Framework for Windows and WP v2.0: Support for new Universal and Windows Phone 8.1 projects for both Xaml and JavaScript projects. See a detailed list of improvements, breaking changes and a general overview of version 2 ADDITIONAL DOWNLOADSSmooth Streaming Client SDK for Windows 8 Applications Smooth Streaming Client SDK for Windows 8.1 Applications Smooth Streaming Client SDK for Windows Phone 8.1 Applications Microsoft PlayReady Client SDK for Windows 8 Applications Microsoft PlayReady Client SDK for Windows 8.1 Applicat...TerraMap (Terraria World Map Viewer): TerraMap 1.0.6: Added support for the new Terraria v1.2.4 update. New items, walls, and tiles Added the ability to select multiple highlighted block types. Added a dynamic, interactive highlight opacity slider, making it easier to find highlighted tiles with dark colors (and fixed blurriness from 1.0.5 alpha). Added ability to find Enchanted Swords (in the stone) and Water Bolt books Fixed Issue 35206: Hightlight/Find doesn't work for Demon Altars Fixed finding Demon Hearts/Shadow Orbs Fixed inst...DotNet.Highcharts: DotNet.Highcharts 4.0 with Examples: DotNet.Highcharts 4.0 Tested and adapted to the latest version of Highcharts 4.0.1 Added new chart type: Heatmap Added new type PointPlacement which represents enumeration or number for the padding of the X axis. Changed target framework from .NET Framework 4 to .NET Framework 4.5. Closed issues: 974: Add 'overflow' property to PlotOptionsColumnDataLabels class 997: Split container from JS 1006: Series/Categories with numeric names don't render DotNet.Highcharts.Samples Updated s...51Degrees - Device Detection and Redirection: 3.1.1.12: Version 3.1 HighlightsDevice detection algorithm is over 100 times faster. Regular expressions and levenshtein distance calculations are no longer used. The device detection algorithm performance is no longer limited by the number of device combinations contained in the dataset. Two modes of operation are available: Memory – the detection data set is loaded into memory and there is no continuous connection to the source data file. Slower initialisation time but faster detection performanc...VisioAutomation: Visio PowerShell Module (VisioPS) 1.2.0: DocumentationDocumentation is here http://sdrv.ms/11AWkp7 Screencasthttp://vimeo.com/61329170 FilesFor easy installation, download and run the MSI file. If you want to manually install, a ZIP file is provided. ChangeLogInvoke-* cmdlets replaced with more specific PowerShell Verbs Enhanced handling of values in User-Defined Cells Get-VisioShape works more unituitivelyNino Seisei Code Generator: Nino Seisei v2.1.1: Mejoras en la interfaz, posibilidad de ejecutar instrucciones Berta cíclicas una dentro de otra con la nueva instrucción @BSETRECURSIVITY_ON. GUI Improvements, posibility of running ciclic Berta instructions one inside another with the new @BSETRECURSIVITY_ON instruction.PowerShell App Deployment Toolkit: PowerShell App Deployment Toolkit v3.1.3: Added CompressLogs option to the config file. Each Install / Uninstall creates a timestamped zip file with all MSI and PSAppDeployToolkit logs contained within Added variable expansion to all paths in the configuration file Added documentation for each of the Toolkit internal variables that can be used Changed Install-MSUpdates to continue if any errors are encountered when installing updates Implement /Force parameter on Update-GroupPolicy (ensure that any logoff message is ignored) ...ULS Log Viewer: Alpha 0.2: Changeset CI&T ULS Log 22/05/2014 - Inclusão do botão de limpar filtro - Inclusão da possibilidade de filtrar as entradas pelo texto da mensagem; - Inclusão da opção de abrir mais de um arquivo de log no mesmo grid para análise; - Inclusão da tela de Sobre. - Campo de Filtro Rápido Level carrega somente os Levels encontrados no arquivo de log carregado; - Campo de exibição rápida de mensagem setado para somente leitura; - Inclusão da Barra de Status com informações do nome do arquivo ...Application Parameters for Microsoft Dynamics CRM: Application Parameters (1.2.0.1): Fix plugin when updating parameters without changing parameter typeWordMat: WordMat v. 1.07: A quick fix because scientific notation was broken in v. 1.06 read more at http://wordmat.blogspot.com????: 《????》: 《????》（c???）??“????”???????，???????????????C?????????。???????,???????????????????????. ??????????????????????????????????；????????????????????????????。Mini SQL Query: Mini SQL Query (1.0.72.457): Apologies for the previous update! FK issue fixed and also a template data cache issue.Wsus Package Publisher: Release v1.3.1405.17: Add Russian translation (thanks to VSharmanov) Fix a bug that make WPP to crash if the user click on "Connect/Reload" while the Report Tab is loading. Enhance the way WPP store the password for remote computers command.MoreTerra (Terraria World Viewer): More Terra 1.12.9: =========== = Compatibility = =========== Updated to account for new format 1.2.4.1 =========== = Issues = =========== all items have not been added. Some colors for new tiles may be off. I wanted to get this out so people have a usable program.LINQ to Twitter: LINQ to Twitter v3.0.3: Supports .NET 4.5x, Windows Phone 8.x, Windows 8.x, Windows Azure, Xamarin.Android, and Xamarin.iOS. New features include Status/Lookup, Mute APIs, and bug fixes. 100% Twitter API v1.1 coverage, Async, Portable Class Library (PCL).CS-Script for Notepad++ (C# intellisense and code execution): Release v1.0.26.0: Added access to the Release Notes during 'Check for Updates...'' Debug panels Added support for generic types members Members are grouped into 'Raw View' and 'Non-Public members' categories Implemented dedicated (array-like) view for Lists and Dictionaries http://download-codeplex.sec.s-msft.com/Download?ProjectName=csscriptnpp&DownloadId=846498New Projects2111110107: Thanh Loi2111110152: Nguy?n Doãn Tu?nASP.NET MVC4 Warehouse management system: WMSNet is an easy to use warehouse management solution for manually operated warehouses and can smoothly be customized according to your requirements. Code Snippets: Code snippets to empower the developers to write quality code faster while adhering to the industry standards.CRM 2011 / CRM 2013 Form Helper: Library of CRM 2011 / CRM 2013 Web Resources that can be used on Forms for making input simpler; ex Automatic title case Kinect Translation Tool: From Sign Language to spoken text and vice versa: Software System Component 1. Kinect SDKver.1.7 for the Kinect sensor. 2. Windows 7 standard APIs- The audio, speech, and media APIs in Windows 7MDriven Getting started - MVC: This is the suggested getting started template for doing MVC with MDriven. Fork it and begin to fill up with your model. Rules Engine Validator: A simple rules validator. It's based on a rule manager component (an implementation of the Command GoF pattern), with a main method called Validate().Simple Connect To Db: ????? ???? ?? ??????? ?? ????? ??? ? ??????? ????? ??z3-str-purdue: test geekchina.com

Read the article
Table Variables: an empirical approach.

- by Phil Factor

It isn’t entirely a pleasant experience to publish an article only to have it described on Twitter as ‘Horrible’, and to have it criticized on the MVP forum. When this happened to me in the aftermath of publishing my article on Temporary tables recently, I was taken aback, because these critics were experts whose views I respect. What was my crime? It was, I think, to suggest that, despite the obvious quirks, it was best to use Table Variables as a first choice, and to use local Temporary Tables if you hit problems due to these quirks, or if you were doing complex joins using a large number of rows. What are these quirks? Well, table variables have advantages if they are used sensibly, but this requires some awareness by the developer about the potential hazards and how to avoid them. You can be hit by a badly-performing join involving a table variable. Table Variables are a compromise, and this compromise doesn’t always work out well. Explicit indexes aren’t allowed on Table Variables, so one cannot use covering indexes or non-unique indexes. The query optimizer has to make assumptions about the data rather than using column distribution statistics when a table variable is involved in a join, because there aren’t any column-based distribution statistics on a table variable. It assumes a reasonably even distribution of data, and is likely to have little idea of the number of rows in the table variables that are involved in queries. However complex the heuristics that are used might be in determining the best way of executing a SQL query, and they most certainly are, the Query Optimizer is likely to fail occasionally with table variables, under certain circumstances, and produce a Query Execution Plan that is frightful. The experienced developer or DBA will be on the lookout for this sort of problem. In this blog, I’ll be expanding on some of the tests I used when writing my article to illustrate the quirks, and include a subsequent example supplied by Kevin Boles. A simplified example. We’ll start out by illustrating a simple example that shows some of these characteristics. We’ll create two tables filled with random numbers and then see how many matches we get between the two tables. We’ll forget indexes altogether for this example, and use heaps. We’ll try the same Join with two table variables, two table variables with OPTION (RECOMPILE) in the JOIN clause, and with two temporary tables. It is all a bit jerky because of the granularity of the timing that isn’t actually happening at the millisecond level (I used DATETIME). However, you’ll see that the table variable is outperforming the local temporary table up to 10,000 rows. Actually, even without a use of the OPTION (RECOMPILE) hint, it is doing well. What happens when your table size increases? The table variable is, from around 30,000 rows, locked into a very bad execution plan unless you use OPTION (RECOMPILE) to provide the Query Analyser with a decent estimation of the size of the table. However, if it has the OPTION (RECOMPILE), then it is smokin’. Well, up to 120,000 rows, at least. It is performing better than a Temporary table, and in a good linear fashion. What about mixed table joins, where you are joining a temporary table to a table variable? You’d probably expect that the query analyzer would throw up its hands and produce a bad execution plan as if it were a table variable. After all, it knows nothing about the statistics in one of the tables so how could it do any better? Well, it behaves as if it were doing a recompile. And an explicit recompile adds no value at all. (we just go up to 45000 rows since we know the bigger picture now) Now, if you were new to this, you might be tempted to start drawing conclusions. Beware! We’re dealing with a very complex beast: the Query Optimizer. It can come up with surprises What if we change the query very slightly to insert the results into a Table Variable? We change nothing else and just measure the execution time of the statement as before. Suddenly, the table variable isn’t looking so much better, even taking into account the time involved in doing the table insert. OK, if you haven’t used OPTION (RECOMPILE) then you’re toast. Otherwise, there isn’t much in it between the Table variable and the temporary table. The table variable is faster up to 8000 rows and then not much in it up to 100,000 rows. Past the 8000 row mark, we’ve lost the advantage of the table variable’s speed. Any general rule you may be formulating has just gone for a walk. What we can conclude from this experiment is that if you join two table variables, and can’t use constraints, you’re going to need that Option (RECOMPILE) hint. Count Dracula and the Horror Join. These tables of integers provide a rather unreal example, so let’s try a rather different example, and get stuck into some implicit indexing, by using constraints. What unusual words are contained in the book ‘Dracula’ by Bram Stoker? Here we get a table of all the common words in the English language (60,387 of them) and put them in a table. We put them in a Table Variable with the word as a primary key, a Table Variable Heap and a Table Variable with a primary key. We then take all the distinct words used in the book ‘Dracula’ (7,558 of them). We then create a table variable and insert into it all those uncommon words that are in ‘Dracula’. i.e. all the words in Dracula that aren’t matched in the list of common words. To do this we use a left outer join, where the right-hand value is null. The results show a huge variation, between the sublime and the gorblimey. If both tables contain a Primary Key on the columns we join on, and both are Table Variables, it took 33 Ms. If one table contains a Primary Key, and the other is a heap, and both are Table Variables, it took 46 Ms. If both Table Variables use a unique constraint, then the query takes 36 Ms. If neither table contains a Primary Key and both are Table Variables, it took 116383 Ms. Yes, nearly two minutes!! If both tables contain a Primary Key, one is a Table Variables and the other is a temporary table, it took 113 Ms. If one table contains a Primary Key, and both are Temporary Tables, it took 56 Ms.If both tables are temporary tables and both have primary keys, it took 46 Ms. Here we see table variables which are joined on their primary key again enjoying a slight performance advantage over temporary tables. Where both tables are table variables and both are heaps, the query suddenly takes nearly two minutes! So what if you have two heaps and you use option Recompile? If you take the rogue query and add the hint, then suddenly, the query drops its time down to 76 Ms. If you add unique indexes, then you've done even better, down to half that time. Here are the text execution plans.So where have we got to? Without drilling down into the minutiae of the execution plans we can begin to create a hypothesis. If you are using table variables, and your tables are relatively small, they are faster than temporary tables, but as the number of rows increases you need to do one of two things: either you need to have a primary key on the column you are using to join on, or else you need to use option (RECOMPILE) If you try to execute a query that is a join, and both tables are table variable heaps, you are asking for trouble, well- slow queries, unless you give the table hint once the number of rows has risen past a point (30,000 in our first example, but this varies considerably according to context). Kevin’s Skew In describing the table-size, I used the term ‘relatively small’. Kevin Boles produced an interesting case where a single-row table variable produces a very poor execution plan when joined to a very, very skewed table. In the original, pasted into my article as a comment, a column consisted of 100000 rows in which the key column was one number (1) . To this was added eight rows with sequential numbers up to 9. When this was joined to a single-tow Table Variable with a key of 2 it produced a bad plan. This problem is unlikely to occur in real usage, and the Query Optimiser team probably never set up a test for it. Actually, the skew can be slightly less extreme than Kevin made it. The following test showed that once the table had 54 sequential rows in the table, then it adopted exactly the same execution plan as for the temporary table and then all was well. Undeniably, real data does occasionally cause problems to the performance of joins in Table Variables due to the extreme skew of the distribution. We've all experienced Perfectly Poisonous Table Variables in real live data. As in Kevin’s example, indexes merely make matters worse, and the OPTION (RECOMPILE) trick does nothing to help. In this case, there is no option but to use a temporary table. However, one has to note that once the slight de-skew had taken place, then the plans were identical across a huge range. Conclusions Where you need to hold intermediate results as part of a process, Table Variables offer a good alternative to temporary tables when used wisely. They can perform faster than a temporary table when the number of rows is not great. For some processing with huge tables, they can perform well when only a clustered index is required, and when the nature of the processing makes an index seek very effective. Table Variables are scoped to the batch or procedure and are unlikely to hang about in the TempDB when they are no longer required. They require no explicit cleanup. Where the number of rows in the table is moderate, you can even use them in joins as ‘Heaps’, unindexed. Beware, however, since, as the number of rows increase, joins on Table Variable heaps can easily become saddled by very poor execution plans, and this must be cured either by adding constraints (UNIQUE or PRIMARY KEY) or by adding the OPTION (RECOMPILE) hint if this is impossible. Occasionally, the way that the data is distributed prevents the efficient use of Table Variables, and this will require using a temporary table instead. Tables Variables require some awareness by the developer about the potential hazards and how to avoid them. If you are not prepared to do any performance monitoring of your code or fine-tuning, and just want to pummel out stuff that ‘just runs’ without considering namby-pamby stuff such as indexes, then stick to Temporary tables. If you are likely to slosh about large numbers of rows in temporary tables without considering the niceties of processing just what is required and no more, then temporary tables provide a safer and less fragile means-to-an-end for you.

Read the article
CodePlex Daily Summary for Tuesday, November 30, 2010

CodePlex Daily Summary for Tuesday, November 30, 2010Popular ReleasesSense/Net Enterprise Portal & ECMS: SenseNet 6.0.1 Community Edition: Sense/Net 6.0.1 Community Edition This half year we have been working quite fiercely to bring you the long-awaited release of Sense/Net 6.0. Download this Community Edition to see what we have been up to. These months we have worked on getting the WebCMS capabilities of Sense/Net 6.0 up to par. New features include: New, powerful page and portlet editing experience. HTML and CSS cleanup, new, powerful site skinning system. Upgraded, lightning-fast indexing and query via Lucene. Limita...Minecraft GPS: Minecraft GPS 1.1.1: New Features Compass! New style. Set opacity on main window to allow overlay of Minecraft. Open World in any folder. Fixes Fixed style so listbox won't grow the window size. Fixed open file dialog issue on non-vista kernel machines.DotSpatial: DotSpatial 11-28-2001: This release introduces some exciting improvements. Support for big raster, both in display and changing the scheme. Faster raster scheme creation for all rasters. Caching of the "sample" values so once obtained the raster symbolizer dialog loads faster. Reprojection supported for raster and image classes. Affine transform fully supported for images and rasters, so skewed images are now possible. Projection uses better checks when loading unprojected layers. GDAL raster support f...Virtu: Virtu 0.9.0: Source Requirements.NET Framework 4 Visual Studio 2010 or Visual Studio 2010 Express Silverlight 4 Tools for Visual Studio 2010 Windows Phone 7 Developer Tools (which includes XNA Game Studio 4) Binaries RequirementsSilverlight 4 .NET Framework 4 XNA Framework 4SuperWebSocket: SuperWebSocket(60438): It is the first release of SuperWebSocket. Because it is base on SuperSocket, most features of SuperSocket are supported in SuperWebSocket. The source code include a LiveChat demo.MDownloader: MDownloader-0.15.25.7002: Fixed updater Fixed FileServe Fixed LetItBitNotepad.NET: Notepad.NET 0.7 Preview 1: Whats New?* Optimized Code Generation: Which means it will run significantly faster. * Preview of Syntax Highlighting: Only VB.NET highlighting is supported, C# and Ruby will come in Preview 2. * Improved Editing Updates (when the line number, etc updates) to be more graceful. * Recent Documents works! * Images can be inserted but they're extremely large. Known Bugs* The Update Process hangs: This is a bug apparently spawning since 0.5. It will be fixed in Preview 2. Until then, perform a fr...Cropper: 1.9.4: Mostly fixes for issues with a few feature requests. Fixed Issues 2730 & 3638 & 14467 11044 11447 11448 11449 14665 Implemented Features 6123 11581PFC: PFC for PB 11.5: This is just a migration from the 11.0 code. No changes have been made yet (and they are needed) for it to work properly with 11.5.PDF Rider: PDF Rider 0.5: This release does not add any new feature for pdf manipulation, but enables automatic updates checking, so it is reccomended to install it in order to stay updated with next releases. Prerequisites * Microsoft Windows Operating Systems (XP - Vista - 7) * Microsoft .NET Framework 3.5 runtime * A PDF rendering software (i.e. Adobe Reader) that can be opened inside Internet Explorer. Installation instructionsChoose one of the following methods: 1. Download and run the "pdfRider0...BCLExtensions: BCL Extensions v1.0: The files associated with v1.0 of the BCL Extensions library.XamlQuery/WPF - The Write Less, Do More, WPF Library: XamlQuery-WPF v1.2 (Runtime, Source): This is the first release of popular XamlQuery library for WPF. XamlQuery has already gained recognition among Silverlight developers.Math.NET Numerics: Beta 1: First beta of Math.NET Numerics. Only contains the managed linear algebra provider. Beta 2 will include the native linear algebra providers along with better documentation and examples.Microsoft All-In-One Code Framework: Visual Studio 2010 Code Samples 2010-11-25: Code samples for Visual Studio 2010Wii Backup Fusion: Wii Backup Fusion 0.8.5 Beta: - WBFS repair (default) options fixed - Transfer to image fixed - Settings ui widget names fixed - Some little bug fixes You need to reset the settings! Delete WiiBaFu's config file or registry entries on windows: Linux: ~/.config/WiiBaFu/wiibafu.conf Windows: HKEY_CURRENT_USER\Software\WiiBaFu\wiibafu Mac OS X: ~/Library/Preferences/com.wiibafu.wiibafu.plist Caution: This is a BETA version! Errors, crashes and data loss not impossible! Use in test environments only, not on productive syste...Minemapper: Minemapper v0.1.3: Added process count and world size calculation progress to the status bar. Added View->'Status Bar' menu item to show/hide the status bar. Status bar is automatically shown when loading a world. Added a prompt, when loading a world, to use or clear cached images.Sexy Select: sexy select v0.4: Changes in v0.4 Added method : elements. This returns all the option elements that are currently added to the select list Added method : selectOption. This method accepts two values, the element to be modified and the selected state. (true/false)Deep Zoom for WPF: First Release: This first release of the Deep Zoom control has the same source code, binaries and demos as the CodeProject article (http://www.codeproject.com/KB/WPF/DeepZoom.aspx).BlogEngine.NET: BlogEngine.NET 2.0 RC: This is a Release Candidate version for BlogEngine.NET 2.0. The most current, stable version of BlogEngine.NET is version 1.6. Find out more about the BlogEngine.NET 2.0 RC here. If you want to extend or modify BlogEngine.NET, you should download the source code. To get started, be sure to check out our installation documentation and the installation screencast. If you are upgrading from a previous version, please take a look at the Upgrading to BlogEngine.NET 2.0 instructions. As this ...NodeXL: Network Overview, Discovery and Exploration for Excel: NodeXL Excel Template, version 1.0.1.156: The NodeXL Excel template displays a network graph using edge and vertex lists stored in an Excel 2007 or Excel 2010 workbook. What's NewThis release adds a feature for aggregating the overall metrics in a folder full of NodeXL workbooks, adds geographical coordinates to the Twitter import features, and fixes a memory-related bug. See the Complete NodeXL Release History for details. Please Note: There is a new option in the setup program to install for "Just Me" or "Everyone." Most people...New ProjectsActiveRecordTest: ActiveRecordTest is a sample project that is really a quick guide for start using Castle ActiveRecord within an ASP.NET web application.BacteriaManage: just test codeplexDS CMS: Diamond Shop - open source project. 1. ASP.NET MVC 3.0 2. Entity Framework 3. Jquery 4. LinqGeneral Media Access WebService: This project is focused on building a general purpose media access webservice based on WCF.JavaEE server for XUNU: C'est le serveur internet du site à ChoupieLearning management system: Learning management system to help teachers on their work.LogWriterReader using Named pipe: LogWriterReader using Named pipeNMix: NMix???EntLib，NHibernate，log4net??????????，????????????????，?????????、?????、????、????、?????????。Nosso Rico Dinheirinho: Financial control system like Microsoft Money, but via web.Post Template: Post Template (for now) is for craigslist posters looking to make their posts more visually appealing. Abstracting the styling and layout details of HTML and CSS, Post Template eliminates the need to know these languages when posting. Post Template is mostly written in C#.SharePoint Silverlight Clock: SharePoint Silverlight ClockSilverlight MVVM wizard using Caliburn Micro: This MVVM style Silverlight 4 wizard shows some Caliburn Micro features, as well as the use of MEF and MVVM style unit testing. The UI and code are based on the code accompanying the "Code Project" article "Creating an Internationalized Wizard in WPF" from dec. 2008.Spider Framework: A ruler-based spider framework developing with C#syx Open Source Project: syx Open Source ProjectTigerCat: TigerCat will support application development as infrastructure and RAD tools.TitleNetSolution: This my team Solution.!Uploadert: UploadertWidget Suite for DotNetNuke: This project is intended to hold a suite of useful widgets to make your skinning easier, and raise the level of interactivity with DotNetNuke website visitors.ZenBridge for Picasa: ZenBridge for Picasa makes it easy for Zenfolio users to upload edited images directly to a chosen Zenfolio gallery. It's developed in C#.NET 4.

Read the article
How to Manage AutoArchive in Outlook 2010

- by Mysticgeek

If you want to keep Outlook 2010 clean and run faster, one method is to set up the AutoArchive feature. Today we show you how to configure and manage the feature in Outlook 2010. Using AutoArchive allows you to manage space in your mailbox or on the email server by moving older items to another location on your hard drive. Enable and Configure Auto Archive In Outlook 2010 Auto Archive is not enabled by default. To turn it on, click on the File tab to access Backstage View, then click on Options. The Outlook Options window opens then click on Advanced then the AutoArchive Settings button. The AutoArchive window opens and you’ll notice everything is grayed out. Check the box next to Run AutoArchive every… Note: If you select the Permanently delete old items option, mails will not be archived. Now you can choose the settings for how you want to manage the AutoArchive feature. Select how often you want it to run, prompt before the feature runs, where to move items, and other actions you want to happen during the process. After you’ve made your selections click OK. Manually Configure Individual Folders For more control over individual folders that are archived, right-click on the folder and click on Properties. Click on the AutoArchive tab and choose the settings you want to change for that folder. For instance you might not want to archive a certain folder or move archived data to a specific folder. If you want to manually archive and backup an item, click on the File tab, Cleanup Tools, then Archive. Click the radio button next to Archive this folder and all subfolders. Select the folder you want to archive. In this example we want to archive this folder to a specific location of its own. The .pst files are saved in your documents folder and if you need to access them at a later time you can. After you’ve setup AutoArchive you can find items in the archived files. In the Navigation Pane expand the Archives folder in the list. You can then view and access your messages. You can also access them by clicking the File tab \ Open then Open Outlook Data File. Then you can browse to the archived file you want to open. Archiving old emails is a good way to help keep a nice clean mailbox, help speed up your Outlook experience, and save space on the email server. The other nice thing is you can configure your email archives and specific folders to meet your email needs. Similar Articles Productive Geek Tips Configure AutoArchive In Outlook 2007Quickly Clean Your Inbox in Outlook 2003/2007Open Different Outlook Features in Separate Windows to Improve ProductivityMake Outlook Faster by Disabling Unnecessary Add-InsCreate an Email Template in Outlook 2003 TouchFreeze Alternative in AutoHotkey The Icy Undertow Desktop Windows Home Server – Backup to LAN The Clear & Clean Desktop Use This Bookmarklet to Easily Get Albums Use AutoHotkey to Assign a Hotkey to a Specific Window Latest Software Reviews Tinyhacker Random Tips CloudBerry Online Backup 1.5 for Windows Home Server Snagit 10 VMware Workstation 7 Acronis Online Backup AceStock, a Tiny Desktop Quote Monitor Gmail Button Addon (Firefox) Hyperwords addon (Firefox) Backup Outlook 2010 Daily Motivator (Firefox) FetchMp3 Can Download Videos & Convert Them to Mp3

Read the article
Microsoft silverlight 5.0 features for developers

- by Jalpesh P. Vadgama

Recently on Silverlight 5.0 firestarter event ScottGu has announced road map for Silverlight 5.0. There will be lots of features that will be there in silverlight 5.0 but here are few glimpses of Silverlight 5.0 Features. Improved Data binding support and Better support for MVVM: One of the greatest strength of Silverlight is its data binding. Microsoft is going to enhanced data binding by providing more ability to debug it. Developer will able to debug the binding expression and other stuff in Siverlight 5.0. Its also going to provide Ancestor Relative source binding which will allow property to bind with container control. MVVM pattern support will also be enhanced. Performance and Speed Enhancement: Now silverlight 5.0 will have support for 64bit browser support. So now you can use that silverlight application on 64 bit platform also. There is no need to take extra care for it.It will also have faster startup time and greater support for hardware acceleration. It will also provide end to end support for hard acceleration features of IE 9. More support for Out Of Browser Application: With Siverlight 4.0 Microsoft has announced new features called out of browser application and it has amazed lots of developer because now possibilities are unlimited with it. Now in silverlight 5.0 Out Of Browser application will have ability to Create Manage child windows just like windows forms or WPF Application. So you can fill power of desktop application with your out of browser application. Testing Support with Visual Studio 2010: Microsoft is going to add automated UI Testing support with Visual Studio 2010 with silverlight 5.0. So now we can test UI of Silverlight much faster. Better Support for RIA Services: RIA Services allows us to create N-tier application with silverlight via creating proxy classes on client and server both side. Now it will more features like complex type support, Custom type support for MVVM(Model View View Model) pattern. WCF Enhancements: There are lots of enhancement with WCF but key enhancement will WSTrust support. Text and Printing Support: Silverlight 5.0 will support vector base graphics. It will also support multicolumn text flow and linked text containers. It will full open type support,Postscript vector enhancement. Improved Power Enhancement: This will prevent screensaver from activating while you are watching videos on silverlight. Silverlight 5.0 is going add that smartness so it can determine while you are going to watch video and while you are not going watch videos. Better support for graphics: Silverlight 5.0 will provide in-depth support for 3D API. Now 3D rendering support is more enhancement in silverlight and 3D graphics can be rendered easily. You can find more details on following links and also don’t forgot to view silverlight firestarter keynot video of scottgu. http://www.silverlight.net/news/events/firestarter-labs/ http://blogs.msdn.com/b/katriend/archive/2010/12/06/silverlight-5-features-firestarter-keynote-and-sessions-resources.aspx http://weblogs.asp.net/scottgu/archive/2010/12/02/announcing-silverlight-5.aspx http://www.silverlight.net/news/events/firestarter/ http://www.microsoft.com/silverlight/future/ Hope this will help you. Stay tuned!!!.

Read the article
GPGPU

WhatGPU obviously stands for Graphics Processing Unit (the silicon powering the display you are using to read this blog post). The extra GP in front of that stands for General Purpose computing.So, altogether GPGPU refers to computing we can perform on GPU for purposes beyond just drawing on the screen. In effect, we can use a GPGPU a bit like we already use a CPU: to perform some calculation (that doesn’t have to have any visual element to it). The attraction is that a GPGPU can be orders of magnitude faster than a CPU.WhyWhen I was at the SuperComputing conference in Portland last November, GPGPUs were all the rage. A quick online search reveals many articles introducing the GPGPU topic. I'll just share 3 here: pcper (ignoring all pages except the first, it is a good consumer perspective), gizmodo (nice take using mostly layman terms) and vizworld (answering the question on "what's the big deal").The GPGPU programming paradigm (from a high level) is simple: in your CPU program you define functions (aka kernels) that take some input, can perform the costly operation and return the output. The kernels are the things that execute on the GPGPU leveraging its power (and hence execute faster than what they could on the CPU) while the host CPU program waits for the results or asynchronously performs other tasks.However, GPGPUs have different characteristics to CPUs which means they are suitable only for certain classes of problem (i.e. data parallel algorithms) and not for others (e.g. algorithms with branching or recursion or other complex flow control). You also pay a high cost for transferring the input data from the CPU to the GPU (and vice versa the results back to the CPU), so the computation itself has to be long enough to justify the overhead transfer costs. If your problem space fits the criteria then you probably want to check out this technology.HowSo where can you get a graphics card to start playing with all this? At the time of writing, the two main vendors ATI (owned by AMD) and NVIDIA are the obvious players in this industry. You can read about GPGPU on this AMD page and also on this NVIDIA page. NVIDIA's website also has a free chapter on the topic from the "GPU Gems" book: A Toolkit for Computation on GPUs.If you followed the links above, then you've already come across some of the choices of programming models that are available today. Essentially, AMD is offering their ATI Stream technology accessible via a language they call Brook+; NVIDIA offers their CUDA platform which is accessible from CUDA C. Choosing either of those locks you into the GPU vendor and hence your code cannot run on systems with cards from the other vendor (e.g. imagine if your CPU code would run on Intel chips but not AMD chips). Having said that, both vendors plan to support a new emerging standard called OpenCL, which theoretically means your kernels can execute on any GPU that supports it. To learn more about all of these there is a website: gpgpu.org. The caveat about that site is that (currently) it completely ignores the Microsoft approach, which I touch on next.On Windows, there is already a cross-GPU-vendor way of programming GPUs and that is the DirectX API. Specifically, on Windows Vista and Windows 7, the DirectX 11 API offers a dedicated subset of the API for GPGPU programming: DirectCompute. You use this API on the CPU side, to set up and execute the kernels that run on the GPU. The kernels are written in a language called HLSL (High Level Shader Language). You can use DirectCompute with HLSL to write a "compute shader", which is the term DirectX uses for what I've been referring to in this post as a "kernel". For a comprehensive collection of links about this (including tutorials, videos and samples) please see my blog post: DirectCompute.Note that there are many efforts to build even higher level languages on top of DirectX that aim to expose GPGPU programming to a wider audience by making it as easy as today's mainstream programming models. I'll mention here just two of those efforts: Accelerator from MSR and Brahma by Ananth. Comments about this post welcome at the original blog.

Read the article
Automating XNA Performance Testing?

- by Grofit

I was wondering what peoples approaches or thoughts were on automating performance testing in XNA. Currently I am looking at only working in 2d, but that poses many areas where performance can be improved with different implementations. An example would be if you had 2 different implementations of spatial partitioning, one may be faster than another but without doing some actual performance testing you wouldn't be able to tell which one for sure (unless you saw the code was blatantly slow in certain parts). You could write a unit test which for a given time frame kept adding/updating/removing entities for both implementations and see how many were made in each timeframe and the higher one would be the faster one (in this given example). Another higher level example would be if you wanted to see how many entities you can have on the screen roughly without going beneath 60fps. The problem with this is to automate it you would need to use the hidden form trick or some other thing to kick off a mock game and purely test which parts you care about and disable everything else. I know that this isnt a simple affair really as even if you can automate the tests, really it is up to a human to interpret if the results are performant enough, but as part of a build step you could have it run these tests and publish the results somewhere for comparison. This way if you go from version 1.1 to 1.2 but have changed a few underlying algorithms you may notice that generally the performance score would have gone up, meaning you have improved your overall performance of the application, and then from 1.2 to 1.3 you may notice that you have then dropped overall performance a bit. So has anyone automated this sort of thing in their projects, and if so how do you measure your performance comparisons at a high level and what frameworks do you use to test? As providing you have written your code so its testable/mockable for most parts you can just use your tests as a mechanism for getting some performance results... === Edit === Just for clarity, I am more interested in the best way to make use of automated tests within XNA to track your performance, not play testing or guessing by manually running your game on a machine. This is completely different to seeing if your game is playable on X hardware, it is more about tracking the change in performance as your game engine/framework changes. As mentioned in one of the comments you could easily test "how many nodes can I insert/remove/update within QuadTreeA within 2 seconds", but you have to physically look at these results every time to see if it has changed, which may be fine and is still better than just relying on playing it to see if you notice any difference between version. However if you were to put an Assert in to notify you of a fail if it goes lower than lets say 5000 in 2 seconds you have a brittle test as it is then contextual to the hardware, not just the implementation. Although that being said these sort of automated tests are only really any use if you are running your tests as some sort of build pipeline i.e: Checkout - Run Unit Tests - Run Integration Tests - Run Performance Tests - Package So then you can easily compare the stats from one build to another on the CI server as a report of some sort, and again this may not mean much to anyone if you are not used to Continuous Integration. The main crux of this question is to see how people manage this between builds and how they find it best to report upon. As I said it can be subjective but as knowledge will be gained from the answers it seems a worthwhile question.

Read the article
Innovative SPARC: Lighting a Fire Under Oracle's New Hardware Business

- by Paulo Folgado

"There's a certain level of things you can do with commercially available parts," says Oracle Executive Vice President Mike Splain. But, he notes, you can do so much more if you design the parts yourself. Mike Splain,EVP, OracleYou can, for example, design cryptographic accelerators into your microprocessors so customers can run their networks fully encrypted if they choose.Of course, it helps if you've already built multiple processing "cores" into those chips so they can handle all that encrypting and decrypting while still getting their other work done.System on a ChipAs the leader of Oracle Microelectronics, Mike knows how implementing clever innovations in silicon can give systems a real competitive advantage.The SPARC microprocessors that his team designed at Sun pioneered the concept of multiple cores several years ago, and the UltraSPARC T2 processor--the industry's first "system on a chip"--packs up to eight cores per chip, each running as many as eight threads at once. That's the most cores and threads of any general-purpose processor. Looking back, Mike points out that the real value of large enterprise-class servers was their ability to run a lot of very large applications in parallel."The beauty of our CMT [chip multi-threading] machines is you can get that same kind of parallel-processing capability at a much lower cost and in a much smaller footprint," he says.The Whole StackWhat has Mike excited these days is that suddenly the opportunity to innovate is much bigger as part of Oracle."In my group, we used to look up the software stack and say, 'We can do any innovation we want, provided the only thing we have to change is what's in the Solaris operating system'--or maybe Java," he says. "If we wanted to change things beyond that, we'd have to go outside the walls of Sun and we'd have to convince the vendors: 'You have to align with us, you have to test with us, you have to build for us, and then you'll reap the benefits.' Now we get access to the entire stack. We can look all the way through the stack and say, 'Okay, what would make the database go faster? What would make the middleware go faster?'"Changing the WorldMike and his microelectronics team also like the fact that Oracle is not just any software company. We're #1 in database, middleware, business intelligence, and more."We're like all the other engineers from Sun; we believe we can change the world, if we can just figure out how to get people to pay attention to us," he says. "Now there's a mechanism at Oracle--much more so than we ever had at Sun."He notes, too, that every innovation in SPARC has involved some combination of hardware and softwareoptimization."Take our cryptography framework, for example. Sure, we can accelerate rapidly, but the Solaris OS has to provide the right set of interfaces that applications can tap into," Mike says. "Same thing with our multicore architecture. We have to have software that can utilize all those threads and run in parallel." His engineers, he points out, have never been interested in producing chips that sell as mere components."Our chips are always designed to go into systems and be combined with various pieces of software," he says. "Our job is to enable the creation of systems."

Read the article
Oracle Data Integration 12c: Simplified, Future-Ready, High-Performance Solutions

- by Thanos Terentes Printzios

In today’s data-driven business environment, organizations need to cost-effectively manage the ever-growing streams of information originating both inside and outside the firewall and address emerging deployment styles like cloud, big data analytics, and real-time replication. Oracle Data Integration delivers pervasive and continuous access to timely and trusted data across heterogeneous systems. Oracle is enhancing its data integration offering announcing the general availability of 12c release for the key data integration products: Oracle Data Integrator 12c and Oracle GoldenGate 12c, delivering Simplified and High-Performance Solutions for Cloud, Big Data Analytics, and Real-Time Replication. The new release delivers extreme performance, increase IT productivity, and simplify deployment, while helping IT organizations to keep pace with new data-oriented technology trends including cloud computing, big data analytics, real-time business intelligence. With the 12c release Oracle becomes the new leader in the data integration and replication technologies as no other vendor offers such a complete set of data integration capabilities for pervasive, continuous access to trusted data across Oracle platforms as well as third-party systems and applications. Oracle Data Integration 12c release addresses data-driven organizations’ critical and evolving data integration requirements under 3 key themes: Future-Ready Solutions : Supporting Current and Emerging Initiatives Extreme Performance : Even higher performance than ever before Fast Time-to-Value : Higher IT Productivity and Simplified Solutions With the new capabilities in Oracle Data Integrator 12c, customers can benefit from: Superior developer productivity, ease of use, and rapid time-to-market with the new flow-based mapping model, reusable mappings, and step-by-step debugger. Increased performance when executing data integration processes due to improved parallelism. Improved productivity and monitoring via tighter integration with Oracle GoldenGate 12c and Oracle Enterprise Manager 12c. Improved interoperability with Oracle Warehouse Builder which enables faster and easier migration to Oracle Data Integrator’s strategic data integration offering. Faster implementation of business analytics through Oracle Data Integrator pre-integrated with Oracle BI Applications’ latest release. Oracle Data Integrator also integrates simply and easily with Oracle Business Analytics tools, including OBI-EE and Oracle Hyperion. Support for loading and transforming big and fast data, enabled by integration with big data technologies: Hadoop, Hive, HDFS, and Oracle Big Data Appliance. Only Oracle GoldenGate provides the best-of-breed real-time replication of data in heterogeneous data environments. With the new capabilities in Oracle GoldenGate 12c, customers can benefit from: Simplified setup and management of Oracle GoldenGate 12c when using multiple database delivery processes via a new Coordinated Delivery feature for non-Oracle databases. Expanded heterogeneity through added support for the latest versions of major databases such as Sybase ASE v 15.7, MySQL NDB Clusters 7.2, and MySQL 5.6., as well as integration with Oracle Coherence. Enhanced high availability and data protection via integration with Oracle Data Guard and Fast-Start Failover integration. Enhanced security for credentials and encryption keys using Oracle Wallet. Real-time replication for databases hosted on public cloud environments supported by third-party clouds. Tight integration between Oracle Data Integrator 12c and Oracle GoldenGate 12c and other Oracle technologies, such as Oracle Database 12c and Oracle Applications, provides a number of benefits for organizations: Tight integration between Oracle Data Integrator 12c and Oracle GoldenGate 12c enables developers to leverage Oracle GoldenGate’s low overhead, real-time change data capture completely within the Oracle Data Integrator Studio without additional training. Integration with Oracle Database 12c provides a strong foundation for seamless private cloud deployments. Delivers real-time data for reporting, zero downtime migration, and improved performance and availability for Oracle Applications, such as Oracle E-Business Suite and ATG Web Commerce . Oracle’s data integration offering is optimized for Oracle Engineered Systems and is an integral part of Oracle’s fast data, real-time analytics strategy on Oracle Exadata Database Machine and Oracle Exalytics In-Memory Machine. Oracle Data Integrator 12c and Oracle GoldenGate 12c differentiate the new offering on data integration with these many new features. This is just a quick glimpse into Oracle Data Integrator 12c and Oracle GoldenGate 12c. Find out much more about the new release in the video webcast "Introducing 12c for Oracle Data Integration", where customer and partner speakers, including SolarWorld, BT, Rittman Mead will join us in launching the new release. Resource Kits Meet Oracle Data Integration 12c Discover what's new with Oracle Goldengate 12c Oracle EMEA DIS (Data Integration Solutions) Partner Community is available for all your questions, while additional partner focused webcasts will be made available through our blog here, so stay connected. For any questions please contact us at partner.imc-AT-beehiveonline.oracle-DOT-com Stay Connected Oracle Newsletters

Read the article
Fast Data: Go Big. Go Fast.

- by Dain C. Hansen

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 For those of you who may have missed it, today’s second full day of Oracle OpenWorld 2012 started with a rumpus. Joe Tucci, from EMC outlined the human face of big data with real examples of how big data is transforming our world. And no not the usual tried-and-true weblog examples, but real stories about taxi cab drivers in Singapore using big data to better optimize their routes as well as folks just trying to get a better hair cut. Next we heard from Thomas Kurian who talked at length about the important platform characteristics of Oracle’s Cloud and more specifically Oracle’s expanded Cloud Services portfolio. Especially interesting to our integration customers are the messaging support for Oracle’s Cloud applications. What this means is that now Oracle’s Cloud applications have a lightweight integration fabric that on-premise applications can communicate to it via REST-APIs using Oracle SOA Suite. It’s an important element to our strategy at Oracle that supports this idea that whether your requirements are for private or public, Oracle has a solution in the Cloud for all of your applications and we give you more deployment choice than any vendor. If this wasn’t enough to get the juices flowing, later that morning we heard from Hasan Rizvi who outlined in his Fusion Middleware session the four most important enterprise imperatives: Social, Mobile, Cloud, and a brand new one: Fast Data. Today, Rizvi made an important step in the definition of this term to explain that he believes it’s a convergence of four essential technology elements: Event Processing for event filtering, business rules – with Oracle Event Processing Data Transformation and Loading - with Oracle Data Integrator Real-time replication and integration – with Oracle GoldenGate Analytics and data discovery – with Oracle Business Intelligence Each of these four elements can be considered (and architect-ed) together on a single integrated platform that can help customers integrate any type of data (structured, semi-structured) leveraging new styles of big data technologies (MapReduce, HDFS, Hive, NoSQL) to process more volume and variety of data at a faster velocity with greater results. Fast data processing (and especially real-time) has always been our credo at Oracle with each one of these products in Fusion Middleware. For example, Oracle GoldenGate continues to be made even faster with the recent 11g R2 Release of Oracle GoldenGate which gives us some even greater optimization to Oracle Database with Integrated Capture, as well as some new heterogeneity capabilities. With Oracle Data Integrator with Big Data Connectors, we’re seeing much improved performance by running MapReduce transformations natively on Hadoop systems. And with Oracle Event Processing we’re seeing some remarkable performance with customers like NTT Docomo. Check out their upcoming session at Oracle OpenWorld on Wednesday to hear more how this customer is using Event processing and Big Data together. If you missed any of these sessions and keynotes, not to worry. There's on-demand versions available on the Oracle OpenWorld website. You can also checkout our upcoming webcast where we will outline some of these new breakthroughs in Data Integration technologies for Big Data, Cloud, and Real-time in more details. /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";}

Read the article
How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article
Silverlight Cream for February 17, 2011 -- #1048

- by Dave Campbell

In this Issue: Oren Gal, Andrea Boschin(-2-), Kevin Hoffman, Rudi Grobler(-2-, -3-), Michael Crump, Yochay Kiriaty, Peter Kuhn, Loek van den Ouweland, Jeremy Likness, Jesse Liberty, and WindowsPhoneGeek. Above the Fold: Silverlight: "Multiple page printing in Silverlight4 - Part 2 - preview before printing" Oren Gal WP7: "Windows Phone 7 Tombstoning with MVVM and Sterling" Jeremy Likness XNA: "XNA for Silverlight developers: Part 4 - Animation (frame-based)" Peter Kuhn From SilverlightCream.com: Multiple page printing in Silverlight4 - Part 2 - preview before printing Oren Gal has part 2 of his Printing with Silverlight 4 series up, and this time he's putting up a preview... how cool is that? Inject ApplicationServices with MEF reloaded: supporting recomposition Andrea Boschin revisited his Inject ApplicationServices with MEF post because of feedback, and took it from the realm of an interesting example to a useful solution. Windows Phone 7 - Part #5: Panorama and Pivot controls Andrea Boschin also has part 5 of his WP7 series up at SilverlightShow... want a good demo of both the panorama and the pivot controls... here it is all in one tutorial WP7 for iPhone and Android Developers - Introduction to C# This should be good.. a 12-part series on SilverlightShow by Kevin Hoffman on porting your iPhone/Android app to WP7... this first part an intro to C# Balls of Steel Rudi Grobler discusses the upcoming (?) release of 'Duke Nukem Forever', and has a 'soundboard' for WP7 to celebrate the event... get your Duke Nukem on with these sounds! Moonlight 4 (Preview) is here Rudi Grobler also has a post up about the release of Moonlight by Novel for Silverlight 4!... explanation and links on his post. WP7 Podcasts Rudi Grobler highlights two WP7 Podcasts that are putting out good material... check them out if you haven't already. Having Fun with Coding4Fun’s Windows Phone 7 Controls Michael Crump takes a look at his WP7 app and uses the Coding4Fun project toolset while doing so... getting the tools, setting them up, and consuming them. Windows Phone Silverlight Application Faster Load Time Yochay Kiriaty has a good long discussion up about how to get faster load time out of your WP7 apps... good useful external links throughout. XNA for Silverlight developers: Part 4 - Animation (frame-based) Peter Kuhn's part 4 of his XNA for Silverlight devs is up at Silverlightshow and is a great tutorial on frame-based animation. Windows Phone SoundEffect clipping Loek van den Ouweland has some good information about soudn clips on WP7... the solutions aren't always code solutions.... good to know info. Windows Phone 7 Tombstoning with MVVM and Sterling Jeremy Likness is discussing Tombstoning via MVVM and Sterling... read on how Sterling gives you a leg up on the Tombstone express. Video: Reactive Phone Programming For Windows Phone 7 Fitting in nicely with his podcast on Reactive Programming, Jesse Liberty releases a video on Reactive Programming for WP7. Talking about Data Binding in WP7 | Coding4fun TextBoxBinding helper in depth WindowsPhoneGeek's latest post walks through WP7 databinding in detail with lots of good external links, then follows up with a discussion of the Coding4Fun Binding Helpers Stay in the 'Light! Twitter SilverlightNews | Twitter WynApse | WynApse.com | Tagged Posts | SilverlightCream Join me @ SilverlightCream | Phoenix Silverlight User Group Technorati Tags: Silverlight Silverlight 3 Silverlight 4 Windows Phone MIX10

Read the article
Bad Spot to Be In: Playing Catch-up with Mobile Advertising

- by Mike Stiles

You probably noticed, there’s a mass migration going on from online desktop/laptop usage to smartphone/tablet usage. It’s an indicator of how we live our lives in the modern world: always on the go, with no intention of being disconnected while out there. Consequently, paid as it relates to mobile advertising is taking the social spotlight. eMarketer estimated that in 2013, US adults would spend about 2 hours, 21 minutes a day on mobile, not counting talking time. More people in the world own smartphones than own toothbrushes (bad news I suppose if you’re marketing toothpaste). They’re using those mobile devices to access social networks, consuming at least 17% of their mobile time on them. Frankly, you don’t need a deep dive into mobile usage stats to know what’s going on. Just look around you in any store, venue or coffee shop. It’s really obvious…our mobile devices are now where we “are,” so that’s where marketers can increasingly reach us. And it’s a smart place for them to do just that. Mobile devices can be viewed more and more as shopping facilitators. Usually when someone is on mobile, they are not in passive research mode. They are likely standing near a store or in front of a product, using their mobile to seek reassurance that buying that product is the right move. They are the hottest of hot prospects. Consider that 4 out of 5 consumers use smartphones to shop, 52% of Americans use mobile devices for in-store for research, 70% of mobile searches lead to online action inside of an hour, and people that find you on mobile convert at almost 3x the rate as those that find you on desktop or laptop. But what are marketers doing? Enter statistics from Mary Meeker’s latest State of the Internet report. Common sense says you buy advertising where people are spending their eyeball time, right? But while mobile is 20% of media use and rising, the ad spend there is 4%. Conversely, while print usage is at 5% and falling, ad spend there is 19%. We all love nostalgia, but come on. There are reasons marketing dollar migration to mobile has not matched user migration, including the availability of mobile ad products and the ability to measure user response to mobile ads. But interesting things are happening now. First came Facebook’s mobile ad, which let app developers pay to get potential downloads. Then their mobile ad network was announced at F8, allowing marketers to target users across non-Facebook apps while leveraging the wealth of diverse data Facebook has on those users, a big deal since Nielsen has pointed out mobile apps make up 89% of the media time spent on mobile. Twitter has a similar play in motion with their MoPub acquisition. And now mobile deeplinks have arrived, which can take users straight to sub-pages of mobile apps for a faster, more direct shopper/researcher user experience. The sooner the gratification, the smoother and faster the conversion. To be clear, growth in mobile ad spending is well underway. After posting $13.1 billion in 2013, Gartner expects global mobile ad spending to reach $18 billion this year, then go to $41.9 billion by 2017. Cheap smartphones and data plans are spreading worldwide, further fueling the shift to mobile. Mobile usage in India alone should grow 400% by 2018. And, of course, there’s the famous statistic that mobile should overtake desktop Internet usage this year. How can we as marketers mess up this opportunity? Two ways. We could position ourselves in perpetual “catch-up” mode and keep spending ad dollars where the public used to be. And we could annoy mobile users with horrid old-school marketing practices. Two-thirds of users told Forrester they think interruptive in-app ads are more annoying than TV ads. Make sure your brand’s social marketing technology platform is delivering a crystal clear picture of your social connections so the mobile touch point is highly relevant, mobile optimized, and delivering real value and satisfying experiences. Otherwise, all we’ve done is find a new way to be unwanted. @mikestiles @oraclesocialPhoto: Kate Mallatratt, freeimages.com

Read the article
What's New in Oracle's EPM System?

- by jmorourke

Oracle’s EPM System R11.1.2.2 is now generally available to customers and partners on the download center. Although the release number doesn’t sound significant, this is a major release of Oracle’s Hyperion EPM Suite with new modules as well as significant enhancements across the suite. This release was announced back on April 4th as part of Oracle’s Business Analytics Strategy launch, so analytics is a key aspect of the release. But the three biggest pieces of news in this release are Oracle Hyperion Planning support for the Exalytics In-Memory Machine, the new Project Financial Planning Application and the new Account Reconciliations Manager module. The Oracle Exalytics In-Memory Machine was announced back in October 2011, at Oracle OpenWorld. It’s the latest installment from Oracle in a line of engineered systems that combine Oracle Sun hardware, with Oracle database and application technologies – in solutions that are designed to provide high scalability and performance for specific tasks. Exalytics is the first engineered system specifically designed for high performance analytics. Running in-memory versions of Oracle Essbase, as well as the Oracle TimesTen database and Oracle BI tools, Exalytics provides speed of thought response times for complex analytic processes with advanced visualizations. Early adopter customers have achieved 5X to 100X faster interactivity and 6X to 10X faster planning cycles. Hyperion Planning running with Oracle Exalytics will support enterprise-wide planning, budgeting and forecasting with more detailed data, with hundreds to thousands of users across an organization getting speed of thought performance. The new Hyperion Project Financial Planning application delivered with EPM 11.1.2.2 is also great news for Oracle customers. This application follows on the heels of other special-purpose planning applications that Oracle has delivered for Workforce and Capital Asset planning. It allows Project Managers to identify project-related expenses and revenues, plan and propose new projects, and track results over time. Finance Managers can evaluate and compare different projects, manage the funding process, monitor and report the actual financial results and impacts of projects and project portfolios. This new application is applicable to capital projects, contract projects and indirect projects like IT and HR projects across all industries. This application is a great complement to existing Project Management applications, and helps bridge the gap between these applications, and the financial planning and budgeting process. Account reconciliations has to be one of the biggest bottlenecks and risks in the financial close and reporting process, and many organizations rely on spreadsheets and manual processes to perform this critical process. To help address this problem, Oracle developed an Account Reconciliation Manager module that is being delivered as part of Oracle Hyperion Financial Close Management. This module helps automate and streamline account reconciliations and eliminates the chances for errors, omissions and fraud. But unlike standalone account reconciliation packages, it’s integrated with the rest of the Oracle Hyperion Financial Close suite, and can integrate balances from any source system. This can help alleviate a major bottleneck in the financial close process, increase accuracy and reduce risk, and can complement existing investments in Hyperion Financial Management, as well as Oracle and non-Oracle transaction processing systems. Other enhancements in this release include an enhanced Web 2.0 interface for Hyperion Planning and Hyperion Financial Management (HFM), configurable dimensionality in HFM, new Predictive Planning feature in Hyperion Planning, new Detailed Profitability feature in Hyperion Profitability and Cost Management, new Smart View interface for Hyperion Strategic Finance, and integration of the Hyperion applications with JD Edwards Financials. For more information about Oracle EPM System R11.1.2.2 check out the links below: Press Release: http://www.oracle.com/us/corporate/press/1575775 Product Information on O.com: http://www.oracle.com/us/solutions/business-analytics/overview/index.html Product Information on OTN: http://www.oracle.com/technetwork/middleware/epm/downloads/index.html Webcast Replay: http://www.oracle.com/us/go/index.html?Src=7317510&Act=65&pcode=WWMK11054701MPP046 Please contact me if you have any questions or need additional information – [email protected]

Read the article
The theory of evolution applied to software

- by Michel Grootjans

I recently realized the many parallels you can draw between the theory of evolution and evolving software. Evolution is not the proverbial million monkeys typing on a million typewriters, where one of them comes up with the complete works of Shakespeare. We would have noticed by now, since the proverbial monkeys are now blogging on the Internet ;-) One of the main ideas of the theory of evolution is the balance between random mutations and natural selection. Random mutations happen all the time: millions of mutations over millions of years. Most of them are totally useless. Some of them are beneficial to the evolved species. Natural selection favors the beneficially mutated species. Less beneficial mutations die off. The mutated rabbit doesn't have to be faster than the fox. It just has to be faster than the other rabbits. Theory of evolution Evolving software Random mutations happen all the time. Most of these mutations are so bad, the new species dies off, or cannot reproduce. Developers write new code all the time. New ideas come up during the act of writing software. The really bad ones don't get past the stage of idea. The bad ones don't get committed to source control. Natural selection favors the beneficial mutated species Good ideas and new code gets discussed in group during informal peer review. Less than good code gets refactored. Enhanced code makes it more readable, maintainable... A good set of traits makes the species superior to others. It becomes widespread A good design tends to make it easier to add new features, easier to understand the current implementations, easier to optimize for performance...thus superior. The best designs get carried over from project to project. They appear in blogs, articles and books about principles, patterns and practices. Of course the act of writing software is deliberate. This can hardly be called random mutations. Though it sometimes might seem that code evolves through a will of its own ;-) Does this mean that evolving software (evolution) is better than a big design up front (creationism)? Not necessarily. It's a false idea to think that a project starts from scratch and everything evolves from there. Everyone carries his experience of what works and what doesn't. Up front design is necessary, but is best kept simple and minimal, just enough to get you started. Let the good experiences and ideas help to drive the process, whether they come from you or from others, from past experience or from the most junior developer on your team. Once again, balance is the keyword. Balance design up front with evolution on a daily basis. How do you know what balance is right? Through your own experience of what worked and what didn't (here's evolution again). Notes: The evolution of software can quickly degenerate without discipline. TDD is a discipline that leaves little to chance on that part. Write your test to describe the new behavior. Write just enough code to make it behave as specified. Refactor to evolve the code to a higher standard. The responsibility of good design rests continuously on each developers' shoulders. Promiscuous pair programming helps quickly spreading the design to the whole team.

Read the article
Windows Azure Recipe: Social Web / Big Media

- by Clint Edmonson

With the rise of social media there’s been an explosion of special interest media web sites on the web. From athletics to board games to funny animal behaviors, you can bet there’s a group of people somewhere on the web talking about it. Social media sites allow us to interact, share experiences, and bond with like minded enthusiasts around the globe. And through the power of software, we can follow trends in these unique domains in real time. Drivers Reach Scalability Media hosting Global distribution Solution Here’s a sketch of how a social media application might be built out on Windows Azure: Ingredients Traffic Manager (optional) – can be used to provide hosting and load balancing across different instances and/or data centers. Perfect if the solution needs to be delivered to different cultures or regions around the world. Access Control – this service is essential to managing user identity. It’s backed by a full blown implementation of Active Directory and allows the definition and management of users, groups, and roles. A pre-built ASP.NET membership provider is included in the training kit to leverage this capability but it’s also flexible enough to be combined with external Identity providers including Windows LiveID, Google, Yahoo!, and Facebook. The provider model has extensibility points to hook into other identity providers as well. Web Role – hosts the core of the web application and presents a central social hub users. Database – used to store core operational, functional, and workflow data for the solution’s web services. Caching (optional) – as a web site traffic grows caching can be leveraged to keep frequently used read-only, user specific, and application resource data in a high-speed distributed in-memory for faster response times and ultimately higher scalability without spinning up more web and worker roles. It includes a token based security model that works alongside the Access Control service. Tables (optional) – for semi-structured data streams that don’t need relational integrity such as conversations, comments, or activity streams, tables provide a faster and more flexible way to store this kind of historical data. Blobs (optional) – users may be creating or uploading large volumes of heterogeneous data such as documents or rich media. Blob storage provides a scalable, resilient way to store terabytes of user data. The storage facilities can also integrate with the Access Control service to ensure users’ data is delivered securely. Content Delivery Network (CDN) (optional) – for sites that service users around the globe, the CDN is an extension to blob storage that, when enabled, will automatically cache frequently accessed blobs and static site content at edge data centers around the world. The data can be delivered statically or streamed in the case of rich media content. Training These links point to online Windows Azure training labs and resources where you can learn more about the individual ingredients described above. (Note: The entire Windows Azure Training Kit can also be downloaded for offline use.) Windows Azure (16 labs) Windows Azure is an internet-scale cloud computing and services platform hosted in Microsoft data centers, which provides an operating system and a set of developer services which can be used individually or together. It gives developers the choice to build web applications; applications running on connected devices, PCs, or servers; or hybrid solutions offering the best of both worlds. New or enhanced applications can be built using existing skills with the Visual Studio development environment and the .NET Framework. With its standards-based and interoperable approach, the services platform supports multiple internet protocols, including HTTP, REST, SOAP, and plain XML SQL Azure (7 labs) Microsoft SQL Azure delivers on the Microsoft Data Platform vision of extending the SQL Server capabilities to the cloud as web-based services, enabling you to store structured, semi-structured, and unstructured data. Windows Azure Services (9 labs) As applications collaborate across organizational boundaries, ensuring secure transactions across disparate security domains is crucial but difficult to implement. Windows Azure Services provides hosted authentication and access control using powerful, secure, standards-based infrastructure. See my Windows Azure Resource Guide for more guidance on how to get started, including links web portals, training kits, samples, and blogs related to Windows Azure.

Read the article
HOWTO Turn off SPARC T4 or Intel AES-NI crypto acceleration.

- by darrenm

Since we released hardware crypto acceleration for SPARC T4 and Intel AES-NI support we have had a common question come up: 'How do I test without the hardware crypto acceleration?'. Initially this came up just for development use so developers can do unit testing on a machine that has hardware offload but still cover the code paths for a machine that doesn't (our integration and release testing would run on all supported types of hardware anyway). I've also seen it asked in a customer context too so that we can show that there is a performance gain from the hardware crypto acceleration, (not just the fact that SPARC T4 much faster performing processor than T3) and measure what it is for their application. With SPARC T2/T3 we could easily disable the hardware crypto offload by running 'cryptoadm disable provider=n2cp/0'. We can't do that with SPARC T4 or with Intel AES-NI because in both of those classes of processor the encryption doesn't require a device driver instead it is unprivileged user land callable instructions. Turns out there is away to do this by using features of the Solaris runtime loader (ld.so.1). First I need to expose a little bit of implementation detail about how the Solaris Cryptographic Framework is implemented in Solaris 11. One of the new Solaris 11 features of the linker/loader is the ability to have a single ELF object that has multiple different implementations of the same functions that are selected at runtime based on the capabilities of the machine. The alternate to this is having the application coded to call getisax() and make the choice itself. We use this functionality of the linker/loader when we build the userland libraries for the Solaris Cryptographic Framework (specifically libmd.so, and the unfortunately misnamed due to historical reasons libsoftcrypto.so) The Solaris linker/loader allows control of a lot of its functionality via environment variables, we can use that to control the version of the cryptographic functions we run. To do this we simply export the LD_HWCAP environment variable with values that tell ld.so.1 to not select the HWCAP section matching certain features even if isainfo says they are present. For SPARC T4 that would be: export LD_HWCAP="-aes -des -md5 -sha256 -sha512 -mont -mpul" and for Intel systems with AES-NI support: export LD_HWCAP="-aes" This will work for consumers of the Solaris Cryptographic Framework that use the Solaris PKCS#11 libraries or use libmd.so interfaces directly. It also works for the Oracle DB and Java JCE. However does not work for the default enabled OpenSSL "t4" or "aes-ni" engines (unfortunately) because they do explicit calls to getisax() themselves rather than using multiple ELF cap sections. However we can still use OpenSSL to demonstrate this by explicitly selecting "pkcs11" engine using only a single process and thread. $ openssl speed -engine pkcs11 -evp aes-128-cbc ... type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 54170.81k 187416.00k 489725.70k 805445.63k 1018880.00k $ LD_HWCAP="-aes" openssl speed -engine pkcs11 -evp aes-128-cbc ... type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 29376.37k 58328.13k 79031.55k 86738.26k 89191.77k We can clearly see the difference this makes in the case where AES offload to the SPARC T4 was disabled. The "t4" engine is faster than the pkcs11 one because there is less overhead (again on a SPARC T4-1 using only a single process/thread - using -multi you will get even bigger numbers). $ openssl speed -evp aes-128-cbc ... type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 85526.61k 89298.84k 91970.30k 92662.78k 92842.67k Yet another cool feature of the Solaris linker/loader, thanks Rod and Ali. Note these above openssl speed output is not intended to show the actual performance of any particular benchmark just that there is a significant improvement from using hardware acceleration on SPARC T4. For cryptographic performance benchmarks see the http://blogs.oracle.com/BestPerf/ postings.

Read the article

Search Results

Search found 4580 results on 184 pages for 'faster'.

Page 45/184 | < Previous Page | 41 42 43 44 45 46 47 48 49 50 51 52 | Next Page >

- by James

- by user1811864

- by jbcurtin

- by rjnagle

- by pancake

- by MusiGenesis

- by cripox

- by Paul White NZ

- by Phil Factor

- by Mysticgeek

- by Jalpesh P. Vadgama

- by Grofit

- by Paulo Folgado

- by Thanos Terentes Printzios

- by Dain C. Hansen

- by rchrd

- by Dave Campbell

- by Mike Stiles

- by jmorourke

- by Michel Grootjans

- by Clint Edmonson

- by darrenm

< Previous Page | 41 42 43 44 45 46 47 48 49 50 51 52 | Next Page >