Search Results

Search found 1058 results on 43 pages for 'compute'.

Page 42/43 | < Previous Page | 38 39 40 41 42 43  | Next Page >

  • When is a SQL function not a function?

    - by Rob Farley
    Should SQL Server even have functions? (Oh yeah – this is a T-SQL Tuesday post, hosted this month by Brad Schulz) Functions serve an important part of programming, in almost any language. A function is a piece of code that is designed to return something, as opposed to a piece of code which isn’t designed to return anything (which is known as a procedure). SQL Server is no different. You can call stored procedures, even from within other stored procedures, and you can call functions and use these in other queries. Stored procedures might query something, and therefore ‘return data’, but a function in SQL is considered to have the type of the thing returned, and can be used accordingly in queries. Consider the internal GETDATE() function. SELECT GETDATE(), SomeDatetimeColumn FROM dbo.SomeTable; There’s no logical difference between the field that is being returned by the function and the field that’s being returned by the table column. Both are the datetime field – if you didn’t have inside knowledge, you wouldn’t necessarily be able to tell which was which. And so as developers, we find ourselves wanting to create functions that return all kinds of things – functions which look up values based on codes, functions which do string manipulation, and so on. But it’s rubbish. Ok, it’s not all rubbish, but it mostly is. And this isn’t even considering the SARGability impact. It’s far more significant than that. (When I say the SARGability aspect, I mean “because you’re unlikely to have an index on the result of some function that’s applied to a column, so try to invert the function and query the column in an unchanged manner”) I’m going to consider the three main types of user-defined functions in SQL Server: Scalar Inline Table-Valued Multi-statement Table-Valued I could also look at user-defined CLR functions, including aggregate functions, but not today. I figure that most people don’t tend to get around to doing CLR functions, and I’m going to focus on the T-SQL-based user-defined functions. Most people split these types of function up into two types. So do I. Except that most people pick them based on ‘scalar or table-valued’. I’d rather go with ‘inline or not’. If it’s not inline, it’s rubbish. It really is. Let’s start by considering the two kinds of table-valued function, and compare them. These functions are going to return the sales for a particular salesperson in a particular year, from the AdventureWorks database. CREATE FUNCTION dbo.FetchSales_inline(@salespersonid int, @orderyear int) RETURNS TABLE AS  RETURN (     SELECT e.LoginID as EmployeeLogin, o.OrderDate, o.SalesOrderID     FROM Sales.SalesOrderHeader AS o     LEFT JOIN HumanResources.Employee AS e     ON e.EmployeeID = o.SalesPersonID     WHERE o.SalesPersonID = @salespersonid     AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101')     AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101') ) ; GO CREATE FUNCTION dbo.FetchSales_multi(@salespersonid int, @orderyear int) RETURNS @results TABLE (     EmployeeLogin nvarchar(512),     OrderDate datetime,     SalesOrderID int     ) AS BEGIN     INSERT @results (EmployeeLogin, OrderDate, SalesOrderID)     SELECT e.LoginID, o.OrderDate, o.SalesOrderID     FROM Sales.SalesOrderHeader AS o     LEFT JOIN HumanResources.Employee AS e     ON e.EmployeeID = o.SalesPersonID     WHERE o.SalesPersonID = @salespersonid     AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101')     AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101')     ;     RETURN END ; GO You’ll notice that I’m being nice and responsible with the use of the DATEADD function, so that I have SARGability on the OrderDate filter. Regular readers will be hoping I’ll show what’s going on in the execution plans here. Here I’ve run two SELECT * queries with the “Show Actual Execution Plan” option turned on. Notice that the ‘Query cost’ of the multi-statement version is just 2% of the ‘Batch cost’. But also notice there’s trickery going on. And it’s nothing to do with that extra index that I have on the OrderDate column. Trickery. Look at it – clearly, the first plan is showing us what’s going on inside the function, but the second one isn’t. The second one is blindly running the function, and then scanning the results. There’s a Sequence operator which is calling the TVF operator, and then calling a Table Scan to get the results of that function for the SELECT operator. But surely it still has to do all the work that the first one is doing... To see what’s actually going on, let’s look at the Estimated plan. Now, we see the same plans (almost) that we saw in the Actuals, but we have an extra one – the one that was used for the TVF. Here’s where we see the inner workings of it. You’ll probably recognise the right-hand side of the TVF’s plan as looking very similar to the first plan – but it’s now being called by a stack of other operators, including an INSERT statement to be able to populate the table variable that the multi-statement TVF requires. And the cost of the TVF is 57% of the batch! But it gets worse. Let’s consider what happens if we don’t need all the columns. We’ll leave out the EmployeeLogin column. Here, we see that the inline function call has been simplified down. It doesn’t need the Employee table. The join is redundant and has been eliminated from the plan, making it even cheaper. But the multi-statement plan runs the whole thing as before, only removing the extra column when the Table Scan is performed. A multi-statement function is a lot more powerful than an inline one. An inline function can only be the result of a single sub-query. It’s essentially the same as a parameterised view, because views demonstrate this same behaviour of extracting the definition of the view and using it in the outer query. A multi-statement function is clearly more powerful because it can contain far more complex logic. But a multi-statement function isn’t really a function at all. It’s a stored procedure. It’s wrapped up like a function, but behaves like a stored procedure. It would be completely unreasonable to expect that a stored procedure could be simplified down to recognise that not all the columns might be needed, but yet this is part of the pain associated with this procedural function situation. The biggest clue that a multi-statement function is more like a stored procedure than a function is the “BEGIN” and “END” statements that surround the code. If you try to create a multi-statement function without these statements, you’ll get an error – they are very much required. When I used to present on this kind of thing, I even used to call it “The Dangers of BEGIN and END”, and yes, I’ve written about this type of thing before in a similarly-named post over at my old blog. Now how about scalar functions... Suppose we wanted a scalar function to return the count of these. CREATE FUNCTION dbo.FetchSales_scalar(@salespersonid int, @orderyear int) RETURNS int AS BEGIN     RETURN (         SELECT COUNT(*)         FROM Sales.SalesOrderHeader AS o         LEFT JOIN HumanResources.Employee AS e         ON e.EmployeeID = o.SalesPersonID         WHERE o.SalesPersonID = @salespersonid         AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101')         AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101')     ); END ; GO Notice the evil words? They’re required. Try to remove them, you just get an error. That’s right – any scalar function is procedural, despite the fact that you wrap up a sub-query inside that RETURN statement. It’s as ugly as anything. Hopefully this will change in future versions. Let’s have a look at how this is reflected in an execution plan. Here’s a query, its Actual plan, and its Estimated plan: SELECT e.LoginID, y.year, dbo.FetchSales_scalar(p.SalesPersonID, y.year) AS NumSales FROM (VALUES (2001),(2002),(2003),(2004)) AS y (year) CROSS JOIN Sales.SalesPerson AS p LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = p.SalesPersonID; We see here that the cost of the scalar function is about twice that of the outer query. Nicely, the query optimizer has worked out that it doesn’t need the Employee table, but that’s a bit of a red herring here. There’s actually something way more significant going on. If I look at the properties of that UDF operator, it tells me that the Estimated Subtree Cost is 0.337999. If I just run the query SELECT dbo.FetchSales_scalar(281,2003); we see that the UDF cost is still unchanged. You see, this 0.0337999 is the cost of running the scalar function ONCE. But when we ran that query with the CROSS JOIN in it, we returned quite a few rows. 68 in fact. Could’ve been a lot more, if we’d had more salespeople or more years. And so we come to the biggest problem. This procedure (I don’t want to call it a function) is getting called 68 times – each one between twice as expensive as the outer query. And because it’s calling it in a separate context, there is even more overhead that I haven’t considered here. The cheek of it, to say that the Compute Scalar operator here costs 0%! I know a number of IT projects that could’ve used that kind of costing method, but that’s another story that I’m not going to go into here. Let’s look at a better way. Suppose our scalar function had been implemented as an inline one. Then it could have been expanded out like a sub-query. It could’ve run something like this: SELECT e.LoginID, y.year, (SELECT COUNT(*)     FROM Sales.SalesOrderHeader AS o     LEFT JOIN HumanResources.Employee AS e     ON e.EmployeeID = o.SalesPersonID     WHERE o.SalesPersonID = p.SalesPersonID     AND o.OrderDate >= DATEADD(year,y.year-2000,'20000101')     AND o.OrderDate < DATEADD(year,y.year-2000+1,'20000101')     ) AS NumSales FROM (VALUES (2001),(2002),(2003),(2004)) AS y (year) CROSS JOIN Sales.SalesPerson AS p LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = p.SalesPersonID; Don’t worry too much about the Scan of the SalesOrderHeader underneath a Nested Loop. If you remember from plenty of other posts on the matter, execution plans don’t push the data through. That Scan only runs once. The Index Spool sucks the data out of it and populates a structure that is used to feed the Stream Aggregate. The Index Spool operator gets called 68 times, but the Scan only once (the Number of Executions property demonstrates this). Here, the Query Optimizer has a full picture of what’s being asked, and can make the appropriate decision about how it accesses the data. It can simplify it down properly. To get this kind of behaviour from a function, we need it to be inline. But without inline scalar functions, we need to make our function be table-valued. Luckily, that’s ok. CREATE FUNCTION dbo.FetchSales_inline2(@salespersonid int, @orderyear int) RETURNS table AS RETURN (SELECT COUNT(*) as NumSales     FROM Sales.SalesOrderHeader AS o     LEFT JOIN HumanResources.Employee AS e     ON e.EmployeeID = o.SalesPersonID     WHERE o.SalesPersonID = @salespersonid     AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101')     AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101') ); GO But we can’t use this as a scalar. Instead, we need to use it with the APPLY operator. SELECT e.LoginID, y.year, n.NumSales FROM (VALUES (2001),(2002),(2003),(2004)) AS y (year) CROSS JOIN Sales.SalesPerson AS p LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = p.SalesPersonID OUTER APPLY dbo.FetchSales_inline2(p.SalesPersonID, y.year) AS n; And now, we get the plan that we want for this query. All we’ve done is tell the function that it’s returning a table instead of a single value, and removed the BEGIN and END statements. We’ve had to name the column being returned, but what we’ve gained is an actual inline simplifiable function. And if we wanted it to return multiple columns, it could do that too. I really consider this function to be superior to the scalar function in every way. It does need to be handled differently in the outer query, but in many ways it’s a more elegant method there too. The function calls can be put amongst the FROM clause, where they can then be used in the WHERE or GROUP BY clauses without fear of calling the function multiple times (another horrible side effect of functions). So please. If you see BEGIN and END in a function, remember it’s not really a function, it’s a procedure. And then fix it. @rob_farley

    Read the article

  • CodePlex Daily Summary for Tuesday, April 10, 2012

    CodePlex Daily Summary for Tuesday, April 10, 2012Popular ReleasesSCCM Client Actions Tool: SCCM Client Actions Tool v1.12: SCCM Client Actions Tool v1.12 is the latest version. It comes with following changes since last version: Improved WMI date conversion to be aware of timezone differences and DST. Fixed new version check. The tool is downloadable as a ZIP file that contains four files: ClientActionsTool.hta – The tool itself. Cmdkey.exe – command line tool for managing cached credentials. This is needed for alternate credentials feature when running the HTA on Windows XP. Cmdkey.exe is natively availab...Dual Browsing: Dual Browser: Please note the following: I setup the address bar temporarily to only accepts http:// .com addresses. Just type in the name of the website excluding: http://, www., and .com; (Ex: for www.youtube.com just type: youtube then click OK). The page splitter can be grabbed by holding down your left mouse button and move left or right. By right clicking on the page background, you can choose to refresh, go back a page and so on. Demo video: http://youtu.be/L7NTFVM3JUYMultiwfn: Multiwfn 2.3.3: Multiwfn 2.3.3Liberty: v3.2.0.1 Release 9th April 2012: Change Log-Fixed -Reach Fixed a bug where the object editor did not work on non-English operating systemsStyleCop+: StyleCop+ 1.8: Built over StyleCop 4.7.17.0 According to http://stylecop.codeplex.com/workitem/7156, it should be the last version which is released without new features and only for compatibility reasons. Do not forget to Unblock the file after downloading (more details) Stay tuned!Path Copy Copy: 10.1: This release addresses the following work items: 11357 11358 11359 This release is a recommended upgrade, especially for users who didn't install the 10.0.1 version.ExtAspNet: ExtAspNet v3.1.3: ExtAspNet - ?? ExtJS ??? ASP.NET 2.0 ???,????? AJAX ?????????? ExtAspNet ????? ExtJS ??? ASP.NET 2.0 ???,????? AJAX ??????????。 ExtAspNet ??????? JavaScript,?? CSS,?? UpdatePanel,?? ViewState,?? WebServices ???????。 ??????: IE 7.0, Firefox 3.6, Chrome 3.0, Opera 10.5, Safari 3.0+ ????:Apache License 2.0 (Apache) ??:http://extasp.net/ ??:http://bbs.extasp.net/ ??:http://extaspnet.codeplex.com/ ??:http://sanshi.cnblogs.com/ ????: +2012-04-08 v3.1.3 -??Language="zh_TW"?JS???BUG(??)。 +?D...Coding4Fun Tools: Coding4Fun.Phone.Toolkit v1.5.5: New Controls ChatBubble ChatBubbleTextBox OpacityToggleButton New Stuff TimeSpan languages added: RU, SK, CS Expose the physics math from TimeSpanPicker Image Stretch now on buttons Bug Fixes Layout fix so RoundToggleButton and RoundButton are exactly the same Fix for ColorPicker when set via code behind ToastPrompt bug fix with OnNavigatedTo Toast now adjusts its layout if the SIP is up Fixed some issues with Expression Blend supportHarness - Internet Explorer Automation: Harness 2.0.3: support the operation fo frameset, frame and iframe Add commands SwitchFrame GetUrl GoBack GoForward Refresh SetTimeout GetTimeout Rename commands GetActiveWindow to GetActiveBrowser SetActiveWindow to SetActiveBrowser FindWindowAll to FindBrowser NewWindow to NewBrowser GetMajorVersion to GetVersionBetter Explorer: Better Explorer 2.0.0.861 Alpha: - fixed new folder button operation not work well in some situations - removed some unnecessary code like subclassing that is not needed anymore - Added option to make Better Exlorer default (at least for WIN+E operations) - Added option to enable file operation replacements (like Terracopy) to work with Better Explorer - Added some basic usability to "Share" button - Other fixesText Designer Outline Text: Version 2 Preview 2: Added Fake 3D demos for C++ MFC, C# Winform and C# WPFLightFarsiDictionary - ??????? ??? ?????/???????: LightFarsiDictionary - v1: LightFarsiDictionary - v1WPF Application Framework (WAF): WPF Application Framework (WAF) 2.5.0.3: Version: 2.5.0.3 (Milestone 3): This release contains the source code of the WPF Application Framework (WAF) and the sample applications. Requirements .NET Framework 4.0 (The package contains a solution file for Visual Studio 2010) The unit test projects require Visual Studio 2010 Professional Changelog Legend: [B] Breaking change; [O] Marked member as obsolete [O] WAF: Mark the StringBuilderExtensions class as obsolete because the AppendInNewLine method can be replaced with string.Jo...GeoMedia PostGIS data server: PostGIS GDO 1.0.1.2: This is a new version of GeoMeda PostGIS data server which supports user rights. It means that only those feature classes, which the current user has rights to select, are visible in GeoMedia. Issues fixed in this release Fixed problem with renaming and deleting feature classes - IMPORTANT! - the gfeatures view must be recreated so that this issue is completely fixed. The attached script "GFeaturesView2.sql" can be used to accomplish this task. Another way is to drop and recreate the metadat...SkyDrive Connector for SharePoint: SkyDrive Connector for SharePoint: Fixed a few bugs pertaining to live authentication Removed dependency on Shared Documents Removed CallBack web part propertyClosedXML - The easy way to OpenXML: ClosedXML 0.65.2: Aside from many bug fixes we now have Conditional Formatting The conditional formatting was sponsored by http://www.bewing.nl (big thanks) New on v0.65.1 Fixed issue when loading conditional formatting with default values for icon sets New on v0.65.2 Fixed issue loading conditional formatting Improved inserts performanceMSBuild Extension Pack: April 2012: Release Blog Post The MSBuild Extension Pack April 2012 release provides a collection of over 435 MSBuild tasks. A high level summary of what the tasks currently cover includes the following: System Items: Active Directory, Certificates, COM+, Console, Date and Time, Drives, Environment Variables, Event Logs, Files and Folders, FTP, GAC, Network, Performance Counters, Registry, Services, Sound Code: Assemblies, AsyncExec, CAB Files, Code Signing, DynamicExecute, File Detokenisation, GUID’...DotNetNuke® Community Edition CMS: 06.01.05: Major Highlights Fixed issue that stopped users from creating vocabularies when the portal ID was not zero Fixed issue that caused modules configured to be displayed on all pages to be added to the wrong container in new pages Fixed page quota restriction issue in the Ribbon Bar Removed restriction that would not allow users to use a dash in page names. Now users can create pages with names like "site-map" Fixed issue that was causing the wrong container to be loaded in modules wh...51Degrees.mobi - Mobile Device Detection and Redirection: 2.1.3.1: One Click Install from NuGet Changes to Version 2.1.3.11. [assembly: AllowPartiallyTrustedCallers] has been added back into the AssemblyInfo.cs file to prevent failures with other assemblies in Medium trust environments. 2. The Lite data embedded into the assembly has been updated to include devices from December 2011. The 42 new RingMark properties will return Unknown if RingMark data is not available. Changes to Version 2.1.2.11Code Changes 1. The project is now licenced under the Mozilla...MVC Controls Toolkit: Mvc Controls Toolkit 2.0.0: Added Support for Mvc4 beta and WebApi The SafeqQuery and HttpSafeQuery IQueryable implementations that works as wrappers aroung any IQueryable to protect it from unwished queries. "Client Side" pager specialized in paging javascript data coming either from a remote data source, or from local data. LinQ like fluent javascript api to build queries either against remote data sources, or against local javascript data, with exactly the same interface. There are 3 different query objects exp...New ProjectsA C++ Websocket Server For realtime interaction with Web clients.: A websocket protocol layer to the Real Time Server library Push Framework. ABS: Assignment 2 of WDT Due date: 20th May 2012C# Garbage Pump: Password Keylogger Evasion: C# DLL for handling password input that is not susceptible to keylogging through a Garbage Pump technique, which pumps random keys, i.e. garbage, out while the user enters in a password. See screenshots for output results.ChangeTrackingDemo: Change Tracking demo application.CRK: My experimental WebsiteDoAnGame3D: d? án game 3dEyes On Train: The Eyes on Train is the application that can take the picture from multiple cameras and then it can monitor where the mini train it's.FRC Robot Simulator: A robot simulator that uses .NET and XNA technology at its core. Although targeted for FRC simulation, it can THEORETICALLY be used for any WPILib projects.GestorFinanceiro: GestorFinanceiro Exemplo de projeto baseado no padrão: Domain Driven Design. Desenvolvido em c#G-Labs: This project will be used to store "labs" projects from our group.HPCloud API: This project allows developers to work with HP's new Openstack based Storage and Compute infrastructures.indexeddb-feed-reader: Feed reader application using Indexed Database APIIT Trick Repository: This project is the source control for all projects, samples and tutorials posted at mshamkhani.blogspot.comJasLib: General-purpose power toolkit for the .NET Framework on desktop Windows computers.karolocommunicator: mój komunikatorekLan Community: Aplikacja sluzaca do komunikowania sie i monitorowania sieci lokalnej.LastFmReminder: (Work in progress) This Silverlight application uses the Last.fm API to get the names of all the artists you haven't listened to since a specified date. The working application is at http://lastfmreminder.atw.hu/ .linewatchSimple: linewatchSimpleLiuyi.network | Liuyi - [Liuyi.network_8.0] Liuyi.network_2.0 Liuyi.network_1.0: Liuyi.network | Liuyi - Liuyi.network_8.0 Liuyi.network_7.0 Liuyi.network_6.0 Liuyi.network_5.0 Liuyi.network_4.0 Liuyi.network_3.0 Liuyi.network_2.0 Liuyi.network_1.0 liuyi .net C# aspx network liuyi.network liuyi.aspx liuyi.C# liuyi.netMogutaro eats files!: Hobby application using HTML5 and File API. You can drag and drop files into the whale (named Mogutaro) 's mouth. moogle: Moogle is an android application developed using Mono as part of a project for a communications class at IIT. The class, COM 380, dealt with the topic of "Humanizing Technology". The application pseudo (but working) app for managing prescription information. In this way, it is mainly meant as a sample android application developed using .NET. For additional information about this application and group effort that led to its creation, please refer to the documentation of this site.ms2011_win32_tcs: ??????ncontrols: LIbrary of ASP.NET controls that works with NHibernateNetGL: Idea is to create a .Net library allows to use OpenGL in managed code. It is in early state, but shaders are working and no garbage collections occur.Orchard Portlets: Building on the work of the Orchard Widgets module, Orchard Portlets allows users to drag the widgets around the Ui without being in the admin screensProject Detroit: OBD-II Manager: A library to parse OBD-II data coming from a vehicle using an ELM323/327 compatible OBD to USB/serial cable. The solution also includes the WPF Instrument Cluster application that was used in the Project Detroit car!Silverlight 5 MarkupExtensions and Other Utilities: This project contains a replacement for certain WPF functionalities in SL5. Currently contains TypeExtension, StaticExtension, MultiBinding (and IMultiValueConverter), ObjectDataProvider, ArrayExtension Currently under development is an ExpressionParser and related converters.SjASMPlusUnreal: SjASM Plus Unreal at last!Source Code: Source CodeSpecflow Example: Some examples with BDD tool SpecflowSuperSocket ClientEngine: Socket client framework wrapping async data receiving, sending and error handlingTestAppMc: TestAppMcTestBBN: Test ProjectTrabalho News FPU: servidor de serviçosVisual FoxPro Professional 2012: Visual FoxPro Professional is a project to extentd Visual FoxPro editor capabilities. This is based on Scintilla Editor control. websocket-japanese-chalkboard: Multi user chalkboard using WebSocketWholemy.MonolithDBF: Monolith is Data Base Format on Double Tree Node, once Node Header for all data in file, opened direct access in file by offset node. First prototype dated on 2007 year.Zinc: Zinc is a utility library for ASP.NET web forms development. It has support for: - utility methods for working easier with controls - CSV exports - HttpModules for dealing with caching and path based rights. - custom controls This library runs on .NET 2.0 and i would like to kee

    Read the article

  • CodePlex Daily Summary for Saturday, March 19, 2011

    CodePlex Daily Summary for Saturday, March 19, 2011Popular ReleasesWPF Inspector: WPF Inspector 0.9.8: New Features in Version 0.9.8 - Much improved Style Explorer - Storing the last window position - Inspector stays operateable when a modal dialog is open - Improved visualization of Resources - Style item in property grid - BugfixesCraig's Utility Library: Craig's Utility Library 2.1: This update contains the following functionality additions: Added Min and Max functions to MathHelper Added code for handling comments to BlogML code. Added WMI based file search ability (at present only searches based on file extension). Added CellularTexture class Added FaultFormation class Added PerlinNoise class Added Akismet helper classes Added Gravatar image link generator Added PipeDelimited class Added FaultFormation class Added FluvialErosion functions to Image c...DirectQ: Release 1.8.7 (RC3): Release Candidate 3 of 1.8.7 fixing many bugs and adding some new functionality.Catel - WPF, Silverlight and WP7 MVVM library: 1.3: Catel history ============= (+) Added (*) Changed (-) Removed (x) Error / bug (fix) For more information about issues or new feature requests, please visit: http://catel.codeplex.com =========== Version 1.3 =========== Release date: ============= 2011/03/18 Added/fixed: ============ (+) Added BindingHelper class to evaluate binding values manually (+) UserControl<TViewModel> now supports master-detail views where the detail view would have a nested UserControl<TViewModel> and no parent ...CBM-Command: Version 2.0 - 2011-03-17 - Final Release: This is the final release of CBM-Command Version 2.0. This version is intended to replace all prior versions you may have downloaded. Please see release notes for prior versions to get comprehensive list of changes. New features since RC1: - (C64) Added double-buffering when displaying the directory in a panel. This eliminates the flickering that users were experiencing when scrolling through long directories. Changes since RC1: - (All Machines) Changed the algorithm for displaying the d...Phalanger - The PHP Language Compiler for the .NET Framework: 2.1 (March 2011) for .NET 4.0: Introducing release of Phalanger 2.1 for .NET 4.0. This release brings big performance boost about 20% in most of the operations. This improvement can be expected also in an overall performance in many PHP applications, e.g. Wordpress. It is the first release that targets .NET Framework 4.0 which allows developers to move the project forward. To migrate your old Phalanger applications from Phalanger 2.0 to 2.1 please follow Migration to 2.1. Installation package also includes basic version o...NodeXL: Network Overview, Discovery and Exploration for Excel: NodeXL Excel Template, version 1.0.1.164: The NodeXL Excel template displays a network graph using edge and vertex lists stored in an Excel 2007 or Excel 2010 workbook. What's NewThis release adds new layout options for groups, makes some minor feature improvements, and fixes a few bugs. See the Complete NodeXL Release History for details. Installation StepsFollow these steps to install and use the template: Download the Zip file. Unzip it into any folder. Use WinZip or a similar program, or just right-click the Zip file in Wi...Leage of Legends Masteries Tool: LoLMasterSave_v1.6.1.274: -Addresses resent LoL update that interfered with the way MasterSave sets / reads masteries - Removed Shift windows since some people experiencing issues If your interested in this function i can provide it as small separate tool.LogExpert: 1.4 build 4092: TabControl: Tooltip on dropdown list shows full path names now New menu item "Lock instance" in Options menu. Only available when "Allow only one instance" is disabled in the settings. "Lock instance" will temporary enable the single instance mode. The locked instance will receive all new launched files Some NullPtrExceptions fixed (e.g. in the settings dialog) Note: The debug build is identical to the release build. But the debug version writes a log file. It also contains line numbers ...Facebook C# SDK: 5.0.6 (BETA): This is seventh BETA release of the version 5 branch of the Facebook C# SDK. Remember this is a BETA build. Some things may change or not work exactly as planned. We are absolutely looking for feedback on this release to help us improve the final 5.X.X release. New in this release: Version 5.0.6 is almost completely backward compatible with 4.2.1 and 5.0.3 (BETA) Bug fixes and helpers to simplify many common scenarios For more information about this release see the following blog posts: F...SQLCE Code Generator: Build 1.0.3: New beta of the SQLCE Code Generator. New features: - Generates an IDataRepository interface that contains the generated repository interfaces that represents each table - Visual Studio 2010 Custom Tool Support Custom Tool: The custom tool is called SQLCECodeGenerator. Write this in the Custom Tool field in the Properties Window of an SDF file included in your project, this should create a code-behind file for the generated data access codeDotNetNuke® Community Edition: 06.00.00 CTP: CTP 1 (Build 155) is firmly focused around our conversion to C#. As many people have noted, this is a significant change to the platform and affects all areas of the product. This is one of the driving factors in why we felt it was important to get this release into your hands as soon as possible. We have already done quite a bit of testing on this feature internally and have fixed a number of issues in this area. We also recognize that there are probably still some more bugs to be found ...Kooboo CMS: Kooboo 3.0 RC: Bug fixes Inline editing toolbar positioning for websites with complicate CSS. Inline editing is turned on by default now for the samplesite template. MongoDB version content query for multiple filters. . Add a new 404 page to guide users to login and create first website. Naming validation for page name and datarule name. Files in this download kooboo_CMS.zip: The Kooboo application files Content_DBProvider.zip: Additional content database implementation of MSSQL,SQLCE, RavenDB ...SQL Monitor - tracking sql server activities: SQL Monitor 3.2: 1. introduce sql color syntax highlighting with http://www.codeproject.com/KB/edit/FastColoredTextBox_.aspxUmbraco CMS: Umbraco 4.7.0: Service release fixing 50+ issues! Getting Started A great place to start is with our Getting Started Guide: Getting Started Guide: http://umbraco.codeplex.com/Project/Download/FileDownload.aspx?DownloadId=197051 Make sure to check the free foundation videos on how to get started building Umbraco sites. They're available from: Introduction for webmasters: http://umbraco.tv/help-and-support/video-tutorials/getting-started Understand the Umbraco concepts: http://umbraco.tv/help-and-support...ProDinner - ASP.NET MVC EF4 Code First DDD jQuery Sample App: first release: ProDinner is an ASP.NET MVC sample application, it uses DDD, EF4 Code First for Data Access, jQuery and MvcProjectAwesome for Web UI, it has Multi-language User Interface Features: CRUD and search operations for entities Multi-Language User Interface upload and crop Images (make thumbnail) for meals pagination using "more results" button very rich and responsive UI (using Mvc Project Awesome) Multiple UI themes (using jQuery UI themes)BEPUphysics: BEPUphysics v0.15.1: Latest binary release. Version HistoryIronRuby: 1.1.3: IronRuby 1.1.3 is a servicing release that keeps on improving compatibility with Ruby 1.9.2 and includes IronRuby integration to Visual Studio 2010. We decided to drop 1.8.6 compatibility mode in all post-1.0 releases. We recommend using IronRuby 1.0 if you need 1.8.6 compatibility. The main purpose of this release is to sync with IronPython 2.7 release, i.e. to keep the Dynamic Language Runtime that both these languages build on top shareable. This release also fixes a few bugs: 5763 Use...SQL Server PowerShell Extensions: 2.3.2.1 Production: Release 2.3.2.1 implements SQLPSX as PowersShell version 2.0 modules. SQLPSX consists of 13 modules with 163 advanced functions, 2 cmdlets and 7 scripts for working with ADO.NET, SMO, Agent, RMO, SSIS, SQL script files, PBM, Performance Counters, SQLProfiler, Oracle and MySQL and using Powershell ISE as a SQL and Oracle query tool. In addition optional backend databases and SQL Server Reporting Services 2008 reports are provided with SQLServer and PBM modules. See readme file for details.IronPython: 2.7: On behalf of the IronPython team, I'm very pleased to announce the release of IronPython 2.7. This release contains all of the language features of Python 2.7, as well as several previously missing modules and numerous bug fixes. IronPython 2.7 also includes built-in Visual Studio support through IronPython Tools for Visual Studio. IronPython 2.7 requires .NET 4.0 or Silverlight 4. To download IronPython 2.7, visit http://ironpython.codeplex.com/releases/view/54498. Any bugs should be report...New ProjectsBRFin: Personal Finances Management SystemCommand Savvy: Command Savvy provides developers with a simple API for quickly creating consistent console applications. It uses conventions and reflection-based auto-wiring to simplify configuration (while allowing for overrides of this default behavior). It is developed in C#.Control Arrays .NET: Ability to create control arrays at design time for standard Windows Forms controls, with all the functionality that existed in VB6, in VB.NET using Visual Studio 2010 and the .NET 4.0 platform.Crystalbyte Horizon: Crystalbyte Horizon is a carrier application written in C# using WPF, licenced under the Ms-PL. Dimos GIS Overlay Project: GIS Project to compute the spatial overlay on the Azure platformEric Fang's SharePoint workflow timer: SharePoint 2010 workflow timer, which can schedule site workflows and list workflowsjaryaan: Jaryaan Makes it easier for infected systems to Force Close Unwanted Process to bring health back to System. Persian Translation is availableMakalu: Track all the problems and issues of your city. Mosscn: mosscn projectNetduino4Fun: Netduino projects for fun. Experiments with the Electronic brick Starter kit from Seeed Studio.NHibernate AutoCriteria: Creating criteria based on...Nina: Nina is an open source asynchronous event-driven networking library.Ploobs Game Engine: Full Game Engine developed in C# and XNA using Deferred Rendering. The principal Features are: Integrated Physic, Artificial Inteligence, Dynamic Lights and Shadow, Lots of Post Effects, Billboards , Extensible Particle System, Vegetation, Materials types, 3D Sound and MUCH MORE!Roadkill - .NET Wiki engine: Roadkill .NET is a lightweight but powerful Wiki engine built on the following foundations: .NET 4.0 jQuery, ASP.NET MVC 3 with Razor, NHibernate, Creole Wiki/Markdown/Mediawiki syntax, SQL ServerService Application Sample: Practical sample of a service application based on known patterns and practices.SharePoint 2010 Custom Login: This is a custom login screen for SharePoint 2010 FBA. Working on a custom authentication.aspx to replace the Windows Authentication / Forms Authentication prompt as well based on URL / Zone.SharePoint People Search Pivot Viewer WebPart: The Silverlight PivotViewer ?ontrol is a new way to display SharePoint 2010 Enterprise Search Results. It's another way to interact with massive amounts of data on the web in ways that are powerful, informative, and valuable. Tested on FAST Search 2010. SharePoint ReGhost: Console tool created for reghosting SharePoint sites after an upgrade.Simesoft: ????????Spruce - MVC frontend for TFS: Spruce is a small ASP.NET MVC 3 (razor) frontend for managing work items in Team Foundation Server 2010. It is influenced by Bitbucket, Fogbugz and other simple to use bug tracking systems.Twavatar: Super simple ASP.NET helper for rendering Twitter avatars / profile images.UCB_GP_B: this is a pilot project which is intented to help the college education.UCB_GP_C: this is a pilot project which is intented to help the college education.UCB_GP_D: this is a pilot project which is intented to help the college education. Ultimate Calculator for windows phone 7: ???????????? ????????????????????????????WP7 Text-to-Speech Tool & Translation Library: Windows Phone Text-to-Speech (wpTTS) produces speech from text strings. wpTTS also provides real-time translation between a select list of languages. (AppID required.)

    Read the article

  • CodePlex Daily Summary for Wednesday, March 09, 2011

    CodePlex Daily Summary for Wednesday, March 09, 2011Popular ReleasesDirectQ: Release 1.8.7 (RC2): More fixes and improvements. Note for multiplayer - you may need to set r_waterwarp to 0 or 2 before connecting to a server, otherwise you will get a "Mod_PointInLeaf: bad model" error and not be able to connect. You can set it back to 1 after you connect, of course. This only came to light after releasing, and will be fixed in the next one.Microsoft All-In-One Code Framework: Visual Studio 2008 Code Samples 2011-03-09: Code samples for Visual Studio 2008myCollections: Version 1.3: New in version 1.3 : Added Editor management for Books Added Amazon API for Books Us, Fr, De Added Amazon Us, Fr, De for Movies Added The MovieDB for Fr and De Added Author for Books Added Editor and Platform for Games Added Amazon Us, De for Games Added Studio for XXX Added Background for XXX Bug fixing with Softonic API Bug fixing with IMDB UI improvement Removed GraceNote Added Amazon Us,Fr, De for Series Added TVDB Fr and De for Series Added Tracks for Musi...Facebook Graph Toolkit: Facebook Graph Toolkit 1.1: Version 1.1 (8 Mar 2011)new Dialog class for redirecting users to Facebook dialogs new Async publishing methods new Check for Extended Permissions option fixed bug: inappropiate condition of redirecting to login in Api class fixed bug: IframeRedirect method not workingpatterns & practices : Composite Services: Composite Services Guidance - CTP2: This is the second CTP of the p&p Composite Service Guidance.Python Tools for Visual Studio: 1.0 Beta 1: Beta 1You can't install IronPython Tools for Visual Studio side-by-side with Python Tools for Visual Studio. A race condition sometimes causes local MPI debugging to miss breakpoints. When MPI jobs on a cluster fail they don’t get cleaned up correctly, which can cause debugging to stall because the associated MPI job is stuck in the queue. The "Threads" view has a race condition which can cause it not to display properly at times. VS2010 shortcuts that are pinned to the taskbar are so...DotNetAge -a lightweight Mvc jQuery CMS: DotNetAge 2: What is new in DotNetAge 2.0 ? Completely update DJME to DJME2, enhance user experience ,more beautiful and more interactively visit DJME project home to lean more about DJME http://www.dotnetage.com/sites/home/djme.html A new widget engine has came! Faster and easiler. Runtime performance enhanced. SEO enhanced. UI Designer enhanced. A new web resources explorer. Page manager enhanced. BlogML supports added that allows you import/export your blog data to/from dotnetage publishi...Kooboo CMS: Kooboo CMS 3.0 Beta: Files in this downloadkooboo_CMS.zip: The kooboo application files Content_DBProvider.zip: Additional content database implementation of MSSQL,SQLCE, RavenDB and MongoDB. Default is XML based database. To use them, copy the related dlls into web root bin folder and remove old content provider dlls. Content provider has the name like "Kooboo.CMS.Content.Persistence.SQLServer.dll" View_Engines.zip: Supports of Razor, webform and NVelocity view engine. Copy the dlls into web root bin folder t...ASP.NET MVC Project Awesome, jQuery Ajax helpers (controls): 1.7.2: A rich set of helpers (controls) that you can use to build highly responsive and interactive Ajax-enabled Web applications. These helpers include Autocomplete, AjaxDropdown, Lookup, Confirm Dialog, Popup Form, Popup and Pager added fullscreen for the popup and popupformIronPython: 2.7 Release Candidate 2: On behalf of the IronPython team, I am pleased to announce IronPython 2.7 Release Candidate 2. The releases contains a few minor bug fixes, including a working webbrowser module. Please see the release notes for 61395 for what was fixed in previous releases.LINQ to Twitter: LINQ to Twitter Beta v2.0.20: Mono 2.8, Silverlight, OAuth, 100% Twitter API coverage, streaming, extensibility via Raw Queries, and added documentation.IIS Tuner: IIS Tuner 1.0: IIS and ASP.NET performance optimization toolMinemapper: Minemapper v0.1.6: Once again supports biomes, thanks to an updated Minecraft Biome Extractor, which added support for the new Minecraft beta v1.3 map format. Updated mcmap to support new biome format.Sandcastle Help File Builder: SHFB v1.9.3.0 Release: This release supports the Sandcastle June 2010 Release (v2.6.10621.1). It includes full support for generating, installing, and removing MS Help Viewer files. This new release is compiled under .NET 4.0, supports Visual Studio 2010 solutions and projects as documentation sources, and adds support for projects targeting the Silverlight Framework. This release uses the Sandcastle Guided Installation package used by Sandcastle Styles. Download and extract to a folder and then run SandcastleI...AutoLoL: AutoLoL v1.6.4: It is now possible to run the clicker anyway when it can't detect the Masteries Window Fixed a critical bug in the open file dialog Removed the resize button Some UI changes 3D camera movement is now more intuitive (Trackball rotation) When an error occurs on the clicker it will attempt to focus AutoLoLYAF.NET (aka Yet Another Forum.NET): v1.9.5.5 RTW: YAF v1.9.5.5 RTM (Date: 3/4/2011 Rev: 4742) Official Discussion Thread here: http://forum.yetanotherforum.net/yaf_postsm47149_v1-9-5-5-RTW--Date-3-4-2011-Rev-4742.aspx Changes in v1.9.5.5 Rev. #4661 - Added "Copy" function to forum administration -- Now instead of having to manually re-enter all the access masks, etc, you can just duplicate an existing forum and modify after the fact. Rev. #4642 - New Setting to Enable/Disable Last Unread posts links Rev. #4641 - Added Arabic Language t...Snippet Designer: Snippet Designer 1.3.1: Snippet Designer 1.3.1 for Visual Studio 2010This is a bug fix release. Change logFixed bug where Snippet Designer would fail if you had the most recent Productivity Power Tools installed Fixed bug where "Export as Snippet" was failing in non-english locales Fixed bug where opening a new .snippet file would fail in non-english localesChiave File Encryption: Chiave 1.0: Final Relase for Chave 1.0 Stable: Application for file encryption and decryption using 512 Bit rijndael encyrption algorithm with simple to use UI. Its written in C# and compiled in .Net version 3.5. It incorporates features of Windows 7 like Jumplists, Taskbar progress and Aero Glass. Now with added support to Windows XP! Change Log from 0.9.2 to 1.0: ==================== Added: > Added Icon Overlay for Windows 7 Taskbar Icon. >Added Thumbnail Toolbar buttons to make the navigation easier...Chirpy - VS Add In For Handling Js, Css, DotLess, and T4 Files: Margogype Chirpy (ver 2.0): Chirpy loves Americans. Chirpy hates Americanos.ASP.NET: Sprite and Image Optimization Preview 3: The ASP.NET Sprite and Image Optimization framework is designed to decrease the amount of time required to request and display a page from a web server by performing a variety of optimizations on the page’s images. This is the third preview of the feature and works with ASP.NET Web Forms 4, ASP.NET MVC 3, and ASP.NET Web Pages (Razor) projects. The binaries are also available via NuGet: AspNetSprites-Core AspNetSprites-WebFormsControl AspNetSprites-MvcAndRazorHelper It includes the foll...New ProjectsA-Inventory: Inventory Management System * Purchase Orders * Sales Orders * Multiple warehouses * Stock Transfers * Financial Transaction Tracking * ReportsAsync Execution Lib: This library simplifies the process of executing code on a different thread and separating the caller from the actual command logic. To do this messages are put into an execution module and the library automatically calls the target message handlers.Bing Wallpaper Downloader: Downloads wallpapers from Bing and displays them as the desktop wallpaper. Based on UI and concepts of Bing4Free.CloudBox: This is a custom storage controller for DropBox. It lets you create multiple DropBox accounts an will then treat them as one large storage. Controller2: Projeto para desenvolvimento de Sistema para o Projeto Integrador do Curso de Análise e Desenvolvimento de Sistemas do CesumarCurso_Virtual_FPSEP: En este proyecto se esta elaborando el sistema para el manejo de un curso virtual que se tiene pensado impartir en la CFE, este curso se esta desarrollando bajo el mando del Ingeniero Earl Amazurrutia Carson y esta dirigido para el personal de protecciones.DBSJ: testEveTools: EveTools is a set of classes to aid in the development of programs that access the EVE Online API. It is written with a very event-driven model; all normally blocking, non-compute-bound workloads will instead run asynchronously, freeing up your program to do as it pleases!GeoIp: .Net MaxMind GeoIP client libraryKieuHungProject: Doan Vien managmentMimoza: ?????? ??? ????? ?????????, ??????? ????? ???????????? ??? ?????? ????????? ???????????? ?? ?????? ??????...mmoss: Medical Marijuana Open Source System. To manage Point-of-sale, inventory, grow and compliance issues related to the sale of MMJNetCassa: .Net Cassandra client library.Neudesic Pulse SDK: The Neudesic Pulse SDK allows developers the ability to quickly and efficiently build solutions that interact with the Neudesic Pulse social framework APIs.Nuget Package Creation and Publishing Wizard: simplifies the creation and publishing of an nuget packagePetscareinlondon: This project is all about pets care.Pool based Batch Processing: A simple framework that allows pool based processing of batches. A new batch is picked up when pools are empty. The framework exposes simple events that allows user to process jobs at the back end (Windows Service).Project Nonnon: Keep-It-Simple Softwares for Win32 MinGW GCC 3.x C Language + Batch Files POSIX-based Base Layer Library Win32 Applications Easy2Compile Easy2Make Easy2Use Python Tools for Visual Studio: Python Tools for Visual Studio adds support for Intellisense, Debugging, Profiling, IPython (.11+), Cluster & Cloud Computing to Visual Studio. It supports both CPython (2.4-3.1) and IronPython (2.7). python_lib: like protobuf,parse xml definition of c++struct,and develop lots of usageRapidMEF: A collection of tools to help developers author and debug applications that use MEF.Reflective: Reflective adds lots of new extension methods related to reflection and Reflection.Emit, to make it easier to build code dynamically at runtime.Remote Desktop Organizer: This is a fun little application that lets you easily manage lots of different Remote Desktops. It allows the user to apply custom alias's and descriptions so that it is easy denote which desktop is which and allows for easy customization and managementSharePoint data population: SharePoint data populationSpriteEditor: Basic sprite editorStock3243254635254325435: 345234324324324324Time Management Application: Based upon Stephen Covey's 7 Habits of Highly Effective People, I was looking for a place to digitally record my time. When I could not find one I liked, I set out to build my own. This also covers several of the coding practices and patterns that I have been putting together.Tuned N: Tuned N is a Playlist.com based media application. Allows listening to playlists via desktop app and allows downloading of tracks in playlists.Winforms BetterBindingSource: A better windows forms (winforms) bindingsource control which enables you to add class based datasources without the hassle of adding datasource files and using the slow wizard to add data sources.WinShutdown: Just a small application to countdown the windows shutdown/restart. When you want the windows shutdown after some time or after some application finishes its work. Please if you have a better project for this purpouse or if you have an update for my code. Let me know.

    Read the article

  • CodePlex Daily Summary for Tuesday, August 21, 2012

    CodePlex Daily Summary for Tuesday, August 21, 2012Popular ReleasesResX Resource Manager: 1.0.0.1 Visual Studio Extension: Fix: truncated version in VSIX manifest leads to permanent update notifications.MFCMAPI: August 2012 Release: Build: 15.0.0.1035 Full release notes at SGriffin's blog. If you just want to run the MFCMAPI or MrMAPI, get the executables. If you want to debug them, get the symbol files and the source. The 64 bit builds will only work on a machine with Outlook 2010 64 bit installed. All other machines should use the 32 bit builds, regardless of the operating system. Facebook BadgeDocument.Editor: 2013.2: Whats new for Document.Editor 2013.2: New save as Html document Improved Traslate support Minor Bug Fix's, improvements and speed upsSharePoint Dynamic Forms: Version 1.0: Version 1.0 of SharePoint Dynamic Forms Includes 1. List Based Rendering 2. Template Based Rendering 2.1 Supports extensive field validation types including String, Date, Comparison, Content Length, regex etc. 2.2 Support for cross field comparison validation. 2.3 Data entry option for a user who doesn’t have write permission to a list. 2.4 Option to extend the web part by overriding form submission event. 2.5 Option to cancel the form submission and provide custom notification message. 2.6 ...Pulse: Pulse Beta 5: Whats new in this release? Well to start with we now have Wallbase.cc Authentication! so you can access favorites or NSFW. This version requires .NET 4.0, you probably already have it, but if you don't it's a free and easy download from Microsoft. Pulse can bet set to start on Windows startup now too. The Wallpaper setter has settings now, so you can change the background color of the desktop and the Picture Position (Tile/Center/Fill/etc...) I've switched to Windows Forms instead of WPF...HydroDesktop - CUAHSI Hydrologic Information System Desktop Application: 1.5.5 Experimental Release: This is HydroDesktop 1.5.5 Experimental Release We are targeting for a 1.5 Stable Release in August 2012. This experimental version has been published for testing. New Features in 1.5 Time Series Data Import Improved performance of table, graph and edit views Support for online sample project packages (sharing data and analyses) More detailed display of time series metadata Improved extension manager (uninstall extensions, choose extension source) Improved attribute table editor (supports fi...Metro Paint: Metro Paint: Download it now , don't forget to give feedback to me at maitreyavyas@live.com or at my facebook page fb.com/maitreyavyas , Hope you enjoy it.MiniTwitter: 1.80: MiniTwitter 1.80 ???? ?? .NET Framework 4.5 ?????? ?? .NET Framework 4.5 ????????????? "&" ??????????????????? ???????????????????????? 2 ??????????? ReTweet ?????????????????、In reply to ?????????????? URL ???????????? ??????????????????????????????Droid Explorer: Droid Explorer 0.8.8.6 Beta: Device images are now pulled from DroidExplorer Cloud Service refined some issues with the usage statistics Added a method to get the first available value from a list of property names DroidExplorer.Configuration no longer depends on DroidExplorer.Core.UI (it is actually the other way now) fix to the bootstraper to only try to delete the SDK if it is a "local" sdk, not an existing. no longer support the "local" sdk, you must now select an existing SDK checks for sdk if it was ins...Path Copy Copy: 11.0.1: Bugfix release that corrects the following issue: 11365 If you are using Path Copy Copy in a network environment and use the UNC path commands, it is recommended that you upgrade to this version.ExtAspNet: ExtAspNet v3.1.9: +2012-08-18 v3.1.9 -??other/addtab.aspx???JS???BoundField??Tooltip???(Dennis_Liu)。 +??Window?GetShowReference???????????????(︶????、????、???、??~)。 -?????JavaScript?????,??????HTML????????。 -??HtmlNodeBuilder????????????????JavaScript??。 -??????WindowField、LinkButton、HyperLink????????????????????????????。 -???????????grid/griddynamiccolumns2.aspx(?????)。 -?????Type??Reset?????,??????????????????(e??)。 -?????????????????????。 -?????????int,short,double??????????(???)。 +?Window????Ge...AcDown????? - AcDown Downloader Framework: AcDown????? v4.0.1: ?? ●AcDown??????????、??、??????。????,????,?????????????????????????。???????????Acfun、????(Bilibili)、??、??、YouTube、??、???、??????、SF????、????????????。 ●??????AcPlay?????,??????、????????????????。 ● AcDown??????????????????,????????????????????????????。 ● AcDown???????C#??,????.NET Framework 2.0??。?????"Acfun?????"。 ????32??64? Windows XP/Vista/7/8 ??:????????Windows XP???,?????????.NET Framework 2.0???(x86),?????"?????????"??? ??????????????,??????????: ??"AcDown?????"????????? ...Fluent Validation for .NET: 3.4: Changes since 3.3: Make ValidationResut.IsValid virtual Add private no-arg ctor to ValidationFailure to help with serialization Add Turkish error messages Work-around for reflection bug in .NET 4.5 that caused VerificationExceptions Assemblies are now unsigned to ease with versioning/upgrades (especially where other frameworks depend on FV) (Note if you need signed assemblies then you can use the following NuGet packages: FluentValidation-signed, FluentValidation.MVC3-signed, FluentV...DotNetNuke® Feedback: 06.02.01: Official Release - 17th August 2012 Please look at the Release Notes file included in the module packages or available on this page as a separate download for a listing of the bug fixes and enhancements found in this version. NOTE: Feedback v 06.02.00 REQUIRES a minimum DotNetNuke framework version of 06.02.00 as well as ASP.Net 3.5 SP1 and MS SQL Server 2005 or 2008 (Express or standard versions). This release brings some enhancements to the module as well as fixing all known bugs. Bug Fi...AssaultCube Reloaded: 2.5.3 Unnamed Fixed: If you are using deltas, download 2.5.2 first, then overwrite with the delta packages. Linux has Ubuntu 11.10 32-bit precompiled binaries and Ubuntu 10.10 64-bit precompiled binaries, but you can compile your own as it also contains the source. If you are using Mac or other operating systems, please wait while we try to package for those OSes. Try to compile it. If it fails, download a virtual machine. The server pack is ready for both Windows and Linux, but you might need to compile your ...Coding4Fun Tools: Coding4Fun.Phone.Toolkit v1.6.1: Bug Fix release Bug Fixes Better support for transparent images IsFrozen respected if not bound to corrected deadlock stateWPF Application Framework (WAF): WPF Application Framework (WAF) 2.5.0.7: Version: 2.5.0.7 (Milestone 7): This release contains the source code of the WPF Application Framework (WAF) and the sample applications. Requirements .NET Framework 4.0 (The package contains a solution file for Visual Studio 2010) The unit test projects require Visual Studio 2010 Professional Changelog Legend: [B] Breaking change; [O] Marked member as obsolete WAF: Add CollectionHelper.GetNextElementOrDefault method. InfoMan: Support creating a new email and saving it in the Send b...myCollections: Version 2.2.3.0: New in this version : Added setup package. Added Amazon Spain for Apps, Books, Games, Movie, Music, Nds and Tvshow. Added TVDB Spain for Tvshow. Added TMDB Spain for Movies. Added Auto rename files from title. Added more filters when adding files (vob,mpls,ifo...) Improve Books author and Music Artist Credits. Rewrite find duplicates for better performance. You can now add Custom link to items. You can now add type directly from the type list using right mouse button. Bug ...Player Framework by Microsoft: Player Framework for Windows 8 Preview 5 (Refresh): Support for Windows 8 and Visual Studio RTM Support for Smooth Streaming SDK beta 2 Support for live playback New bitrate meter and SD/HD indicators Auto smooth streaming track restriction for snapped mode to conserve bandwidth New "Go Live" button and SeekToLive API Support for offset start times Support for Live position unique from end time Support for multiple audio streams (smooth and progressive content) Improved intellisense in JS version NEW TO PREVIEW 5 REFRESH:Req...TFS Workbench: TFS Workbench v2.2.0.10: Compiled installers for TFS Workbench 2.2.0.10 Bug Fix Fixed bug that stopped the change workspace action from working.New ProjectsAdaptive modeling interface: Ami will be a framework to simplify scientific model development. Keywords: Modeling, C#AlphaDogDemo: A simple XNA gameDnf: Dnf??eProjectSem3: beginFluentGUI: A Silverlight library that enables composition of a type safe Graphical User Interface.Frontrader-IB: Financial test applicationHART Analyzer: HART Analyzer is a tool to monitor the HART protocol between field devices and your PC. It used Hart Communication Protocol Lite for the communication.Hybrid.Net - Light-weight GPU Computing for .NET: Hybrid.Net enables .NET developers to harness the power of GPUs for data- and compute-intense applications using the simple well-known construct: Parallel.ForLoggerz- A .Net Error Logging framework: A new and hopefully exciting error logging framework that will integrate nicely into any windows/web application. MakersEngine: World of Warcraft Emulator Compiling SoftwareMetro Paint: This is Metro Paint app which is a modern ui app for Windows 8 . To test it in your system you will need Visual Studio Express for Windows 8. Hope you love it.mojomo, a modular design framework for mojoPortal CMS: Mojomo is a modular, front end design framework for mojoPortal CMS. My CSharp reminders: Several simple code examples in C #, used as reminders for my future development in C #.MYCoding Codes: anyone can tell me about this?MyExample: testMyGit: my git source libraryNWebsec: NWebsec is a security library for ASP.NET applications. It's built on the philosophy that security should be simple and maintainable.Private cloud DMS: Cloud document management system with workflow support. Free essential version is destined for small companies or any user groups.proEx: just a testQuantoSharp: A Financial Quant library implemented in C# completely to showcase the power of mathematics and its application, aimed for educational purposes only.SharePoint SlideShow: Sharepoint SlideShow customizable webpart in sharepoint 2010. Slideshow from SP 2010 picture library using jquery. Please leave comments about the project.Tool army building for battle: Helper application to build army list for Warhammer BattleWinRT XAML Calendar: WinRT XAML Calendar control ported from the Silverlight Toolkit's Calendar control for Silverlight 4.????: ????? ??? ???? ??? ?????? ?????. ???? so.cl? ???? API? ???? ????? ????! ? ????? ???? ???? ????. ???? ????!..

    Read the article

  • Issue with WIC image resizing on ASP.NET MVC 2

    - by Dave
    I am attempting to implement image resizing on user uploads in ASP.NET MVC 2 using a version of the method found: here on asp.net. This works great on my dev machine, but as soon as I put it on my production machine, I start getting the error 'Exception from HRESULT: 0x88982F60' which is supposed to mean that there is an issue decoding the image. However, when I use WICExplorer to open the image, it looks ok. I've also tried this with dozens of images of various sources and still get the error (though possible, I doubt all of them are corrupted). Here is the relevant code (with my debugging statements in there): MVC Controller [Authorize, HttpPost] public ActionResult Upload(string file) { //Check file extension string fx = file.Substring(file.LastIndexOf('.')).ToLowerInvariant(); string key; if (ConfigurationManager.AppSettings["ImageExtensions"].Contains(fx)) { key = Guid.NewGuid().ToString() + fx; } else { return Json("extension not found"); } //Check file size if (Request.ContentLength <= Convert.ToInt32(ConfigurationManager.AppSettings["MinImageSize"]) || Request.ContentLength >= Convert.ToInt32(ConfigurationManager.AppSettings["MaxImageSize"])) { return Json("content length out of bounds: " + Request.ContentLength); } ImageResizerResult irr, irr2; //Check if this image is coming from FF, Chrome or Safari (XHR) HttpPostedFileBase hpf = null; if (Request.Files.Count <= 0) { //Scale and encode image and thumbnail irr = ImageResizer.CreateMaxSizeImage(Request.InputStream); irr2 = ImageResizer.CreateThumbnail(Request.InputStream); } //Or IE else { hpf = Request.Files[0] as HttpPostedFileBase; if (hpf.ContentLength == 0) return Json("hpf.length = 0"); //Scale and encode image and thumbnail irr = ImageResizer.CreateMaxSizeImage(hpf.InputStream); irr2 = ImageResizer.CreateThumbnail(hpf.InputStream); } //Check if image and thumbnail encoded and scaled correctly if (irr == null || irr.output == null || irr2 == null || irr2.output == null) { if (irr != null && irr.output != null) irr.output.Dispose(); if (irr2 != null && irr2.output != null) irr2.output.Dispose(); if(irr == null) return Json("irr null"); if (irr2 == null) return Json("irr2 null"); if (irr.output == null) return Json("irr.output null. irr.error = " + irr.error); if (irr2.output == null) return Json("irr2.output null. irr2.error = " + irr2.error); } if (irr.output.Length > Convert.ToInt32(ConfigurationManager.AppSettings["MaxImageSize"]) || irr2.output.Length > Convert.ToInt32(ConfigurationManager.AppSettings["MaxImageSize"])) { if(irr.output.Length > Convert.ToInt32(ConfigurationManager.AppSettings["MaxImageSize"])) return Json("irr.output.Length > maximage size. irr.output.Length = " + irr.output.Length + ", irr.error = " + irr.error); return Json("irr2.output.Length > maximage size. irr2.output.Length = " + irr2.output.Length + ", irr2.error = " + irr2.error); } //Store scaled and encoded image and thumbnail .... return Json("success"); } The code is always failing when checking if the output stream is null (i.e. irr.output == null is true). ImageResizerResult and ImageResizer public class ImageResizerResult : IDisposable { public MemoryIStream output; public int width; public int height; public string error; public void Dispose() { output.Dispose(); } } public static class ImageResizer { private static Object thislock = new Object(); public static ImageResizerResult CreateMaxSizeImage(Stream input) { uint maxSize = Convert.ToUInt32(ConfigurationManager.AppSettings["MaxImageDimension"]); try { lock (thislock) { // Read the source image var photo = ByteArrayFromStream(input); var factory = (IWICComponentFactory)new WICImagingFactory(); var inputStream = factory.CreateStream(); inputStream.InitializeFromMemory(photo, (uint)photo.Length); var decoder = factory.CreateDecoderFromStream(inputStream, null, WICDecodeOptions.WICDecodeMetadataCacheOnDemand); var frame = decoder.GetFrame(0); // Compute target size uint width, height, outWidth, outHeight; frame.GetSize(out width, out height); if (width > height) { //Check if width is greater than maxSize if (width > maxSize) { outWidth = maxSize; outHeight = height * maxSize / width; } //Width is less than maxSize, so use existing dimensions else { outWidth = width; outHeight = height; } } else { //Check if height is greater than maxSize if (height > maxSize) { outWidth = width * maxSize / height; outHeight = maxSize; } //Height is less than maxSize, so use existing dimensions else { outWidth = width; outHeight = height; } } // Prepare output stream to cache file var outputStream = new MemoryIStream(); // Prepare JPG encoder var encoder = factory.CreateEncoder(Consts.GUID_ContainerFormatJpeg, null); encoder.Initialize(outputStream, WICBitmapEncoderCacheOption.WICBitmapEncoderNoCache); // Prepare output frame IWICBitmapFrameEncode outputFrame; var arg = new IPropertyBag2[1]; encoder.CreateNewFrame(out outputFrame, arg); var propBag = arg[0]; var propertyBagOption = new PROPBAG2[1]; propertyBagOption[0].pstrName = "ImageQuality"; propBag.Write(1, propertyBagOption, new object[] { 0.85F }); outputFrame.Initialize(propBag); outputFrame.SetResolution(96, 96); outputFrame.SetSize(outWidth, outHeight); // Prepare scaler var scaler = factory.CreateBitmapScaler(); scaler.Initialize(frame, outWidth, outHeight, WICBitmapInterpolationMode.WICBitmapInterpolationModeFant); // Write the scaled source to the output frame outputFrame.WriteSource(scaler, new WICRect { X = 0, Y = 0, Width = (int)outWidth, Height = (int)outHeight }); outputFrame.Commit(); encoder.Commit(); return new ImageResizerResult { output = outputStream, height = (int)outHeight, width = (int)outWidth }; } } catch (Exception e) { return new ImageResizerResult { error = "Create maxsizeimage = " + e.Message }; } } } Thoughts on where this is going wrong? Thanks in advance for the effort.

    Read the article

  • creating a 3d plane using Frank Luna's technique

    - by numerical25
    I am creating a 3d plane that lays on the x and z axis. and has hills that extend on the y axis. bulk of the code looks like this float PeaksAndValleys::getHeight(float x, float z)const { return 0.3f*( z*sinf(0.1f*x) + x*cosf(0.1f*z) ); } void PeaksAndValleys::init(ID3D10Device* device, DWORD m, DWORD n, float dx) { md3dDevice = device; mNumRows = m; mNumCols = n; mNumVertices = m*n; mNumFaces = (m-1)*(n-1)*2; // Create the geometry and fill the vertex buffer. std::vector<Vertex> vertices(mNumVertices); float halfWidth = (n-1)*dx*0.5f; float halfDepth = (m-1)*dx*0.5f; for(DWORD i = 0; i < m; ++i) { float z = halfDepth - i*dx; for(DWORD j = 0; j < n; ++j) { float x = -halfWidth + j*dx; // Graph of this function looks like a mountain range. float y = getHeight(x,z); vertices[i*n+j].pos = D3DXVECTOR3(x, y, z); // Color the vertex based on its height. if( y < -10.0f ) vertices[i*n+j].color = BEACH_SAND; else if( y < 5.0f ) vertices[i*n+j].color = LIGHT_YELLOW_GREEN; else if( y < 12.0f ) vertices[i*n+j].color = DARK_YELLOW_GREEN; else if( y < 20.0f ) vertices[i*n+j].color = DARKBROWN; else vertices[i*n+j].color = WHITE; } } D3D10_BUFFER_DESC vbd; vbd.Usage = D3D10_USAGE_IMMUTABLE; vbd.ByteWidth = sizeof(Vertex) * mNumVertices; vbd.BindFlags = D3D10_BIND_VERTEX_BUFFER; vbd.CPUAccessFlags = 0; vbd.MiscFlags = 0; D3D10_SUBRESOURCE_DATA vinitData; vinitData.pSysMem = &vertices[0]; HR(md3dDevice->CreateBuffer(&vbd, &vinitData, &mVB)); // Create the index buffer. The index buffer is fixed, so we only // need to create and set once. std::vector<DWORD> indices(mNumFaces*3); // 3 indices per face // Iterate over each quad and compute indices. int k = 0; for(DWORD i = 0; i < m-1; ++i) { for(DWORD j = 0; j < n-1; ++j) { indices[k] = i*n+j; indices[k+1] = i*n+j+1; indices[k+2] = (i+1)*n+j; indices[k+3] = (i+1)*n+j; indices[k+4] = i*n+j+1; indices[k+5] = (i+1)*n+j+1; k += 6; // next quad } } D3D10_BUFFER_DESC ibd; ibd.Usage = D3D10_USAGE_IMMUTABLE; ibd.ByteWidth = sizeof(DWORD) * mNumFaces*3; ibd.BindFlags = D3D10_BIND_INDEX_BUFFER; ibd.CPUAccessFlags = 0; ibd.MiscFlags = 0; D3D10_SUBRESOURCE_DATA iinitData; iinitData.pSysMem = &indices[0]; HR(md3dDevice->CreateBuffer(&ibd, &iinitData, &mIB)); } My question pretains to the cosf and sinf. I am formiluar with trigonometry and I understand sin, cosine, and tangent. but I am not formiluar with cosf and sinf and what they do. From looking at this example. they have alot to do with finding a y value.

    Read the article

  • How to change Matlab program for solving equation with finite element method?

    - by DSblizzard
    I don't know is this question more related to mathematics or programming and I'm absolute newbie in Matlab. Program FEM_50 applies the finite element method to Laplace's equation -Uxx(x, y) - Uyy(x, y) = F(x, y) in Omega. How to change it to apply FEM to equation -Uxx(x, y) - Uyy(x, y) + U(x, y) = F(x, y)? At this page: http://sc.fsu.edu/~burkardt/m_src/fem_50/fem_50.html additional code files in case you need them. function fem_50 ( ) %% FEM_50 applies the finite element method to Laplace's equation. % % Discussion: % % FEM_50 is a set of MATLAB routines to apply the finite % element method to solving Laplace's equation in an arbitrary % region, using about 50 lines of MATLAB code. % % FEM_50 is partly a demonstration, to show how little it % takes to implement the finite element method (at least using % every possible MATLAB shortcut.) The user supplies datafiles % that specify the geometry of the region and its arrangement % into triangular and quadrilateral elements, and the location % and type of the boundary conditions, which can be any mixture % of Neumann and Dirichlet. % % The unknown state variable U(x,y) is assumed to satisfy % Laplace's equation: % -Uxx(x,y) - Uyy(x,y) = F(x,y) in Omega % with Dirichlet boundary conditions % U(x,y) = U_D(x,y) on Gamma_D % and Neumann boundary conditions on the outward normal derivative: % Un(x,y) = G(x,y) on Gamma_N % If Gamma designates the boundary of the region Omega, % then we presume that % Gamma = Gamma_D + Gamma_N % but the user is free to determine which boundary conditions to % apply. Note, however, that the problem will generally be singular % unless at least one Dirichlet boundary condition is specified. % % The code uses piecewise linear basis functions for triangular elements, % and piecewise isoparametric bilinear basis functions for quadrilateral % elements. % % The user is required to supply a number of data files and MATLAB % functions that specify the location of nodes, the grouping of nodes % into elements, the location and value of boundary conditions, and % the right hand side function in Laplace's equation. Note that the % fact that the geometry is completely up to the user means that % just about any two dimensional region can be handled, with arbitrary % shape, including holes and islands. % clear % % Read the nodal coordinate data file. % load coordinates.dat; % % Read the triangular element data file. % load elements3.dat; % % Read the quadrilateral element data file. % load elements4.dat; % % Read the Neumann boundary condition data file. % I THINK the purpose of the EVAL command is to create an empty NEUMANN array % if no Neumann file is found. % eval ( 'load neumann.dat;', 'neumann=[];' ); % % Read the Dirichlet boundary condition data file. % load dirichlet.dat; A = sparse ( size(coordinates,1), size(coordinates,1) ); b = sparse ( size(coordinates,1), 1 ); % % Assembly. % for j = 1 : size(elements3,1) A(elements3(j,:),elements3(j,:)) = A(elements3(j,:),elements3(j,:)) ... + stima3(coordinates(elements3(j,:),:)); end for j = 1 : size(elements4,1) A(elements4(j,:),elements4(j,:)) = A(elements4(j,:),elements4(j,:)) ... + stima4(coordinates(elements4(j,:),:)); end % % Volume Forces. % for j = 1 : size(elements3,1) b(elements3(j,:)) = b(elements3(j,:)) ... + det( [1,1,1; coordinates(elements3(j,:),:)'] ) * ... f(sum(coordinates(elements3(j,:),:))/3)/6; end for j = 1 : size(elements4,1) b(elements4(j,:)) = b(elements4(j,:)) ... + det([1,1,1; coordinates(elements4(j,1:3),:)'] ) * ... f(sum(coordinates(elements4(j,:),:))/4)/4; end % % Neumann conditions. % if ( ~isempty(neumann) ) for j = 1 : size(neumann,1) b(neumann(j,:)) = b(neumann(j,:)) + ... norm(coordinates(neumann(j,1),:) - coordinates(neumann(j,2),:)) * ... g(sum(coordinates(neumann(j,:),:))/2)/2; end end % % Determine which nodes are associated with Dirichlet conditions. % Assign the corresponding entries of U, and adjust the right hand side. % u = sparse ( size(coordinates,1), 1 ); BoundNodes = unique ( dirichlet ); u(BoundNodes) = u_d ( coordinates(BoundNodes,:) ); b = b - A * u; % % Compute the solution by solving A * U = B for the remaining unknown values of U. % FreeNodes = setdiff ( 1:size(coordinates,1), BoundNodes ); u(FreeNodes) = A(FreeNodes,FreeNodes) \ b(FreeNodes); % % Graphic representation. % show ( elements3, elements4, coordinates, full ( u ) ); return end

    Read the article

  • WP: AesManaged encryption vs. mcrypt_encrypt

    - by invalidusername
    I'm trying to synchronize my encryption and decryption methods between C# and PHP but something seems to be going wrong. In the Windows Phone 7 SDK you can use AESManaged to encrypt your data I use the following method: public static string EncryptA(string dataToEncrypt, string password, string salt) { AesManaged aes = null; MemoryStream memoryStream = null; CryptoStream cryptoStream = null; try { //Generate a Key based on a Password, Salt and HMACSHA1 pseudo-random number generator Rfc2898DeriveBytes rfc2898 = new Rfc2898DeriveBytes(password, Encoding.UTF8.GetBytes(salt)); //Create AES algorithm with 256 bit key and 128-bit block size aes = new AesManaged(); aes.Key = rfc2898.GetBytes(aes.KeySize / 8); aes.IV = new byte[] { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; // rfc2898.GetBytes(aes.BlockSize / 8); // to check my results against those of PHP var blaat1 = Convert.ToBase64String(aes.Key); var blaat2 = Convert.ToBase64String(aes.IV); //Create Memory and Crypto Streams memoryStream = new MemoryStream(); cryptoStream = new CryptoStream(memoryStream, aes.CreateEncryptor(), CryptoStreamMode.Write); //Encrypt Data byte[] data = Encoding.Unicode.GetBytes(dataToEncrypt); cryptoStream.Write(data, 0, data.Length); cryptoStream.FlushFinalBlock(); //Return Base 64 String string result = Convert.ToBase64String(memoryStream.ToArray()); return result; } finally { if (cryptoStream != null) cryptoStream.Close(); if (memoryStream != null) memoryStream.Close(); if (aes != null) aes.Clear(); } } I solved the problem of generating the Key. The Key and IV are similar as those on the PHP end. But then the final step in the encryption is going wrong. here is my PHP code <?php function pbkdf2($p, $s, $c, $dk_len, $algo = 'sha1') { // experimentally determine h_len for the algorithm in question static $lengths; if (!isset($lengths[$algo])) { $lengths[$algo] = strlen(hash($algo, null, true)); } $h_len = $lengths[$algo]; if ($dk_len > (pow(2, 32) - 1) * $h_len) { return false; // derived key is too long } else { $l = ceil($dk_len / $h_len); // number of derived key blocks to compute $t = null; for ($i = 1; $i <= $l; $i++) { $f = $u = hash_hmac($algo, $s . pack('N', $i), $p, true); // first iterate for ($j = 1; $j < $c; $j++) { $f ^= ($u = hash_hmac($algo, $u, $p, true)); // xor each iterate } $t .= $f; // concatenate blocks of the derived key } return substr($t, 0, $dk_len); // return the derived key of correct length } } $password = 'test'; $salt = 'saltsalt'; $text = "texttoencrypt"; #$iv_size = mcrypt_get_iv_size(MCRYPT_RIJNDAEL_128, MCRYPT_MODE_CBC); #echo $iv_size . '<br/>'; #$iv = mcrypt_create_iv($iv_size, MCRYPT_RAND); #print_r (mcrypt_list_algorithms()); $iv = "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"; $key = pbkdf2($password, $salt, 1000, 32); echo 'key: ' . base64_encode($key) . '<br/>'; echo 'iv: ' . base64_encode($iv) . '<br/>'; echo '<br/><br/>'; function addpadding($string, $blocksize = 32){ $len = strlen($string); $pad = $blocksize - ($len % $blocksize); $string .= str_repeat(chr($pad), $pad); return $string; } echo 'text: ' . $text . '<br/>'; echo 'text: ' . addpadding($text) . '<br/>'; // -- works till here $crypttext = mcrypt_encrypt(MCRYPT_RIJNDAEL_256, $key, $text, MCRYPT_MODE_CBC, $iv); echo '1.' . $crypttext . '<br/>'; $crypttext = base64_encode($crypttext); echo '2.' . $crypttext . '<br/>'; $crypttext = mcrypt_encrypt(MCRYPT_RIJNDAEL_256, $key, addpadding($text), MCRYPT_MODE_CBC, $iv); echo '1.' . $crypttext . '<br/>'; $crypttext = base64_encode($crypttext); echo '2.' . $crypttext . '<br/>'; ?> So to point out, the Key and IV look similar on both .NET and PHP, but something seems to be going wrong in the final call when executing mcrypt_encrypt(). The end result, the encrypted string, differs from .NET. Can anybody tell me what i'm doing wrong. As far as i can see everything should be correct. Thank you! EDIT: Additional information on the AESManaged object in .NET Keysize = 256 Mode = CBC Padding = PKCS7

    Read the article

  • What statistics can be maintained for a set of numerical data without iterating?

    - by Dan Tao
    Update Just for future reference, I'm going to list all of the statistics that I'm aware of that can be maintained in a rolling collection, recalculated as an O(1) operation on every addition/removal (this is really how I should've worded the question from the beginning): Obvious Count Sum Mean Max* Min* Median** Less Obvious Variance Standard Deviation Skewness Kurtosis Mode*** Weighted Average Weighted Moving Average**** OK, so to put it more accurately: these are not "all" of the statistics I'm aware of. They're just the ones that I can remember off the top of my head right now. *Can be recalculated in O(1) for additions only, or for additions and removals if the collection is sorted (but in this case, insertion is not O(1)). Removals potentially incur an O(n) recalculation for non-sorted collections. **Recalculated in O(1) for a sorted, indexed collection only. ***Requires a fairly complex data structure to recalculate in O(1). ****This can certainly be achieved in O(1) for additions and removals when the weights are assigned in a linearly descending fashion. In other scenarios, I'm not sure. Original Question Say I maintain a collection of numerical data -- let's say, just a bunch of numbers. For this data, there are loads of calculated values that might be of interest; one example would be the sum. To get the sum of all this data, I could... Option 1: Iterate through the collection, adding all the values: double sum = 0.0; for (int i = 0; i < values.Count; i++) sum += values[i]; Option 2: Maintain the sum, eliminating the need to ever iterate over the collection just to find the sum: void Add(double value) { values.Add(value); sum += value; } void Remove(double value) { values.Remove(value); sum -= value; } EDIT: To put this question in more relatable terms, let's compare the two options above to a (sort of) real-world situation: Suppose I start listing numbers out loud and ask you to keep them in your head. I start by saying, "11, 16, 13, 12." If you've just been remembering the numbers themselves and nothing more, and then I say, "What's the sum?", you'd have to think to yourself, "OK, what's 11 + 16 + 13 + 12?" before responding, "52." If, on the other hand, you had been keeping track of the sum yourself while I was listing the numbers (i.e., when I said, "11" you thought "11", when I said "16", you thought, "27," and so on), you could answer "52" right away. Then if I say, "OK, now forget the number 16," if you've been keeping track of the sum inside your head you can simply take 16 away from 52 and know that the new sum is 36, rather than taking 16 off the list and them summing up 11 + 13 + 12. So my question is, what other calculations, other than the obvious ones like sum and average, are like this? SECOND EDIT: As an arbitrary example of a statistic that (I'm almost certain) does require iteration -- and therefore cannot be maintained as simply as a sum or average -- consider if I asked you, "how many numbers in this collection are divisible by the min?" Let's say the numbers are 5, 15, 19, 20, 21, 25, and 30. The min of this set is 5, which divides into 5, 15, 20, 25, and 30 (but not 19 or 21), so the answer is 5. Now if I remove 5 from the collection and ask the same question, the answer is now 2, since only 15 and 30 are divisible by the new min of 15; but, as far as I can tell, you cannot know this without going through the collection again. So I think this gets to the heart of my question: if we can divide kinds of statistics into these categories, those that are maintainable (my own term, maybe there's a more official one somewhere) versus those that require iteration to compute any time a collection is changed, what are all the maintainable ones? What I am asking about is not strictly the same as an online algorithm (though I sincerely thank those of you who introduced me to that concept). An online algorithm can begin its work without having even seen all of the input data; the maintainable statistics I am seeking will certainly have seen all the data, they just don't need to reiterate through it over and over again whenever it changes.

    Read the article

  • C# Memoization of functions with arbitrary number of arguments

    - by Lirik
    I'm trying to create a memoization interface for functions with arbitrary number of arguments, but I'm failing miserably. The first thing I tried is to define an interface for a function which gets memoized automatically upon execution: class EMAFunction:IFunction { Dictionary<List<object>, List<object>> map; class EMAComparer : IEqualityComparer<List<object>> { private int _multiplier = 97; public bool Equals(List<object> a, List<object> b) { List<object> aVals = (List<object>)a[0]; int aPeriod = (int)a[1]; List<object> bVals = (List<object>)b[0]; int bPeriod = (int)b[1]; return (aVals.Count == bVals.Count) && (aPeriod == bPeriod); } public int GetHashCode(List<object> obj) { // Don't compute hash code on null object. if (obj == null) { return 0; } // Get length. int length = obj.Count; List<object> vals = (List<object>) obj[0]; int period = (int) obj[1]; return (_multiplier * vals.GetHashCode() * period.GetHashCode()) + length;; } } public EMAFunction() { NumParams = 2; Name = "EMA"; map = new Dictionary<List<object>, List<object>>(new EMAComparer()); } #region IFunction Members public int NumParams { get; set; } public string Name { get; set; } public object Execute(List<object> parameters) { if (parameters.Count != NumParams) throw new ArgumentException("The num params doesn't match!"); if (!map.ContainsKey(parameters)) { //map.Add(parameters, List<double> values = new List<double>(); List<object> asObj = (List<object>)parameters[0]; foreach (object val in asObj) { values.Add((double)val); } int period = (int)parameters[1]; asObj.Clear(); List<double> ema = TechFunctions.ExponentialMovingAverage(values, period); foreach (double val in ema) { asObj.Add(val); } map.Add(parameters, asObj); } return map[parameters]; } public void ClearMap() { map.Clear(); } #endregion } Here are my tests of the function: private void MemoizeTest() { DataSet dataSet = DataLoader.LoadData(DataLoader.DataSource.FROM_WEB, 1024); List<String> labels = dataSet.DataLabels; Stopwatch sw = new Stopwatch(); IFunction emaFunc = new EMAFunction(); List<object> parameters = new List<object>(); int numRuns = 1000; long sumTicks = 0; parameters.Add(dataSet.GetValues("open")); parameters.Add(12); // First call for(int i = 0; i < numRuns; ++i) { emaFunc.ClearMap();// remove any memoization mappings sw.Start(); emaFunc.Execute(parameters); sw.Stop(); sumTicks += sw.ElapsedTicks; } Console.WriteLine("Average ticks not-memoized " + (sumTicks/numRuns)); sumTicks = 0; // Repeat call for (int i = 0; i < numRuns; ++i) { sw.Start(); emaFunc.Execute(parameters); sw.Stop(); sumTicks += sw.ElapsedTicks; } Console.WriteLine("Average ticks memoized " + (sumTicks/numRuns)); } The performance is confusing me... I expected the memoized function to be faster, but it didn't work out that way: Average ticks not-memoized 106,182 Average ticks memoized 198,854 I tried doubling the data instances to 2048, but the results were about the same: Average ticks not-memoized 232,579 Average ticks memoized 446,280 I did notice that it was correctly finding the parameters in the map and it going directly to the map, but the performance was still slow... I'm either open for troubleshooting help with this example, or if you have a better solution to the problem then please let me know what it is.

    Read the article

  • How can I further optimize this color difference function?

    - by aLfa
    I have made this function to calculate color differences in the CIE Lab colorspace, but it lacks speed. Since I'm not a Java expert, I wonder if any Java guru around has some tips that can improve the speed here. The code is based on the matlab function mentioned in the comment block. /** * Compute the CIEDE2000 color-difference between the sample color with * CIELab coordinates 'sample' and a standard color with CIELab coordinates * 'std' * * Based on the article: * "The CIEDE2000 Color-Difference Formula: Implementation Notes, * Supplementary Test Data, and Mathematical Observations,", G. Sharma, * W. Wu, E. N. Dalal, submitted to Color Research and Application, * January 2004. * available at http://www.ece.rochester.edu/~gsharma/ciede2000/ */ public static double deltaE2000(double[] lab1, double[] lab2) { double L1 = lab1[0]; double a1 = lab1[1]; double b1 = lab1[2]; double L2 = lab2[0]; double a2 = lab2[1]; double b2 = lab2[2]; // Cab = sqrt(a^2 + b^2) double Cab1 = Math.sqrt(a1 * a1 + b1 * b1); double Cab2 = Math.sqrt(a2 * a2 + b2 * b2); // CabAvg = (Cab1 + Cab2) / 2 double CabAvg = (Cab1 + Cab2) / 2; // G = 1 + (1 - sqrt((CabAvg^7) / (CabAvg^7 + 25^7))) / 2 double CabAvg7 = Math.pow(CabAvg, 7); double G = 1 + (1 - Math.sqrt(CabAvg7 / (CabAvg7 + 6103515625.0))) / 2; // ap = G * a double ap1 = G * a1; double ap2 = G * a2; // Cp = sqrt(ap^2 + b^2) double Cp1 = Math.sqrt(ap1 * ap1 + b1 * b1); double Cp2 = Math.sqrt(ap2 * ap2 + b2 * b2); // CpProd = (Cp1 * Cp2) double CpProd = Cp1 * Cp2; // hp1 = atan2(b1, ap1) double hp1 = Math.atan2(b1, ap1); // ensure hue is between 0 and 2pi if (hp1 < 0) { // hp1 = hp1 + 2pi hp1 += 6.283185307179586476925286766559; } // hp2 = atan2(b2, ap2) double hp2 = Math.atan2(b2, ap2); // ensure hue is between 0 and 2pi if (hp2 < 0) { // hp2 = hp2 + 2pi hp2 += 6.283185307179586476925286766559; } // dL = L2 - L1 double dL = L2 - L1; // dC = Cp2 - Cp1 double dC = Cp2 - Cp1; // computation of hue difference double dhp = 0.0; // set hue difference to zero if the product of chromas is zero if (CpProd != 0) { // dhp = hp2 - hp1 dhp = hp2 - hp1; if (dhp > Math.PI) { // dhp = dhp - 2pi dhp -= 6.283185307179586476925286766559; } else if (dhp < -Math.PI) { // dhp = dhp + 2pi dhp += 6.283185307179586476925286766559; } } // dH = 2 * sqrt(CpProd) * sin(dhp / 2) double dH = 2 * Math.sqrt(CpProd) * Math.sin(dhp / 2); // weighting functions // Lp = (L1 + L2) / 2 - 50 double Lp = (L1 + L2) / 2 - 50; // Cp = (Cp1 + Cp2) / 2 double Cp = (Cp1 + Cp2) / 2; // average hue computation // hp = (hp1 + hp2) / 2 double hp = (hp1 + hp2) / 2; // identify positions for which abs hue diff exceeds 180 degrees if (Math.abs(hp1 - hp2) > Math.PI) { // hp = hp - pi hp -= Math.PI; } // ensure hue is between 0 and 2pi if (hp < 0) { // hp = hp + 2pi hp += 6.283185307179586476925286766559; } // LpSqr = Lp^2 double LpSqr = Lp * Lp; // Sl = 1 + 0.015 * LpSqr / sqrt(20 + LpSqr) double Sl = 1 + 0.015 * LpSqr / Math.sqrt(20 + LpSqr); // Sc = 1 + 0.045 * Cp double Sc = 1 + 0.045 * Cp; // T = 1 - 0.17 * cos(hp - pi / 6) + // + 0.24 * cos(2 * hp) + // + 0.32 * cos(3 * hp + pi / 30) - // - 0.20 * cos(4 * hp - 63 * pi / 180) double hphp = hp + hp; double T = 1 - 0.17 * Math.cos(hp - 0.52359877559829887307710723054658) + 0.24 * Math.cos(hphp) + 0.32 * Math.cos(hphp + hp + 0.10471975511965977461542144610932) - 0.20 * Math.cos(hphp + hphp - 1.0995574287564276334619251841478); // Sh = 1 + 0.015 * Cp * T double Sh = 1 + 0.015 * Cp * T; // deltaThetaRad = (pi / 3) * e^-(36 / (5 * pi) * hp - 11)^2 double powerBase = hp - 4.799655442984406; double deltaThetaRad = 1.0471975511965977461542144610932 * Math.exp(-5.25249016001879 * powerBase * powerBase); // Rc = 2 * sqrt((Cp^7) / (Cp^7 + 25^7)) double Cp7 = Math.pow(Cp, 7); double Rc = 2 * Math.sqrt(Cp7 / (Cp7 + 6103515625.0)); // RT = -sin(delthetarad) * Rc double RT = -Math.sin(deltaThetaRad) * Rc; // de00 = sqrt((dL / Sl)^2 + (dC / Sc)^2 + (dH / Sh)^2 + RT * (dC / Sc) * (dH / Sh)) double dLSl = dL / Sl; double dCSc = dC / Sc; double dHSh = dH / Sh; return Math.sqrt(dLSl * dLSl + dCSc * dCSc + dHSh * dHSh + RT * dCSc * dHSh); }

    Read the article

  • Trouble with copying dictionaries and using deepcopy on an SQLAlchemy ORM object

    - by Az
    Hi there, I'm doing a Simulated Annealing algorithm to optimise a given allocation of students and projects. This is language-agnostic pseudocode from Wikipedia: s ? s0; e ? E(s) // Initial state, energy. sbest ? s; ebest ? e // Initial "best" solution k ? 0 // Energy evaluation count. while k < kmax and e > emax // While time left & not good enough: snew ? neighbour(s) // Pick some neighbour. enew ? E(snew) // Compute its energy. if enew < ebest then // Is this a new best? sbest ? snew; ebest ? enew // Save 'new neighbour' to 'best found'. if P(e, enew, temp(k/kmax)) > random() then // Should we move to it? s ? snew; e ? enew // Yes, change state. k ? k + 1 // One more evaluation done return sbest // Return the best solution found. The following is an adaptation of the technique. My supervisor said the idea is fine in theory. First I pick up some allocation (i.e. an entire dictionary of students and their allocated projects, including the ranks for the projects) from entire set of randomised allocations, copy it and pass it to my function. Let's call this allocation aOld (it is a dictionary). aOld has a weight related to it called wOld. The weighting is described below. The function does the following: Let this allocation, aOld be the best_node From all the students, pick a random number of students and stick in a list Strip (DEALLOCATE) them of their projects ++ reflect the changes for projects (allocated parameter is now False) and lecturers (free up slots if one or more of their projects are no longer allocated) Randomise that list Try assigning (REALLOCATE) everyone in that list projects again Calculate the weight (add up ranks, rank 1 = 1, rank 2 = 2... and no project rank = 101) For this new allocation aNew, if the weight wNew is smaller than the allocation weight wOld I picked up at the beginning, then this is the best_node (as defined by the Simulated Annealing algorithm above). Apply the algorithm to aNew and continue. If wOld < wNew, then apply the algorithm to aOld again and continue. The allocations/data-points are expressed as "nodes" such that a node = (weight, allocation_dict, projects_dict, lecturers_dict) Right now, I can only perform this algorithm once, but I'll need to try for a number N (denoted by kmax in the Wikipedia snippet) and make sure I always have with me, the previous node and the best_node. So that I don't modify my original dictionaries (which I might want to reset to), I've done a shallow copy of the dictionaries. From what I've read in the docs, it seems that it only copies the references and since my dictionaries contain objects, changing the copied dictionary ends up changing the objects anyway. So I tried to use copy.deepcopy().These dictionaries refer to objects that have been mapped with SQLA. Questions: I've been given some solutions to the problems faced but due to my über green-ness with using Python, they all sound rather cryptic to me. Deepcopy isn't playing nicely with SQLA. I've been told thatdeepcopy on ORM objects probably has issues that prevent it from working as you'd expect. Apparently I'd be better off "building copy constructors, i.e. def copy(self): return FooBar(....)." Can someone please explain what that means? I checked and found out that deepcopy has issues because SQLAlchemy places extra information on your objects, i.e. an _sa_instance_state attribute, that I wouldn't want in the copy but is necessary for the object to have. I've been told: "There are ways to manually blow away the old _sa_instance_state and put a new one on the object, but the most straightforward is to make a new object with __init__() and set up the attributes that are significant, instead of doing a full deep copy." What exactly does that mean? Do I create a new, unmapped class similar to the old, mapped one? An alternate solution is that I'd have to "implement __deepcopy__() on your objects and ensure that a new _sa_instance_state is set up, there are functions in sqlalchemy.orm.attributes which can help with that." Once again this is beyond me so could someone kindly explain what it means? A more general question: given the above information are there any suggestions on how I can maintain the information/state for the best_node (which must always persist through my while loop) and the previous_node, if my actual objects (referenced by the dictionaries, therefore the nodes) are changing due to the deallocation/reallocation taking place? That is, without using copy?

    Read the article

  • Improving Partitioned Table Join Performance

    - by Paul White
    The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x,   CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) );   CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x,   CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY);   WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME();   SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID;   SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50);   -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE);   -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory:   Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME();   CREATE TABLE #P ( partition_number integer PRIMARY KEY);   INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1;   SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals;   DROP TABLE #P;   SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated  – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

    Read the article

  • A Taxonomy of Numerical Methods v1

    - by JoshReuben
    Numerical Analysis – When, What, (but not how) Once you understand the Math & know C++, Numerical Methods are basically blocks of iterative & conditional math code. I found the real trick was seeing the forest for the trees – knowing which method to use for which situation. Its pretty easy to get lost in the details – so I’ve tried to organize these methods in a way that I can quickly look this up. I’ve included links to detailed explanations and to C++ code examples. I’ve tried to classify Numerical methods in the following broad categories: Solving Systems of Linear Equations Solving Non-Linear Equations Iteratively Interpolation Curve Fitting Optimization Numerical Differentiation & Integration Solving ODEs Boundary Problems Solving EigenValue problems Enjoy – I did ! Solving Systems of Linear Equations Overview Solve sets of algebraic equations with x unknowns The set is commonly in matrix form Gauss-Jordan Elimination http://en.wikipedia.org/wiki/Gauss%E2%80%93Jordan_elimination C++: http://www.codekeep.net/snippets/623f1923-e03c-4636-8c92-c9dc7aa0d3c0.aspx Produces solution of the equations & the coefficient matrix Efficient, stable 2 steps: · Forward Elimination – matrix decomposition: reduce set to triangular form (0s below the diagonal) or row echelon form. If degenerate, then there is no solution · Backward Elimination –write the original matrix as the product of ints inverse matrix & its reduced row-echelon matrix à reduce set to row canonical form & use back-substitution to find the solution to the set Elementary ops for matrix decomposition: · Row multiplication · Row switching · Add multiples of rows to other rows Use pivoting to ensure rows are ordered for achieving triangular form LU Decomposition http://en.wikipedia.org/wiki/LU_decomposition C++: http://ganeshtiwaridotcomdotnp.blogspot.co.il/2009/12/c-c-code-lu-decomposition-for-solving.html Represent the matrix as a product of lower & upper triangular matrices A modified version of GJ Elimination Advantage – can easily apply forward & backward elimination to solve triangular matrices Techniques: · Doolittle Method – sets the L matrix diagonal to unity · Crout Method - sets the U matrix diagonal to unity Note: both the L & U matrices share the same unity diagonal & can be stored compactly in the same matrix Gauss-Seidel Iteration http://en.wikipedia.org/wiki/Gauss%E2%80%93Seidel_method C++: http://www.nr.com/forum/showthread.php?t=722 Transform the linear set of equations into a single equation & then use numerical integration (as integration formulas have Sums, it is implemented iteratively). an optimization of Gauss-Jacobi: 1.5 times faster, requires 0.25 iterations to achieve the same tolerance Solving Non-Linear Equations Iteratively find roots of polynomials – there may be 0, 1 or n solutions for an n order polynomial use iterative techniques Iterative methods · used when there are no known analytical techniques · Requires set functions to be continuous & differentiable · Requires an initial seed value – choice is critical to convergence à conduct multiple runs with different starting points & then select best result · Systematic - iterate until diminishing returns, tolerance or max iteration conditions are met · bracketing techniques will always yield convergent solutions, non-bracketing methods may fail to converge Incremental method if a nonlinear function has opposite signs at 2 ends of a small interval x1 & x2, then there is likely to be a solution in their interval – solutions are detected by evaluating a function over interval steps, for a change in sign, adjusting the step size dynamically. Limitations – can miss closely spaced solutions in large intervals, cannot detect degenerate (coinciding) solutions, limited to functions that cross the x-axis, gives false positives for singularities Fixed point method http://en.wikipedia.org/wiki/Fixed-point_iteration C++: http://books.google.co.il/books?id=weYj75E_t6MC&pg=PA79&lpg=PA79&dq=fixed+point+method++c%2B%2B&source=bl&ots=LQ-5P_taoC&sig=lENUUIYBK53tZtTwNfHLy5PEWDk&hl=en&sa=X&ei=wezDUPW1J5DptQaMsIHQCw&redir_esc=y#v=onepage&q=fixed%20point%20method%20%20c%2B%2B&f=false Algebraically rearrange a solution to isolate a variable then apply incremental method Bisection method http://en.wikipedia.org/wiki/Bisection_method C++: http://numericalcomputing.wordpress.com/category/algorithms/ Bracketed - Select an initial interval, keep bisecting it ad midpoint into sub-intervals and then apply incremental method on smaller & smaller intervals – zoom in Adv: unaffected by function gradient à reliable Disadv: slow convergence False Position Method http://en.wikipedia.org/wiki/False_position_method C++: http://www.dreamincode.net/forums/topic/126100-bisection-and-false-position-methods/ Bracketed - Select an initial interval , & use the relative value of function at interval end points to select next sub-intervals (estimate how far between the end points the solution might be & subdivide based on this) Newton-Raphson method http://en.wikipedia.org/wiki/Newton's_method C++: http://www-users.cselabs.umn.edu/classes/Summer-2012/csci1113/index.php?page=./newt3 Also known as Newton's method Convenient, efficient Not bracketed – only a single initial guess is required to start iteration – requires an analytical expression for the first derivative of the function as input. Evaluates the function & its derivative at each step. Can be extended to the Newton MutiRoot method for solving multiple roots Can be easily applied to an of n-coupled set of non-linear equations – conduct a Taylor Series expansion of a function, dropping terms of order n, rewrite as a Jacobian matrix of PDs & convert to simultaneous linear equations !!! Secant Method http://en.wikipedia.org/wiki/Secant_method C++: http://forum.vcoderz.com/showthread.php?p=205230 Unlike N-R, can estimate first derivative from an initial interval (does not require root to be bracketed) instead of inputting it Since derivative is approximated, may converge slower. Is fast in practice as it does not have to evaluate the derivative at each step. Similar implementation to False Positive method Birge-Vieta Method http://mat.iitm.ac.in/home/sryedida/public_html/caimna/transcendental/polynomial%20methods/bv%20method.html C++: http://books.google.co.il/books?id=cL1boM2uyQwC&pg=SA3-PA51&lpg=SA3-PA51&dq=Birge-Vieta+Method+c%2B%2B&source=bl&ots=QZmnDTK3rC&sig=BPNcHHbpR_DKVoZXrLi4nVXD-gg&hl=en&sa=X&ei=R-_DUK2iNIjzsgbE5ID4Dg&redir_esc=y#v=onepage&q=Birge-Vieta%20Method%20c%2B%2B&f=false combines Horner's method of polynomial evaluation (transforming into lesser degree polynomials that are more computationally efficient to process) with Newton-Raphson to provide a computational speed-up Interpolation Overview Construct new data points for as close as possible fit within range of a discrete set of known points (that were obtained via sampling, experimentation) Use Taylor Series Expansion of a function f(x) around a specific value for x Linear Interpolation http://en.wikipedia.org/wiki/Linear_interpolation C++: http://www.hamaluik.com/?p=289 Straight line between 2 points à concatenate interpolants between each pair of data points Bilinear Interpolation http://en.wikipedia.org/wiki/Bilinear_interpolation C++: http://supercomputingblog.com/graphics/coding-bilinear-interpolation/2/ Extension of the linear function for interpolating functions of 2 variables – perform linear interpolation first in 1 direction, then in another. Used in image processing – e.g. texture mapping filter. Uses 4 vertices to interpolate a value within a unit cell. Lagrange Interpolation http://en.wikipedia.org/wiki/Lagrange_polynomial C++: http://www.codecogs.com/code/maths/approximation/interpolation/lagrange.php For polynomials Requires recomputation for all terms for each distinct x value – can only be applied for small number of nodes Numerically unstable Barycentric Interpolation http://epubs.siam.org/doi/pdf/10.1137/S0036144502417715 C++: http://www.gamedev.net/topic/621445-barycentric-coordinates-c-code-check/ Rearrange the terms in the equation of the Legrange interpolation by defining weight functions that are independent of the interpolated value of x Newton Divided Difference Interpolation http://en.wikipedia.org/wiki/Newton_polynomial C++: http://jee-appy.blogspot.co.il/2011/12/newton-divided-difference-interpolation.html Hermite Divided Differences: Interpolation polynomial approximation for a given set of data points in the NR form - divided differences are used to approximately calculate the various differences. For a given set of 3 data points , fit a quadratic interpolant through the data Bracketed functions allow Newton divided differences to be calculated recursively Difference table Cubic Spline Interpolation http://en.wikipedia.org/wiki/Spline_interpolation C++: https://www.marcusbannerman.co.uk/index.php/home/latestarticles/42-articles/96-cubic-spline-class.html Spline is a piecewise polynomial Provides smoothness – for interpolations with significantly varying data Use weighted coefficients to bend the function to be smooth & its 1st & 2nd derivatives are continuous through the edge points in the interval Curve Fitting A generalization of interpolating whereby given data points may contain noise à the curve does not necessarily pass through all the points Least Squares Fit http://en.wikipedia.org/wiki/Least_squares C++: http://www.ccas.ru/mmes/educat/lab04k/02/least-squares.c Residual – difference between observed value & expected value Model function is often chosen as a linear combination of the specified functions Determines: A) The model instance in which the sum of squared residuals has the least value B) param values for which model best fits data Straight Line Fit Linear correlation between independent variable and dependent variable Linear Regression http://en.wikipedia.org/wiki/Linear_regression C++: http://www.oocities.org/david_swaim/cpp/linregc.htm Special case of statistically exact extrapolation Leverage least squares Given a basis function, the sum of the residuals is determined and the corresponding gradient equation is expressed as a set of normal linear equations in matrix form that can be solved (e.g. using LU Decomposition) Can be weighted - Drop the assumption that all errors have the same significance –-> confidence of accuracy is different for each data point. Fit the function closer to points with higher weights Polynomial Fit - use a polynomial basis function Moving Average http://en.wikipedia.org/wiki/Moving_average C++: http://www.codeproject.com/Articles/17860/A-Simple-Moving-Average-Algorithm Used for smoothing (cancel fluctuations to highlight longer-term trends & cycles), time series data analysis, signal processing filters Replace each data point with average of neighbors. Can be simple (SMA), weighted (WMA), exponential (EMA). Lags behind latest data points – extra weight can be given to more recent data points. Weights can decrease arithmetically or exponentially according to distance from point. Parameters: smoothing factor, period, weight basis Optimization Overview Given function with multiple variables, find Min (or max by minimizing –f(x)) Iterative approach Efficient, but not necessarily reliable Conditions: noisy data, constraints, non-linear models Detection via sign of first derivative - Derivative of saddle points will be 0 Local minima Bisection method Similar method for finding a root for a non-linear equation Start with an interval that contains a minimum Golden Search method http://en.wikipedia.org/wiki/Golden_section_search C++: http://www.codecogs.com/code/maths/optimization/golden.php Bisect intervals according to golden ratio 0.618.. Achieves reduction by evaluating a single function instead of 2 Newton-Raphson Method Brent method http://en.wikipedia.org/wiki/Brent's_method C++: http://people.sc.fsu.edu/~jburkardt/cpp_src/brent/brent.cpp Based on quadratic or parabolic interpolation – if the function is smooth & parabolic near to the minimum, then a parabola fitted through any 3 points should approximate the minima – fails when the 3 points are collinear , in which case the denominator is 0 Simplex Method http://en.wikipedia.org/wiki/Simplex_algorithm C++: http://www.codeguru.com/cpp/article.php/c17505/Simplex-Optimization-Algorithm-and-Implemetation-in-C-Programming.htm Find the global minima of any multi-variable function Direct search – no derivatives required At each step it maintains a non-degenerative simplex – a convex hull of n+1 vertices. Obtains the minimum for a function with n variables by evaluating the function at n-1 points, iteratively replacing the point of worst result with the point of best result, shrinking the multidimensional simplex around the best point. Point replacement involves expanding & contracting the simplex near the worst value point to determine a better replacement point Oscillation can be avoided by choosing the 2nd worst result Restart if it gets stuck Parameters: contraction & expansion factors Simulated Annealing http://en.wikipedia.org/wiki/Simulated_annealing C++: http://code.google.com/p/cppsimulatedannealing/ Analogy to heating & cooling metal to strengthen its structure Stochastic method – apply random permutation search for global minima - Avoid entrapment in local minima via hill climbing Heating schedule - Annealing schedule params: temperature, iterations at each temp, temperature delta Cooling schedule – can be linear, step-wise or exponential Differential Evolution http://en.wikipedia.org/wiki/Differential_evolution C++: http://www.amichel.com/de/doc/html/ More advanced stochastic methods analogous to biological processes: Genetic algorithms, evolution strategies Parallel direct search method against multiple discrete or continuous variables Initial population of variable vectors chosen randomly – if weighted difference vector of 2 vectors yields a lower objective function value then it replaces the comparison vector Many params: #parents, #variables, step size, crossover constant etc Convergence is slow – many more function evaluations than simulated annealing Numerical Differentiation Overview 2 approaches to finite difference methods: · A) approximate function via polynomial interpolation then differentiate · B) Taylor series approximation – additionally provides error estimate Finite Difference methods http://en.wikipedia.org/wiki/Finite_difference_method C++: http://www.wpi.edu/Pubs/ETD/Available/etd-051807-164436/unrestricted/EAMPADU.pdf Find differences between high order derivative values - Approximate differential equations by finite differences at evenly spaced data points Based on forward & backward Taylor series expansion of f(x) about x plus or minus multiples of delta h. Forward / backward difference - the sums of the series contains even derivatives and the difference of the series contains odd derivatives – coupled equations that can be solved. Provide an approximation of the derivative within a O(h^2) accuracy There is also central difference & extended central difference which has a O(h^4) accuracy Richardson Extrapolation http://en.wikipedia.org/wiki/Richardson_extrapolation C++: http://mathscoding.blogspot.co.il/2012/02/introduction-richardson-extrapolation.html A sequence acceleration method applied to finite differences Fast convergence, high accuracy O(h^4) Derivatives via Interpolation Cannot apply Finite Difference method to discrete data points at uneven intervals – so need to approximate the derivative of f(x) using the derivative of the interpolant via 3 point Lagrange Interpolation Note: the higher the order of the derivative, the lower the approximation precision Numerical Integration Estimate finite & infinite integrals of functions More accurate procedure than numerical differentiation Use when it is not possible to obtain an integral of a function analytically or when the function is not given, only the data points are Newton Cotes Methods http://en.wikipedia.org/wiki/Newton%E2%80%93Cotes_formulas C++: http://www.siafoo.net/snippet/324 For equally spaced data points Computationally easy – based on local interpolation of n rectangular strip areas that is piecewise fitted to a polynomial to get the sum total area Evaluate the integrand at n+1 evenly spaced points – approximate definite integral by Sum Weights are derived from Lagrange Basis polynomials Leverage Trapezoidal Rule for default 2nd formulas, Simpson 1/3 Rule for substituting 3 point formulas, Simpson 3/8 Rule for 4 point formulas. For 4 point formulas use Bodes Rule. Higher orders obtain more accurate results Trapezoidal Rule uses simple area, Simpsons Rule replaces the integrand f(x) with a quadratic polynomial p(x) that uses the same values as f(x) for its end points, but adds a midpoint Romberg Integration http://en.wikipedia.org/wiki/Romberg's_method C++: http://code.google.com/p/romberg-integration/downloads/detail?name=romberg.cpp&can=2&q= Combines trapezoidal rule with Richardson Extrapolation Evaluates the integrand at equally spaced points The integrand must have continuous derivatives Each R(n,m) extrapolation uses a higher order integrand polynomial replacement rule (zeroth starts with trapezoidal) à a lower triangular matrix set of equation coefficients where the bottom right term has the most accurate approximation. The process continues until the difference between 2 successive diagonal terms becomes sufficiently small. Gaussian Quadrature http://en.wikipedia.org/wiki/Gaussian_quadrature C++: http://www.alglib.net/integration/gaussianquadratures.php Data points are chosen to yield best possible accuracy – requires fewer evaluations Ability to handle singularities, functions that are difficult to evaluate The integrand can include a weighting function determined by a set of orthogonal polynomials. Points & weights are selected so that the integrand yields the exact integral if f(x) is a polynomial of degree <= 2n+1 Techniques (basically different weighting functions): · Gauss-Legendre Integration w(x)=1 · Gauss-Laguerre Integration w(x)=e^-x · Gauss-Hermite Integration w(x)=e^-x^2 · Gauss-Chebyshev Integration w(x)= 1 / Sqrt(1-x^2) Solving ODEs Use when high order differential equations cannot be solved analytically Evaluated under boundary conditions RK for systems – a high order differential equation can always be transformed into a coupled first order system of equations Euler method http://en.wikipedia.org/wiki/Euler_method C++: http://rosettacode.org/wiki/Euler_method First order Runge–Kutta method. Simple recursive method – given an initial value, calculate derivative deltas. Unstable & not very accurate (O(h) error) – not used in practice A first-order method - the local error (truncation error per step) is proportional to the square of the step size, and the global error (error at a given time) is proportional to the step size In evolving solution between data points xn & xn+1, only evaluates derivatives at beginning of interval xn à asymmetric at boundaries Higher order Runge Kutta http://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods C++: http://www.dreamincode.net/code/snippet1441.htm 2nd & 4th order RK - Introduces parameterized midpoints for more symmetric solutions à accuracy at higher computational cost Adaptive RK – RK-Fehlberg – estimate the truncation at each integration step & automatically adjust the step size to keep error within prescribed limits. At each step 2 approximations are compared – if in disagreement to a specific accuracy, the step size is reduced Boundary Value Problems Where solution of differential equations are located at 2 different values of the independent variable x à more difficult, because cannot just start at point of initial value – there may not be enough starting conditions available at the end points to produce a unique solution An n-order equation will require n boundary conditions – need to determine the missing n-1 conditions which cause the given conditions at the other boundary to be satisfied Shooting Method http://en.wikipedia.org/wiki/Shooting_method C++: http://ganeshtiwaridotcomdotnp.blogspot.co.il/2009/12/c-c-code-shooting-method-for-solving.html Iteratively guess the missing values for one end & integrate, then inspect the discrepancy with the boundary values of the other end to adjust the estimate Given the starting boundary values u1 & u2 which contain the root u, solve u given the false position method (solving the differential equation as an initial value problem via 4th order RK), then use u to solve the differential equations. Finite Difference Method For linear & non-linear systems Higher order derivatives require more computational steps – some combinations for boundary conditions may not work though Improve the accuracy by increasing the number of mesh points Solving EigenValue Problems An eigenvalue can substitute a matrix when doing matrix multiplication à convert matrix multiplication into a polynomial EigenValue For a given set of equations in matrix form, determine what are the solution eigenvalue & eigenvectors Similar Matrices - have same eigenvalues. Use orthogonal similarity transforms to reduce a matrix to diagonal form from which eigenvalue(s) & eigenvectors can be computed iteratively Jacobi method http://en.wikipedia.org/wiki/Jacobi_method C++: http://people.sc.fsu.edu/~jburkardt/classes/acs2_2008/openmp/jacobi/jacobi.html Robust but Computationally intense – use for small matrices < 10x10 Power Iteration http://en.wikipedia.org/wiki/Power_iteration For any given real symmetric matrix, generate the largest single eigenvalue & its eigenvectors Simplest method – does not compute matrix decomposition à suitable for large, sparse matrices Inverse Iteration Variation of power iteration method – generates the smallest eigenvalue from the inverse matrix Rayleigh Method http://en.wikipedia.org/wiki/Rayleigh's_method_of_dimensional_analysis Variation of power iteration method Rayleigh Quotient Method Variation of inverse iteration method Matrix Tri-diagonalization Method Use householder algorithm to reduce an NxN symmetric matrix to a tridiagonal real symmetric matrix vua N-2 orthogonal transforms     Whats Next Outside of Numerical Methods there are lots of different types of algorithms that I’ve learned over the decades: Data Mining – (I covered this briefly in a previous post: http://geekswithblogs.net/JoshReuben/archive/2007/12/31/ssas-dm-algorithms.aspx ) Search & Sort Routing Problem Solving Logical Theorem Proving Planning Probabilistic Reasoning Machine Learning Solvers (eg MIP) Bioinformatics (Sequence Alignment, Protein Folding) Quant Finance (I read Wilmott’s books – interesting) Sooner or later, I’ll cover the above topics as well.

    Read the article

  • Node.js Adventure - Host Node.js on Windows Azure Worker Role

    - by Shaun
    In my previous post I demonstrated about how to develop and deploy a Node.js application on Windows Azure Web Site (a.k.a. WAWS). WAWS is a new feature in Windows Azure platform. Since it’s low-cost, and it provides IIS and IISNode components so that we can host our Node.js application though Git, FTP and WebMatrix without any configuration and component installation. But sometimes we need to use the Windows Azure Cloud Service (a.k.a. WACS) and host our Node.js on worker role. Below are some benefits of using worker role. - WAWS leverages IIS and IISNode to host Node.js application, which runs in x86 WOW mode. It reduces the performance comparing with x64 in some cases. - WACS worker role does not need IIS, hence there’s no restriction of IIS, such as 8000 concurrent requests limitation. - WACS provides more flexibility and controls to the developers. For example, we can RDP to the virtual machines of our worker role instances. - WACS provides the service configuration features which can be changed when the role is running. - WACS provides more scaling capability than WAWS. In WAWS we can have at most 3 reserved instances per web site while in WACS we can have up to 20 instances in a subscription. - Since when using WACS worker role we starts the node by ourselves in a process, we can control the input, output and error stream. We can also control the version of Node.js.   Run Node.js in Worker Role Node.js can be started by just having its execution file. This means in Windows Azure, we can have a worker role with the “node.exe” and the Node.js source files, then start it in Run method of the worker role entry class. Let’s create a new windows azure project in Visual Studio and add a new worker role. Since we need our worker role execute the “node.exe” with our application code we need to add the “node.exe” into our project. Right click on the worker role project and add an existing item. By default the Node.js will be installed in the “Program Files\nodejs” folder so we can navigate there and add the “node.exe”. Then we need to create the entry code of Node.js. In WAWS the entry file must be named “server.js”, which is because it’s hosted by IIS and IISNode and IISNode only accept “server.js”. But here as we control everything we can choose any files as the entry code. For example, I created a new JavaScript file named “index.js” in project root. Since we created a C# Windows Azure project we cannot create a JavaScript file from the context menu “Add new item”. We have to create a text file, and then rename it to JavaScript extension. After we added these two files we should set their “Copy to Output Directory” property to “Copy Always”, or “Copy if Newer”. Otherwise they will not be involved in the package when deployed. Let’s paste a very simple Node.js code in the “index.js” as below. As you can see I created a web server listening at port 12345. 1: var http = require("http"); 2: var port = 12345; 3:  4: http.createServer(function (req, res) { 5: res.writeHead(200, { "Content-Type": "text/plain" }); 6: res.end("Hello World\n"); 7: }).listen(port); 8:  9: console.log("Server running at port %d", port); Then we need to start “node.exe” with this file when our worker role was started. This can be done in its Run method. I found the Node.js and entry JavaScript file name, and then create a new process to run it. Our worker role will wait for the process to be exited. If everything is OK once our web server was opened the process will be there listening for incoming requests, and should not be terminated. The code in worker role would be like this. 1: public override void Run() 2: { 3: // This is a sample worker implementation. Replace with your logic. 4: Trace.WriteLine("NodejsHost entry point called", "Information"); 5:  6: // retrieve the node.exe and entry node.js source code file name. 7: var node = Environment.ExpandEnvironmentVariables(@"%RoleRoot%\approot\node.exe"); 8: var js = "index.js"; 9:  10: // prepare the process starting of node.exe 11: var info = new ProcessStartInfo(node, js) 12: { 13: CreateNoWindow = false, 14: ErrorDialog = true, 15: WindowStyle = ProcessWindowStyle.Normal, 16: UseShellExecute = false, 17: WorkingDirectory = Environment.ExpandEnvironmentVariables(@"%RoleRoot%\approot") 18: }; 19: Trace.WriteLine(string.Format("{0} {1}", node, js), "Information"); 20:  21: // start the node.exe with entry code and wait for exit 22: var process = Process.Start(info); 23: process.WaitForExit(); 24: } Then we can run it locally. In the computer emulator UI the worker role started and it executed the Node.js, then Node.js windows appeared. Open the browser to verify the website hosted by our worker role. Next let’s deploy it to azure. But we need some additional steps. First, we need to create an input endpoint. By default there’s no endpoint defined in a worker role. So we will open the role property window in Visual Studio, create a new input TCP endpoint to the port we want our website to use. In this case I will use 80. Even though we created a web server we should add a TCP endpoint of the worker role, since Node.js always listen on TCP instead of HTTP. And then changed the “index.js”, let our web server listen on 80. 1: var http = require("http"); 2: var port = 80; 3:  4: http.createServer(function (req, res) { 5: res.writeHead(200, { "Content-Type": "text/plain" }); 6: res.end("Hello World\n"); 7: }).listen(port); 8:  9: console.log("Server running at port %d", port); Then publish it to Windows Azure. And then in browser we can see our Node.js website was running on WACS worker role. We may encounter an error if we tried to run our Node.js website on 80 port at local emulator. This is because the compute emulator registered 80 and map the 80 endpoint to 81. But our Node.js cannot detect this operation. So when it tried to listen on 80 it will failed since 80 have been used.   Use NPM Modules When we are using WAWS to host Node.js, we can simply install modules we need, and then just publish or upload all files to WAWS. But if we are using WACS worker role, we have to do some extra steps to make the modules work. Assuming that we plan to use “express” in our application. Firstly of all we should download and install this module through NPM command. But after the install finished, they are just in the disk but not included in the worker role project. If we deploy the worker role right now the module will not be packaged and uploaded to azure. Hence we need to add them to the project. On solution explorer window click the “Show all files” button, select the “node_modules” folder and in the context menu select “Include In Project”. But that not enough. We also need to make all files in this module to “Copy always” or “Copy if newer”, so that they can be uploaded to azure with the “node.exe” and “index.js”. This is painful step since there might be many files in a module. So I created a small tool which can update a C# project file, make its all items as “Copy always”. The code is very simple. 1: static void Main(string[] args) 2: { 3: if (args.Length < 1) 4: { 5: Console.WriteLine("Usage: copyallalways [project file]"); 6: return; 7: } 8:  9: var proj = args[0]; 10: File.Copy(proj, string.Format("{0}.bak", proj)); 11:  12: var xml = new XmlDocument(); 13: xml.Load(proj); 14: var nsManager = new XmlNamespaceManager(xml.NameTable); 15: nsManager.AddNamespace("pf", "http://schemas.microsoft.com/developer/msbuild/2003"); 16:  17: // add the output setting to copy always 18: var contentNodes = xml.SelectNodes("//pf:Project/pf:ItemGroup/pf:Content", nsManager); 19: UpdateNodes(contentNodes, xml, nsManager); 20: var noneNodes = xml.SelectNodes("//pf:Project/pf:ItemGroup/pf:None", nsManager); 21: UpdateNodes(noneNodes, xml, nsManager); 22: xml.Save(proj); 23:  24: // remove the namespace attributes 25: var content = xml.InnerXml.Replace("<CopyToOutputDirectory xmlns=\"\">", "<CopyToOutputDirectory>"); 26: xml.LoadXml(content); 27: xml.Save(proj); 28: } 29:  30: static void UpdateNodes(XmlNodeList nodes, XmlDocument xml, XmlNamespaceManager nsManager) 31: { 32: foreach (XmlNode node in nodes) 33: { 34: var copyToOutputDirectoryNode = node.SelectSingleNode("pf:CopyToOutputDirectory", nsManager); 35: if (copyToOutputDirectoryNode == null) 36: { 37: var n = xml.CreateNode(XmlNodeType.Element, "CopyToOutputDirectory", null); 38: n.InnerText = "Always"; 39: node.AppendChild(n); 40: } 41: else 42: { 43: if (string.Compare(copyToOutputDirectoryNode.InnerText, "Always", true) != 0) 44: { 45: copyToOutputDirectoryNode.InnerText = "Always"; 46: } 47: } 48: } 49: } Please be careful when use this tool. I created only for demo so do not use it directly in a production environment. Unload the worker role project, execute this tool with the worker role project file name as the command line argument, it will set all items as “Copy always”. Then reload this worker role project. Now let’s change the “index.js” to use express. 1: var express = require("express"); 2: var app = express(); 3:  4: var port = 80; 5:  6: app.configure(function () { 7: }); 8:  9: app.get("/", function (req, res) { 10: res.send("Hello Node.js!"); 11: }); 12:  13: app.get("/User/:id", function (req, res) { 14: var id = req.params.id; 15: res.json({ 16: "id": id, 17: "name": "user " + id, 18: "company": "IGT" 19: }); 20: }); 21:  22: app.listen(port); Finally let’s publish it and have a look in browser.   Use Windows Azure SQL Database We can use Windows Azure SQL Database (a.k.a. WACD) from Node.js as well on worker role hosting. Since we can control the version of Node.js, here we can use x64 version of “node-sqlserver” now. This is better than if we host Node.js on WAWS since it only support x86. Just install the “node-sqlserver” module from NPM, copy the “sqlserver.node” from “Build\Release” folder to “Lib” folder. Include them in worker role project and run my tool to make them to “Copy always”. Finally update the “index.js” to use WASD. 1: var express = require("express"); 2: var sql = require("node-sqlserver"); 3:  4: var connectionString = "Driver={SQL Server Native Client 10.0};Server=tcp:{SERVER NAME}.database.windows.net,1433;Database={DATABASE NAME};Uid={LOGIN}@{SERVER NAME};Pwd={PASSWORD};Encrypt=yes;Connection Timeout=30;"; 5: var port = 80; 6:  7: var app = express(); 8:  9: app.configure(function () { 10: app.use(express.bodyParser()); 11: }); 12:  13: app.get("/", function (req, res) { 14: sql.open(connectionString, function (err, conn) { 15: if (err) { 16: console.log(err); 17: res.send(500, "Cannot open connection."); 18: } 19: else { 20: conn.queryRaw("SELECT * FROM [Resource]", function (err, results) { 21: if (err) { 22: console.log(err); 23: res.send(500, "Cannot retrieve records."); 24: } 25: else { 26: res.json(results); 27: } 28: }); 29: } 30: }); 31: }); 32:  33: app.get("/text/:key/:culture", function (req, res) { 34: sql.open(connectionString, function (err, conn) { 35: if (err) { 36: console.log(err); 37: res.send(500, "Cannot open connection."); 38: } 39: else { 40: var key = req.params.key; 41: var culture = req.params.culture; 42: var command = "SELECT * FROM [Resource] WHERE [Key] = '" + key + "' AND [Culture] = '" + culture + "'"; 43: conn.queryRaw(command, function (err, results) { 44: if (err) { 45: console.log(err); 46: res.send(500, "Cannot retrieve records."); 47: } 48: else { 49: res.json(results); 50: } 51: }); 52: } 53: }); 54: }); 55:  56: app.get("/sproc/:key/:culture", function (req, res) { 57: sql.open(connectionString, function (err, conn) { 58: if (err) { 59: console.log(err); 60: res.send(500, "Cannot open connection."); 61: } 62: else { 63: var key = req.params.key; 64: var culture = req.params.culture; 65: var command = "EXEC GetItem '" + key + "', '" + culture + "'"; 66: conn.queryRaw(command, function (err, results) { 67: if (err) { 68: console.log(err); 69: res.send(500, "Cannot retrieve records."); 70: } 71: else { 72: res.json(results); 73: } 74: }); 75: } 76: }); 77: }); 78:  79: app.post("/new", function (req, res) { 80: var key = req.body.key; 81: var culture = req.body.culture; 82: var val = req.body.val; 83:  84: sql.open(connectionString, function (err, conn) { 85: if (err) { 86: console.log(err); 87: res.send(500, "Cannot open connection."); 88: } 89: else { 90: var command = "INSERT INTO [Resource] VALUES ('" + key + "', '" + culture + "', N'" + val + "')"; 91: conn.queryRaw(command, function (err, results) { 92: if (err) { 93: console.log(err); 94: res.send(500, "Cannot retrieve records."); 95: } 96: else { 97: res.send(200, "Inserted Successful"); 98: } 99: }); 100: } 101: }); 102: }); 103:  104: app.listen(port); Publish to azure and now we can see our Node.js is working with WASD through x64 version “node-sqlserver”.   Summary In this post I demonstrated how to host our Node.js in Windows Azure Cloud Service worker role. By using worker role we can control the version of Node.js, as well as the entry code. And it’s possible to do some pre jobs before the Node.js application started. It also removed the IIS and IISNode limitation. I personally recommended to use worker role as our Node.js hosting. But there are some problem if you use the approach I mentioned here. The first one is, we need to set all JavaScript files and module files as “Copy always” or “Copy if newer” manually. The second one is, in this way we cannot retrieve the cloud service configuration information. For example, we defined the endpoint in worker role property but we also specified the listening port in Node.js hardcoded. It should be changed that our Node.js can retrieve the endpoint. But I can tell you it won’t be working here. In the next post I will describe another way to execute the “node.exe” and Node.js application, so that we can get the cloud service configuration in Node.js. I will also demonstrate how to use Windows Azure Storage from Node.js by using the Windows Azure Node.js SDK.   Hope this helps, Shaun All documents and related graphics, codes are provided "AS IS" without warranty of any kind. Copyright © Shaun Ziyan Xu. This work is licensed under the Creative Commons License.

    Read the article

  • Advanced TSQL Tuning: Why Internals Knowledge Matters

    - by Paul White
    There is much more to query tuning than reducing logical reads and adding covering nonclustered indexes.  Query tuning is not complete as soon as the query returns results quickly in the development or test environments.  In production, your query will compete for memory, CPU, locks, I/O and other resources on the server.  Today’s entry looks at some tuning considerations that are often overlooked, and shows how deep internals knowledge can help you write better TSQL. As always, we’ll need some example data.  In fact, we are going to use three tables today, each of which is structured like this: Each table has 50,000 rows made up of an INTEGER id column and a padding column containing 3,999 characters in every row.  The only difference between the three tables is in the type of the padding column: the first table uses CHAR(3999), the second uses VARCHAR(MAX), and the third uses the deprecated TEXT type.  A script to create a database with the three tables and load the sample data follows: USE master; GO IF DB_ID('SortTest') IS NOT NULL DROP DATABASE SortTest; GO CREATE DATABASE SortTest COLLATE LATIN1_GENERAL_BIN; GO ALTER DATABASE SortTest MODIFY FILE ( NAME = 'SortTest', SIZE = 3GB, MAXSIZE = 3GB ); GO ALTER DATABASE SortTest MODIFY FILE ( NAME = 'SortTest_log', SIZE = 256MB, MAXSIZE = 1GB, FILEGROWTH = 128MB ); GO ALTER DATABASE SortTest SET ALLOW_SNAPSHOT_ISOLATION OFF ; ALTER DATABASE SortTest SET AUTO_CLOSE OFF ; ALTER DATABASE SortTest SET AUTO_CREATE_STATISTICS ON ; ALTER DATABASE SortTest SET AUTO_SHRINK OFF ; ALTER DATABASE SortTest SET AUTO_UPDATE_STATISTICS ON ; ALTER DATABASE SortTest SET AUTO_UPDATE_STATISTICS_ASYNC ON ; ALTER DATABASE SortTest SET PARAMETERIZATION SIMPLE ; ALTER DATABASE SortTest SET READ_COMMITTED_SNAPSHOT OFF ; ALTER DATABASE SortTest SET MULTI_USER ; ALTER DATABASE SortTest SET RECOVERY SIMPLE ; USE SortTest; GO CREATE TABLE dbo.TestCHAR ( id INTEGER IDENTITY (1,1) NOT NULL, padding CHAR(3999) NOT NULL,   CONSTRAINT [PK dbo.TestCHAR (id)] PRIMARY KEY CLUSTERED (id), ) ; CREATE TABLE dbo.TestMAX ( id INTEGER IDENTITY (1,1) NOT NULL, padding VARCHAR(MAX) NOT NULL,   CONSTRAINT [PK dbo.TestMAX (id)] PRIMARY KEY CLUSTERED (id), ) ; CREATE TABLE dbo.TestTEXT ( id INTEGER IDENTITY (1,1) NOT NULL, padding TEXT NOT NULL,   CONSTRAINT [PK dbo.TestTEXT (id)] PRIMARY KEY CLUSTERED (id), ) ; -- ============= -- Load TestCHAR (about 3s) -- ============= INSERT INTO dbo.TestCHAR WITH (TABLOCKX) ( padding ) SELECT padding = REPLICATE(CHAR(65 + (Data.n % 26)), 3999) FROM ( SELECT TOP (50000) n = ROW_NUMBER() OVER (ORDER BY (SELECT 0)) - 1 FROM master.sys.columns C1, master.sys.columns C2, master.sys.columns C3 ORDER BY n ASC ) AS Data ORDER BY Data.n ASC ; -- ============ -- Load TestMAX (about 3s) -- ============ INSERT INTO dbo.TestMAX WITH (TABLOCKX) ( padding ) SELECT CONVERT(VARCHAR(MAX), padding) FROM dbo.TestCHAR ORDER BY id ; -- ============= -- Load TestTEXT (about 5s) -- ============= INSERT INTO dbo.TestTEXT WITH (TABLOCKX) ( padding ) SELECT CONVERT(TEXT, padding) FROM dbo.TestCHAR ORDER BY id ; -- ========== -- Space used -- ========== -- EXECUTE sys.sp_spaceused @objname = 'dbo.TestCHAR'; EXECUTE sys.sp_spaceused @objname = 'dbo.TestMAX'; EXECUTE sys.sp_spaceused @objname = 'dbo.TestTEXT'; ; CHECKPOINT ; That takes around 15 seconds to run, and shows the space allocated to each table in its output: To illustrate the points I want to make today, the example task we are going to set ourselves is to return a random set of 150 rows from each table.  The basic shape of the test query is the same for each of the three test tables: SELECT TOP (150) T.id, T.padding FROM dbo.Test AS T ORDER BY NEWID() OPTION (MAXDOP 1) ; Test 1 – CHAR(3999) Running the template query shown above using the TestCHAR table as the target, we find that the query takes around 5 seconds to return its results.  This seems slow, considering that the table only has 50,000 rows.  Working on the assumption that generating a GUID for each row is a CPU-intensive operation, we might try enabling parallelism to see if that speeds up the response time.  Running the query again (but without the MAXDOP 1 hint) on a machine with eight logical processors, the query now takes 10 seconds to execute – twice as long as when run serially. Rather than attempting further guesses at the cause of the slowness, let’s go back to serial execution and add some monitoring.  The script below monitors STATISTICS IO output and the amount of tempdb used by the test query.  We will also run a Profiler trace to capture any warnings generated during query execution. DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TC.id, TC.padding FROM dbo.TestCHAR AS TC ORDER BY NEWID() OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; Let’s take a closer look at the statistics and query plan generated from this: Following the flow of the data from right to left, we see the expected 50,000 rows emerging from the Clustered Index Scan, with a total estimated size of around 191MB.  The Compute Scalar adds a column containing a random GUID (generated from the NEWID() function call) for each row.  With this extra column in place, the size of the data arriving at the Sort operator is estimated to be 192MB. Sort is a blocking operator – it has to examine all of the rows on its input before it can produce its first row of output (the last row received might sort first).  This characteristic means that Sort requires a memory grant – memory allocated for the query’s use by SQL Server just before execution starts.  In this case, the Sort is the only memory-consuming operator in the plan, so it has access to the full 243MB (248,696KB) of memory reserved by SQL Server for this query execution. Notice that the memory grant is significantly larger than the expected size of the data to be sorted.  SQL Server uses a number of techniques to speed up sorting, some of which sacrifice size for comparison speed.  Sorts typically require a very large number of comparisons, so this is usually a very effective optimization.  One of the drawbacks is that it is not possible to exactly predict the sort space needed, as it depends on the data itself.  SQL Server takes an educated guess based on data types, sizes, and the number of rows expected, but the algorithm is not perfect. In spite of the large memory grant, the Profiler trace shows a Sort Warning event (indicating that the sort ran out of memory), and the tempdb usage monitor shows that 195MB of tempdb space was used – all of that for system use.  The 195MB represents physical write activity on tempdb, because SQL Server strictly enforces memory grants – a query cannot ‘cheat’ and effectively gain extra memory by spilling to tempdb pages that reside in memory.  Anyway, the key point here is that it takes a while to write 195MB to disk, and this is the main reason that the query takes 5 seconds overall. If you are wondering why using parallelism made the problem worse, consider that eight threads of execution result in eight concurrent partial sorts, each receiving one eighth of the memory grant.  The eight sorts all spilled to tempdb, resulting in inefficiencies as the spilled sorts competed for disk resources.  More importantly, there are specific problems at the point where the eight partial results are combined, but I’ll cover that in a future post. CHAR(3999) Performance Summary: 5 seconds elapsed time 243MB memory grant 195MB tempdb usage 192MB estimated sort set 25,043 logical reads Sort Warning Test 2 – VARCHAR(MAX) We’ll now run exactly the same test (with the additional monitoring) on the table using a VARCHAR(MAX) padding column: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TM.id, TM.padding FROM dbo.TestMAX AS TM ORDER BY NEWID() OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; This time the query takes around 8 seconds to complete (3 seconds longer than Test 1).  Notice that the estimated row and data sizes are very slightly larger, and the overall memory grant has also increased very slightly to 245MB.  The most marked difference is in the amount of tempdb space used – this query wrote almost 391MB of sort run data to the physical tempdb file.  Don’t draw any general conclusions about VARCHAR(MAX) versus CHAR from this – I chose the length of the data specifically to expose this edge case.  In most cases, VARCHAR(MAX) performs very similarly to CHAR – I just wanted to make test 2 a bit more exciting. MAX Performance Summary: 8 seconds elapsed time 245MB memory grant 391MB tempdb usage 193MB estimated sort set 25,043 logical reads Sort warning Test 3 – TEXT The same test again, but using the deprecated TEXT data type for the padding column: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TT.id, TT.padding FROM dbo.TestTEXT AS TT ORDER BY NEWID() OPTION (MAXDOP 1, RECOMPILE) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; This time the query runs in 500ms.  If you look at the metrics we have been checking so far, it’s not hard to understand why: TEXT Performance Summary: 0.5 seconds elapsed time 9MB memory grant 5MB tempdb usage 5MB estimated sort set 207 logical reads 596 LOB logical reads Sort warning SQL Server’s memory grant algorithm still underestimates the memory needed to perform the sorting operation, but the size of the data to sort is so much smaller (5MB versus 193MB previously) that the spilled sort doesn’t matter very much.  Why is the data size so much smaller?  The query still produces the correct results – including the large amount of data held in the padding column – so what magic is being performed here? TEXT versus MAX Storage The answer lies in how columns of the TEXT data type are stored.  By default, TEXT data is stored off-row in separate LOB pages – which explains why this is the first query we have seen that records LOB logical reads in its STATISTICS IO output.  You may recall from my last post that LOB data leaves an in-row pointer to the separate storage structure holding the LOB data. SQL Server can see that the full LOB value is not required by the query plan until results are returned, so instead of passing the full LOB value down the plan from the Clustered Index Scan, it passes the small in-row structure instead.  SQL Server estimates that each row coming from the scan will be 79 bytes long – 11 bytes for row overhead, 4 bytes for the integer id column, and 64 bytes for the LOB pointer (in fact the pointer is rather smaller – usually 16 bytes – but the details of that don’t really matter right now). OK, so this query is much more efficient because it is sorting a very much smaller data set – SQL Server delays retrieving the LOB data itself until after the Sort starts producing its 150 rows.  The question that normally arises at this point is: Why doesn’t SQL Server use the same trick when the padding column is defined as VARCHAR(MAX)? The answer is connected with the fact that if the actual size of the VARCHAR(MAX) data is 8000 bytes or less, it is usually stored in-row in exactly the same way as for a VARCHAR(8000) column – MAX data only moves off-row into LOB storage when it exceeds 8000 bytes.  The default behaviour of the TEXT type is to be stored off-row by default, unless the ‘text in row’ table option is set suitably and there is room on the page.  There is an analogous (but opposite) setting to control the storage of MAX data – the ‘large value types out of row’ table option.  By enabling this option for a table, MAX data will be stored off-row (in a LOB structure) instead of in-row.  SQL Server Books Online has good coverage of both options in the topic In Row Data. The MAXOOR Table The essential difference, then, is that MAX defaults to in-row storage, and TEXT defaults to off-row (LOB) storage.  You might be thinking that we could get the same benefits seen for the TEXT data type by storing the VARCHAR(MAX) values off row – so let’s look at that option now.  This script creates a fourth table, with the VARCHAR(MAX) data stored off-row in LOB pages: CREATE TABLE dbo.TestMAXOOR ( id INTEGER IDENTITY (1,1) NOT NULL, padding VARCHAR(MAX) NOT NULL,   CONSTRAINT [PK dbo.TestMAXOOR (id)] PRIMARY KEY CLUSTERED (id), ) ; EXECUTE sys.sp_tableoption @TableNamePattern = N'dbo.TestMAXOOR', @OptionName = 'large value types out of row', @OptionValue = 'true' ; SELECT large_value_types_out_of_row FROM sys.tables WHERE [schema_id] = SCHEMA_ID(N'dbo') AND name = N'TestMAXOOR' ; INSERT INTO dbo.TestMAXOOR WITH (TABLOCKX) ( padding ) SELECT SPACE(0) FROM dbo.TestCHAR ORDER BY id ; UPDATE TM WITH (TABLOCK) SET padding.WRITE (TC.padding, NULL, NULL) FROM dbo.TestMAXOOR AS TM JOIN dbo.TestCHAR AS TC ON TC.id = TM.id ; EXECUTE sys.sp_spaceused @objname = 'dbo.TestMAXOOR' ; CHECKPOINT ; Test 4 – MAXOOR We can now re-run our test on the MAXOOR (MAX out of row) table: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) MO.id, MO.padding FROM dbo.TestMAXOOR AS MO ORDER BY NEWID() OPTION (MAXDOP 1, RECOMPILE) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; TEXT Performance Summary: 0.3 seconds elapsed time 245MB memory grant 0MB tempdb usage 193MB estimated sort set 207 logical reads 446 LOB logical reads No sort warning The query runs very quickly – slightly faster than Test 3, and without spilling the sort to tempdb (there is no sort warning in the trace, and the monitoring query shows zero tempdb usage by this query).  SQL Server is passing the in-row pointer structure down the plan and only looking up the LOB value on the output side of the sort. The Hidden Problem There is still a huge problem with this query though – it requires a 245MB memory grant.  No wonder the sort doesn’t spill to tempdb now – 245MB is about 20 times more memory than this query actually requires to sort 50,000 records containing LOB data pointers.  Notice that the estimated row and data sizes in the plan are the same as in test 2 (where the MAX data was stored in-row). The optimizer assumes that MAX data is stored in-row, regardless of the sp_tableoption setting ‘large value types out of row’.  Why?  Because this option is dynamic – changing it does not immediately force all MAX data in the table in-row or off-row, only when data is added or actually changed.  SQL Server does not keep statistics to show how much MAX or TEXT data is currently in-row, and how much is stored in LOB pages.  This is an annoying limitation, and one which I hope will be addressed in a future version of the product. So why should we worry about this?  Excessive memory grants reduce concurrency and may result in queries waiting on the RESOURCE_SEMAPHORE wait type while they wait for memory they do not need.  245MB is an awful lot of memory, especially on 32-bit versions where memory grants cannot use AWE-mapped memory.  Even on a 64-bit server with plenty of memory, do you really want a single query to consume 0.25GB of memory unnecessarily?  That’s 32,000 8KB pages that might be put to much better use. The Solution The answer is not to use the TEXT data type for the padding column.  That solution happens to have better performance characteristics for this specific query, but it still results in a spilled sort, and it is hard to recommend the use of a data type which is scheduled for removal.  I hope it is clear to you that the fundamental problem here is that SQL Server sorts the whole set arriving at a Sort operator.  Clearly, it is not efficient to sort the whole table in memory just to return 150 rows in a random order. The TEXT example was more efficient because it dramatically reduced the size of the set that needed to be sorted.  We can do the same thing by selecting 150 unique keys from the table at random (sorting by NEWID() for example) and only then retrieving the large padding column values for just the 150 rows we need.  The following script implements that idea for all four tables: SET STATISTICS IO ON ; WITH TestTable AS ( SELECT * FROM dbo.TestCHAR ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id = ANY (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestMAX ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestTEXT ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestMAXOOR ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; All four queries now return results in much less than a second, with memory grants between 6 and 12MB, and without spilling to tempdb.  The small remaining inefficiency is in reading the id column values from the clustered primary key index.  As a clustered index, it contains all the in-row data at its leaf.  The CHAR and VARCHAR(MAX) tables store the padding column in-row, so id values are separated by a 3999-character column, plus row overhead.  The TEXT and MAXOOR tables store the padding values off-row, so id values in the clustered index leaf are separated by the much-smaller off-row pointer structure.  This difference is reflected in the number of logical page reads performed by the four queries: Table 'TestCHAR' logical reads 25511 lob logical reads 000 Table 'TestMAX'. logical reads 25511 lob logical reads 000 Table 'TestTEXT' logical reads 00412 lob logical reads 597 Table 'TestMAXOOR' logical reads 00413 lob logical reads 446 We can increase the density of the id values by creating a separate nonclustered index on the id column only.  This is the same key as the clustered index, of course, but the nonclustered index will not include the rest of the in-row column data. CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestCHAR (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestMAX (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestTEXT (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestMAXOOR (id); The four queries can now use the very dense nonclustered index to quickly scan the id values, sort them by NEWID(), select the 150 ids we want, and then look up the padding data.  The logical reads with the new indexes in place are: Table 'TestCHAR' logical reads 835 lob logical reads 0 Table 'TestMAX' logical reads 835 lob logical reads 0 Table 'TestTEXT' logical reads 686 lob logical reads 597 Table 'TestMAXOOR' logical reads 686 lob logical reads 448 With the new index, all four queries use the same query plan (click to enlarge): Performance Summary: 0.3 seconds elapsed time 6MB memory grant 0MB tempdb usage 1MB sort set 835 logical reads (CHAR, MAX) 686 logical reads (TEXT, MAXOOR) 597 LOB logical reads (TEXT) 448 LOB logical reads (MAXOOR) No sort warning I’ll leave it as an exercise for the reader to work out why trying to eliminate the Key Lookup by adding the padding column to the new nonclustered indexes would be a daft idea Conclusion This post is not about tuning queries that access columns containing big strings.  It isn’t about the internal differences between TEXT and MAX data types either.  It isn’t even about the cool use of UPDATE .WRITE used in the MAXOOR table load.  No, this post is about something else: Many developers might not have tuned our starting example query at all – 5 seconds isn’t that bad, and the original query plan looks reasonable at first glance.  Perhaps the NEWID() function would have been blamed for ‘just being slow’ – who knows.  5 seconds isn’t awful – unless your users expect sub-second responses – but using 250MB of memory and writing 200MB to tempdb certainly is!  If ten sessions ran that query at the same time in production that’s 2.5GB of memory usage and 2GB hitting tempdb.  Of course, not all queries can be rewritten to avoid large memory grants and sort spills using the key-lookup technique in this post, but that’s not the point either. The point of this post is that a basic understanding of execution plans is not enough.  Tuning for logical reads and adding covering indexes is not enough.  If you want to produce high-quality, scalable TSQL that won’t get you paged as soon as it hits production, you need a deep understanding of execution plans, and as much accurate, deep knowledge about SQL Server as you can lay your hands on.  The advanced database developer has a wide range of tools to use in writing queries that perform well in a range of circumstances. By the way, the examples in this post were written for SQL Server 2008.  They will run on 2005 and demonstrate the same principles, but you won’t get the same figures I did because 2005 had a rather nasty bug in the Top N Sort operator.  Fair warning: if you do decide to run the scripts on a 2005 instance (particularly the parallel query) do it before you head out for lunch… This post is dedicated to the people of Christchurch, New Zealand. © 2011 Paul White email: @[email protected] twitter: @SQL_Kiwi

    Read the article

  • Solving embarassingly parallel problems using Python multiprocessing

    - by gotgenes
    How does one use multiprocessing to tackle embarrassingly parallel problems? Embarassingly parallel problems typically consist of three basic parts: Read input data (from a file, database, tcp connection, etc.). Run calculations on the input data, where each calculation is independent of any other calculation. Write results of calculations (to a file, database, tcp connection, etc.). We can parallelize the program in two dimensions: Part 2 can run on multiple cores, since each calculation is independent; order of processing doesn't matter. Each part can run independently. Part 1 can place data on an input queue, part 2 can pull data off the input queue and put results onto an output queue, and part 3 can pull results off the output queue and write them out. This seems a most basic pattern in concurrent programming, but I am still lost in trying to solve it, so let's write a canonical example to illustrate how this is done using multiprocessing. Here is the example problem: Given a CSV file with rows of integers as input, compute their sums. Separate the problem into three parts, which can all run in parallel: Process the input file into raw data (lists/iterables of integers) Calculate the sums of the data, in parallel Output the sums Below is traditional, single-process bound Python program which solves these three tasks: #!/usr/bin/env python # -*- coding: UTF-8 -*- # basicsums.py """A program that reads integer values from a CSV file and writes out their sums to another CSV file. """ import csv import optparse import sys def make_cli_parser(): """Make the command line interface parser.""" usage = "\n\n".join(["python %prog INPUT_CSV OUTPUT_CSV", __doc__, """ ARGUMENTS: INPUT_CSV: an input CSV file with rows of numbers OUTPUT_CSV: an output file that will contain the sums\ """]) cli_parser = optparse.OptionParser(usage) return cli_parser def parse_input_csv(csvfile): """Parses the input CSV and yields tuples with the index of the row as the first element, and the integers of the row as the second element. The index is zero-index based. :Parameters: - `csvfile`: a `csv.reader` instance """ for i, row in enumerate(csvfile): row = [int(entry) for entry in row] yield i, row def sum_rows(rows): """Yields a tuple with the index of each input list of integers as the first element, and the sum of the list of integers as the second element. The index is zero-index based. :Parameters: - `rows`: an iterable of tuples, with the index of the original row as the first element, and a list of integers as the second element """ for i, row in rows: yield i, sum(row) def write_results(csvfile, results): """Writes a series of results to an outfile, where the first column is the index of the original row of data, and the second column is the result of the calculation. The index is zero-index based. :Parameters: - `csvfile`: a `csv.writer` instance to which to write results - `results`: an iterable of tuples, with the index (zero-based) of the original row as the first element, and the calculated result from that row as the second element """ for result_row in results: csvfile.writerow(result_row) def main(argv): cli_parser = make_cli_parser() opts, args = cli_parser.parse_args(argv) if len(args) != 2: cli_parser.error("Please provide an input file and output file.") infile = open(args[0]) in_csvfile = csv.reader(infile) outfile = open(args[1], 'w') out_csvfile = csv.writer(outfile) # gets an iterable of rows that's not yet evaluated input_rows = parse_input_csv(in_csvfile) # sends the rows iterable to sum_rows() for results iterable, but # still not evaluated result_rows = sum_rows(input_rows) # finally evaluation takes place as a chain in write_results() write_results(out_csvfile, result_rows) infile.close() outfile.close() if __name__ == '__main__': main(sys.argv[1:]) Let's take this program and rewrite it to use multiprocessing to parallelize the three parts outlined above. Below is a skeleton of this new, parallelized program, that needs to be fleshed out to address the parts in the comments: #!/usr/bin/env python # -*- coding: UTF-8 -*- # multiproc_sums.py """A program that reads integer values from a CSV file and writes out their sums to another CSV file, using multiple processes if desired. """ import csv import multiprocessing import optparse import sys NUM_PROCS = multiprocessing.cpu_count() def make_cli_parser(): """Make the command line interface parser.""" usage = "\n\n".join(["python %prog INPUT_CSV OUTPUT_CSV", __doc__, """ ARGUMENTS: INPUT_CSV: an input CSV file with rows of numbers OUTPUT_CSV: an output file that will contain the sums\ """]) cli_parser = optparse.OptionParser(usage) cli_parser.add_option('-n', '--numprocs', type='int', default=NUM_PROCS, help="Number of processes to launch [DEFAULT: %default]") return cli_parser def main(argv): cli_parser = make_cli_parser() opts, args = cli_parser.parse_args(argv) if len(args) != 2: cli_parser.error("Please provide an input file and output file.") infile = open(args[0]) in_csvfile = csv.reader(infile) outfile = open(args[1], 'w') out_csvfile = csv.writer(outfile) # Parse the input file and add the parsed data to a queue for # processing, possibly chunking to decrease communication between # processes. # Process the parsed data as soon as any (chunks) appear on the # queue, using as many processes as allotted by the user # (opts.numprocs); place results on a queue for output. # # Terminate processes when the parser stops putting data in the # input queue. # Write the results to disk as soon as they appear on the output # queue. # Ensure all child processes have terminated. # Clean up files. infile.close() outfile.close() if __name__ == '__main__': main(sys.argv[1:]) These pieces of code, as well as another piece of code that can generate example CSV files for testing purposes, can be found on github. I would appreciate any insight here as to how you concurrency gurus would approach this problem. Here are some questions I had when thinking about this problem. Bonus points for addressing any/all: Should I have child processes for reading in the data and placing it into the queue, or can the main process do this without blocking until all input is read? Likewise, should I have a child process for writing the results out from the processed queue, or can the main process do this without having to wait for all the results? Should I use a processes pool for the sum operations? If yes, what method do I call on the pool to get it to start processing the results coming into the input queue, without blocking the input and output processes, too? apply_async()? map_async()? imap()? imap_unordered()? Suppose we didn't need to siphon off the input and output queues as data entered them, but could wait until all input was parsed and all results were calculated (e.g., because we know all the input and output will fit in system memory). Should we change the algorithm in any way (e.g., not run any processes concurrently with I/O)?

    Read the article

  • Python regex to parse text file, get the items in list and count the list

    - by Nemo
    I have a text file which contains some data. I m particularly interested in finding the count of the number of items in v_dims v_dims pattern in my text file looks like this : v_dims={ "Sales", "Product Family", "Sales Organization", "Region", "Sales Area", "Sales office", "Sales Division", "Sales Person", "Sales Channel", "Sales Order Type", "Sales Number", "Sales Person", "Sales Quantity", "Sales Amount" } So I m thinking of getting all the elements in v_dims and dumping them out in a Python list. Then compute the len(mylist) to get the count of the items. The challenge is in getting all the elements of v_dims from my text file and putting them in an empty list. I m particularly interested in items in v_dims in my text file. The text file has data in the form of v_dims pattern i showed in my original post. Some data has nested patterns of v_dims. Thanks. Here's what I have tried and failed. Any help is appreciated. TIA. import re fname = "C:\Users\XXXX\Test.mrk" with open(fname, "r") as fo: content_as_string = fo.read() match = re.findall(r'v_dims={\"(.+?)\"}',content_as_string) Though I have a big text file, Here's a snippet of what's the structure of my text file version "1"; // Computer generated object language file object 'MRKR' "Main" { Data_Type=2, HeaderBlock={ Version_String="6.3 (25)" }, Printer_Info={ Orientation=0, Page_Width=8.50000000, Page_Height=11.00000000, Page_Header="", Page_Footer="", Margin_type=0, Top_Margin=0.50000000, Left_Margin=0.50000000, Bottom_Margin=0.50000000, Right_Margin=0.50000000 }, Marker_Options={ Close_All="TRUE", Hide_Console="FALSE", Console_Left="FALSE", Console_Width=217, Main_Style="Maximized", MDI_Rect={ 0, 0, 892, 1063 } }, Dives={ { Dive="A", Windows={ { View_Index=0, Window_Info={ Window_Rect={ 0, -288, 400, 1008 }, Window_Style="Maximized Front", Window_Name="Theater [Previous Qtr Diveplan-Dive A]" }, Dependent_bool="FALSE", Colset={ Dive_Type="Normal", Dimension_Name="Theater", Action_List={ Actions={ { Action_Type="Select", select_type=5 }, { Action_Type="Select", select_type=0, Key_Names={ "Theater" }, Key_Indexes={ { "AMERICAS" } } }, { Action_Type="Focus", Focus_Rows="True" }, { Action_Type="Dimensions", v_dims={ "Theater", "Product Family", "Division", "Region", "Install at Country Name", "Connect Home Type", "Connect In Type", "SymmConnect Enabled", "Connect Home Refusal Reason", "Sales Order Channel Type", "Maintained By Group", "PS Flag", "Avalanche Flag", "Product Item Family" }, Xtab_Bool="False", Xtab_Flip="False" }, { Action_Type="Select", select_type=5 }, { Action_Type="Select", select_type=0, Key_Names={ "Theater", "Product Family", "Division", "Region", "Install at Country Name", "Connect Home Type", "Connect In Type", "SymmConnect Enabled", "Connect Home Refusal Reason", "Sales Order Channel Type", "Maintained By Group", "PS Flag", "Avalanche Flag" }, Key_Indexes={ { "AMERICAS", "ATMOS", "Latin America CS Division", "37000 CS Region", "Mexico", "", "", "", "", "DIRECT", "EMC", "N", "0" } } } } }, Num_Palette_cols=0, Num_Palette_rows=0 }, Format={ Window_Type="Tabular", Tabular={ Num_row_labels=8 } } } } } }, Widget_Set={ Widget_Layout="Vertical", Go_Button=1, Picklist_Width=0, Sort_Subset_Dimensions="TRUE", Order={ } }, Views={ { Data_Type=1, dbname="Previous Qtr Diveplan", diveline_dbname="Current Qtr Diveplan", logical_name="Current Qtr Diveplan", cols={ { name="Total TSS installs", column_type="Calc[Total TSS installs]", output_type="Number", format_string="." }, { name="TSS Valid Connectivity Records", column_type="Calc[TSS Valid Connectivity Records]", output_type="Number", format_string="." }, { name="% TSS Connectivity Record", column_type="Calc[% TSS Connectivity Record]", output_type="Number" }, { name="TSS Not Applicable", column_type="Calc[TSS Not Applicable]", output_type="Number", format_string="." }, { name="TSS Customer Refusals", column_type="Calc[TSS Customer Refusals]", output_type="Number", format_string="." }, { name="% TSS Refusals", column_type="Calc[% TSS Refusals]", output_type="Number" }, { name="TSS Eligible for Physical Connectivity", column_type="Calc[TSS Eligible for Physical Connectivity]", output_type="Number", format_string="." }, { name="TSS Boxes with Physical Connectivty", column_type="Calc[TSS Boxes with Physical Connectivty]", output_type="Number", format_string="." }, { name="% TSS Physical Connectivity", column_type="Calc[% TSS Physical Connectivity]", output_type="Number" } }, dim_cols={ { name="Model", column_type="Dimension[Model]", output_type="None" }, { name="Model", column_type="Dimension[Model]", output_type="None" }, { name="Connect In Type", column_type="Dimension[Connect In Type]", output_type="None" }, { name="Connect Home Type", column_type="Dimension[Connect Home Type]", output_type="None" }, { name="SymmConnect Enabled", column_type="Dimension[SymmConnect Enabled]", output_type="None" }, { name="Theater", column_type="Dimension[Theater]", output_type="None" }, { name="Division", column_type="Dimension[Division]", output_type="None" }, { name="Region", column_type="Dimension[Region]", output_type="None" }, { name="Sales Order Number", column_type="Dimension[Sales Order Number]", output_type="None" }, { name="Product Item Family", column_type="Dimension[Product Item Family]", output_type="None" }, { name="Item Serial Number", column_type="Dimension[Item Serial Number]", output_type="None" }, { name="Sales Order Deal Number", column_type="Dimension[Sales Order Deal Number]", output_type="None" }, { name="Item Install Date", column_type="Dimension[Item Install Date]", output_type="None" }, { name="SYR Last Dial Home Date", column_type="Dimension[SYR Last Dial Home Date]", output_type="None" }, { name="Maintained By Group", column_type="Dimension[Maintained By Group]", output_type="None" }, { name="PS Flag", column_type="Dimension[PS Flag]", output_type="None" }, { name="Connect Home Refusal Reason", column_type="Dimension[Connect Home Refusal Reason]", output_type="None", col_width=177 }, { name="Cust Name", column_type="Dimension[Cust Name]", output_type="None" }, { name="Sales Order Channel Type", column_type="Dimension[Sales Order Channel Type]", output_type="None" }, { name="Sales Order Type", column_type="Dimension[Sales Order Type]", output_type="None" }, { name="Part Model Key", column_type="Dimension[Part Model Key]", output_type="None" }, { name="Ship Date", column_type="Dimension[Ship Date]", output_type="None" }, { name="Model Number", column_type="Dimension[Model Number]", output_type="None" }, { name="Item Description", column_type="Dimension[Item Description]", output_type="None" }, { name="Customer Classification", column_type="Dimension[Customer Classification]", output_type="None" }, { name="CS Customer Name", column_type="Dimension[CS Customer Name]", output_type="None" }, { name="Install At Customer Number", column_type="Dimension[Install At Customer Number]", output_type="None" }, { name="Install at Country Name", column_type="Dimension[Install at Country Name]", output_type="None" }, { name="TLA Serial Number", column_type="Dimension[TLA Serial Number]", output_type="None" }, { name="Product Version", column_type="Dimension[Product Version]", output_type="None" }, { name="Avalanche Flag", column_type="Dimension[Avalanche Flag]", output_type="None" }, { name="Product Family", column_type="Dimension[Product Family]", output_type="None" }, { name="Project Number", column_type="Dimension[Project Number]", output_type="None" }, { name="PROJECT_STATUS", column_type="Dimension[PROJECT_STATUS]", output_type="None" } }, Available_Columns={ "Total TSS installs", "TSS Valid Connectivity Records", "% TSS Connectivity Record", "TSS Not Applicable", "TSS Customer Refusals", "% TSS Refusals", "TSS Eligible for Physical Connectivity", "TSS Boxes with Physical Connectivty", "% TSS Physical Connectivity", "Total Installs", "All Boxes with Valid Connectivty Record", "% All Connectivity Record", "Overall Refusals", "Overall Refusals %", "All Eligible for Physical Connectivty", "Boxes with Physical Connectivity", "% All with Physical Conectivity" }, Remaining_columns={ { name="Total Installs", column_type="Calc[Total Installs]", output_type="Number", format_string="." }, { name="All Boxes with Valid Connectivty Record", column_type="Calc[All Boxes with Valid Connectivty Record]", output_type="Number", format_string="." }, { name="% All Connectivity Record", column_type="Calc[% All Connectivity Record]", output_type="Number" }, { name="Overall Refusals", column_type="Calc[Overall Refusals]", output_type="Number", format_string="." }, { name="Overall Refusals %", column_type="Calc[Overall Refusals %]", output_type="Number" }, { name="All Eligible for Physical Connectivty", column_type="Calc[All Eligible for Physical Connectivty]", output_type="Number" }, { name="Boxes with Physical Connectivity", column_type="Calc[Boxes with Physical Connectivity]", output_type="Number" }, { name="% All with Physical Conectivity", column_type="Calc[% All with Physical Conectivity]", output_type="Number" } }, calcs={ { name="Total TSS installs", definition="Total[Total TSS installs]", ts_flag="Not TS Calc" }, { name="TSS Valid Connectivity Records", definition="Total[PS Boxes w/ valid connectivity record (1=yes)]", ts_flag="Not TS Calc" }, { name="% TSS Connectivity Record", definition="Total[PS Boxes w/ valid connectivity record (1=yes)] /Total[Total TSS installs]", ts_flag="Not TS Calc" }, { name="TSS Not Applicable", definition="Total[Bozes w/ valid connectivity record (1=yes)]-Total[Boxes Eligible (1=yes)]-Total[TSS Refusals]", ts_flag="Not TS Calc" }, { name="TSS Customer Refusals", definition="Total[TSS Refusals]", ts_flag="Not TS Calc" }, { name="% TSS Refusals", definition="Total[TSS Refusals]/Total[PS Boxes w/ valid connectivity record (1=yes)]", ts_flag="Not TS Calc" }, { name="TSS Eligible for Physical Connectivity", definition="Total[TSS Eligible]-Total[Exception]", ts_flag="Not TS Calc" }, { name="TSS Boxes with Physical Connectivty", definition="Total[PS Physical Connectivity] - Total[PS Physical Connectivity, SymmConnect Enabled=\"Capable not enabled\"]", ts_flag="Not TS Calc" }, { name="% TSS Physical Connectivity", definition="Total[Boxes w/ phys conn]/Total[Boxes Eligible (1=yes)]", ts_flag="Not TS Calc" }, { name="Total Installs", definition="Total[Total Installs]", ts_flag="Not TS Calc" }, { name="All Boxes with Valid Connectivty Record", definition="Total[Bozes w/ valid connectivity record (1=yes)]", ts_flag="Not TS Calc" }, { name="% All Connectivity Record", definition="Total[Bozes w/ valid connectivity record (1=yes)]/Total[Total Installs]", ts_flag="Not TS Calc" }, { name="Overall Refusals", definition="Total[Overall Refusals]", ts_flag="Not TS Calc" }, { name="Overall Refusals %", definition="Total[Overall Refusals]/Total[Bozes w/ valid connectivity record (1=yes)]", ts_flag="Not TS Calc" }, { name="All Eligible for Physical Connectivty", definition="Total[Boxes Eligible (1=yes)]-Total[Exception]", ts_flag="Not TS Calc" }, { name="Boxes with Physical Connectivity", definition="Total[Boxes w/ phys conn]-Total[Boxes w/ phys conn,SymmConnect Enabled=\"Capable not enabled\"]", ts_flag="Not TS Calc" }, { name="% All with Physical Conectivity", definition="Total[Boxes w/ phys conn]/Total[Boxes Eligible (1=yes)]", ts_flag="Not TS Calc" } }, merge_type="consolidate", merge_dbs={ { dbname="connectivityallproducts.mdl", diveline_dbname="/DI_PSREPORTING/connectivityallproducts.mdl" } }, skip_constant_columns="FALSE", categories={ { name="Geography", dimensions={ "Theater", "Division", "Region", "Install at Country Name" } }, { name="Mappings and Flags", dimensions={ "Connect Home Type", "Connect In Type", "SymmConnect Enabled", "Connect Home Refusal Reason", "Sales Order Channel Type", "Maintained By Group", "Customer Installable", "PS Flag", "Top Level Flag", "Avalanche Flag" } }, { name="Product Information", dimensions={ "Product Family", "Product Item Family", "Product Version", "Item Description" } }, { name="Sales Order Info", dimensions={ "Sales Order Deal Number", "Sales Order Number", "Sales Order Type" } }, { name="Dates", dimensions={ "Item Install Date", "Ship Date", "SYR Last Dial Home Date" } }, { name="Details", dimensions={ "Item Serial Number", "TLA Serial Number", "Part Model Key", "Model Number" } }, { name="Customer Infor", dimensions={ "CS Customer Name", "Install At Customer Number", "Customer Classification", "Cust Name" } }, { name="Other Dimensions", dimensions={ "Model" } } }, Maintain_Category_Order="FALSE", popup_info="false" } } };

    Read the article

  • Weblogic 10.0: SAMLSignedObject.verify() failed to validate signature value

    - by joshea
    I've been having this problem for a while and it's driving me nuts. I'm trying to create a client (in C# .NET 2.0) that will use SAML 1.1 to sign on to a WebLogic 10.0 server (i.e., a Single Sign-On scenario, using browser/post profile). The client is on a WinXP machine and the WebLogic server is on a RHEL 5 box. I based my client largely on code in the example here: http://www.codeproject.com/KB/aspnet/DotNetSamlPost.aspx (the source has a section for SAML 1.1). I set up WebLogic based on instructions for SAML Destination Site from here:http://www.oracle.com/technology/pub/articles/dev2arch/2006/12/sso-with-saml4.html I created a certificate using makecert that came with VS 2005. makecert -r -pe -n "CN=whatever" -b 01/01/2010 -e 01/01/2011 -sky exchange whatever.cer -sv whatever.pvk pvk2pfx.exe -pvk whatever.pvk -spc whatever.cer -pfx whatever.pfx Then I installed the .pfx to my personal certificate directory, and installed the .cer into the WebLogic SAML Identity Asserter V2. I read on another site that formatting the response to be readable (ie, adding whitespace) to the response after signing would cause this problem, so I tried various combinations of turning on/off .Indent XMLWriterSettings and turning on/off .PreserveWhiteSpace when loading the XML document, and none of it made any difference. I've printed the SignatureValue both before the message is is encoded/sent and after it arrives/gets decoded, and they are the same. So, to be clear: the Response appears to be formed, encoded, sent, and decoded fine (I see the full Response in the WebLogic logs). WebLogic finds the certificate I want it to use, verifies that a key was supplied, gets the signed info, and then fails to validate the signature. Code: public string createResponse(Dictionary<string, string> attributes){ ResponseType response = new ResponseType(); // Create Response response.ResponseID = "_" + Guid.NewGuid().ToString(); response.MajorVersion = "1"; response.MinorVersion = "1"; response.IssueInstant = System.DateTime.UtcNow; response.Recipient = "http://theWLServer/samlacs/acs"; StatusType status = new StatusType(); status.StatusCode = new StatusCodeType(); status.StatusCode.Value = new XmlQualifiedName("Success", "urn:oasis:names:tc:SAML:1.0:protocol"); response.Status = status; // Create Assertion AssertionType assertionType = CreateSaml11Assertion(attributes); response.Assertion = new AssertionType[] {assertionType}; //Serialize XmlSerializerNamespaces ns = new XmlSerializerNamespaces(); ns.Add("samlp", "urn:oasis:names:tc:SAML:1.0:protocol"); ns.Add("saml", "urn:oasis:names:tc:SAML:1.0:assertion"); XmlSerializer responseSerializer = new XmlSerializer(response.GetType()); StringWriter stringWriter = new StringWriter(); XmlWriterSettings settings = new XmlWriterSettings(); settings.OmitXmlDeclaration = true; settings.Indent = false;//I've tried both ways, for the fun of it settings.Encoding = Encoding.UTF8; XmlWriter responseWriter = XmlTextWriter.Create(stringWriter, settings); responseSerializer.Serialize(responseWriter, response, ns); responseWriter.Close(); string samlString = stringWriter.ToString(); stringWriter.Close(); // Sign the document XmlDocument doc = new XmlDocument(); doc.PreserveWhiteSpace = true; //also tried this both ways to no avail doc.LoadXml(samlString); X509Certificate2 cert = null; X509Store store = new X509Store(StoreName.My, StoreLocation.CurrentUser); store.Open(OpenFlags.ReadOnly); X509Certificate2Collection coll = store.Certificates.Find(X509FindType.FindBySubjectDistinguishedName, "distName", true); if (coll.Count < 1) { throw new ArgumentException("Unable to locate certificate"); } cert = coll[0]; store.Close(); //this special SignDoc just overrides a function in SignedXml so //it knows to look for ResponseID rather than ID XmlElement signature = SamlHelper.SignDoc( doc, cert, "ResponseID", response.ResponseID); doc.DocumentElement.InsertBefore(signature, doc.DocumentElement.ChildNodes[0]); // Base64Encode and URL Encode byte[] base64EncodedBytes = Encoding.UTF8.GetBytes(doc.OuterXml); string returnValue = System.Convert.ToBase64String( base64EncodedBytes); return returnValue; } private AssertionType CreateSaml11Assertion(Dictionary<string, string> attributes){ AssertionType assertion = new AssertionType(); assertion.AssertionID = "_" + Guid.NewGuid().ToString(); assertion.Issuer = "madeUpValue"; assertion.MajorVersion = "1"; assertion.MinorVersion = "1"; assertion.IssueInstant = System.DateTime.UtcNow; //Not before, not after conditions ConditionsType conditions = new ConditionsType(); conditions.NotBefore = DateTime.UtcNow; conditions.NotBeforeSpecified = true; conditions.NotOnOrAfter = DateTime.UtcNow.AddMinutes(10); conditions.NotOnOrAfterSpecified = true; //Name Identifier to be used in Saml Subject NameIdentifierType nameIdentifier = new NameIdentifierType(); nameIdentifier.NameQualifier = domain.Trim(); nameIdentifier.Value = subject.Trim(); SubjectConfirmationType subjectConfirmation = new SubjectConfirmationType(); subjectConfirmation.ConfirmationMethod = new string[] { "urn:oasis:names:tc:SAML:1.0:cm:bearer" }; // // Create some SAML subject. SubjectType samlSubject = new SubjectType(); AttributeStatementType attrStatement = new AttributeStatementType(); AuthenticationStatementType authStatement = new AuthenticationStatementType(); authStatement.AuthenticationMethod = "urn:oasis:names:tc:SAML:1.0:am:password"; authStatement.AuthenticationInstant = System.DateTime.UtcNow; samlSubject.Items = new object[] { nameIdentifier, subjectConfirmation}; attrStatement.Subject = samlSubject; authStatement.Subject = samlSubject; IPHostEntry ipEntry = Dns.GetHostEntry(System.Environment.MachineName); SubjectLocalityType subjectLocality = new SubjectLocalityType(); subjectLocality.IPAddress = ipEntry.AddressList[0].ToString(); authStatement.SubjectLocality = subjectLocality; attrStatement.Attribute = new AttributeType[attributes.Count]; int i=0; // Create SAML attributes. foreach (KeyValuePair<string, string> attribute in attributes) { AttributeType attr = new AttributeType(); attr.AttributeName = attribute.Key; attr.AttributeNamespace= domain; attr.AttributeValue = new object[] {attribute.Value}; attrStatement.Attribute[i] = attr; i++; } assertion.Conditions = conditions; assertion.Items = new StatementAbstractType[] {authStatement, attrStatement}; return assertion; } private static XmlElement SignDoc(XmlDocument doc, X509Certificate2 cert2, string referenceId, string referenceValue) { // Use our own implementation of SignedXml SamlSignedXml sig = new SamlSignedXml(doc, referenceId); // Add the key to the SignedXml xmlDocument. sig.SigningKey = cert2.PrivateKey; // Create a reference to be signed. Reference reference = new Reference(); reference.Uri= String.Empty; reference.Uri = "#" + referenceValue; // Add an enveloped transformation to the reference. XmlDsigEnvelopedSignatureTransform env = new XmlDsigEnvelopedSignatureTransform(); reference.AddTransform(env); // Add the reference to the SignedXml object. sig.AddReference(reference); // Add an RSAKeyValue KeyInfo (optional; helps recipient find key to validate). KeyInfo keyInfo = new KeyInfo(); keyInfo.AddClause(new KeyInfoX509Data(cert2)); sig.KeyInfo = keyInfo; // Compute the signature. sig.ComputeSignature(); // Get the XML representation of the signature and save // it to an XmlElement object. XmlElement xmlDigitalSignature = sig.GetXml(); return xmlDigitalSignature; } To open the page in my client app, string postData = String.Format("SAMLResponse={0}&APID=ap_00001&TARGET={1}", System.Web.HttpUtility.UrlEncode(builder.buildResponse("http://theWLServer/samlacs/acs",attributes)), "http://desiredURL"); webBrowser.Navigate("http://theWLServer/samlacs/acs", "_self", Encoding.UTF8.GetBytes(postData), "Content-Type: application/x-www-form-urlencoded");

    Read the article

  • Non recursive way to position a genogram in 2D points for x axis. Descendant are below

    - by Nassign
    I currently was tasked to make a genogram for a family consisting of siblings, parents with aunts and uncles with grandparents and greatgrandparents for only blood relatives. My current algorithm is using recursion. but I am wondering how to do it in non recursive way to make it more efficient. it is programmed in c# using graphics to draw on a bitmap. Current algorithm for calculating x position, the y position is by getting the generation number. public void StartCalculatePosition() { // Search the start node (The only node with targetFlg set to true) Person start = null; foreach (Person p in PersonDic.Values) { if (start == null) start = p; if (p.Targetflg) { start = p; break; } } CalcPositionRecurse(start); // Normalize the position (shift all values to positive value) // Get the minimum value (must be negative) // Then offset the position of all marriage and person with that to make it start from zero float minPosition = float.MaxValue; foreach (Person p in PersonDic.Values) { if (minPosition > p.Position) { minPosition = p.Position; } } if (minPosition < 0) { foreach (Person p in PersonDic.Values) { p.Position -= minPosition; } foreach (Marriage m in MarriageList) { m.ParentsPosition -= minPosition; m.ChildrenPosition -= minPosition; } } } /// <summary> /// Calculate position of genogram using recursion /// </summary> /// <param name="psn"></param> private void CalcPositionRecurse(Person psn) { // End the recursion if (psn.BirthMarriage == null || psn.BirthMarriage.Parents.Count == 0) { psn.Position = 0.0f; if (psn.BirthMarriage != null) { psn.BirthMarriage.ParentsPosition = 0.0f; psn.BirthMarriage.ChildrenPosition = 0.0f; } CalculateSiblingPosition(psn); return; } // Left recurse if (psn.Father != null) { CalcPositionRecurse(psn.Father); } // Right recurse if (psn.Mother != null) { CalcPositionRecurse(psn.Mother); } // Merge Position if (psn.Father != null && psn.Mother != null) { AdjustConflict(psn.Father, psn.Mother); // Position person in center of parent psn.Position = (psn.Father.Position + psn.Mother.Position) / 2; psn.BirthMarriage.ParentsPosition = psn.Position; psn.BirthMarriage.ChildrenPosition = psn.Position; } else { // Single mom or single dad if (psn.Father != null) { psn.Position = psn.Father.Position; psn.BirthMarriage.ParentsPosition = psn.Position; psn.BirthMarriage.ChildrenPosition = psn.Position; } else if (psn.Mother != null) { psn.Position = psn.Mother.Position; psn.BirthMarriage.ParentsPosition = psn.Position; psn.BirthMarriage.ChildrenPosition = psn.Position; } else { // Should not happen, checking in start of function } } // Arrange the siblings base on my position (left younger, right older) CalculateSiblingPosition(psn); } private float GetRightBoundaryAncestor(Person psn) { float rPos = psn.Position; // Get the rightmost position among siblings foreach (Person sibling in psn.Siblings) { if (sibling.Position > rPos) { rPos = sibling.Position; } } if (psn.Father != null) { float rFatherPos = GetRightBoundaryAncestor(psn.Father); if (rFatherPos > rPos) { rPos = rFatherPos; } } if (psn.Mother != null) { float rMotherPos = GetRightBoundaryAncestor(psn.Mother); if (rMotherPos > rPos) { rPos = rMotherPos; } } return rPos; } private float GetLeftBoundaryAncestor(Person psn) { float rPos = psn.Position; // Get the rightmost position among siblings foreach (Person sibling in psn.Siblings) { if (sibling.Position < rPos) { rPos = sibling.Position; } } if (psn.Father != null) { float rFatherPos = GetLeftBoundaryAncestor(psn.Father); if (rFatherPos < rPos) { rPos = rFatherPos; } } if (psn.Mother != null) { float rMotherPos = GetLeftBoundaryAncestor(psn.Mother); if (rMotherPos < rPos) { rPos = rMotherPos; } } return rPos; } /// <summary> /// Check if two parent group has conflict and compensate on the conflict /// </summary> /// <param name="leftGroup"></param> /// <param name="rightGroup"></param> public void AdjustConflict(Person leftGroup, Person rightGroup) { float leftMax = GetRightBoundaryAncestor(leftGroup); leftMax += 0.5f; float rightMin = GetLeftBoundaryAncestor(rightGroup); rightMin -= 0.5f; float diff = leftMax - rightMin; if (diff > 0.0f) { float moveHalf = Math.Abs(diff) / 2; RecurseMoveAncestor(leftGroup, 0 - moveHalf); RecurseMoveAncestor(rightGroup, moveHalf); } } /// <summary> /// Recursively move a person and all his/her ancestor /// </summary> /// <param name="psn"></param> /// <param name="moveUnit"></param> public void RecurseMoveAncestor(Person psn, float moveUnit) { psn.Position += moveUnit; foreach (Person siblings in psn.Siblings) { if (siblings.Id != psn.Id) { siblings.Position += moveUnit; } } if (psn.BirthMarriage != null) { psn.BirthMarriage.ChildrenPosition += moveUnit; psn.BirthMarriage.ParentsPosition += moveUnit; } if (psn.Father != null) { RecurseMoveAncestor(psn.Father, moveUnit); } if (psn.Mother != null) { RecurseMoveAncestor(psn.Mother, moveUnit); } } /// <summary> /// Calculate the position of the siblings /// </summary> /// <param name="psn"></param> /// <param name="anchor"></param> public void CalculateSiblingPosition(Person psn) { if (psn.Siblings.Count == 0) { return; } List<Person> sibling = psn.Siblings; int argidx; for (argidx = 0; argidx < sibling.Count; argidx++) { if (sibling[argidx].Id == psn.Id) { break; } } // Compute position for each brother that is younger that person int idx; for (idx = argidx - 1; idx >= 0; idx--) { sibling[idx].Position = sibling[idx + 1].Position - 1; } for (idx = argidx + 1; idx < sibling.Count; idx++) { sibling[idx].Position = sibling[idx - 1].Position + 1; } }

    Read the article

  • Criticize my code, please

    - by Micky
    Hey, I was applying for a position, and they asked me to complete a coding problem for them. I did so and submitted it, but I later found out I was rejected from the position. Anyways, I have an eclectic programming background so I'm not sure if my code is grossly wrong or if I just didn't have the best solution out there. I would like to post my code and get some feedback about it. Before I do, here's a description of a problem: You are given a sorted array of integers, say, {1, 2, 4, 4, 5, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10, 11, 13 }. Now you are supposed to write a program (in C or C++, but I chose C) that prompts the user for an element to search for. The program will then search for the element. If it is found, then it should return the first index the entry was found at and the number of instances of that element. If the element is not found, then it should return "not found" or something similar. Here's a simple run of it (with the array I just put up): Enter a number to search for: 4 4 was found at index 2. There are 2 instances for 4 in the array. Enter a number to search for: -4. -4 is not in the array. They made a comment that my code should scale well with large arrays (so I wrote up a binary search). Anyways, my code basically runs as follows: Prompts user for input. Then it checks if it is within bounds (bigger than a[0] in the array and smaller than the largest element of the array). If so, then I perform a binary search. If the element is found, then I wrote two while loops. One while loop will count to the left of the element found, and the second while loop will count to the right of the element found. The loops terminate when the adjacent elements do not match with the desired value. EX: 4, 4, 4, 4, 4 The bold 4 is the value the binary search landed on. One loop will check to the left of it, and another loop will check to the right of it. Their sum will be the total number of instances of the the number four. Anyways, I don't know if there are any advanced techniques that I am missing or if I just don't have the CS background and made a big error. Any constructive critiques would be appreciated! #include <stdio.h> #include <stdlib.h> #include <string.h> #include <stddef.h> /* function prototype */ int get_num_of_ints( const int* arr, size_t r, int N, size_t* first, size_t* count ); int main() { int N; /* input variable */ int arr[]={1,1,2,3,3,4,4,4,4,5,5,7,7,7,7,8,8,8,9,11,12,12}; /* array of sorted integers */ size_t r = sizeof(arr)/sizeof(arr[0]); /* right bound */ size_t first; /* first match index */ size_t count; /* total number of matches */ /* prompts the user to enter input */ printf( "\nPlease input the integer you would like to find.\n" ); scanf( "%d", &N ); int a = get_num_of_ints( arr, r, N, &first, &count ); /* If the function returns -1 then the value is not found. Else it is returned */ if( a == -1) printf( "%d has not been found.\n", N ); else if(a >= 0){ printf( "The first matching index is %d.\n", first ); printf( "The total number of instances is %d.\n", count ); } return 0; } /* function definition */ int get_num_of_ints( const int* arr, size_t r, int N, size_t* first, size_t* count ) { int lo=0; /* lower bound for search */ int m=0; /* middle value obtained */ int hi=r-1; /* upper bound for search */ int w=r-1; /* used as a fixed upper bound to calculate the number of right instances of a particular value. */ /* binary search to find if a value exists */ /* first check if the element is out of bounds */ if( N < arr[0] || arr[hi] < N ){ m = -1; } else{ /* binary search to find a value, if it exists, within given parameters */ while(lo <= hi){ m = (hi + lo)/2; if(arr[m] < N) lo = m+1; else if(arr[m] > N) hi = m-1; else if(arr[m]==N){ m=m; break; } } if (lo > hi) /* if it doesn't we assign it -1 */ m = -1; } /* If the value is found, then we compute the left and right instances of it */ if( m >= 0 ){ int j = m-1; /* starting with the first term to the left */ int L = 0; /* total number of left instances */ /* while loop computes total number of left instances */ while( j >= 0 && arr[j] == arr[m] ){ L++; j--; } /* There are six possible outcomes of this. Depending on the outcome, we must assign the first index variable accordingly */ if( j > 0 && L > 0 ) *first=j+1; else if( j==0 && L==0) *first=m; else if( j > 0 && L==0 ) *first=m; else if(j < 0 && L==0 ) *first=m; else if( j < 0 && L > 0 ) *first=0; else if( j=0 && L > 0 ) *first=j+1; int h = m + 1; /* starting with the first term to the right */ int R = 0; /* total number of right instances */ /* while loop computes total number of right instances */ /* we fixed w earlier so that it's value does not change */ while( arr[h]==arr[m] && h <= w ){ R++; h++; } *count = (R + L + 1); /* total number of instances stored as value of count */ return *first; /* first instance index stored here */ } /* if value does not exist, then we return a negative value */ else if( m==-1) return -1; }

    Read the article

  • How Should I Generate Trade Statistics For CouchDB/Rails3 Application?

    - by James
    My Problem: I am trying to developing a web application for currency traders. The application allows traders to enter or upload information about their trades and I want to calculate a wide variety of statistics based on what the user entered. Now, normally I would use a relational database for this, but I have two requirements that don't fit well with a relational database so I am attempting to use couchdb. Those two problems are: 1) Primarily, I have a companion desktop application that users will be able to work with and replicate to the site using couchdb's awesome replication feature and 2) I would like to allow users to be able to define their own custom things to track about trades and generate results based off of what they enter. The schema less nature of couch seems perfect here, but it may end up being harder than it sounds. (I already know couch requires you to define views in advance and such so I was just planning on sticking all the custom attributes in an array and then emitting the array in the view and further processing from there.) What I Am Doing: Right now I am just emitting each trade in couch keyed by each user's system and querying with the key of the system to get an array of trades per system. Simple. I am not using a reduce function currently to calculate any stats because I couldn't figure out how to get everything I need without getting a reduce overflow error. Here is an example of rows that are getting emitted from couch: {"total_rows":134,"offset":0,"rows":[ {"id":"5b1dcd47221e160d8721feee4ccc64be", "key":["80e40ba2fa43589d57ec3f1d19db41e6","2010/05/14 04:32:37 +0000"], null, "doc":{ "_id":"5b1dcd47221e160d8721feee4ccc64be", "_rev":"1-bc9fe763e2637694df47d6f5efb58e5b", "couchrest-type":"Trade", "system":"80e40ba2fa43589d57ec3f1d19db41e6", "pair":"EUR/USD", "direction":"Buy", "entry":12600, "exit":12700, "stop_loss":12500, "profit_target":12700, "status":"Closed", "slug":"101332132375", "custom_tracking": [{"name":"signal", "value":"Pin Bar"}] "updated_at":"2010/05/14 04:32:37 +0000", "created_at":"2010/05/14 04:32:37 +0000", "result":100}} ]} In my rails 3 controller I am basically just populating an array of trades such as the one above and then extracting out the relevant data into smaller arrays that I can compute my statistics on. Here is my show action for the page that I want to display the stats and all the trades: def show @trades = Trade.by_system(:startkey => [@system.id], :endkey => [@system.id, Time.now ]) @trades.each do |trade| if trade.result > 0 @winning_trades << trade.result elsif trade.result < 0 @losing_trades << trade.result else @breakeven_trades << trade.result end if trade.direction == "Buy" @long_trades << trade.result else @short_trades << trade.result end if trade["custom_tracking"] != nil @custom_tracking << {"result" => trade.result, "variables" => trade["custom_tracking"]} end end end I am omitting some other stuff that is going on, but that is the gist of what I am doing. Then I am calculating stuff in the view layer to produce some results: <% winning_long_trades = @long_trades.reject {|trade| trade <= 0 } %> <% winning_short_trades = @short_trades.reject {|trade| trade <= 0 } %> <ul> <li>Total Trades: <%= @trades.count %></li> <li>Winners: <%= @winning_trades.size %></li> <li>Biggest Winner (Pips): <%= @winning_trades.max %></li> <li>Average Win(Pips): <%= @winning_trades.sum/@winning_trades.size %></li> <li>Losers: <%= @losing_trades.size %></li> <li>Biggest Loser (Pips): <%= @losing_trades.min %></li> <li>Average Loss(Pips): <%= @losing_trades.sum/@losing_trades.size %></li> <li>Breakeven Trades: <%= @breakeven_trades.size %></li> <li>Long Trades: <%= @long_trades.size %></li> <li>Winning Long Trades: <%= winning_long_trades.size %></li> <li>Short Trades: <%= @short_trades.size %></li> <li>Winning Short Trades: <%= winning_short_trades.size %></li> <li>Total Pips: <%= @winning_trades.sum + @losing_trades.sum %></li> <li>Win Rate (%): <%= @winning_trades.size/@trades.count.to_f * 100 %></li> </ul> This produces the following results, which aside from a few things is exactly what I want: Total Trades: 134 Winners: 70 Biggest Winner (Pips): 1488 Average Win(Pips): 440 Losers: 58 Biggest Loser (Pips): -516 Average Loss(Pips): -225 Breakeven Trades: 6 Long Trades: 125 Winning Long Trades: 67 Short Trades: 9 Winning Short Trades: 3 Total Pips: 17819 Win Rate (%): 52.23880597014925 What I Am Wondering- Finally The Actual Questions: I am starting to get really skeptical of how well this method will work when a user has 5,000 trades instead of just 134 like in this example. I anticipate most users will only have somewhere under 200 per year, but some users may have a couple thousand trades per year. Probably no more than 5,000 per year. It seems to work ok now, but the page load times are already getting a tad high for my tastes. (About 800ms to generate the page according to rails logs with about a 250ms of that spent in the view layer.) I will end up caching this page I am sure, but I still need the regenerate the page each time a trade is updated and I can't afford to have this be too slow. Sooo..... Is doing something similar here possible with a straight couchdb reduce function? I am assuming handing this off to couch would possibly help with larger data sets. I couldn't figure out how, but I suppose that doesn't mean it isn't possible. If possible, any hints will be helpful. Could I use a list function if a reduce was not available due to reduce constraints? Are couchdb list functions suitable for this type of calculations? Anyone have any idea of whether or not list functions perform well? Any hints what one would look like for the type of calculations I am trying to achieve? I thought about other options such as running the calculations at the time each trade was saved or nightly if I had to and saving the results to a statistics doc that I could then query so that all the processing was done ahead of time. I would like this to be the last resort because then I can't really filter out trades by time periods dynamically like I would really like to. (I want to have a slider that a user can slide to only show trades from that time period using the startkey and endkey in couchdb if I can.) If I should continue running the calculations inside the rails app at the time of the page view, what can I do to improve my current implementation. I am new to rails, couch and programming in general. I am sure that I could be doing something better here. Do I need to create an array for each stat or is there a better way to do that. I guess I just would really like some advice on how to tackle this problem. I want to keep the page generation time minimal since I anticipate these being some of the highest trafficked pages. My gut is that I will need to offload the statistics calculation to either couch or run the stats in advance of when they are called, but I am not sure. Lastly: Like I mentioned above, one of the primary reasons for using couch is to allow users to define their own things to track per trade. Getting the data into couch is no problem, but how would I be able to take the custom_tracking array and find how many winning trades for each named tracking attribute. If anyone can give me any hints to the possibility of doing this that would be great. Thanks a bunch. Would really appreciate any help. Willing to fork out some $$$ if someone wants to take on the problem for me. (Don't know if that is allowed on stack overflow or not.)

    Read the article

  • Bridge or Factory and How

    - by Chris
    I'm trying to learn patterns and I've got a job that is screaming for a pattern, I just know it but I can't figure it out. I know the filter type is something that can be abstracted and possibly bridged. I'M NOT LOOKING FOR A CODE REWRITE JUST SUGGESTIONS. I'm not looking for someone to do my job. I would like to know how patterns could be applied to this example. using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Data; using System.IO; using System.Xml; using System.Text.RegularExpressions; namespace CopyTool { class CopyJob { public enum FilterType { TextFilter, RegExFilter, NoFilter } public FilterType JobFilterType { get; set; } private string _jobName; public string JobName { get { return _jobName; } set { _jobName = value; } } private int currentIndex; public int CurrentIndex { get { return currentIndex; } } private DataSet ds; public int MaxJobs { get { return ds.Tables["Job"].Rows.Count; } } private string _filter; public string Filter { get { return _filter; } set { _filter = value; } } private string _fromFolder; public string FromFolder { get { return _fromFolder; } set { if (Directory.Exists(value)) { _fromFolder = value; } else { throw new DirectoryNotFoundException(String.Format("Folder not found: {0}", value)); } } } private List<string> _toFolders; public List<string> ToFolders { get { return _toFolders; } } public CopyJob() { Initialize(); } private void Initialize() { if (ds == null) { ds = new DataSet(); } ds.ReadXml(Properties.Settings.Default.ConfigLocation); LoadValues(0); } public void Execute() { ExecuteJob(FromFolder, _toFolders, Filter, JobFilterType); } public void ExecuteAll() { string OrigPath; List<string> DestPaths; string FilterText; FilterType FilterWay; foreach (DataRow rw in ds.Tables["Job"].Rows) { OrigPath = rw["FromFolder"].ToString(); FilterText = rw["FilterText"].ToString(); switch (rw["FilterType"].ToString()) { case "TextFilter": FilterWay = FilterType.TextFilter; break; case "RegExFilter": FilterWay = FilterType.RegExFilter; break; default: FilterWay = FilterType.NoFilter; break; } DestPaths = new List<string>(); foreach (DataRow crw in rw.GetChildRows("Job_ToFolder")) { DestPaths.Add(crw["FolderPath"].ToString()); } ExecuteJob(OrigPath, DestPaths, FilterText, FilterWay); } } private void ExecuteJob(string OrigPath, List<string> DestPaths, string FilterText, FilterType FilterWay) { FileInfo[] files; switch (FilterWay) { case FilterType.RegExFilter: files = GetFilesByRegEx(new Regex(FilterText), OrigPath); break; case FilterType.TextFilter: files = GetFilesByFilter(FilterText, OrigPath); break; default: files = new DirectoryInfo(OrigPath).GetFiles(); break; } foreach (string fld in DestPaths) { CopyFiles(files, fld); } } public void MoveToJob(int RecordNumber) { Save(); LoadValues(RecordNumber - 1); } public void AddToFolder(string folderPath) { if (Directory.Exists(folderPath)) { _toFolders.Add(folderPath); } else { throw new DirectoryNotFoundException(String.Format("Folder not found: {0}", folderPath)); } } public void DeleteToFolder(int index) { _toFolders.RemoveAt(index); } public void Save() { DataRow rw = ds.Tables["Job"].Rows[currentIndex]; rw["JobName"] = _jobName; rw["FromFolder"] = _fromFolder; rw["FilterText"] = _filter; switch (JobFilterType) { case FilterType.RegExFilter: rw["FilterType"] = "RegExFilter"; break; case FilterType.TextFilter: rw["FilterType"] = "TextFilter"; break; default: rw["FilterType"] = "NoFilter"; break; } DataRow[] ToFolderRows = ds.Tables["Job"].Rows[currentIndex].GetChildRows("Job_ToFolder"); for (int i = 0; i <= ToFolderRows.GetUpperBound(0); i++) { ToFolderRows[i].Delete(); } foreach (string fld in _toFolders) { DataRow ToFolderRow = ds.Tables["ToFolder"].NewRow(); ToFolderRow["JobId"] = ds.Tables["Job"].Rows[currentIndex]["JobId"]; ToFolderRow["Job_Id"] = ds.Tables["Job"].Rows[currentIndex]["Job_Id"]; ToFolderRow["FolderPath"] = fld; ds.Tables["ToFolder"].Rows.Add(ToFolderRow); } } public void Delete() { ds.Tables["Job"].Rows.RemoveAt(currentIndex); LoadValues(currentIndex++); } public void MoveNext() { Save(); currentIndex++; LoadValues(currentIndex); } public void MovePrevious() { Save(); currentIndex--; LoadValues(currentIndex); } public void MoveFirst() { Save(); LoadValues(0); } public void MoveLast() { Save(); LoadValues(ds.Tables["Job"].Rows.Count - 1); } public void CreateNew() { Save(); int MaxJobId = 0; Int32.TryParse(ds.Tables["Job"].Compute("Max(JobId)", "").ToString(), out MaxJobId); DataRow rw = ds.Tables["Job"].NewRow(); rw["JobId"] = MaxJobId + 1; ds.Tables["Job"].Rows.Add(rw); LoadValues(ds.Tables["Job"].Rows.IndexOf(rw)); } public void Commit() { Save(); ds.WriteXml(Properties.Settings.Default.ConfigLocation); } private void LoadValues(int index) { if (index > ds.Tables["Job"].Rows.Count - 1) { currentIndex = ds.Tables["Job"].Rows.Count - 1; } else if (index < 0) { currentIndex = 0; } else { currentIndex = index; } DataRow rw = ds.Tables["Job"].Rows[currentIndex]; _jobName = rw["JobName"].ToString(); _fromFolder = rw["FromFolder"].ToString(); _filter = rw["FilterText"].ToString(); switch (rw["FilterType"].ToString()) { case "TextFilter": JobFilterType = FilterType.TextFilter; break; case "RegExFilter": JobFilterType = FilterType.RegExFilter; break; default: JobFilterType = FilterType.NoFilter; break; } if (_toFolders == null) _toFolders = new List<string>(); _toFolders.Clear(); foreach (DataRow crw in rw.GetChildRows("Job_ToFolder")) { AddToFolder(crw["FolderPath"].ToString()); } } private static FileInfo[] GetFilesByRegEx(Regex rgx, string locPath) { DirectoryInfo d = new DirectoryInfo(locPath); FileInfo[] fullFileList = d.GetFiles(); List<FileInfo> filteredList = new List<FileInfo>(); foreach (FileInfo fi in fullFileList) { if (rgx.IsMatch(fi.Name)) { filteredList.Add(fi); } } return filteredList.ToArray(); } private static FileInfo[] GetFilesByFilter(string filter, string locPath) { DirectoryInfo d = new DirectoryInfo(locPath); FileInfo[] fi = d.GetFiles(filter); return fi; } private void CopyFiles(FileInfo[] files, string destPath) { foreach (FileInfo fi in files) { bool success = false; int i = 0; string copyToName = fi.Name; string copyToExt = fi.Extension; string copyToNameWithoutExt = Path.GetFileNameWithoutExtension(fi.FullName); while (!success && i < 100) { i++; try { if (File.Exists(Path.Combine(destPath, copyToName))) throw new CopyFileExistsException(); File.Copy(fi.FullName, Path.Combine(destPath, copyToName)); success = true; } catch (CopyFileExistsException ex) { copyToName = String.Format("{0} ({1}){2}", copyToNameWithoutExt, i, copyToExt); } } } } } public class CopyFileExistsException : Exception { public string Message; } }

    Read the article

  • Trouble calling a method from an external class

    - by Bradley Hobbs
    Here is my employee database program: import java.util.*; import java.io.*; import java.io.File; import java.io.FileReader; import java.util.ArrayList; public class P { //Instance Variables private static String empName; private static String wage; private static double wages; private static double salary; private static double numHours; private static double increase; // static ArrayList<String> ARempName = new ArrayList<String>(); // static ArrayList<Double> ARwages = new ArrayList<Double>(); // static ArrayList<Double> ARsalary = new ArrayList<Double>(); static ArrayList<Employee> emp = new ArrayList<Employee>(); public static void main(String[] args) throws Exception { clearScreen(); printMenu(); question(); exit(); } public static void printArrayList(ArrayList<Employee> emp) { for (int i = 0; i < emp.size(); i++){ System.out.println(emp.get(i)); } } public static void clearScreen() { System.out.println("\u001b[H\u001b[2J"); } private static void exit() { System.exit(0); } private static void printMenu() { System.out.println("\t------------------------------------"); System.out.println("\t|Commands: n - New employee |"); System.out.println("\t| c - Compute paychecks |"); System.out.println("\t| r - Raise wages |"); System.out.println("\t| p - Print records |"); System.out.println("\t| d - Download data |"); System.out.println("\t| u - Upload data |"); System.out.println("\t| q - Quit |"); System.out.println("\t------------------------------------"); System.out.println(""); } public static void question() { System.out.print("Enter command: "); Scanner q = new Scanner(System.in); String input = q.nextLine(); input.replaceAll("\\s","").toLowerCase(); boolean valid = (input.equals("n") || input.equals("c") || input.equals("r") || input.equals("p") || input.equals("d") || input.equals("u") || input.equals("q")); if (!valid){ System.out.println("Command was not recognized; please try again."); printMenu(); question(); } else if (input.equals("n")){ System.out.print("Enter the name of new employee: "); Scanner stdin = new Scanner(System.in); empName = stdin.nextLine(); System.out.print("Hourly (h) or salaried (s): "); Scanner stdin2 = new Scanner(System.in); wage = stdin2.nextLine(); wage.replaceAll("\\s","").toLowerCase(); if (!(wage.equals("h") || wage.equals("s"))){ System.out.println("Input was not h or s; please try again"); } else if (wage.equals("h")){ System.out.print("Enter hourly wage: "); Scanner stdin4 = new Scanner(System.in); wages = stdin4.nextDouble(); Employee emp1 = new HourlyEmployee(empName, wages); emp.add(emp1); printMenu(); question();} else if (wage.equals("s")){ System.out.print("Enter annual salary: "); Scanner stdin5 = new Scanner(System.in); salary = stdin5.nextDouble(); Employee emp1 = new SalariedEmployee(empName, salary); printMenu(); question();}} else if (input.equals("c")){ for (int i = 0; i < emp.size(); i++){ System.out.println("Enter number of hours worked by " + emp.get(i) + ":"); } Scanner stdin = new Scanner(System.in); numHours = stdin.nextInt(); System.out.println("Pay: " + emp1.computePay(numHours)); System.out.print("Enter number of hours worked by " + empName); Scanner stdin2 = new Scanner(System.in); numHours = stdin2.nextInt(); System.out.println("Pay: " + emp1.computePay(numHours)); printMenu(); question();} else if (input.equals("r")){ System.out.print("Enter percentage increase: "); Scanner stdin = new Scanner(System.in); increase = stdin.nextDouble(); System.out.println("\nNew Wages"); System.out.println("---------"); // System.out.println(Employee.toString()); printMenu(); question(); } else if (input.equals("p")){ printArrayList(emp); printMenu(); question(); } else if (input.equals("q")){ exit(); } } } Here is one of the class files: public abstract class Employee { private String name; private double wage; protected Employee(String name, double wage){ this.name = name; this.wage = wage; } public String getName() { return name; } public double getWage() { return wage; } public void setName(String name) { this.name = name; } public void setWage(double wage) { this.wage = wage; } public void percent(double wage, double percent) { wage *= percent; } } And here are the errors: P.java:108: cannot find symbol symbol : variable emp1 location: class P System.out.println("Pay: " + emp1.computePay(numHours)); ^ P.java:112: cannot find symbol symbol : variable emp1 location: class P System.out.println("Pay: " + emp1.computePay(numHours)); ^ 2 errors I'm trying to the get paycheck to print out but i'm having trouble with how to call the method. It should take the user inputed numHours and calculate it then print on the paycheck for each employee. Thanks!

    Read the article

< Previous Page | 38 39 40 41 42 43  | Next Page >