Search Results

Search found 192 results on 8 pages for 'iterative'.

Page 8/8 | < Previous Page | 4 5 6 7 8

C#: LINQ vs foreach - Round 1.

- by James Michael Hare

So I was reading Peter Kellner's blog entry on Resharper 5.0 and its LINQ refactoring and thought that was very cool. But that raised a point I had always been curious about in my head -- which is a better choice: manual foreach loops or LINQ? The answer is not really clear-cut. There are two sides to any code cost arguments: performance and maintainability. The first of these is obvious and quantifiable. Given any two pieces of code that perform the same function, you can run them side-by-side and see which piece of code performs better. Unfortunately, this is not always a good measure. Well written assembly language outperforms well written C++ code, but you lose a lot in maintainability which creates a big techncial debt load that is hard to offset as the application ages. In contrast, higher level constructs make the code more brief and easier to understand, hence reducing technical cost. Now, obviously in this case we're not talking two separate languages, we're comparing doing something manually in the language versus using a higher-order set of IEnumerable extensions that are in the System.Linq library. Well, before we discuss any further, let's look at some sample code and the numbers. First, let's take a look at the for loop and the LINQ expression. This is just a simple find comparison: // find implemented via LINQ public static bool FindViaLinq(IEnumerable<int> list, int target) { return list.Any(item => item == target); } // find implemented via standard iteration public static bool FindViaIteration(IEnumerable<int> list, int target) { foreach (var i in list) { if (i == target) { return true; } } return false; } Okay, looking at this from a maintainability point of view, the Linq expression is definitely more concise (8 lines down to 1) and is very readable in intention. You don't have to actually analyze the behavior of the loop to determine what it's doing. So let's take a look at performance metrics from 100,000 iterations of these methods on a List<int> of varying sizes filled with random data. For this test, we fill a target array with 100,000 random integers and then run the exact same pseudo-random targets through both searches. List<T> On 100,000 Iterations Method Size Total (ms) Per Iteration (ms) % Slower Any 10 26 0.00046 30.00% Iteration 10 20 0.00023 - Any 100 116 0.00201 18.37% Iteration 100 98 0.00118 - Any 1000 1058 0.01853 16.78% Iteration 1000 906 0.01155 - Any 10,000 10,383 0.18189 17.41% Iteration 10,000 8843 0.11362 - Any 100,000 104,004 1.8297 18.27% Iteration 100,000 87,941 1.13163 - The LINQ expression is running about 17% slower for average size collections and worse for smaller collections. Presumably, this is due to the overhead of the state machine used to track the iterators for the yield returns in the LINQ expressions, which seems about right in a tight loop such as this. So what about other LINQ expressions? After all, Any() is one of the more trivial ones. I decided to try the TakeWhile() algorithm using a Count() to get the position stopped like the sample Pete was using in his blog that Resharper refactored for him into LINQ: // Linq form public static int GetTargetPosition1(IEnumerable<int> list, int target) { return list.TakeWhile(item => item != target).Count(); } // traditionally iterative form public static int GetTargetPosition2(IEnumerable<int> list, int target) { int count = 0; foreach (var i in list) { if(i == target) { break; } ++count; } return count; } Once again, the LINQ expression is much shorter, easier to read, and should be easier to maintain over time, reducing the cost of technical debt. So I ran these through the same test data: List<T> On 100,000 Iterations Method Size Total (ms) Per Iteration (ms) % Slower TakeWhile 10 41 0.00041 128% Iteration 10 18 0.00018 - TakeWhile 100 171 0.00171 88% Iteration 100 91 0.00091 - TakeWhile 1000 1604 0.01604 94% Iteration 1000 825 0.00825 - TakeWhile 10,000 15765 0.15765 92% Iteration 10,000 8204 0.08204 - TakeWhile 100,000 156950 1.5695 92% Iteration 100,000 81635 0.81635 - Wow! I expected some overhead due to the state machines iterators produce, but 90% slower? That seems a little heavy to me. So then I thought, well, what if TakeWhile() is not the right tool for the job? The problem is TakeWhile returns each item for processing using yield return, whereas our for-loop really doesn't care about the item beyond using it as a stop condition to evaluate. So what if that back and forth with the iterator state machine is the problem? Well, we can quickly create an (albeit ugly) lambda that uses the Any() along with a count in a closure (if a LINQ guru knows a better way PLEASE let me know!), after all , this is more consistent with what we're trying to do, we're trying to find the first occurence of an item and halt once we find it, we just happen to be counting on the way. This mostly matches Any(). // a new method that uses linq but evaluates the count in a closure. public static int TakeWhileViaLinq2(IEnumerable<int> list, int target) { int count = 0; list.Any(item => { if(item == target) { return true; } ++count; return false; }); return count; } Now how does this one compare? List<T> On 100,000 Iterations Method Size Total (ms) Per Iteration (ms) % Slower TakeWhile 10 41 0.00041 128% Any w/Closure 10 23 0.00023 28% Iteration 10 18 0.00018 - TakeWhile 100 171 0.00171 88% Any w/Closure 100 116 0.00116 27% Iteration 100 91 0.00091 - TakeWhile 1000 1604 0.01604 94% Any w/Closure 1000 1101 0.01101 33% Iteration 1000 825 0.00825 - TakeWhile 10,000 15765 0.15765 92% Any w/Closure 10,000 10802 0.10802 32% Iteration 10,000 8204 0.08204 - TakeWhile 100,000 156950 1.5695 92% Any w/Closure 100,000 108378 1.08378 33% Iteration 100,000 81635 0.81635 - Much better! It seems that the overhead of TakeAny() returning each item and updating the state in the state machine is drastically reduced by using Any() since Any() iterates forward until it finds the value we're looking for -- for the task we're attempting to do. So the lesson there is, make sure when you use a LINQ expression you're choosing the best expression for the job, because if you're doing more work than you really need, you'll have a slower algorithm. But this is true of any choice of algorithm or collection in general. Even with the Any() with the count in the closure it is still about 30% slower, but let's consider that angle carefully. For a list of 100,000 items, it was the difference between 1.01 ms and 0.82 ms roughly in a List<T>. That's really not that bad at all in the grand scheme of things. Even running at 90% slower with TakeWhile(), for the vast majority of my projects, an extra millisecond to save potential errors in the long term and improve maintainability is a small price to pay. And if your typical list is 1000 items or less we're talking only microseconds worth of difference. It's like they say: 90% of your performance bottlenecks are in 2% of your code, so over-optimizing almost never pays off. So personally, I'll take the LINQ expression wherever I can because they will be easier to read and maintain (thus reducing technical debt) and I can rely on Microsoft's development to have coded and unit tested those algorithm fully for me instead of relying on a developer to code the loop logic correctly. If something's 90% slower, yes, it's worth keeping in mind, but it's really not until you start get magnitudes-of-order slower (10x, 100x, 1000x) that alarm bells should really go off. And if I ever do need that last millisecond of performance? Well then I'll optimize JUST THAT problem spot. To me it's worth it for the readability, speed-to-market, and maintainability.

Read the article
Does Test Driven Development (TDD) improve Quality and Correctness? (Part 1)

- by David V. Corbin

Since the dawn of the computer age, various methodologies have been introduced to improve quality and reduce cost. In this posting, I will by sharing my experiences with Test Driven Development; both its benefits and limitations. To start this topic, we need to agree on what TDD is. The first is to define each of the three words as used in this context. Test - An item or action which measures something in some quantifiable form. Driven - The primary motivation or focus of a series of activities (process) Development - All phases of a software project/product from concept through delivery. The above are very simple definitions that result in the following: "TDD is a process where the primary focus is on measuring and quantifying all aspects of the creation of a (software) product." There are many places where TDD is used outside of software development, even though it is not known by this name. Consider the (conventional) education process that most of us grew up on. The focus was to get the best grades as measured by different tests. Many of these tests measured rote memorization and not understanding of the subject matter. The result of this that many people graduated with high scores but without "quality and correctness" in their ability to utilize the subject matter (of course, the flip side is true where certain people DID understand the material but were not very good at taking this type of test). Returning to software development, let us look at some common scenarios. While these items are generally applicable regardless of platform, language and tools; the remainder of this post will utilize Microsoft Visual Studio and Team Foundation Server (TFS) for examples. It should be realized that everyone does at least some aspect of TDD. At the most rudimentary level, getting a program to compile involves a "pass/fail" measurement (is the syntax valid) that drives their ability to proceed further (run the program). Other developers may create "Unit Tests" in the belief that having a test for every method/property of a class and good code coverage is the goal of TDD. These items may be helpful and even important, but really only address a small aspect of the overall effort. To see TDD in a bigger view, lets identify the various activities that are part of the Software Development LifeCycle. These are going to be presented in a Waterfall style for simplicity, but each item also occurs within Iterative methodologies such as Agile/Scrum. the key ones here are: Requirements Gathering Architecture Design Implementation Quality Assurance Can each of these items be subjected to a process which establishes metrics (quantified metrics) that reflect both the quality and correctness of each item? It should be clear that conventional Unit Tests do not apply to all of these items; at best they can verify that a local aspect (e.g. a Class/Method) of implementation matches the (test writers perspective of) the appropriate design document. So what can we do? For each of area, the goal is to create tests that are quantifiable and durable. The ability to quantify the measurements (beyond a simple pass/fail) is critical to tracking progress(eventually measuring the level of success that has been achieved) and for providing clear information on what items need to be addressed (along with the appropriate time to address them - in varying levels of detail) . Durability is important so that the test can be reapplied (ideally in an automated fashion) over the entire cycle. Returning for a moment back to our "education example", one must also be careful of how the tests are organized and how the measurements are taken. If a test is in a multiple choice format, there is a significant statistical probability that a correct answer might be the result of a random guess. Also, in many situations, having the student simply provide a final answer can obscure many important elements. For example, on a math test, having the student simply provide a numeric answer (rather than showing the methodology) may result in a complete mismatch between the process and the result. It is hard to determine which is worse: The student who makes a simple arithmetric error at one step of a long process (resulting in a wrong answer) or The student who (without providing the "workflow") uses a completely invalid approach, yet still comes up with the right number. The "Wrong Process"/"Right Answer" is probably the single biggest problem in software development. Even very simple items can suffer from this. As an example consider the following code for a "straight line" calculation....Is it correct? (for Integral Points) int Solve(int m, int b, int x) { return m * x + b; } Most people would respond "Yes". But let's take the question one step further... Is it correct for all possible values of m,b,x??? (no fair if you cheated by being focused on the bolded text!) Without additional information regarding constrains on "the possible values of m,b,x" the answer must be NO, there is the risk of overflow/wraparound that will produce an incorrect result! To properly answer this question (i.e. Test the Code), one MUST be able to backtrack from the implementation through the design, and architecture all the way back to the requirements. And the requirement itself must be tested against the stakeholder(s). It is only when the bounding conditions are defined that it is possible to determine if the code is "Correct" and has "Quality". Yet, how many of us (myself included) have written such code without even thinking about it. In many canses we (think we) "know" what the bounds are, and that the code will be correct. As we all know, requirements change, "code reuse" causes implementations to be applied to different scenarios, etc. This leads directly to the types of system failures that plague so many projects. This approach to TDD is much more holistic than ones which start by focusing on the details. The fundamental concepts still apply: Each item should be tested. The test should be defined/implemented before (or concurrent with) the definition/implementation of the actual item. We also add concepts that expand the scope and alter the style by recognizing: There are many things beside "lines of code" that benefit from testing (measuring/evaluating in a formal way) Correctness and Quality can not be solely measured by "correct results" In the future parts, we will examine in greater detail some of the techniques that can be applied to each of these areas....

Read the article
SSIS: Deploying OLAP cubes using C# script tasks and AMO

- by DrJohn

As part of the continuing series on Building dynamic OLAP data marts on-the-fly, this blog entry will focus on how to automate the deployment of OLAP cubes using SQL Server Integration Services (SSIS) and Analysis Services Management Objects (AMO). OLAP cube deployment is usually done using the Analysis Services Deployment Wizard. However, this option was dismissed for a variety of reasons. Firstly, invoking external processes from SSIS is fraught with problems as (a) it is not always possible to ensure SSIS waits for the external program to terminate; (b) we cannot log the outcome properly and (c) it is not always possible to control the server's configuration to ensure the executable works correctly. Another reason for rejecting the Deployment Wizard is that it requires the 'answers' to be written into four XML files. These XML files record the three things we need to change: the name of the server, the name of the OLAP database and the connection string to the data mart. Although it would be reasonably straight forward to change the content of the XML files programmatically, this adds another set of complication and level of obscurity to the overall process. When I first investigated the possibility of using C# to deploy a cube, I was surprised to find that there are no other blog entries about the topic. I can only assume everyone else is happy with the Deployment Wizard! SSIS "forgets" assembly references If you build your script task from scratch, you will have to remember how to overcome one of the major annoyances of working with SSIS script tasks: the forgetful nature of SSIS when it comes to assembly references. Basically, you can go through the process of adding an assembly reference using the Add Reference dialog, but when you close the script window, SSIS "forgets" the assembly reference so the script will not compile. After repeating the operation several times, you will find that SSIS only remembers the assembly reference when you specifically press the Save All icon in the script window. This problem is not unique to the AMO assembly and has certainly been a "feature" since SQL Server 2005, so I am not amazed it is still present in SQL Server 2008 R2! Sample Package So let's take a look at the sample SSIS package I have provided which can be downloaded from here: DeployOlapCubeExample.zip Below is a screenshot after a successful run. Connection Managers The package has three connection managers: AsDatabaseDefinitionFile is a file connection manager pointing to the .asdatabase file you wish to deploy. Note that this can be found in the bin directory of you OLAP database project once you have clicked the "Build" button in Visual Studio TargetOlapServerCS is an Analysis Services connection manager which identifies both the deployment server and the target database name. SourceDataMart is an OLEDB connection manager pointing to the data mart which is to act as the source of data for your cube. This will be used to replace the connection string found in your .asdatabase file Once you have configured the connection managers, the sample should run and deploy your OLAP database in a few seconds. Of course, in a production environment, these connection managers would be associated with package configurations or set at runtime. When you run the sample, you should see that the script logs its activity to the output screen (see screenshot above). If you configure logging for the package, then these messages will also appear in your SSIS logging. Sample Code Walkthrough Next let's walk through the code. The first step is to parse the connection string provided by the TargetOlapServerCS connection manager and obtain the name of both the target OLAP server and also the name of the OLAP database. Note that the target database does not have to exist to be referenced in an AS connection manager, so I am using this as a convenient way to define both properties. We now connect to the server and check for the existence of the OLAP database. If it exists, we drop the database so we can re-deploy. svr.Connect(olapServerName); if (svr.Connected) { // Drop the OLAP database if it already exists Database db = svr.Databases.FindByName(olapDatabaseName); if (db != null) { db.Drop(); } // rest of script } Next we start building the XMLA command that will actually perform the deployment. Basically this is a small chuck of XML which we need to wrap around the large .asdatabase file generated by the Visual Studio build process. // Start generating the main part of the XMLA command XmlDocument xmlaCommand = new XmlDocument(); xmlaCommand.LoadXml(string.Format("<Batch Transaction='false' xmlns='http://schemas.microsoft.com/analysisservices/2003/engine'><Alter AllowCreate='true' ObjectExpansion='ExpandFull'><Object><DatabaseID>{0}</DatabaseID></Object><ObjectDefinition/></Alter></Batch>", olapDatabaseName)); Next we need to merge two XML files which we can do by simply using setting the InnerXml property of the ObjectDefinition node as follows: // load OLAP Database definition from .asdatabase file identified by connection manager XmlDocument olapCubeDef = new XmlDocument(); olapCubeDef.Load(Dts.Connections["AsDatabaseDefinitionFile"].ConnectionString); // merge the two XML files by obtain a reference to the ObjectDefinition node oaRootNode.InnerXml = olapCubeDef.InnerXml; One hurdle I had to overcome was removing detritus from the .asdabase file left by the Visual Studio build. Through an iterative process, I found I needed to remove several nodes as they caused the deployment to fail. The XMLA error message read "Cannot set read-only node: CreatedTimestamp" or similar. In comparing the XMLA generated with by the Deployment Wizard with that generated by my code, these read-only nodes were missing, so clearly I just needed to strip them out. This was easily achieved using XPath to find the relevant XML nodes, of which I show one example below: foreach (XmlNode node in rootNode.SelectNodes("//ns1:CreatedTimestamp", nsManager)) { node.ParentNode.RemoveChild(node); } Now we need to change the database name in both the ID and Name nodes using code such as: XmlNode databaseID = xmlaCommand.SelectSingleNode("//ns1:Database/ns1:ID", nsManager); if (databaseID != null) databaseID.InnerText = olapDatabaseName; Finally we need to change the connection string to point at the relevant data mart. Again this is easily achieved using XPath to search for the relevant nodes and then replace the content of the node with the new name or connection string. XmlNode connectionStringNode = xmlaCommand.SelectSingleNode("//ns1:DataSources/ns1:DataSource/ns1:ConnectionString", nsManager); if (connectionStringNode != null) { connectionStringNode.InnerText = Dts.Connections["SourceDataMart"].ConnectionString; } Finally we need to perform the deployment using the Execute XMLA command and check the returned XmlaResultCollection for errors before setting the Dts.TaskResult. XmlaResultCollection oResults = svr.Execute(xmlaCommand.InnerXml); // check for errors during deployment foreach (Microsoft.AnalysisServices.XmlaResult oResult in oResults) { foreach (Microsoft.AnalysisServices.XmlaMessage oMessage in oResult.Messages) { if ((oMessage.GetType().Name == "XmlaError")) { FireError(oMessage.Description); HadError = true; } } } If you are not familiar with XML programming, all this may all seem a bit daunting, but perceiver as the sample code is pretty short. If you would like the script to process the OLAP database, simply uncomment the lines in the vicinity of Process method. Of course, you can extend the script to perform your own custom processing and to even synchronize the database to a front-end server. Personally, I like to keep the deployment and processing separate as the code can become overly complex for support staff.If you want to know more, come see my session at the forthcoming SQLBits conference.

Read the article
Agilist, Heal Thyself!

- by Dylan Smith

I’ve been meaning to blog about a great experience I had earlier in the year at Prairie Dev Con Calgary. Myself and Steve Rogalsky did a session that we called “Agilist, Heal Thyself!”. We used a format that was new to me, but that Steve had seen used at another conference. What we did was start by asking the audience to give us a list of challenges they had had when adopting agile. We wrote them all down, then had everybody vote on the most interesting ones. Then we split into two groups, and each group was assigned one of the agile challenges. We had 20 minutes to discuss the challenge, and suggest solutions or approaches to improve things. At the end of the 20 minutes, each of the groups gave a brief summary of their discussion and learning's, then we mixed up the groups and repeated with another 2 challenges. The 2 groups I was part of had some really interesting discussions, and suggestions: Unfinished Stories at the end of Sprints The first agile challenge we tackled, was something that every single Scrum team I have worked with has struggled with. What happens when you get to the end of a Sprint, and there are some stories that are only partially completed. The team in question was getting very de-moralized as they felt that every Sprint was a failure as they never had a set of fully completed stories. How do you avoid this? and/or what do you do when it happens? There were 2 pieces of advice that were well received: 1. Try to bring stories to completion before starting new ones. This is advice I give all my Scrum teams. If you have a 3-week sprint, what happens all too often is you get to the end of week 2, and a lot of stories are almost done; but almost none are completely done. This is a Bad Thing. I encourage the teams I work with to only start a new story as a very last resort. If you finish your task look at the stories in progress and see if there’s anything you can do to help before moving onto a new story. In the daily standup, put a focus on seeing what stories got completed yesterday, if a few days go by with none getting completed, be sure this fact is visible to the team and do something about it. Something I’ve been doing recently is introducing WIP (Work In Progress) limits while using Scrum. My current team has 2-week sprints, and we usually have about a dozen or stories in a sprint. We instituted a WIP limit of 4 stories. If 4 stories have been started but not finished then nobody is allowed to start new stories. This made it obvious very quickly that our QA tasks were our bottleneck (we have 4 devs, but only 1.5 testers). The WIP limit forced the developers to start to pickup QA tasks before moving onto the next dev tasks, and we ended our sprints with many more stories completely finished than we did before introducing WIP limits. 2. Rather than using time-boxed sprints, why not just do away with them altogether and go to a continuous flow type approach like KanBan. Limit WIP to keep things under control, but don’t have a fixed time box at the end of which all tasks are supposed to be done. This eliminates the problem almost entirely. At some points in the project (releases) you need to be able to burn down all the half finished stories to get a stable release build, but this probably occurs less often than every sprint, and there are alternative approaches to achieve it using branching strategies rather than forcing your team to try to get to Zero WIP every 2-weeks (e.g. when you are ready for a release, create a new branch for any new stories, but finish all existing stories in the current branch and release it). Trying to Introduce Agile into a team with previous Bad Agile Experiences One of the agile adoption challenges somebody described, was he was in a leadership role on a team he had recently joined – lets call him Dave. This team was currently very waterfall in their ALM process, but they were about to start on a new green-field project. Dave wanted to use this new project as an opportunity to do things the “right way”, using an Agile methodology like Scrum, adopting TDD, automated builds, proper branching strategies, etc. The problem he was facing is everybody else on the team had previously gone through an “Agile Adoption” that was a horrible failure. Dave blamed this failure on the consultant brought in previously to lead this agile transition, but regardless of the reason, the team had very negative feelings towards agile, and was very resistant to trying it out again. Dave possibly had the authority to try to force the team to adopt Agile practices, but we all know that doesn’t work very well. What was Dave to do? Ultimately, the best advice was to question *why* did Dave want to adopt all these various practices. Rather than trying to convince his team that these were the “right way” to run a dev project, and trying to do a Big Bang approach to introducing change. He would be better served by identifying problems the team currently faces, have a discussion with the team to get everybody to agree that specific problems existed, then have an open discussion about ways to address those problems. This way Dave could incrementally introduce agile practices, and he doesn’t even need to identify them as “agile” practices if he doesn’t want to. For example, when we discussed with Dave, he said probably the teams biggest problem was long periods without feedback from users, then finding out too late that the software is not going to meet their needs. Rather than Dave jumping right to introducing Scrum and all it entails, it would be easier to get buy-in from team if he framed it as a discussion of existing problems, and brainstorming possible solutions. And possibly most importantly, don’t try to do massive changes all at once with a team that has not bought-into those changes. Taking an incremental approach has a greater chance of success. I see something similar in my day job all the time too. Clients who for one reason or another claim to not be fans of agile (or not ready for agile yet). But then they go on to ask me to help them get shorter feedback cycles, quicker delivery cycles, iterative development processes, etc. It’s kind of funny at times, sometimes you just need to phrase the suggestions in terms they are using and avoid the word “agile”. PS – I haven’t blogged all that much over the past couple of years, but in an attempt to motivate myself, a few of us have accepted a blogger challenge. There’s 6 of us who have all put some money into a pool, and the agreement is that we each need to blog at least once every 2-weeks. The first 2-week period that we miss we’re eliminated. Last person standing gets the money. So expect at least one blog post every couple of weeks for the near future (I hope!). And check out the blogs of the other 5 people in this blogger challenge: Steve Rogalsky: http://winnipegagilist.blogspot.ca Aaron Kowall: http://www.geekswithblogs.net/caffeinatedgeek Tyler Doerkson: http://blog.tylerdoerksen.com David Alpert: http://www.spinthemoose.com Dave White: http://www.agileramblings.com (note: site not available yet. should be shortly or he owes me some money!)

Read the article
Webcast Q&A: ING on How to Scale Role Management and Compliance

- by Tanu Sood

Thanks to all who attended the live webcast we hosted on ING: Scaling Role Management and Access Certifications to Thousands of Applications on Wed, April 11th. Those of you who couldn’t join us, the webcast replay is now available. Many thanks to our guest speaker, Mark Robison, Enterprise Architect at ING for walking us through ING’s drivers and rationale for the platform approach, the phased implementation strategy, results & metrics, roadmap and recommendations. We greatly appreciate the insight he shared with us all on the deployment synergies between Oracle Identity Manager (OIM) and Oracle Identity Analytics (OIA) to enforce streamlined user and role management and scalable compliance. Mark was also kind enough to walk us through specific solutions features that helped ING manage the problem of role explosion and implement closed loop remediation. Our host speaker, Neil Gandhi, Principal Product Manager, Oracle rounded off the presentation by discussing common use cases and deployment scenarios we see organizations implement to automate user/identity administration and enforce closed-loop scalable compliance. Neil also called out the specific features in Oracle Identity Analytics 11gR1 that cater to expediting and streamlining compliance processes such as access certifications. While we tackled a few questions during the webcast, we have captured the responses to those that we weren’t able to get to here; our sincere thanks to Mark Robison for taking the time to respond to questions specific to ING’s implementation and strategy. Q. Did you include business friendly entitlment descriptions, or is the business seeing application descriptors A. We include very business friendly descriptions. The OIA tool has the facility to allow this. Q. When doing attestation on job change, who is in the workflow to review and confirm that the employee should continue to have access? Is that a best practice? A. The new and old manager are in the workflow. The tool can check for any Separation of Duties (SOD) violations with both having similiar accesses. It may not be a best practice, but it is a reality of doing your old and new job for a transition period on a transfer. Q. What versions of OIM and OIA are being used at ING? A. OIM 11gR1 and OIA 11gR1; the very latest versions available. Q. Are you using an entitlements / role catalog? A. Yes. We use both roles and entitlements. Q. What specific unexpected benefits did the Identity Warehouse provide ING? A. The most unanticipated was to help Legal Hold identify user ID's in the various applications. Other benefits included providing a one stop shop for all aggregated ID information. Q. How fine grained are your application and entitlements? Did OIA, OIM support that level of granularity? A. We have some very fine grained entitlements, but we role this up into approved Roles to allow for easier management. For managing very fine grained entitlements, Oracle offers the Oracle Entitlement Server. We currently do not own this software but are considering it. Q. Do you allow any individual access or is everything truly role based? A. We are a hybrid environment with roles and individual positive and negative entitlements Q. Did you use an Agile methodology like scrum to deliver functionality during your project? A. We started with waterfall, but used an agile approach to provide benefits after the initial implementation Q. How did you handle rolling out the standard ID format to existing users? A. We just used the standard IDs for new users. We have not taken on a project to address the existing nonstandard IDs. Q. To avoid role explosion, how do you deal with apps that require more than a couple of entitlement TYPES? For example, an app may have different levels of access and it may need to know the user's country/state to associate them with particular customers. A. We focus on the functional user and craft the role around their daily job requirements. The role captures the required application entitlements. To keep role explosion down, we use role mining in OIA and also meet and interview the business. It is an iterative process to get role consensus. Q. Great presentation! How many rounds of Certifications has ING performed so far? A. Around 7 quarters and constant certifications on transfer. Q. Did you have executive support from the top down A. Yes The executive support was key to our success. Q. For your cloud instance are you using OIA or OIM as SaaS? A. No. We are just provisioning and deprovisioning to various Cloud providers. (Service Now is an example) Q. How do you ensure a role owner does not get more priviliges as are intended and thus violates another role, e,g, a DBA Roles should not get tor rigt to run somethings as root, as this would affect the root role? A. We have SOD checks. Also all Roles are initially approved by external audit and the role owners have to certify the roles and any changes Q. What is your ratio of employees to roles? A. We are still in process going through our various lines of business, so I do not have a final ratio. From what we have seen, the ratio varies greatly depending on the Line of Business and the diversity of Job Functions. For standardized lines of business such as call centers, the ratio is very good where we can have a single role that covers many employees. For specialized lines of business like treasury, it can be one or two people per role. Q. Is ING using Oracle On Demand service ? A. No Q. Do you have to implement or migrate to OIM in order to get the Identity Warehouse, or can OIA provide the identity warehouse as well if you haven't reached OIM yet? A. No, OIM deployment is not required to implement OIA’s Identity Warehouse but as you heard during the webcast, there are tremendous deployment synergies in deploying both OIA and OIM together. Q. When is the Security Governor product coming out? A. Oracle Security Governor for Healthcare is available today. Hope you enjoyed the webcast and we look forward to having you join us for the next webcast in the Customers Talk: Identity as a Platform webcast series: Toyota: Putting Customers First – Identity Platform as a Business Enabler Wednesday, May 16th at 10 am PST/ 1 pm EST Register Today You can also register for a live event at a city near you where Aberdeen’s Derek Brink will discuss the survey results from the recently published report “Analyzing Platform vs. Point Solution Approach in Identity”. And, you can do a quick (& free) online assessment of your identity programs by benchmarking it against the 160 organizations surveyed in the Aberdeen report, compliments of Oracle. Here’s the slide deck from our ING webcast: ING webcast platform View more presentations from OracleIDM

Read the article
Bug Triage

In this blog post brain dump, I'll attempt to describe the process my team tries to follow when dealing with new bug reports (specifically, code defect reports). This is not official Microsoft policy, just the way we do things… if you do things differently and want to share, you can do so at the bottom in the comments (or on your blog).Feature Triage TeamA subset of the feature crew, the triage team (which has representations from the PM, Dev and QA disciplines), looks at all unassigned bugs at regular intervals. This can be weekly or daily (or other frequency) dependent on which part of the product cycle we are in and what the untriaged bug load looks like. They discuss each bug considering the evidence and make a decision of whether the bug goes from Not Yet Assigned to Assigned (plus the name of the DEV to fix this) or whether it goes from Active to Resolved (which means it gets assigned back to the requestor for closure or further debate if they were not present at the triage meeting). Close to critical milestones, the feature triage team needs to further justify bugs they take to additional higher-level triage teams.Bug Opened = Not Yet AssignedSomeone (typically an SDET from the QA team) creates the bug item (e.g. in TFS), ensuring they populate all the relevant fields including: Title, Description, Repro Steps (including the Actual Result at the end of the steps), attachments of code and/or screenshots, Build number that they observed the issue in, regression details if applicable, how it was found, if a test case exists or needs to be created etc. They also indicate their opinion on the Priority and Severity. The bug status is left as Not Yet Assigned."Issue" versus "Fix for issue"The solution to some bugs is easy to determine, e.g. "bug: the column name is misspelled". Obviously the fix is to correct the spelling – still, the triage team should be explicit and enter the correct spelling in the bug's Description. Note that a bad bug name here would be "bug: fix the spelling of the column" (it describes the solution, rather than the problem).Other solutions are trickier to establish, e.g. "bug: the column header is not accessible (can only be clicked on with the mouse, not reached via keyboard)". What is the correct solution here? The last thing to do is leave this undetermined and just assign it to a developer. The solution has to be entered in the description. Behind this type of a bug usually hides a spec defect or a new feature request.The person opening the bug should focus on describing the issue, rather than the solution. The person indicates what the fix is in their opinion by stating the Expected Result (immediately after stating the Actual Result). If they have a complex suggested solution, that should be split out in a separate part, but the triage team has the final say before assigning it. If the solution is lengthy/complicated to describe, the bug can be assigned to the PM. Note: the strict interpretation suggests that any bug with no clear, obvious solution is always a hole in the spec and should always go to the PM. This also ensures the spec gets updated.Not Yet Assigned - Not Yet Assigned (on someone else's plate)If the bug is observed in our feature, but the cause is actually another team, we change the Area Path (which is the way we identify teams in TFS) and leave it as Not Yet Assigned. The triage team may add more comments as appropriate including potentially changing the repro steps. In some cases, we may even resolve the bug in our area path and open a new bug in the area path of the other team.Even though there is no action on a dev on the team, the bug still needs to be tracked. One way of doing this is to implement some notification system that informs the team when the tracked bug changed status; another way is to occasionally run a global query (against all area paths) for bugs that have been opened by a member of the team and follow up with the current owners for stale bugs.Not Yet Assigned - ResolvedThis state transition can only be made by the Feature Triage Team.0. Sometimes the bug description is not clear and in that case it gets Resolved as More Information Needed, so the original requestor can provide it.After understanding what the bug item is about, the first decision is to determine whether it needs to go to a dev.1. If it is a known bug, it gets resolved as "Duplicate" and linked to the existing bug.2. If it is "By Design" it gets resolved as such, indicating that the triage team does not think this is a bug.3. If the bug does not repro on latest bits, it is resolved as "No Repro"4. The most painful: If it is decided that we cannot fix it for this release it gets resolved as "Postponed" or "Won't Fix". The former is typically due to resources and time constraints, while the latter is due to deciding that it is not important enough to consume our resources in any release (yes, not all bugs must be fixed!). For both cases, there are other factors that contribute to the decision such as: existence of a reasonable workaround, frequency we expect users to encounter the issue, dependencies on other team to offer a solution, whether it breaks a core scenario, whether it prohibits customer feedback on a major feature, is it a regression from a previous release, impact of the fix on other partner teams (e.g. User Education, User Experience, Localization/Globalization), whether this is the right fix, does the fix impact performance goals, and last but not least, severity of bug (e.g. loss of customer data, security threat, crash, hang). The bar for fixing a bug goes up as the release date approaches. The triage team becomes hardnosed about which bugs to take, while the developers are busy resolving assigned bugs thus everyone drives for Zero Bug Bounce (ZBB). ZBB is when you have 0 active bugs older than 48 hours.Not Yet Assigned - AssignedIf the bug is something we decide to fix in this release and the solution is known, then it is assigned to a DEV. This is either the developer that will do the work, or a Lead that can further assign it to one of his developer team based on a load balancing algorithm of their choosing.Sometimes, the triage team needs the dev to do some investigation work before deciding whether to take the fix; similarly, the checkin for the fix may be gated on code review by the triage team. In these cases, these instructions are provided in the comments section of the bug and when the developer is done they notify the triage team for final decision.Additionally, a Priority and Severity (from 0 to 4) has to be entered, e.g. a P0 means "drop anything you are doing and fix this now" whereas a P4 is something you get to after all P0,1,2,3 bugs are fixed.From a testing perspective, if the bug was found through ad-hoc testing or an external team, the decision is made whether test cases should be added to avoid future regressions. This is communicated to the QA team.Assigned - ResolvedWhen the developer receives the bug (they should be checking daily for new bugs on their plate looking at bugs in order of priority and from older to newer) they can send it back to triage if the information is not clear. Otherwise, they investigate the bug, setting the Sub Status to "Investigating"; if they cannot make progress, they set the Sub Status to "Blocked" and discuss this with triage or whoever else can help them get unblocked. Once they are unblocked, they set the Sub Status to "Working on Solution"; once they are code complete they send a code review request, setting the Sub Status to "Fix Available". After the iterative code review process is over and everyone is happy with the fix, the developer checks it in and changes the state of the bug from Active (and Assigned to them) to Resolved (and Assigned to someone else).The developer needs to ensure that when the status is changed to Resolved that it is assigned to a QA person. For example, maybe the PM opened the bug, but it should be a QA person that will verify the fix - the developer needs to manually change the assignee in that case. Typically the QA person will send an email to the original requestor notifying them that the fix is verified.Resolved - ??In all cases above, note that the final state was Resolved. What happens after that? The final step should be Closed. The bug is closed once the QA person verifying the fix is happy with it. If the person is not happy, then they change the state from Resolved to Active, thus sending it back to the developer. If the developer and QA person cannot reach agreement, then triage can be brought into it. An easy way to do that is change the status back to Not Yet Assigned with appropriate comments so the triage team can re-review.It is important to note that only QA can close a bug. That means that if the opener of the bug was a PM, when the bug gets resolved by the dev it may land on the PM's plate and after a quick review, the PM would re-assign to an SDET, which is the only role that can close bugs. One exception to this is if the person that filed the bug is external: in that case, we leave it Resolved and assigned to them and also send them a notification that they need to verify the fix. Another exception is if specialized developer knowledge is needed for verifying the bug fix (e.g. it was a refactoring suggestion bug typically not observable by the user) in which case it is fine to have a developer verify the fix, and ideally a different developer to the one that opened the bug.Other links on bug triageA quick search reveals that others have talked about this subject, e.g. here, here, here, here and here.Your take?If you have other best practices your team uses to deal with incoming bug reports, feel free to share in the comments below or on your blog. Comments about this post welcome at the original blog.

Read the article
Dotfuscator Deep Dive with WP7

- by Bil Simser

I thought I would share some experience with code obfuscation (specifically the Dotfuscator product) and Windows Phone 7 apps. These days twitter is a buzz with black hat and white operations coming out about how the marketplace is insecure and Microsoft failed, blah, blah, blah. So it’s that much more important to protect your intellectual property. You should protect it no matter what when releasing apps into the wild but more so when someone is paying for them. You want to protect the time and effort that went into your code and have some comfort that the casual hacker isn’t going to usurp your next best thing. Enter code obfuscation. Code obfuscation is one tool that can help protect your IP. Basically it goes into your compiled assemblies, rewrites things at an IL level (like renaming methods and classes and hiding logic flow) and rewrites it back so that the assembly or executable is still fully functional but prying eyes using a tool like ILDASM or Reflector can’t see what’s going on. You can read more about code obfuscation here on Wikipedia. A word to the wise. Code obfuscation isn’t 100% secure. More so on the WP7 platform where the OS expects certain things to be as they were meant to be. So don’t expect 100% obfuscation of every class and every method and every property. It’s just not going to happen. What this does do is give you some level of protection but don’t put all your eggs in one basket and call it done. Like I said, this is just one step in the process. There are a few tools out there that provide code obfuscation and support the Windows Phone 7 platform (see links to other tools at the end of this post). One such tool is Dotfuscator from PreEmptive solutions. The thing about Dotfuscator is that they’ve struck a deal with Microsoft to provide a *free* copy of their commercial product for Windows Phone 7. The only drawback is that it only runs until March 31, 2010. However it’s a good place to start and the focus of this article. Getting Started When you fire up Dotfuscator you’re presented with a dialog to start a new project or load a previous one. We’ll start with a new project. You’re then looking at a somewhat blank screen that shows an Input tab (among others) and you’re probably wondering what to do? Click on the folder icon (first one) and browse to where your xap file is. At this point you can save the project and click on the arrow to start the process. Bam! You’re done. Right? Think again. The program did indeed run and create a new version of your xap (doing it’s thing and rewriting back your *obfuscated* assemblies) but let’s take a look at the assembly in Reflector to see the end result. Remember a xap file is really just a glorified zip file (or cab file if you prefer). When you ran Dotfuscator for the first time with the default settings you’ll see it created a new version of your xap in a folder under “My Documents” called “Dotfuscated” (you can configure the output directory in settings). Here’s the new xap file. Since a xap is just a zip, rename it to .cab or .zip or something and open it with your favorite unarchive program (I use WinRar but it doesn’t matter as long as it can unzip files). If you already have the xap file associated with your unarchive tool the rename isn’t needed. Once renamed extract the contents of the xap to your hard drive: Now you’ll have a folder with the contents of the xap file extracted: Double click or load up your assembly (WindowsPhoneDataBoundApplication1.dll in the example) in Reflector and let’s see the results: Hmm. That doesn’t look right. I can see all the methods and the code is all there for my LoadData method I wanted to protect. Product failure. Let’s return it for a refund. Hold your horses. We need to check out the settings in the program first. Remember when we loaded up our xap file. It started us on the Input tab but there was a settings tab before that. Wonder what it does? Here’s the default settings: Renaming Taking a closer look, all of the settings in Feature are disabled. WTF? Yeah, it leaves me scratching my head why an obfuscator by default doesn’t obfuscate. However it’s a simple fix to change these settings. Let’s enable Renaming as it sounds like a good start. Renaming obscures code by renaming methods and fields to names that are not understandable. Great. Run the tool again and go through the process of unzipping the updated xap and let’s take a look in Reflector again at our project. This looks a lot better. Lots of methods named a, b, c, d, etc. That’ll help slow hackers down a bit. What about our logic that we spent days weeks on? Let’s take a look at the LoadData method: What gives? We have renaming enabled but all of our code is still there. If you look through all your methods you’ll find it’s still sitting there out in the open. Control Flow Back to the settings page again. Let’s enable Control Flow now. Control Flow obfuscation synthesizes branching, conditional, and iterative constructs (such as if, for, and while) that produce valid executable logic, but yield non-deterministic semantic results when decompilation is attempted. In other words, the code runs as before, but decompilers cannot reproduce the original code. Do the dance again and let’s see the results in Reflector. Ahh, that’s better. Methods renamed *and* nobody can look at our LoadData method now. Life is good. More than Minimum This is the bare minimum to obfuscate your xap to at least a somewhat comfortable level. However I did find that while this worked in my Hello World demo, it didn’t work on one of my real world apps. I had to do some extra tweaking with that. Below are the screens that I used on one app that worked. I’m not sure what it was about the app that the approach above didn’t work with (maybe the extra assembly?) but it works and I’m happy with it. YMMV. Remember to test your obfuscated app on your device first before submitting to ensure you haven’t obfuscated the obfuscator. settings tab: rename tab: string encryption tab: premark tab: A few final notes Play with the settings and keep bumping up the bar to try to get as much obfuscation as you can. The more the better but remember you can overdo it. Always (always, always, always) deploy your obfuscated xap to your device and test it before submitting to the marketplace. I didn’t and got rejected because I had gone overboard with the obfuscation so the app wouldn’t launch at all. Not everything is going to be obfuscated. Specifically I don’t see a way to obfuscate auto properties and a few other language features. Again, if you crank the settings up you might hide these but I haven’t spent a lot of time optimizing the process. Some people might say to obfuscate your xaml using string encryption but again, test, test, test. Xaml is picky so too much obfuscation (or any) might disable your app or produce odd rendering effets. Remember, obfuscation is not 100% secure! Don’t rely on it as a sole way of protecting your assets. Other Tools Dotfuscator is one just product and isn’t the end-all be-all to obfuscation so check out others below. For example, Crypto can make it so Reflector doesn’t even recognize the app as a .NET one and won’t open it. Others can encrypt resources and Xaml markup files. Here are some other obfuscators that support the Windows Phone 7 platform. Feel free to give them a try and let people know your experience with them! Dotfuscator Windows Phone Edition Crypto Obfuscator for .NET DeepSea Obfuscation

Read the article
Load and Web Performance Testing using Visual Studio Ultimate 2010-Part 2

- by Tarun Arora

Welcome back, in part 1 of Load and Web Performance Testing using Visual Studio 2010 I talked about why Performance Testing the application is important, the test tools available in Visual Studio Ultimate 2010 and various test rig topologies. In this blog post I’ll get into the details of web performance & load tests as well as why it’s important to follow a goal based pattern while performance testing your application. Tools => Options => Test Tools Have you visited the treasures of Visual Studio Menu bar tools => Options => Test Tools lately? The options to enable disable prompts on creating, editing, deleting or running manual/automated tests can be controller from here. The default test project language and default test types created on a new test project creation could be selected/unselected from here. Ever wondered how you can change the default limit of 25 test results, this can again be changed from here. If you record a lot of Web Tests and wish for the web test recorder to start with “that” URL populated, well this again can be specified from here. If you haven’t so far, I would urge you to spend 2 minutes in the test tools options. Test Menu => Ready Steady Test Action! The Test tools are under the Test Menu in Visual Studio, apart from being able to create a new Test and Test List you can also load an existing vsmdi file. You can also manage your test controllers from here. A solution can have one or more test setting files, but there can only be one active test settings file at any time. Again, this selection can be done from here. You can open the various test windows from under the windows option from the test menu. If you open the Test view window you will see that you have the option to group the tests by work items, project, test type, etc. You can set these properties by right clicking a test in the test list and choosing properties from the context menu. So, what is a vsmdi file? vsmdi stands for Visual Studio Test Metadata File. Placed under the Solution Items this file keeps track of the list of unit tests in your solution. If you open the vsmdi file as an xml file you will see a series of Test Links nested with in the list Test List tags along with the Run Configuration tag. When in visual studio you run tests, the IDE looks at the vsmdi file to see what tests need to be run. You also have the option of using the vsmdi file in your team builds to specify which tests need to run as part of the build. Refer here for a walkthrough from a fellow blogger on how to use the vsmdi file in the team builds. Web Performance Test – The Truth! In Visual Studio 2010 “Web Tests” have been renamed to “Web Performance Tests”. Apart from renaming this test type there have been several improvements to this test type in visual studio 2010. I am very active on the MSDN Visual Studio And Load Testing forum and a frequent question from many users is “Do Web Tests support Pages that run JavaScript?” I will start with a little bit of background before answering this question. Web Performance Tests operate at the HTTP Layer, but why? To enable you to generate high loads with a relatively low amount of hardware, Web performance tests are driven at the protocol layer rather than instantiating a browser.The most common source of confusion is that users do not realize Web Performance Tests work at the HTTP layer. The tool adds to that misconception. After all, you record in IE, and when running a Web test you can select which browser to use, and then the result viewer shows the results in a browser window. So that means the tests run through the browser, right? NO! The Web test engine works at the HTTP layer, and does not instantiate a browser. What does that mean? In the diagram below, you can see there are no browsers running when the engine is sending and receiving requests. Does that mean I can’t test pages that use Java script? The best example for java script generating HTTP traffic is AJAX calls. The most common example of browser plugins are Silverlight or Flash. The Web test recorder will record HTTP traffic from AJAX calls and from most (but not all) browser plugins. This means you will still be able to web performance test pages that use java script or plugin and play back the results but the playback engine will not show the java script or plug in results in the ‘browser control’. If you want to test the page behaviour as a result of the java script or plug in consider using Coded UI Tests. This page looks like it failed, when in fact it succeeded! Looking closely at the response, and subsequent requests, it is clear the operation succeeded. As stated above, the reason why the browser control is pasting this message is because java script has been disabled in this control. So, to reiterate, the web performance test recorder: - Sends and receives data at the HTTP layer. - Does NOT run a browser. - Does NOT run java script. - Does NOT host ActiveX controls or plugins. There is a great series of blog posts from Ed Glas, i would highly recommend his blog to any one performing Load/Performance testing through Visual Studio. Demo – Web Performance Test [Demo] - Visual Studio Ultimate 2010: Test Settings and Configuration [Demo]–Visual Studio Ultimate 2010: Web Performance Test In this short video I try and answer the following questions, Why is performance Testing important? How does Visual Studio Help you performance Test your applications? How do i record a web performance test? How do make a web performance test data driven, transaction driven, loop driven, convert to code, add validations? Best practices for recording Web Performance Tests. I have a web performance test, what next? Creating the Web Performance Test was the first step towards load testing your application. Now that we have the base test we can test the page behaviour when N-users access the page. Have you ever had the head of business call you and mention that the marketing team has done a fantastic job and are expecting increased traffic on the web site, can the website survive the weekend with that additional load? This is the perfect opportunity to capacity test your application to see how your website holds up under various levels of load, you can work the results backwards to see how much hardware you may need to scale up your application to survive the weekend. Apart from that it is always a good idea to have some benchmarks around how the application performs under light loads for short duration, under heavy load for long duration and soak test the application run a constant load for a very week or two to record the effects of constant load for really long durations, this is a great way of identifying how your application handles the default IIS application pool reset which by default is configured to once every 25 hours. These bench marks will act as the perfect yard stick to measure performance gains when you start making improvements. BUT there are some best practices! => Goal Based Load Testing Approach Since the subject is vast and there are a lot of things to measure and analyse, … it is very easy to get distracted from the real goal! You can optimize your application once you know where the pain points are. There is no point performing a load test of 5000 users if your intranet application will only have a 100 simultaneous users, it is important to keep focussed on the real goals of the project. So the idea is to have a user story around your load testing scenarios and test realistically. So it is recommended that you follow the below outline, It is an Iterative process, refine your objectives, identify the key scenarios, what is the expected workload, key metrics you want to report, record the web performance tests, simulate load and analyse results. Is your application already deployed in Production? This is great! You can analyse the IIS Logs to understand the user behaviour… But what are IIS LOGS? The IIS logs allow you to record events for each application and Web site on the Web server. You can create separate logs for each of your applications and Web sites. Logging information in IIS goes beyond the scope of the event logging or performance monitoring features provided by Windows. The IIS logs can include information, such as who has visited your site, what the visitor viewed, and when the information was last viewed. You can use the IIS logs to identify any attempts to gain unauthorized access to your Web server. How to configure IIS LOGS? For those Ninjas who already have IIS Logs configured (by the way its on by default) and need a way to analyse the IIS Logs, can use the Windows IIS Utility – Log Parser. Log Parser is a very powerful tool that provides a generic SQL-like language on top of many types of data like IIS Logs, Event Viewer entries, XML files, CSV files, File System and others; and it allows you to export the result of the queries to many output formats such as CSV, XML, SQL Server, Charts and others; and it works well with IIS 5, 6, 7 and 7.5. Frequently used Log Parser queries. Demo – Load Test [Demo]–Visual Studio Ultimate 2010: Load Testing In this short video I try and answer the following questions, - Types of Performance Testing? - Perform Goal driven Load Testing, analyse Test Run Result and Generate a report? Recap A quick recap of what we have covered so far, Thank you for taking the time out and reading this blog post, in part III of this blog series I’ll be getting into the details of Test Result Analysis, Test Result Drill through, Test Report Generation, Test Run Comparison, and the Asp.net Profiler. If you enjoyed the post, remember to subscribe to http://feeds.feedburner.com/TarunArora. Questions/Feedback/Suggestions, etc please leave a comment. See you on in Part III Share this post : CodeProject

Read the article
Solving Big Problems with Oracle R Enterprise, Part I

- by dbayard

Abstract: This blog post will show how we used Oracle R Enterprise to tackle a customer’s big calculation problem across a big data set. Overview: Databases are great for managing large amounts of data in a central place with rigorous enterprise-level controls. R is great for doing advanced computations. Sometimes you need to do advanced computations on large amounts of data, subject to rigorous enterprise-level concerns. This blog post shows how Oracle R Enterprise enables R plus the Oracle Database enabled us to do some pretty sophisticated calculations across 1 million accounts (each with many detailed records) in minutes. The problem: A financial services customer of mine has a need to calculate the historical internal rate of return (IRR) for its customers’ portfolios. This information is needed for customer statements and the online web application. In the past, they had solved this with a home-grown application that pulled trade and account data out of their data warehouse and ran the calculations. But this home-grown application was not able to do this fast enough, plus it was a challenge for them to write and maintain the code that did the IRR calculation. IRR – a problem that R is good at solving: Internal Rate of Return is an interesting calculation in that in most real-world scenarios it is impractical to calculate exactly. Rather, IRR is a calculation where approximation techniques need to be used. In this blog post, we will discuss calculating the “money weighted rate of return” but in the actual customer proof of concept we used R to calculate both money weighted rate of returns and time weighted rate of returns. You can learn more about the money weighted rate of returns here: http://www.wikinvest.com/wiki/Money-weighted_return First Steps- Calculating IRR in R We will start with calculating the IRR in standalone/desktop R. In our second post, we will show how to take this desktop R function, deploy it to an Oracle Database, and make it work at real-world scale. The first step we did was to get some sample data. For a historical IRR calculation, you have a balances and cash flows. In our case, the customer provided us with several accounts worth of sample data in Microsoft Excel. The above figure shows part of the spreadsheet of sample data. The data provides balances and cash flows for a sample account (BMV=beginning market value. FLOW=cash flow in/out of account. EMV=ending market value). Once we had the sample spreadsheet, the next step we did was to read the Excel data into R. This is something that R does well. R offers multiple ways to work with spreadsheet data. For instance, one could save the spreadsheet as a .csv file. In our case, the customer provided a spreadsheet file containing multiple sheets where each sheet provided data for a different sample account. To handle this easily, we took advantage of the RODBC package which allowed us to read the Excel data sheet-by-sheet without having to create individual .csv files. We wrote ourselves a little helper function called getsheet() around the RODBC package. Then we loaded all of the sample accounts into a data.frame called SimpleMWRRData. Writing the IRR function At this point, it was time to write the money weighted rate of return (MWRR) function itself. The definition of MWRR is easily found on the internet or if you are old school you can look in an investment performance text book. In the customer proof, we based our calculations off the ones defined in the The Handbook of Investment Performance: A User’s Guide by David Spaulding since this is the reference book used by the customer. (One of the nice things we found during the course of this proof-of-concept is that by using R to write our IRR functions we could easily incorporate the specific variations and business rules of the customer into the calculation.) The key thing with calculating IRR is the need to solve a complex equation with a numerical approximation technique. For IRR, you need to find the value of the rate of return (r) that sets the Net Present Value of all the flows in and out of the account to zero. With R, we solve this by defining our NPV function: where bmv is the beginning market value, cf is a vector of cash flows, t is a vector of time (relative to the beginning), emv is the ending market value, and tend is the ending time. Since solving for r is a one-dimensional optimization problem, we decided to take advantage of R’s optimize method (http://stat.ethz.ch/R-manual/R-patched/library/stats/html/optimize.html). The optimize method can be used to find a minimum or maximum; to find the value of r where our npv function is closest to zero, we wrapped our npv function inside the abs function and asked optimize to find the minimum. Here is an example of using optimize: where low and high are scalars that indicate the range to search for an answer. To test this out, we need to set values for bmv, cf, t, emv, tend, low, and high. We will set low and high to some reasonable defaults. For example, this account had a negative 2.2% money weighted rate of return. Enhancing and Packaging the IRR function With numerical approximation methods like optimize, sometimes you will not be able to find an answer with your initial set of inputs. To account for this, our approach was to first try to find an answer for r within a narrow range, then if we did not find an answer, try calling optimize() again with a broader range. See the R help page on optimize() for more details about the search range and its algorithm. At this point, we can now write a simplified version of our MWRR function. (Our real-world version is more sophisticated in that it calculates rate of returns for 5 different time periods [since inception, last quarter, year-to-date, last year, year before last year] in a single invocation. In our actual customer proof, we also defined time-weighted rate of return calculations. The beauty of R is that it was very easy to add these enhancements and additional calculations to our IRR package.)To simplify code deployment, we then created a new package of our IRR functions and sample data. For this blog post, we only need to include our SimpleMWRR function and our SimpleMWRRData sample data. We created the shell of the package by calling: To turn this package skeleton into something usable, at a minimum you need to edit the SimpleMWRR.Rd and SimpleMWRRData.Rd files in the \man subdirectory. In those files, you need to at least provide a value for the “title” section. Once that is done, you can change directory to the IRR directory and type at the command-line: The myIRR package for this blog post (which has both SimpleMWRR source and SimpleMWRRData sample data) is downloadable from here: myIRR package Testing the myIRR package Here is an example of testing our IRR function once it was converted to an installable package: Calculating IRR for All the Accounts So far, we have shown how to calculate IRR for a single account. The real-world issue is how do you calculate IRR for all of the accounts?This is the kind of situation where we can leverage the “Split-Apply-Combine” approach (see http://www.cscs.umich.edu/~crshalizi/weblog/815.html). Given that our sample data can fit in memory, one easy approach is to use R’s “by” function. (Other approaches to Split-Apply-Combine such as plyr can also be used. See http://4dpiecharts.com/2011/12/16/a-quick-primer-on-split-apply-combine-problems/). Here is an example showing the use of “by” to calculate the money weighted rate of return for each account in our sample data set. Recap and Next Steps At this point, you’ve seen the power of R being used to calculate IRR. There were several good things: R could easily work with the spreadsheets of sample data we were given R’s optimize() function provided a nice way to solve for IRR- it was both fast and allowed us to avoid having to code our own iterative approximation algorithm R was a convenient language to express the customer-specific variations, business-rules, and exceptions that often occur in real-world calculations- these could be easily added to our IRR functions The Split-Apply-Combine technique can be used to perform calculations of IRR for multiple accounts at once. However, there are several challenges yet to be conquered at this point in our story: The actual data that needs to be used lives in a database, not in a spreadsheet The actual data is much, much bigger- too big to fit into the normal R memory space and too big to want to move across the network The overall process needs to run fast- much faster than a single processor The actual data needs to be kept secured- another reason to not want to move it from the database and across the network And the process of calculating the IRR needs to be integrated together with other database ETL activities, so that IRR’s can be calculated as part of the data warehouse refresh processes In our next blog post in this series, we will show you how Oracle R Enterprise solved these challenges.

Read the article
Software development is (mostly) a trade, and what to do about it

- by Jeff

(This is another cross-post from my personal blog. I don’t even remember when I first started to write it, but I feel like my opinion is well enough baked to share.) I've been sitting on this for a long time, particularly as my opinion has changed dramatically over the last few years. That I've encountered more crappy code than maintainable, quality code in my career as a software developer only reinforces what I'm about to say. Software development is just a trade for most, and not a huge academic endeavor. For those of you with computer science degrees readying your pitchforks and collecting your algorithm interview questions, let me explain. This is not an assault on your way of life, and if you've been around, you know I'm right about the quality problem. You also know the HR problem is very real, or we wouldn't be paying top dollar for mediocre developers and importing people from all over the world to fill the jobs we can't fill. I'm going to try and outline what I see as some of the problems, and hopefully offer my views on how to address them. The recruiting problem I think a lot of companies are doing it wrong. Over the years, I've had two kinds of interview experiences. The first, and right, kind of experience involves talking about real life achievements, followed by some variation on white boarding in pseudo-code, drafting some basic system architecture, or even sitting down at a comprooder and pecking out some basic code to tackle a real problem. I can honestly say that I've had a job offer for every interview like this, save for one, because the task was to debug something and they didn't like me asking where to look ("everyone else in the company died in a plane crash"). The other interview experience, the wrong one, involves the classic torture test designed to make the candidate feel stupid and do things they never have, and never will do in their job. First they will question you about obscure academic material you've never seen, or don't care to remember. Then they'll ask you to white board some ridiculous algorithm involving prime numbers or some kind of string manipulation no one would ever do. In fact, if you had to do something like this, you'd Google for a solution instead of waste time on a solved problem. Some will tell you that the academic gauntlet interview is useful to see how people respond to pressure, how they engage in complex logic, etc. That might be true, unless of course you have someone who brushed up on the solutions to the silly puzzles, and they're playing you. But here's the real reason why the second experience is wrong: You're evaluating for things that aren't the job. These might have been useful tactics when you had to hire people to write machine language or C++, but in a world dominated by managed code in C#, or Java, people aren't managing memory or trying to be smarter than the compilers. They're using well known design patterns and techniques to deliver software. More to the point, these puzzle gauntlets don't evaluate things that really matter. They don't get into code design, issues of loose coupling and testability, knowledge of the basics around HTTP, or anything else that relates to building supportable and maintainable software. The first situation, involving real life problems, gives you an immediate idea of how the candidate will work out. One of my favorite experiences as an interviewee was with a guy who literally brought his work from that day and asked me how to deal with his problem. I had to demonstrate how I would design a class, make sure the unit testing coverage was solid, etc. I worked at that company for two years. So stop looking for algorithm puzzle crunchers, because a guy who can crush a Fibonacci sequence might also be a guy who writes a class with 5,000 lines of untestable code. Fashion your interview process on ways to reveal a developer who can write supportable and maintainable code. I would even go so far as to let them use the Google. If they want to cut-and-paste code, pass on them, but if they're looking for context or straight class references, hire them, because they're going to be life-long learners. The contractor problem I doubt anyone has ever worked in a place where contractors weren't used. The use of contractors seems like an obvious way to control costs. You can hire someone for just as long as you need them and then let them go. You can even give them the work that no one else wants to do. In practice, most places I've worked have retained and budgeted for the contractor year-round, meaning that the $90+ per hour they're paying (of which half goes to the person) would have been better spent on a full-time person with a $100k salary and benefits. But it's not even the cost that is an issue. It's the quality of work delivered. The accountability of a contractor is totally transient. They only need to deliver for as long as you keep them around, and chances are they'll never again touch the code. There's no incentive for them to get things right, there's little incentive to understand your system or learn anything. At the risk of making an unfair generalization, craftsmanship doesn't matter to most contractors. The education problem I don't know what they teach in college CS courses. I've believed for most of my adult life that a college degree was an essential part of being successful. Of course I would hold that bias, since I did it, and have the paper to show for it in a box somewhere in the basement. My first clue that maybe this wasn't a fully qualified opinion comes from the fact that I double-majored in journalism and radio/TV, not computer science. Eventually I worked with people who skipped college entirely, many of them at Microsoft. Then I worked with people who had a masters degree who sucked at writing code, next to the high school diploma types that rock it every day. I still think there's a lot to be said for the social development of someone who has the on-campus experience, but for software developers, college might not matter. As I mentioned before, most of us are not writing compilers, and we never will. It's actually surprising to find how many people are self-taught in the art of software development, and that should reveal some interesting truths about how we learn. The first truth is that we learn largely out of necessity. There's something that we want to achieve, so we do what I call just-in-time learning to meet those goals. We acquire knowledge when we need it. So what about the gaps in our knowledge? That's where the most valuable education occurs, via our mentors. They're the people we work next to and the people who write blogs. They are critical to our professional development. They don't need to be an encyclopedia of jargon, but they understand the craft. Even at this stage of my career, I probably can't tell you what SOLID stands for, but you can bet that I practice the principles behind that acronym every day. That comes from experience, augmented by my peers. I'm hell bent on passing that experience to others. Process issues If you're a manager type and don't do much in the way of writing code these days (shame on you for not messing around at least), then your job is to isolate your tradespeople from nonsense, while bringing your business into the realm of modern software development. That doesn't mean you slap up a white board with sticky notes and start calling yourself agile, it means getting all of your stakeholders to understand that frequent delivery of quality software is the best way to deal with change and evolving expectations. It also means that you have to play technical overlord to make sure the education and quality issues are dealt with. That's why I make the crack about sticky notes, because without the right technique being practiced among your code monkeys, you're just a guy with sticky notes. You're asking your business to accept frequent and iterative delivery, now make sure that the folks writing the code can handle the same thing. This means unit testing, the right instrumentation, integration tests, automated builds and deployments... all of the stuff that makes it easy to see when change breaks stuff. The prognosis I strongly believe that education is the most important part of what we do. I'm encouraged by things like The Starter League, and it's the kind of thing I'd love to see more of. I would go as far as to say I'd love to start something like this internally at an existing company. Most of all though, I can't emphasize enough how important it is that we mentor each other and share our knowledge. If you have people on your staff who don't want to learn, fire them. Seriously, get rid of them. A few months working with someone really good, who understands the craftsmanship required to build supportable and maintainable code, will change that person forever and increase their value immeasurably.

Read the article
CodePlex Daily Summary for Sunday, May 27, 2012

CodePlex Daily Summary for Sunday, May 27, 2012Popular ReleasesMS CRM Rich Text box: MS CRM Rich Text box: This release contains the final JavaScript for this plug-in. It is tested and verified. Even if someone is unable to use it, can contact me. Suggestions and bug-notifications are always most welcome.Nivo Slider Web Part SharePoint 2010: Nivo Slider Web Part WSP: Web Part encapsulating nivo slider jquery web part. Download the wsp for one click install. Edit the property of the web part to point to any image library and all done. Web part includes jQuery and nivo jquery library. No configuration is required. This web part is a SharePoint 2010 farm solution. Scope for installation is site collection.iPDC - Free Phasor Data Concentrator: iPDC-v1.3.0: For more info see the iPDC-v1.3.0-Release_Notes document. Changes in iPDC-1.3 : Now iPDC has a centralized file structure. Only a single file for each iPDC and that will store with iPDC-ID. File structure will be explained in release notes document. A setup file for a iPDC will contains the information about: iPDC Server, Connected Source Devices, Connected Destination Devices, and finally configuration frames of sources. Because of this single Setup File previously generated ....Net Code Samples: Code Samples: Code samples (SLNs).Tweetz - Windows Twitter Client Gadget: Tweetz 3.1.5.4: Changed from screen names to regular names in all timelines except search Minor correction in German translation Updated Italian translationSubExtractor: Release 1027: Fix: out-of-memory exception when reading DVDs with very large (over 1GB) cells Fix: AltGr key toggling Italics during OCR Feature: use centered alignment SSA tag for centered text in the upper part of the frame Feature: increased number of subtitle tracks visible in Choose Subtitles step listbox Feature: allow change of palette for entire movieLINQ_Koans: LinqKoans v.02: Cleaned up a bitAutoFixture: 2.11.1: This is an automatically published release created from the latest successful build. Versioning is based on Semantic Versioning. Read more here: http://blog.ploeh.dk/2011/09/06/AutoFixtureGoesContinuousDeliveryWithSemanticVersioning.aspxBlueGem: BlueGem-v1.B Source: Description This is the source of BlueGem -v1.B.XML Schema Documenter: 1.9.4.0: Compatibility fixes for SHFB 1.9.4.0 and Visual Studio 11 Third-Party References Reference Required Used For Sandcastle Help File Builder (version 1.9.4.0) Yes Underlying platform Windows Installer XML (WiX) v3.6 RC0 No Only required to build source code ZXMAK2: Version 2.6.2.1: - fix contended timings for ULA 48/128 early/late - small refactoring for ULA code - fix mistake in timing for CBXX opcodes (thanks to Pegaz for report) - save early/late flag when saving to SZX (works with original ULA's 48/128 only) Use ZXMAK-SPRINTER-2621 package to get emulator with SPRINTER files (select Sprinter model from VM->Settings->Wizard)totalem: version 2012.05.25.1: Beta version speed improvements memory usage improvements smoothness list scrollingJayData - The cross-platform HTML5 data-management library for JavaScript: JayData 1.0 RC1 Refresh 2: JayData is a unified data access library for JavaScript developers to query and update data from different sources like webSQL, indexedDB, OData, Facebook or YQL. See it in action in this 6 minutes video: http://www.youtube.com/watch?v=LlJHgj1y0CU RC1 R2 Release highlights Knockout.js integrationUsing the Knockout.js module, your UI can be automatically refreshed when the data model changes, so you can develop the front-end of your data manager app even faster. Querying 1:N relations in W...Christoc's DotNetNuke Module Development Template: 00.00.08 for DNN6: BEFORE USE YOU need to install the MSBuild Community Tasks available from http://msbuildtasks.tigris.org For best results you should configure your development environment as described in this blog post Then read this latest blog post about customizing and using these custom templates. Installation is simple To use this template place the ZIP (not extracted) file in your My Documents\Visual Studio 2010\Templates\ProjectTemplates\Visual C#\Web OR for VB My Documents\Visual Studio 2010\Te...Microsoft Ajax Minifier: Microsoft Ajax Minifier 4.53: fix issue #18106, where member operators on numeric literals caused the member part to be duplicated when not minifying numeric literals ADD NEW FEATURE: ability to create source map files! The first mapfile format to be supported is the Script# format. Use the new -map filename switch to create map files when building your sources.myManga: Initial Release - Version 1.0.0.1 - BETA: Leave a Review! This is the initial release of myManga. Please report any bugs. NOTE: There is a bug with MangaReader.net where images are reported 403 Forbidden, this is NOT the fault of myManga but, myManga will through an error and will not download the image.BlackJumboDog: Ver5.6.3: 2012.05.22 Ver5.6.3 　(1) HTTP????????、ftp://??????????????????????LogicCircuit: LogicCircuit 2.12.5.22: Logic Circuit - is educational software for designing and simulating logic circuits. Intuitive graphical user interface, allows you to create unrestricted circuit hierarchy with multi bit buses, debug circuits behavior with oscilloscope, and navigate running circuits hierarchy. Changes of this versionThis release is fixing start up issue.Orchard Project: Orchard 1.4.2: This is a service release to address 1.4 and 1.4.1 bugs. Please read our release notes for Orchard 1.4.2: http://docs.orchardproject.net/Documentation/Orchard-1-4-Release-NotesSharePoint Euro 2012 - UEFA European Football Predictor: havivi.euro2012.wsp (1.0): New fetures:View other users predictions Hide/Show background image (web part property) Installing SharePoint Euro 2012 PredictorSharePoint Euro 2012 Predictor has been developed as a SharePoint Sandbox solution to support SharePoint Online (Office 365) Download the solution havivi.euro2012.wsp from the download page: Downloads Upload this solution to your Site Collection via the solutions area. Click on Activate to make the web parts in the solution available for use in the Site C...New ProjectsAdvStopWatch: This is an wpf project to build a stop watchBlueGem: BlueGem is a simple Rich Text Editor. It also has a system resource monitor.Db4o Extensions: Db4o Extensions is a .NET library to ease database routines like creating composite keys, defining deletion behavior, data validation and transparent persistance.Financial Advisor Toolbox: An attempt to build a set of tools that can be used by a Financial Advisor to help automate some of their daily tasks. The initial release will aim to include "client management" to track clients and their financial profile. ImpresionesJL: This is a great project!K-Dock: K-Dock is a basic WPF library built to allow developers to implement a docking system in their applications. Written in Visual Basic. mouse: ignoreopencv: a personal git mirror for opencvOpenLaunch: OpenLaunch is a new way to make your companies' software more popular. By using a simple framework, developers can integrate their software into OpenLaunch by either their existing project, or by a new project. All help on this project is greatly appreciated, and we are looking for a web developer to create a web interface to the store.Smart Setup: Smart Setup is a PowerShell-based deployment program with using a XML document. It is very flexible with supporting multiple extensions and different configurations for different environment. It's smart and light.UusAhi: Teeme jälle ahju!Win7 Style TreeView: A TreeView with win7 styleWindows Phone 7 Feedback Control: Windows Phone user control for sending feedback directly from the applicationWindows Phone SignalR Helper: Windows Phone SignalR Helper makes it easier for WPDevs to leverage SignalR in making connected real-time Windows Phone applications. Various modes of communication between phone and server will be stubbed out - like real-time Mapping, Chat, Stocks, Game Scores, Object Sync etc.; Re-use or extend to make it work for your own needs. Iterative feature addition planned. Hope this helps!

Read the article
Is Fast Enumeration messing with my text output?

- by Dan Ray

Here I am iterating through an array of NSDictionary objects (inside the parsed JSON response of the EXCELLENT MapQuest directions API). I want to build up an HTML string to put into a UIWebView. My code says: for (NSDictionary *leg in legs ) { NSString *thisLeg = [NSString stringWithFormat:@"<br>%@ - %@", [leg valueForKey:@"narrative"], [leg valueForKey:@"distance"]]; NSLog(@"This leg's string is %@", thisLeg); [directionsOutput appendString:thisLeg]; } The content of directionsOutput (which is an NSMutableString) contains ALL the values for [leg valueForKey:@"narrative"], wrapped up in parens, followed by a hyphen, followed by all the parenthesized values for [leg valueForKey:@"distance"]. So I put in that NSLog call... and I get the same thing there! It appears that the for() is somehow batching up our output values as we iterate, and putting out the output only once. How do I make it not do this but instead do what I actually want, which is an iterative output as I iterate? Here's what NSLog gets. Yes, I know I need to figure out NSNumberFormatter. ;-) This leg's string is ( "Start out going NORTH on INFINITE LOOP.", "Turn LEFT to stay on INFINITE LOOP.", "Turn RIGHT onto N DE ANZA BLVD.", "Merge onto I-280 S toward SAN JOSE.", "Merge onto CA-87 S via EXIT 3A.", "Take the exit on the LEFT.", "Merge onto CA-85 S via EXIT 1A on the LEFT toward GILROY.", "Merge onto US-101 S via EXIT 1A on the LEFT toward LOS ANGELES.", "Take the CA-152 E/10TH ST exit, EXIT 356.", "Turn LEFT onto CA-152/E 10TH ST/PACHECO PASS HWY. Continue to follow CA-152/PACHECO PASS HWY.", "Turn SLIGHT RIGHT.", "Turn SLIGHT RIGHT onto PACHECO PASS HWY/CA-152 E. Continue to follow CA-152 E.", "Merge onto I-5 S toward LOS ANGELES.", "Take the CA-46 exit, EXIT 278, toward LOST HILLS/WASCO.", "Turn LEFT onto CA-46/PASO ROBLES HWY. Continue to follow CA-46.", "Merge onto CA-99 S toward BAKERSFIELD.", "Merge onto CA-58 E via EXIT 24 toward TEHACHAPI/MOJAVE.", "Merge onto I-15 N via the exit on the LEFT toward I-40/LAS VEGAS.", "Keep RIGHT to take I-40 E via EXIT 140A toward NEEDLES (Passing through ARIZONA, NEW MEXICO, TEXAS, OKLAHOMA, and ARKANSAS, then crossing into TENNESSEE).", "Merge onto I-40 E via EXIT 12C on the LEFT toward NASHVILLE (Crossing into NORTH CAROLINA).", "Merge onto I-40 BR E/US-421 S via EXIT 188 on the LEFT toward WINSTON-SALEM.", "Take the CLOVERDALE AVE exit, EXIT 4.", "Turn LEFT onto CLOVERDALE AVE SW.", "Turn SLIGHT LEFT onto N HAWTHORNE RD.", "Turn RIGHT onto W NORTHWEST BLVD.", "1047 W NORTHWEST BLVD is on the LEFT." ) - ( 0.0020000000949949026, 0.07800000160932541, 0.14000000059604645, 7.827000141143799, 5.0329999923706055, 0.15299999713897705, 5.050000190734863, 20.871000289916992, 0.3050000071525574, 2.802999973297119, 0.10199999809265137, 37.78000259399414, 124.50700378417969, 0.3970000147819519, 25.264001846313477, 20.475000381469727, 125.8580093383789, 4.538000106811523, 1693.0350341796875, 628.8970336914062, 3.7990000247955322, 0.19099999964237213, 0.4099999964237213, 0.257999986410141, 0.5170000195503235, 0 )

Read the article
Faster method for Matrix vector product for large matrix in C or C++ for use in GMRES

- by user35959

I have a large, dense matrix A, and I aim to find the solution to the linear system Ax=b using an iterative method (in MATLAB was the plan using its built in GMRES). For more than 10,000 rows, this is too much for my computer to store in memory, but I know that the entries in A are constructed by two known vectors x and y of length N and the entries satisfy: A(i,j) = .5*(x[i]-x[j])^2+([y[i]-y[j])^2 * log(x[i]-x[j])^2+([y[i]-y[j]^2). MATLAB's GMRES command accepts as input a function call that can compute the matrix vector product A*x, which allows me to handle larger matrices than I can store in memory. To write the matrix-vecotr product function, I first tried this in matlab by going row by row and using some vectorization, but I avoid spawning the entire array A (since it would be too large). This was fairly slow unfortnately in my application for GMRES. My plan was to write a mex file for MATLAB to, which is in C, and ideally should be significantly faster than the matlab code. I'm rather new to C, so this went rather poorly and my naive attempt at writing the code in C was slower than my partially vectorized attempt in Matlab. #include <math.h> #include "mex.h" void Aproduct(double *x, double *ctrs_x, double *ctrs_y, double *b, mwSize n) { mwSize i; mwSize j; double val; for (i=0; i<n; i++) { for (j=0; j<i; j++) { val = pow(ctrs_x[i]-ctrs_x[j],2)+pow(ctrs_y[i]-ctrs_y[j],2); b[i] = b[i] + .5* val * log(val) * x[j]; } for (j=i+1; j<n; j++) { val = pow(ctrs_x[i]-ctrs_x[j],2)+pow(ctrs_y[i]-ctrs_y[j],2); b[i] = b[i] + .5* val * log(val) * x[j]; } } } The above is the computational portion of the code for the matlab mex file (which is slightly modified C, if I understand correctly). Please note that I skip the case i=j, since in that case the variable val will be a 0*log(0), which should be interpreted as 0 for me, so I just skip it. Is there a more efficient or faster way to write this? When I call this C function via the mex file in matlab, it is quite slow, slower even than the matlab method I used. This surprises me since I suspected that C code should be much faster than matlab. The alternative matlab method which is partially vectorized that I am comparing it with is function Ax = Aprod(x,ctrs) n = length(x); Ax = zeros(n,1); for j=1:(n-3) v = .5*((ctrs(j,1)-ctrs(:,1)).^2+(ctrs(j,2)-ctrs(:,2)).^2).*log((ctrs(j,1)-ctrs(:,1)).^2+(ctrs(j,2)-ctrs(:,2)).^2); v(j)=0; Ax(j) = dot(v,x(1:n-3); end (the n-3 is because there is actually 3 extra components, but they are dealt with separately,so I excluded that code). This is partly vectorized and only needs one for loop, so it makes some sense that it is faster. However, I was hoping I could go even faster with C+mex file. Any suggestions or help would be greatly appreciated! Thanks! EDIT: I should be more clear. I am open to any faster method that can help me use GMRES to invert this matrix that I am interested in, which requires a faster way of doing the matrix vector product without explicitly loading the array into memory. Thanks!

Read the article
Java code optimization leads to numerical inaccuracies and errors

- by rano

I'm trying to implement a version of the Fuzzy C-Means algorithm in Java and I'm trying to do some optimization by computing just once everything that can be computed just once. This is an iterative algorithm and regarding the updating of a matrix, the clusters x pixels membership matrix U, this is the update rule I want to optimize: where the x are the element of a matrix X (pixels x features) and v belongs to the matrix V (clusters x features). And m is a parameter that ranges from 1.1 to infinity. The distance used is the euclidean norm. If I had to implement this formula in a banal way I'd do: for(int i = 0; i < X.length; i++) { int count = 0; for(int j = 0; j < V.length; j++) { double num = D[i][j]; double sumTerms = 0; for(int k = 0; k < V.length; k++) { double thisDistance = D[i][k]; sumTerms += Math.pow(num / thisDistance, (1.0 / (m - 1.0))); } U[i][j] = (float) (1f / sumTerms); } } In this way some optimization is already done, I precomputed all the possible squared distances between X and V and stored them in a matrix D but that is not enough, since I'm cycling througn the elements of V two times resulting in two nested loops. Looking at the formula the numerator of the fraction is independent of the sum so I can compute numerator and denominator independently and the denominator can be computed just once for each pixel. So I came to a solution like this: int nClusters = V.length; double exp = (1.0 / (m - 1.0)); for(int i = 0; i < X.length; i++) { int count = 0; for(int j = 0; j < nClusters; j++) { double distance = D[i][j]; double denominator = D[i][nClusters]; double numerator = Math.pow(distance, exp); U[i][j] = (float) (1f / (numerator * denominator)); } } Where I precomputed the denominator into an additional column of the matrix D while I was computing the distances: for (int i = 0; i < X.length; i++) { for (int j = 0; j < V.length; j++) { double sum = 0; for (int k = 0; k < nDims; k++) { final double d = X[i][k] - V[j][k]; sum += d * d; } D[i][j] = sum; D[i][B.length] += Math.pow(1 / D[i][j], exp); } } By doing so I encounter numerical differences between the 'banal' computation and the second one that leads to different numerical value in U (not in the first iterates but soon enough). I guess that the problem is that exponentiate very small numbers to high values (the elements of U can range from 0.0 to 1.0 and exp , for m = 1.1, is 10) leads to ver y small values, whereas by dividing the numerator and the denominator and THEN exponentiating the result seems to be better numerically. The problem is it involves much more operations. Am I doing something wrong? Is there a possible solution to get both the code optimized and numerically stable? Any suggestion or criticism will be appreciated.

Read the article
Generating strongly biased radom numbers for tests

- by nobody

I want to run tests with randomized inputs and need to generate 'sensible' random numbers, that is, numbers that match good enough to pass the tested function's preconditions, but hopefully wreak havoc deeper inside its code. math.random() (I'm using Lua) produces uniformly distributed random numbers. Scaling these up will give far more big numbers than small numbers, and there will be very few integers. I would like to skew the random numbers (or generate new ones using the old function as a randomness source) in a way that strongly favors 'simple' numbers, but will still cover the whole range, I.e. extending up to positive/negative infinity (or ±1e309 for double). This means: numbers up to, say, ten should be most common, integers should be more common than fractions, numbers ending in 0.5 should be the most common fractions, followed by 0.25 and 0.75; then 0.125, and so on. A different description: Fix a base probability x such that probabilities will sum to one and define the probability of a number n as xk where k is the generation in which n is constructed as a surreal number1. That assigns x to 0, x2 to -1 and +1, x3 to -2, -1/2, +1/2 and +2, and so on. This gives a nice description of something close to what I want (it skews a bit too much), but is near-unusable for computing random numbers. The resulting distribution is nowhere continuous (it's fractal!), I'm not sure how to determine the base probability x (I think for infinite precision it would be zero), and computing numbers based on this by iteration is awfully slow (spending near-infinite time to construct large numbers). Does anyone know of a simple approximation that, given a uniformly distributed randomness source, produces random numbers very roughly distributed as described above? I would like to run thousands of randomized tests, quantity/speed is more important than quality. Still, better numbers mean less inputs get rejected. Lua has a JIT, so performance can't be reasonably predicted. Jumps based on randomness will break every prediction, and many calls to math.random() will be slow, too. This means a closed formula will be better than an iterative or recursive one. 1 Wikipedia has an article on surreal numbers, with a nice picture. A surreal number is a pair of two surreal numbers, i.e. x := {n|m}, and its value is the number in the middle of the pair, i.e. (for finite numbers) {n|m} = (n+m)/2 (as rational). If one side of the pair is empty, that's interpreted as increment (or decrement, if right is empty) by one. If both sides are empty, that's zero. Initially, there are no numbers, so the only number one can build is 0 := { | }. In generation two one can build numbers {0| } =: 1 and { |0} =: -1, in three we get {1| } =: 2, {|1} =: -2, {0|1} =: 1/2 and {-1|0} =: -1/2 (plus some more complex representations of known numbers, e.g. {-1|1} ? 0). Note that e.g. 1/3 is never generated by finite numbers because it is an infinite fraction – the same goes for floats, 1/3 is never represented exactly.

Read the article
The Product Owner

- by Robert May

In a previous post, I outlined the rules of Scrum. This post details one of those rules. Picking a most important part of Scrum is difficult. All of the rules are required, but if there were one rule that is “more” required that every other rule, its having a good Product Owner. Simply put, the Product Owner can make or break the project. Duties of the Product Owner A Product Owner has many duties and responsibilities. I’ll talk about each of these duties in detail below. A Product Owner: Discovers and records stories for the backlog. Prioritizes stories in the Product Backlog, Release Backlog and Iteration Backlog. Determines Release dates and Iteration Dates. Develops story details and helps the team understand those details. Helps QA to develop acceptance tests. Interact with the Customer to make sure that the product is meeting the customer’s needs. Discovers and Records Stories for the Backlog When I do Scrum, I always use User Stories as the means for capturing functionality that’s required in the system. Some people will use Use Cases, but the same rule applies. The Product Owner has the ultimate responsibility for figuring out what functionality will be in the system. Many different mechanisms for capturing this input can be used. User interviews are great, but all sources should be considered, including talking with Customer Support types. Often, they hear what users are struggling with the most and are a great source for stories that can make the application easier to use. Care should be taken when soliciting user stories from technical types such as programmers and the people that manage them. They will almost always give stories that are very technical in nature and may not have a direct benefit for the end user. Stories are about adding value to the company. If the stories don’t have direct benefit to the end user, the Product Owner should question whether or not the story should be implemented. In general, technical stories should be included as tasks in User Stories. Technical stories are often needed, but the ultimate value to the user is in user based functionality, so technical stories should be considered nothing more than overhead in providing that user functionality. Until the iteration prior to development, stories should be nothing more than short, one line placeholders. An exercise called Story Planning can be used to brainstorm and come up with stories. I’ll save the description of this activity for another blog post. For more information on User Stories, please read the book User Stories Applied by Mike Cohn. Prioritizes Stories in the Product Backlog, Release Backlog and Iteration Backlog Prioritization of stories is one of the most difficult tasks that a Product Owner must do. A key concept of Scrum done right is the need to have the team working from a single set of prioritized stories. If the team does not have a single set of prioritized stories, Scrum will likely fail at your organization. The Product Owner is the ONLY person who has the responsibility to prioritize that list. The Product Owner must be very diplomatic and sincerely listen to the people around him so that he can get the priorities correct. Just listening will still not yield the proper priorities. Care must also be taken to ensure that Return on Investment is also considered. Ultimately, determining which stories give the most value to the company for the least cost is the most important factor in determining priorities. Product Owners should be willing to look at cold, hard numbers to determine the order for stories. Even when many people want a feature, if that features is costly to develop, it may not have as high of a return on investment as features that are cheaper, but not as popular. The act of prioritization often causes conflict in an environment. Customer Service thinks that feature X is the most important, because it will stop people from calling. Operations thinks that feature Y is the most important, because it will stop servers from crashing. Developers think that feature Z is most important because it will make writing software much easier for them. All of these are useful goals, but the team can have only one list of items, and each item must have a priority that is different from all other stories. The Product Owner will determine which feature gives the best return on investment and the other features will have to wait their turn, which means that someone will not have their top priority feature implemented first. A weak Product Owner will refuse to do prioritization. I’ve heard from multiple Product Owners the following phrase, “Well, it’s all got to be done, so what does it matter what order we do it in?” If your product owner is using this phrase, you need a new Product Owner. Order is VERY important. In Scrum, every release is potentially shippable. If the wrong priority items are developed, then the value added in each release isn’t what it should be. Additionally, the Product Owner with this mindset doesn’t understand Agile. A product is NEVER finished, until the company has decided that it is no longer a going concern and they are no longer going to sell the product. Therefore, prioritization isn’t an event, its something that continues every day. The logical extension of the phrase “It’s all got to be done” is that you will never ship your product, since a product is never “done.” Once stories have been prioritized, assigning them to the Release Backlog and the Iteration Backlog becomes relatively simple. The top priority items are copied into the respective backlogs in order and the task is complete. The team does have the right to shuffle things around a little in the iteration backlog. For example, they may determine that working on story C with story A is appropriate because they’re related, even though story B is technically a higher priority than story C. Or they may decide that story B is too big to complete in the time available after Story A has tasks created, so they’ll work on Story C since it’s smaller. They can’t, however, go deep into the backlog to pick stories to implement. The team and the Product Owner should work together to determine what’s best for the company. Prioritization is time consuming, but its one of the most important things a Product Owner does. Determines Release Dates and Iteration Dates Product owners are responsible for determining release dates for a product. A common misconception that Product Owners have is that every “release” needs to correspond with an actual release to customers. This is not the case. In general, releases should be no more than 3 months long. You may decide to release the product to the customers, and many companies do release the product to customers, but it may also be an internal release. If a release date is too far away, developers will fall into the trap of not feeling a sense of urgency. The date is far enough away that they don’t need to give the release their full attention. Additionally, important tasks, such as performance tuning, regression testing, user documentation, and release preparation, will not happen regularly, making them much more difficult and time consuming to do. The more frequently you do these tasks, the easier they are to accomplish. The Product Owner will be a key participant in determining whether or not a release should be sent out to the customers. The determination should be made on whether or not the features contained in the release are valuable enough and complete enough that the customers will see real value in the release. Often, some features will take more than three months to get them to a state where they qualify for a release or need additional supporting features to be released. The product owner has the right to make this determination. In addition to release dates, the Product Owner also will help determine iteration dates. In general, an iteration length should be chosen and the team should follow that iteration length for an extended period of time. If the iteration length is changed every iteration, you’re not doing Scrum. Iteration lengths help the team and company get into a rhythm of developing quality software. Iterations should be somewhere between 2 and 4 weeks in length. Any shorter, and significant software will likely not be developed. Any longer, and the team won’t feel urgency and planning will become very difficult. Iterations may not be extended during the iteration. Companies where Scrum isn’t really followed will often use this as a strategy to complete all stories. They don’t want to face the harsh reality of what their true performance is, and looking good is more important than seeking visibility and improving the process and team. Companies like this typically don’t allow failure. This is unhealthy. Failure is part of life and unless we learn from it, we can’t improve. I would much rather see a team push out stories to the next iteration and then have healthy discussions about why they failed rather than extend the iteration and not deal with the core problems. If iteration length varies, retrospectives become more difficult. For example, evaluating the performance of the team’s estimation efforts becomes much more difficult if the iteration length varies. Also, the team must have a velocity measurement. If the iteration length varies, measuring velocity becomes impossible and upper management no longer will have the ability to evaluate the teams performance. People external to the team will no longer have the ability to determine when key features are likely to be developed. Variable iterations cause the entire company to fail and likely cause Scrum to fail at an organization. Develops Story Details and Helps the Team Understand Those Details A key concept in Scrum is that the stories are nothing more than a placeholder for a conversation. Stories should be nothing more than short, one line statements about the functionality. The team will then converse with the Product Owner about the details about that story. The product owner needs to have a very good idea about what the details of the story are and needs to be able to help the team understand those details. Too often, we see this requirement as being translated into the need for comprehensive documentation about the story, including old fashioned requirements documentation. The team should only develop the documentation that is required and should not develop documentation that is only created because their is a process to do so. In general, what we see that works best is the iteration before a team starts development work on a story, the Product Owner, with other appropriate business analysts, will develop the details of that story. They’ll figure out what business rules are required, potentially make paper prototypes or other light weight mock-ups, and they seek to understand the story and what is implied. Note that the time allowed for this task is deliberately short. The Product Owner only has a single iteration to develop all of the stories for the next iteration. If more than one iteration is used, I’ve found that teams will end up with Big Design Up Front and traditional requirements documents. This is a waste of time, since the team will need to then have discussions with the Product Owner to figure out what the requirements document says. Instead of this, skip making the pretty pictures and detailing the nuances of the requirements and build only what is minimally needed by the team to do development. If something comes up during development, you can address it at that time and figure out what you want to do. The goal is to keep things as light weight as possible so that everyone can move as quickly as possible. Helps QA to Develop Acceptance Tests In Scrum, no story can be counted until it is accepted by QA. Because of this, acceptance tests are very important to the team. In general, acceptance tests need to be developed prior to the iteration or at the very beginning of the iteration so that the team can make sure that the tasks that they develop will fulfill the acceptance criteria. The Product Owner will help the team, including QA, understand what will make the story acceptable. Note that the Product Owner needs to be careful about specifying that the feature will work “Perfectly” at the end of the iteration. In general, features are developed a little bit at a time, so only the bit that is being developed should be considered as necessary for acceptance. A weak Product Owner will make statements like “Do it right the first time.” Not only are these statements damaging to the team (like they would try to do it WRONG the first time . . .), they’re also ignoring the iterative nature of Scrum. Additionally, a weak product owner will seek to add scope in the acceptance testing. For example, they will refuse to determine acceptance at the beginning of the iteration, and then, after the team has planned and committed to the iteration, they will expand scope by defining acceptance. This often causes the team to miss the iteration because scope that wasn’t planned on is included. There are ways that the team can mitigate this problem. For example, include extra “Product Owner” time to deal with the uncertainty that you know will be introduced by the Product Owner. This will slow the perceived velocity of the team and is not ideal, since they’ll be doing more work than they get credit for. Interact with the Customer to Make Sure that the Product is Meeting the Customer’s Needs Once development is complete, what the team has worked on should be put in front of real live people to see if it meets the needs of the customer. One of the great things about Agile is that if something doesn’t work, we can revisit it in a future iteration! This frees up the team to make the best decision now and know that if that decision proves to be incorrect, the team can revisit it and change that decision. Features are about adding value to the customer, so if the customer doesn’t find them useful, then having the team make tweaks is valuable. In general, most software will be 80 to 90 percent “right” after the initial round and only minor tweaks are required. If proper coding standards are followed, these tweaks are usually minor and easy to accomplish. Product Owners that are doing a good job will encourage real users to see and use the software, since they know that they are trying to add value to the customer. Poor product owners will think that they know the answers already, that their customers are silly and do stupid things and that they don’t need customer input. If you have a product owner that is afraid to show the team’s work to real customers, you probably need a different product owner. Up Next, “Who Makes a Good Product Owner.” Followed by, “Messing with the Team.” Technorati Tags: Scrum,Product Owner

Read the article
Find the set of largest contiguous rectangles to cover multiple areas

- by joelpt

I'm working on a tool called Quickfort for the game Dwarf Fortress. Quickfort turns spreadsheets in csv/xls format into a series of commands for Dwarf Fortress to carry out in order to plot a "blueprint" within the game. I am currently trying to optimally solve an area-plotting problem for the 2.0 release of this tool. Consider the following "blueprint" which defines plotting commands for a 2-dimensional grid. Each cell in the grid should either be dug out ("d"), channeled ("c"), or left unplotted ("."). Any number of distinct plotting commands might be present in actual usage. . d . d c c d d d d c c . d d d . c d d d d d c . d . d d c To minimize the number of instructions that need to be sent to Dwarf Fortress, I would like to find the set of largest contiguous rectangles that can be formed to completely cover, or "plot", all of the plottable cells. To be valid, all of a given rectangle's cells must contain the same command. This is a faster approach than Quickfort 1.0 took: plotting every cell individually as a 1x1 rectangle. This video shows the performance difference between the two versions. For the above blueprint, the solution looks like this: . 9 . 0 3 2 8 1 1 1 3 2 . 1 1 1 . 2 7 1 1 1 4 2 . 6 . 5 4 2 Each same-numbered rectangle above denotes a contiguous rectangle. The largest rectangles take precedence over smaller rectangles that could also be formed in their areas. The order of the numbering/rectangles is unimportant. My current approach is iterative. In each iteration, I build a list of the largest rectangles that could be formed from each of the grid's plottable cells by extending in all 4 directions from the cell. After sorting the list largest first, I begin with the largest rectangle found, mark its underlying cells as "plotted", and record the rectangle in a list. Before plotting each rectangle, its underlying cells are checked to ensure they are not yet plotted (overlapping a previous plot). We then start again, finding the largest remaining rectangles that can be formed and plotting them until all cells have been plotted as part of some rectangle. I consider this approach slightly more optimized than a dumb brute-force search, but I am wasting a lot of cycles (re)calculating cells' largest rectangles and checking underlying cells' states. Currently, this rectangle-discovery routine takes the lion's share of the total runtime of the tool, especially for large blueprints. I have sacrificed some accuracy for the sake of speed by only considering rectangles from cells which appear to form a rectangle's corner (determined using some neighboring-cell heuristics which aren't always correct). As a result of this 'optimization', my current code doesn't actually generate the above solution correctly, but it's close enough. More broadly, I consider the goal of largest-rectangles-first to be a "good enough" approach for this application. However I observe that if the goal is instead to find the minimum set (fewest number) of rectangles to completely cover multiple areas, the solution would look like this instead: . 3 . 5 6 8 1 3 4 5 6 8 . 3 4 5 . 8 2 3 4 5 7 8 . 3 . 5 7 8 This second goal actually represents a more optimal solution to the problem, as fewer rectangles usually means fewer commands sent to Dwarf Fortress. However, this approach strikes me as closer to NP-Hard, based on my limited math knowledge. Watch the video if you'd like to better understand the overall strategy; I have not addressed other aspects of Quickfort's process, such as finding the shortest cursor-path that plots all rectangles. Possibly there is a solution to this problem that coherently combines these multiple strategies. Help of any form would be appreciated.

Read the article

Search Results

Search found 192 results on 8 pages for 'iterative'.

Page 8/8 | < Previous Page | 4 5 6 7 8

- by James Michael Hare

- by David V. Corbin

- by DrJohn

- by Dylan Smith

- by Tanu Sood

- by Bil Simser

- by Tarun Arora

- by dbayard

- by Jeff

- by Dan Ray

- by user35959

- by rano

- by nobody

- by Robert May

- by joelpt

< Previous Page | 4 5 6 7 8