Search Results

Search found 10556 results on 423 pages for 'practical approach'.

Page 117/423 | < Previous Page | 113 114 115 116 117 118 119 120 121 122 123 124 | Next Page >

Should I use a class in this: Reading a XML file using lxml.

- by PulpFiction

Hi everyone. This question is in continuation to my previous question, in which I asked about passing around an ElementTree. I need to read the XML files only and to solve this, I decided to create a global ElementTree and then parse it wherever required. My question is: Is this an acceptable practice? I heard global variables are bad. If I don't make it global, I was suggested to make a class. But do I really need to create a class? What benefits would I have from that approach. Note that I would be handling only one ElementTree instance per run, the operations are read-only. If I don't use a class, how and where do I declare that ElementTree so that it available globally? (Note that I would be importing this module) Please answer this question in the respect that I am a beginner to development, and at this stage I can't figure out whether to use a class or just go with the functional style programming approach.

Read the article
How to make a piece of WPF content take up the entire application window

- by Bojin Li

I'm working on an application that contains a number of content areas. I want to implement a behavior such that in response to user input, any of these content areas can be toggled to fit the entire application window, and optionally back to its original position again. I experimented with several approaches and none of them seem optimal for me. Here's what I tried to do: Use the ClipToBoundsProperty on the content I want to make "Full Screen": Doesn't work because only the CanvasPanel seems to fully respect this property. The application need to be localized so I would really like to avoid the CanvasPanel. Use a Grid and collapse the other content areas, such that only the one I want to see is visible, hence taking up the entire screen: This will probably work but doesn't seem easy to implement nor maintain. The "Full Screen" content area could be several levels deep, for example residing inside a Tabcontrol, so I would have to hide the tab headers too etc. Reconstruct the content area in a separate view and display it while hiding the rest: Seems easy enough to do with DataTemplates and my ViewModel objects, but any GUI/View only states are not preserved using this approach. Somehow "lift" the GUI/View I want to "Full Screen" into the separate view and display it while hiding the rest: I don't know how to do this or even if this is possible. Anyway if anyone knows a better approach I would love to know about it. Thanks a lot!

Read the article
SQL queries to determine all values that would satisfy an arbitrary query

- by jasterm007

I'm trying to figure out how to efficiently run a set of queries that will provide a new table of all values that would return results for an arbitrary query. Say my table has a schema like: id name age city What is an efficient way to list all values that would return results for an arbitrary query, say "NOT city=X AND age BETWEEN Y and Z"? My naive approach for this would be to use a script and recurse through all possible combinations of {city, age, age} and see which SELECTs return more than 0 results, but that seems incredibly inefficient. I've also tried building large joins on {city, age, age} as well and basically using that table as an argument list to the query, but that quickly becomes an impossibility for queries on many columns. For simple conjunctive equality queries, i.e. "name=X and age=Y", this is much simpler, as I can do something like SELECT name, age, count(*) AS count FROM main GROUP BY name, age HAVING count > 0 But I'm having difficulty coming up with a general approach for anything more complicated than that. Any pointers in the right direction would be most helpful, thanks.

Read the article
Passing huge amounts of data as an hexadecimal (0x123AB...) parameter of a clr stored procedure in s

- by user193655

I post this question has followup of This question, since the thread is not recieving more answers. I'm trying to understand if it is possible to pass as a parameter of a CLR stored procedure a large amount of data as "0x5352532F...". This is to avoid to send the data directly to the CLR stored procedure, instead of sending ti to a temporary DB field and from there passing it as varbinary(max) parmeter to the CLR stored procedure. I have a triple question: 1) is it possible, if yes how? Let's say i want to pass a pdf file to the CLR stored procedure (not the path, the full bits that make up the file). Something like: exec MyCLRStoredProcs.dbo.insertfile @file_remote_path ='c:\temp\test_file.txt' , @file_contents=0x4D5A90000300000004000.... --(this long list is the file content) where insertfile is a stored proc that writes to the server path (at file_remote_path) the binary data I pass as (file_contents). 2) is it there corruption risk of adopting this approach (or it is the same approach that sql server uses behind the scenes)? 3) how to convert the content of a file into the "0x23423..." hexadecimal representation

Read the article
Is the below thread pool implementation correct(C#3.0)

- by Newbie

Hi Experts, For the first time ever I have implemented thread pooling and I found it to be working. But I am not very sure about the way I have done is the appropriate way it is supposed to be. Would you people mind in spending some valuable time to check and let me know if my approach is correct or not? If you people find that the approach is incorrect , could you please help me out in writing the correct version. I have basicaly read How to use thread pool and based on what ever I have understood I have developed the below program as per my need public class Calculation { #region Private variable declaration ManualResetEvent[] factorManualResetEvent = null; #endregion public void Compute() { factorManualResetEvent = new ManualResetEvent[2]; for (int i = 0; i < 2; i++){ factorManualResetEvent[i] = new ManualResetEvent(false); ThreadPool.QueueUserWorkItem(ThreadPoolCallback, i);} //Wait for all the threads to complete WaitHandle.WaitAll(factorManualResetEvent); //Proceed with the next task(s) NEXT_TASK_TO_BE_EXECUTED(); } #region Private Methods // Wrapper method for use with thread pool. public void ThreadPoolCallback(Object threadContext) { int threadIndex = (int)threadContext; Method1(); Method2(); factorManualResetEvent[threadIndex].Set(); } private void Method1 () { //Code of method 1} private void Method2 () { //Code of method 2 } #endregion }

Read the article
List of values as keys for a Map

- by thr

I have lists of variable length where each item can be one of four unique, that I need to use as keys for another object in a map. Assume that each value can be either 0, 1, 2 or 3 (it's not integer in my real code, but a lot easier to explain this way) so a few examples of key lists could be: [1, 0, 2, 3] [3, 2, 1] [1, 0, 0, 1, 1, 3] [2, 3, 1, 1, 2] [1, 2] So, to re-iterate: each item in the list can be either 0, 1, 2 or 3 and there can be any number of items in a list. My first approach was to try to hash the contents of the array, using the built in GetHashCode() in .NET to combine the hash of each element. But since this would return an int I would have to deal with collisions manually (two equal int values are identical to a Dictionary). So my second approach was to use a quad tree, breaking down each item in the list into a Node that has four pointers (one for each possible value) to the next four possible values (with the root node representing [], an empty list), inserting [1, 0, 2] => Foo, [1, 3] => Bar and [1, 0] => Baz into this tree would look like this: Grey nodes nodes being unused pointers/nodes. Though I worry about the performance of this setup, but there will be no need to deal with hash collisions and the tree won't become to deep (there will mostly be lists with 2-6 items stored, rarely over 6). Is there some other magic way to store items with lists of values as keys that I have missed?

Read the article
Field to display Previous 30 Day Total

- by whytheq

I've got this table: CREATE TABLE #Data1 ( [Market] VARCHAR(100) NOT NULL, [Operator] VARCHAR(100) NOT NULL, [Date] DATETIME NOT NULL, [Measure] VARCHAR(100) NOT NULL, [Amount] NUMERIC(36,10) NOT NULL, --new calculated fields [DailyAvg_30days] NUMERIC(38,6) NULL DEFAULT 0 ) I've populated all the fields apart from DailyAvg_30days. This field needs to show the total for the preceding 30 days e.g. 1. if Date for a particular record is 2nd Dec then it will be the total for the period 3rd Nov - 2nd Dec inclusive. 2. if Date for a particular record is 1st Dec then it will be the total for the period 2nd Nov - 1st Dec inclusive. My attempt to try to find these totals before updating the table is as follows: SELECT a.[Market], a.[Operator], a.[Date], a.[Measure], a.[Amount], [DailyAvg_30days] = SUM(b.[Amount]) FROM #Data1 a INNER JOIN #Data1 b ON a.[Market] = b.[Market] AND a.[Operator] = b.[Operator] AND a.[Measure] = b.[Measure] AND a.[Date] >= b.[Date]-30 AND a.[Date] <= b.[Date] GROUP BY a.[Market], a.[Operator], a.[Date], a.[Measure], a.[Amount] ORDER BY 1,2,4,3 Is this a valid approach or do I need to approach this from a different angle?

Read the article
find the difference between two very large list

- by user157195

I have two large list(could be a hundred million items), the source of each list can be either from a database table or a flat file. both lists are of comparable sizes, both unsorted. I need to find the difference between them. so I have 3 scenarios: 1. List1 is a database table(assume each row simply have one item(key) that is a string), List2 is a large file. 2. Both lists are from 2 db tables. 3. both lists are from two files. in case 2, I plan to use: select a.item from MyTable a where a.item not in (select b.item form MyTable b) this clearly is inefficient, is there a better way? Another approach is: I plan to sort each list, and then walk down both of them to find the diff. If the list is from a file, I have to read it into a db table first, then use db sorting to output the list. Is the run time complexity still O(nlogn) in db sorting? either approach is a pain and seems would be very slow when the list involved has hundreds of millions of items. any suggestions?

Read the article
Loading jQuery Consistently in a .NET Web App

- by Rick Strahl

One thing that frequently comes up in discussions when using jQuery is how to best load the jQuery library (as well as other commonly used and updated libraries) in a Web application. Specifically the issue is the one of versioning and making sure that you can easily update and switch versions of script files with application wide settings in one place and having your script usage reflect those settings in the entire application on all pages that use the script. Although I use jQuery as an example here, the same concepts can be applied to any script library - for example in my Web libraries I use the same approach for jQuery.ui and my own internal jQuery support library. The concepts used here can be applied both in WebForms and MVC. Loading jQuery Properly From CDN Before we look at a generic way to load jQuery via some server logic, let me first point out my preferred way to embed jQuery into the page. I use the Google CDN to load jQuery and then use a fallback URL to handle the offline or no Internet connection scenario. Why use a CDN? CDN links tend to be loaded more quickly since they are very likely to be cached in user's browsers already as jQuery CDN is used by many, many sites on the Web. Using a CDN also removes load from your Web server and puts the load bearing on the CDN provider - in this case Google - rather than on your Web site. On the downside, CDN links gives the provider (Google, Microsoft) yet another way to track users through their Web usage. Here's how I use jQuery CDN plus a fallback link on my WebLog for example: <!DOCTYPE HTML> <html> <head> <script src="//ajax.googleapis.com/ajax/libs/jquery/1.6.4/jquery.min.js"></script> <script> if (typeof (jQuery) == 'undefined') document.write(unescape("%3Cscript " + "src='/Weblog/wwSC.axd?r=Westwind.Web.Controls.Resources.jquery.js' %3E%3C/script%3E")); </script> <title>Rick Strahl's Web Log</title> ... </head> You can see that the CDN is referenced first, followed by a small script block that checks to see whether jQuery was loaded (jQuery object exists). If it didn't load another script reference is added to the document dynamically pointing to a backup URL. In this case my backup URL points at a WebResource in my Westwind.Web assembly, but the URL can also be local script like src="/scripts/jquery.min.js". Important: Use the proper Protocol/Scheme for for CDN Urls [updated based on comments] If you're using a CDN to load an external script resource you should always make sure that the script is loaded with the same protocol as the parent page to avoid mixed content warnings by the browser. You don't want to load a script link to an http:// resource when you're on an https:// page. The easiest way to use this is by using a protocol relative URL: <script src="//ajax.googleapis.com/ajax/libs/jquery/1.6.4/jquery.min.js"></script> which is an easy way to load resources from other domains. This URL syntax will automatically use the parent page's protocol (or more correctly scheme). As long as the remote domains support both http:// and https:// access this should work. BTW this also works in CSS (with some limitations) and links. BTW, I didn't know about this until it was pointed out in the comments. This is a very useful feature for many things - ah the benefits of my blog to myself :-) Version Numbers When you use a CDN you notice that you have to reference a specific version of jQuery. When using local files you may not have to do this as you can rename your private copy of jQuery.js, but for CDN the references are always versioned. The version number is of course very important to ensure you getting the version you have tested with, but it's also important to the provider because it ensures that cached content is always correct. If an existing file was updated the updates might take a very long time to get past the locally cached content and won't refresh properly. The version number ensures you get the right version and not some cached content that has been changed but not updated in your cache. On the other hand version numbers also mean that once you decide to use a new version of the script you now have to change all your script references in your pages. Depending on whether you use some sort of master/layout page or not this may or may not be easy in your application. Even if you do use master/layout pages, chances are that you probably have a few of them and at the very least all of those have to be updated for the scripts. If you use individual pages for all content this issue then spreads to all of your pages. Search and Replace in Files will do the trick, but it's still something that's easy to forget and worry about. Personaly I think it makes sense to have a single place where you can specify common script libraries that you want to load and more importantly which versions thereof and where they are loaded from. Loading Scripts via Server Code Script loading has always been important to me and as long as I can remember I've always built some custom script loading routines into my Web frameworks. WebForms makes this fairly easy because it has a reasonably useful script manager (ClientScriptManager and the ScriptManager) which allow injecting script into the page easily from anywhere in the Page cycle. What's nice about these components is that they allow scripts to be injected by controls so components can wrap up complex script/resource dependencies more easily without having to require long lists of CSS/Scripts/Image includes. In MVC or pure script driven applications like Razor WebPages the process is more raw, requiring you to embed script references in the right place. But its also more immediate - it lets you know exactly which versions of scripts to use because you have to manually embed them. In WebForms with different controls loading resources this often can get confusing because it's quite possible to load multiple versions of the same script library into a page, the results of which are less than optimal… In this post I look a simple routine that embeds jQuery into the page based on a few application wide configuration settings. It returns only a string of the script tags that can be manually embedded into a Page template. It's a small function that merely a string of the script tags shown at the begging of this post along with some options on how that string is comprised. You'll be able to specify in one place which version loads and then all places where the help function is used will automatically reflect this selection. Options allow specification of the jQuery CDN Url, the fallback Url and where jQuery should be loaded from (script folder, Resource or CDN in my case). While this is specific to jQuery you can apply this to other resources as well. For example I use a similar approach with jQuery.ui as well using practically the same semantics. Providing Resources in ControlResources In my Westwind.Web Web utility library I have a class called ControlResources which is responsible for holding resource Urls, resource IDs and string contants that reference those resource IDs. The library also provides a few helper methods for loading common scriptscripts into a Web page. There are specific versions for WebForms which use the ClientScriptManager/ScriptManager and script link methods that can be used in any .NET technology that can embed an expression into the output template (or code for that matter). The ControlResources class contains mostly static content - references to resources mostly. But it also contains a few static properties that configure script loading: A Script LoadMode (CDN, Resource, or script url) A default CDN Url A fallback url They are static properties in the ControlResources class: public class ControlResources { /// <summary> /// Determines what location jQuery is loaded from /// </summary> public static JQueryLoadModes jQueryLoadMode = JQueryLoadModes.ContentDeliveryNetwork; /// <summary> /// jQuery CDN Url on Google /// </summary> public static string jQueryCdnUrl = "//ajax.googleapis.com/ajax/libs/jquery/1.6.4/jquery.min.js"; /// <summary> /// jQuery CDN Url on Google /// </summary> public static string jQueryUiCdnUrl = "//ajax.googleapis.com/ajax/libs/jqueryui/1.8.16/jquery-ui.min.js"; /// <summary> /// jQuery UI fallback Url if CDN is unavailable or WebResource is used /// Note: The file needs to exist and hold the minimized version of jQuery ui /// </summary> public static string jQueryUiLocalFallbackUrl = "~/scripts/jquery-ui.min.js"; } These static properties are fixed values that can be changed at application startup to reflect your preferences. Since they're static they are application wide settings and respected across the entire Web application running. It's best to set these default in Application_Init or similar startup code if you need to change them for your application: protected void Application_Start(object sender, EventArgs e) { // Force jQuery to be loaded off Google Content Network ControlResources.jQueryLoadMode = JQueryLoadModes.ContentDeliveryNetwork; // Allow overriding of the Cdn url ControlResources.jQueryCdnUrl = "http://ajax.googleapis.com/ajax/libs/jquery/1.6.2/jquery.min.js"; // Route to our own internal handler App.OnApplicationStart(); } With these basic settings in place you can then embed expressions into a page easily. In WebForms use: <!DOCTYPE html> <html> <head runat="server"> <%= ControlResources.jQueryLink() %> <script src="scripts/ww.jquery.min.js"></script> </head> In Razor use: <!DOCTYPE html> <html> <head> @Html.Raw(ControlResources.jQueryLink()) <script src="scripts/ww.jquery.min.js"></script> </head> Note that in Razor you need to use @Html.Raw() to force the string NOT to escape. Razor by default escapes string results and this ensures that the HTML content is properly expanded as raw HTML text. Both the WebForms and Razor output produce: <!DOCTYPE html> <html> <head> <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.6.2/jquery.min.js" type="text/javascript"></script> <script type="text/javascript"> if (typeof (jQuery) == 'undefined') document.write(unescape("%3Cscript src='/WestWindWebToolkitWeb/WebResource.axd?d=-b6oWzgbpGb8uTaHDrCMv59VSmGhilZP5_T_B8anpGx7X-PmW_1eu1KoHDvox-XHqA1EEb-Tl2YAP3bBeebGN65tv-7-yAimtG4ZnoWH633pExpJor8Qp1aKbk-KQWSoNfRC7rQJHXVP4tC0reYzVw2&t=634535391996872492' type='text/javascript'%3E%3C/script%3E"));</script> <script src="scripts/ww.jquery.min.js"></script> </head> which produces the desired effect for both CDN load and fallback URL. The implementation of jQueryLink is pretty basic of course: /// <summary> /// Inserts a script link to load jQuery into the page based on the jQueryLoadModes settings /// of this class. Default load is by CDN plus WebResource fallback /// </summary> /// <param name="url"> /// An optional explicit URL to load jQuery from. Url is resolved. /// When specified no fallback is applied /// </param> /// <returns>full script tag and fallback script for jQuery to load</returns> public static string jQueryLink(JQueryLoadModes jQueryLoadMode = JQueryLoadModes.Default, string url = null) { string jQueryUrl = string.Empty; string fallbackScript = string.Empty; if (jQueryLoadMode == JQueryLoadModes.Default) jQueryLoadMode = ControlResources.jQueryLoadMode; if (!string.IsNullOrEmpty(url)) jQueryUrl = WebUtils.ResolveUrl(url); else if (jQueryLoadMode == JQueryLoadModes.WebResource) { Page page = new Page(); jQueryUrl = page.ClientScript.GetWebResourceUrl(typeof(ControlResources), ControlResources.JQUERY_SCRIPT_RESOURCE); } else if (jQueryLoadMode == JQueryLoadModes.ContentDeliveryNetwork) { jQueryUrl = ControlResources.jQueryCdnUrl; if (!string.IsNullOrEmpty(jQueryCdnUrl)) { // check if jquery loaded - if it didn't we're not online and use WebResource fallbackScript = @"<script type=""text/javascript"">if (typeof(jQuery) == 'undefined') document.write(unescape(""%3Cscript src='{0}' type='text/javascript'%3E%3C/script%3E""));</script>"; fallbackScript = string.Format(fallbackScript, WebUtils.ResolveUrl(ControlResources.jQueryCdnFallbackUrl)); } } string output = "<script src=\"" + jQueryUrl + "\" type=\"text/javascript\"></script>"; // add in the CDN fallback script code if (!string.IsNullOrEmpty(fallbackScript)) output += "\r\n" + fallbackScript + "\r\n"; return output; } There's one dependency here on WebUtils.ResolveUrl() which resolves Urls without access to a Page/Control (another one of those features that should be in the runtime, not in the WebForms or MVC engine). You can see there's only a little bit of logic in this code that deals with potentially different load modes. I can load scripts from a Url, WebResources or - my preferred way - from CDN. Based on the static settings the scripts to embed are composed to be returned as simple string <script> tag(s). I find this extremely useful especially when I'm not connected to the internet so that I can quickly swap in a local jQuery resource instead of loading from CDN. While CDN loading with the fallback works it can be a bit slow as the CDN is probed first before the fallback kicks in. Switching quickly in one place makes this trivial. It also makes it very easy once a new version of jQuery rolls around to move up to the new version and ensure that all pages are using the new version immediately. I'm not trying to make this out as 'the' definite way to load your resources, but rather provide it here as a pointer so you can maybe apply your own logic to determine where scripts come from and how they load. You could even automate this some more by using configuration settings or reading the locations/preferences out of some sort of data/metadata store that can be dynamically updated instead via recompilation. FWIW, I use a very similar approach for loading jQuery UI and my own ww.jquery library - the same concept can be applied to any kind of script you might be loading from different locations. Hopefully some of you find this a useful addition to your toolset. Resources Google CDN for jQuery Full ControlResources Source Code ControlResource Documentation Westwind.Web NuGet This method is part of the Westwind.Web library of the West Wind Web Toolkit or you can grab the Web library from NuGet and add to your Visual Studio project. This package includes a host of Web related utilities and script support features. © Rick Strahl, West Wind Technologies, 2005-2011Posted in ASP.NET jQuery Tweet (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article
.NET HTML Sanitation for rich HTML Input

- by Rick Strahl

Recently I was working on updating a legacy application to MVC 4 that included free form text input. When I set up the new site my initial approach was to not allow any rich HTML input, only simple text formatting that would respect a few simple HTML commands for bold, lists etc. and automatically handles line break processing for new lines and paragraphs. This is typical for what I do with most multi-line text input in my apps and it works very well with very little development effort involved. Then the client sprung another note: Oh by the way we have a bunch of customers (real estate agents) who need to post complete HTML documents. Oh uh! There goes the simple theory. After some discussion and pleading on my part (<snicker>) to try and avoid this type of raw HTML input because of potential XSS issues, the client decided to go ahead and allow raw HTML input anyway. There has been lots of discussions on this subject on StackOverFlow (and here and here) but to after reading through some of the solutions I didn't really find anything that would work even closely for what I needed. Specifically we need to be able to allow just about any HTML markup, with the exception of script code. Remote CSS and Images need to be loaded, links need to work and so. While the 'legit' HTML posted by these agents is basic in nature it does span most of the full gamut of HTML (4). Most of the solutions XSS prevention/sanitizer solutions I found were way to aggressive and rendered the posted output unusable mostly because they tend to strip any externally loaded content. In short I needed a custom solution. I thought the best solution to this would be to use an HTML parser - in this case the Html Agility Pack - and then to run through all the HTML markup provided and remove any of the blacklisted tags and a number of attributes that are prone to JavaScript injection. There's much discussion on whether to use blacklists vs. whitelists in the discussions mentioned above, but I found that whitelists can make sense in simple scenarios where you might allow manual HTML input, but when you need to allow a larger array of HTML functionality a blacklist is probably easier to manage as the vast majority of elements and attributes could be allowed. Also white listing gets a bit more complex with HTML5 and the new proliferation of new HTML tags and most new tags generally don't affect XSS issues directly. Pure whitelisting based on elements and attributes also doesn't capture many edge cases (see some of the XSS cheat sheets listed below) so even with a white list, custom logic is still required to handle many of those edge cases. The Microsoft Web Protection Library (AntiXSS) My first thought was to check out the Microsoft AntiXSS library. Microsoft has an HTML Encoding and Sanitation library in the Microsoft Web Protection Library (formerly AntiXSS Library) on CodePlex, which provides stricter functions for whitelist encoding and sanitation. Initially I thought the Sanitation class and its static members would do the trick for me,but I found that this library is way too restrictive for my needs. Specifically the Sanitation class strips out images and links which rendered the full HTML from our real estate clients completely useless. I didn't spend much time with it, but apparently I'm not alone if feeling this library is not really useful without some way to configure operation. To give you an example of what didn't work for me with the library here's a small and simple HTML fragment that includes script, img and anchor tags. I would expect the script to be stripped and everything else to be left intact. Here's the original HTML:var value = "<b>Here</b> <script>alert('hello')</script> we go. Visit the " + "<a href='http://west-wind.com'>West Wind</a> site. " + "<img src='http://west-wind.com/images/new.gif' /> " ; and the code to sanitize it with the AntiXSS Sanitize class:@Html.Raw(Microsoft.Security.Application.Sanitizer.GetSafeHtmlFragment(value)) This produced a not so useful sanitized string: Here we go. Visit the <a>West Wind</a> site. While it removed the <script> tag (good) it also removed the href from the link and the image tag altogether (bad). In some situations this might be useful, but for most tasks I doubt this is the desired behavior. While links can contain javascript: references and images can 'broadcast' information to a server, without configuration to tell the library what to restrict this becomes useless to me. I couldn't find any way to customize the white list, nor is there code available in this 'open source' library on CodePlex. Using Html Agility Pack for HTML Parsing The WPL library wasn't going to cut it. After doing a bit of research I decided the best approach for a custom solution would be to use an HTML parser and inspect the HTML fragment/document I'm trying to import. I've used the HTML Agility Pack before for a number of apps where I needed an HTML parser without requiring an instance of a full browser like the Internet Explorer Application object which is inadequate in Web apps. In case you haven't checked out the Html Agility Pack before, it's a powerful HTML parser library that you can use from your .NET code. It provides a simple, parsable HTML DOM model to full HTML documents or HTML fragments that let you walk through each of the elements in your document. If you've used the HTML or XML DOM in a browser before you'll feel right at home with the Agility Pack. Blacklist based HTML Parsing to strip XSS Code For my purposes of HTML sanitation, the process involved is to walk the HTML document one element at a time and then check each element and attribute against a blacklist. There's quite a bit of argument of what's better: A whitelist of allowed items or a blacklist of denied items. While whitelists tend to be more secure, they also require a lot more configuration. In the case of HTML5 a whitelist could be very extensive. For what I need, I only want to ensure that no JavaScript is executed, so a blacklist includes the obvious <script> tag plus any tag that allows loading of external content including <iframe>, <object>, <embed> and <link> etc. <form> is also excluded to avoid posting content to a different location. I also disallow <head> and <meta> tags in particular for my case, since I'm only allowing posting of HTML fragments. There is also some internal logic to exclude some attributes or attributes that include references to JavaScript or CSS expressions. The default tag blacklist reflects my use case, but is customizable and can be added to. Here's my HtmlSanitizer implementation:using System.Collections.Generic; using System.IO; using System.Xml; using HtmlAgilityPack; namespace Westwind.Web.Utilities { public class HtmlSanitizer { public HashSet<string> BlackList = new HashSet<string>() { { "script" }, { "iframe" }, { "form" }, { "object" }, { "embed" }, { "link" }, { "head" }, { "meta" } }; /// <summary> /// Cleans up an HTML string and removes HTML tags in blacklist /// </summary> /// <param name="html"></param> /// <returns></returns> public static string SanitizeHtml(string html, params string[] blackList) { var sanitizer = new HtmlSanitizer(); if (blackList != null && blackList.Length > 0) { sanitizer.BlackList.Clear(); foreach (string item in blackList) sanitizer.BlackList.Add(item); } return sanitizer.Sanitize(html); } /// <summary> /// Cleans up an HTML string by removing elements /// on the blacklist and all elements that start /// with onXXX . /// </summary> /// <param name="html"></param> /// <returns></returns> public string Sanitize(string html) { var doc = new HtmlDocument(); doc.LoadHtml(html); SanitizeHtmlNode(doc.DocumentNode); //return doc.DocumentNode.WriteTo(); string output = null; // Use an XmlTextWriter to create self-closing tags using (StringWriter sw = new StringWriter()) { XmlWriter writer = new XmlTextWriter(sw); doc.DocumentNode.WriteTo(writer); output = sw.ToString(); // strip off XML doc header if (!string.IsNullOrEmpty(output)) { int at = output.IndexOf("?>"); output = output.Substring(at + 2); } writer.Close(); } doc = null; return output; } private void SanitizeHtmlNode(HtmlNode node) { if (node.NodeType == HtmlNodeType.Element) { // check for blacklist items and remove if (BlackList.Contains(node.Name)) { node.Remove(); return; } // remove CSS Expressions and embedded script links if (node.Name == "style") { if (string.IsNullOrEmpty(node.InnerText)) { if (node.InnerHtml.Contains("expression") || node.InnerHtml.Contains("javascript:")) node.ParentNode.RemoveChild(node); } } // remove script attributes if (node.HasAttributes) { for (int i = node.Attributes.Count - 1; i >= 0; i--) { HtmlAttribute currentAttribute = node.Attributes[i]; var attr = currentAttribute.Name.ToLower(); var val = currentAttribute.Value.ToLower(); span style="background: white; color: green">// remove event handlers if (attr.StartsWith("on")) node.Attributes.Remove(currentAttribute); // remove script links else if ( //(attr == "href" || attr== "src" || attr == "dynsrc" || attr == "lowsrc") && val != null && val.Contains("javascript:")) node.Attributes.Remove(currentAttribute); // Remove CSS Expressions else if (attr == "style" && val != null && val.Contains("expression") || val.Contains("javascript:") || val.Contains("vbscript:")) node.Attributes.Remove(currentAttribute); } } } // Look through child nodes recursively if (node.HasChildNodes) { for (int i = node.ChildNodes.Count - 1; i >= 0; i--) { SanitizeHtmlNode(node.ChildNodes[i]); } } } } } Please note: Use this as a starting point only for your own parsing and review the code for your specific use case! If your needs are less lenient than mine were you can you can make this much stricter by not allowing src and href attributes or CSS links if your HTML doesn't allow it. You can also check links for external URLs and disallow those - lots of options. The code is simple enough to make it easy to extend to fit your use cases more specifically. It's also quite easy to make this code work using a WhiteList approach if you want to go that route. The code above is semi-generic for allowing full featured HTML fragments that only disallow script related content. The Sanitize method walks through each node of the document and then recursively drills into all of its children until the entire document has been traversed. Note that the code here uses an XmlTextWriter to write output - this is done to preserve XHTML style self-closing tags which are otherwise left as non-self-closing tags. The sanitizer code scans for blacklist elements and removes those elements not allowed. Note that the blacklist is configurable either in the instance class as a property or in the static method via the string parameter list. Additionally the code goes through each element's attributes and looks for a host of rules gleaned from some of the XSS cheat sheets listed at the end of the post. Clearly there are a lot more XSS vulnerabilities, but a lot of them apply to ancient browsers (IE6 and versions of Netscape) - many of these glaring holes (like CSS expressions - WTF IE?) have been removed in modern browsers. What a Pain To be honest this is NOT a piece of code that I wanted to write. I think building anything related to XSS is better left to people who have far more knowledge of the topic than I do. Unfortunately, I was unable to find a tool that worked even closely for me, or even provided a working base. For the project I was working on I had no choice and I'm sharing the code here merely as a base line to start with and potentially expand on for specific needs. It's sad that Microsoft Web Protection Library is currently such a train wreck - this is really something that should come from Microsoft as the systems vendor or possibly a third party that provides security tools. Luckily for my application we are dealing with a authenticated and validated users so the user base is fairly well known, and relatively small - this is not a wide open Internet application that's directly public facing. As I mentioned earlier in the post, if I had my way I would simply not allow this type of raw HTML input in the first place, and instead rely on a more controlled HTML input mechanism like MarkDown or even a good HTML Edit control that can provide some limits on what types of input are allowed. Alas in this case I was overridden and we had to go forward and allow *any* raw HTML posted. Sometimes I really feel sad that it's come this far - how many good applications and tools have been thwarted by fear of XSS (or worse) attacks? So many things that could be done *if* we had a more secure browser experience and didn't have to deal with every little script twerp trying to hack into Web pages and obscure browser bugs. So much time wasted building secure apps, so much time wasted by others trying to hack apps… We're a funny species - no other species manages to waste as much time, effort and resources as we humans do :-) Resources Code on GitHub Html Agility Pack XSS Cheat Sheet XSS Prevention Cheat Sheet Microsoft Web Protection Library (AntiXss) StackOverflow Links: http://stackoverflow.com/questions/341872/html-sanitizer-for-net http://blog.stackoverflow.com/2008/06/safe-html-and-xss/ http://code.google.com/p/subsonicforums/source/browse/trunk/SubSonic.Forums.Data/HtmlScrubber.cs?r=61© Rick Strahl, West Wind Technologies, 2005-2012Posted in Security HTML ASP.NET JavaScript Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article
Inheritance Mapping Strategies with Entity Framework Code First CTP5 Part 1: Table per Hierarchy (TPH)

- by mortezam

A simple strategy for mapping classes to database tables might be “one table for every entity persistent class.” This approach sounds simple enough and, indeed, works well until we encounter inheritance. Inheritance is such a visible structural mismatch between the object-oriented and relational worlds because object-oriented systems model both “is a” and “has a” relationships. SQL-based models provide only "has a" relationships between entities; SQL database management systems don’t support type inheritance—and even when it’s available, it’s usually proprietary or incomplete. There are three different approaches to representing an inheritance hierarchy: Table per Hierarchy (TPH): Enable polymorphism by denormalizing the SQL schema, and utilize a type discriminator column that holds type information. Table per Type (TPT): Represent "is a" (inheritance) relationships as "has a" (foreign key) relationships. Table per Concrete class (TPC): Discard polymorphism and inheritance relationships completely from the SQL schema.I will explain each of these strategies in a series of posts and this one is dedicated to TPH. In this series we'll deeply dig into each of these strategies and will learn about "why" to choose them as well as "how" to implement them. Hopefully it will give you a better idea about which strategy to choose in a particular scenario. Inheritance Mapping with Entity Framework Code FirstAll of the inheritance mapping strategies that we discuss in this series will be implemented by EF Code First CTP5. The CTP5 build of the new EF Code First library has been released by ADO.NET team earlier this month. EF Code-First enables a pretty powerful code-centric development workflow for working with data. I’m a big fan of the EF Code First approach, and I’m pretty excited about a lot of productivity and power that it brings. When it comes to inheritance mapping, not only Code First fully supports all the strategies but also gives you ultimate flexibility to work with domain models that involves inheritance. The fluent API for inheritance mapping in CTP5 has been improved a lot and now it's more intuitive and concise in compare to CTP4. A Note For Those Who Follow Other Entity Framework ApproachesIf you are following EF's "Database First" or "Model First" approaches, I still recommend to read this series since although the implementation is Code First specific but the explanations around each of the strategies is perfectly applied to all approaches be it Code First or others. A Note For Those Who are New to Entity Framework and Code-FirstIf you choose to learn EF you've chosen well. If you choose to learn EF with Code First you've done even better. To get started, you can find a great walkthrough by Scott Guthrie here and another one by ADO.NET team here. In this post, I assume you already setup your machine to do Code First development and also that you are familiar with Code First fundamentals and basic concepts. You might also want to check out my other posts on EF Code First like Complex Types and Shared Primary Key Associations. A Top Down Development ScenarioThese posts take a top-down approach; it assumes that you’re starting with a domain model and trying to derive a new SQL schema. Therefore, we start with an existing domain model, implement it in C# and then let Code First create the database schema for us. However, the mapping strategies described are just as relevant if you’re working bottom up, starting with existing database tables. I’ll show some tricks along the way that help you dealing with nonperfect table layouts. Let’s start with the mapping of entity inheritance. -- The Domain ModelIn our domain model, we have a BillingDetail base class which is abstract (note the italic font on the UML class diagram below). We do allow various billing types and represent them as subclasses of BillingDetail class. As for now, we support CreditCard and BankAccount: Implement the Object Model with Code First As always, we start with the POCO classes. Note that in our DbContext, I only define one DbSet for the base class which is BillingDetail. Code First will find the other classes in the hierarchy based on Reachability Convention. public abstract class BillingDetail { public int BillingDetailId { get; set; } public string Owner { get; set; } public string Number { get; set; } } public class BankAccount : BillingDetail { public string BankName { get; set; } public string Swift { get; set; } } public class CreditCard : BillingDetail { public int CardType { get; set; } public string ExpiryMonth { get; set; } public string ExpiryYear { get; set; } } public class InheritanceMappingContext : DbContext { public DbSet<BillingDetail> BillingDetails { get; set; } } This object model is all that is needed to enable inheritance with Code First. If you put this in your application you would be able to immediately start working with the database and do CRUD operations. Before going into details about how EF Code First maps this object model to the database, we need to learn about one of the core concepts of inheritance mapping: polymorphic and non-polymorphic queries. Polymorphic Queries LINQ to Entities and EntitySQL, as object-oriented query languages, both support polymorphic queries—that is, queries for instances of a class and all instances of its subclasses, respectively. For example, consider the following query: IQueryable<BillingDetail> linqQuery = from b in context.BillingDetails select b; List<BillingDetail> billingDetails = linqQuery.ToList(); Or the same query in EntitySQL: string eSqlQuery = @"SELECT VAlUE b FROM BillingDetails AS b"; ObjectQuery<BillingDetail> objectQuery = ((IObjectContextAdapter)context).ObjectContext .CreateQuery<BillingDetail>(eSqlQuery); List<BillingDetail> billingDetails = objectQuery.ToList(); linqQuery and eSqlQuery are both polymorphic and return a list of objects of the type BillingDetail, which is an abstract class but the actual concrete objects in the list are of the subtypes of BillingDetail: CreditCard and BankAccount. Non-polymorphic QueriesAll LINQ to Entities and EntitySQL queries are polymorphic which return not only instances of the specific entity class to which it refers, but all subclasses of that class as well. On the other hand, Non-polymorphic queries are queries whose polymorphism is restricted and only returns instances of a particular subclass. In LINQ to Entities, this can be specified by using OfType<T>() Method. For example, the following query returns only instances of BankAccount: IQueryable<BankAccount> query = from b in context.BillingDetails.OfType<BankAccount>() select b; EntitySQL has OFTYPE operator that does the same thing: string eSqlQuery = @"SELECT VAlUE b FROM OFTYPE(BillingDetails, Model.BankAccount) AS b"; In fact, the above query with OFTYPE operator is a short form of the following query expression that uses TREAT and IS OF operators: string eSqlQuery = @"SELECT VAlUE TREAT(b as Model.BankAccount) FROM BillingDetails AS b WHERE b IS OF(Model.BankAccount)"; (Note that in the above query, Model.BankAccount is the fully qualified name for BankAccount class. You need to change "Model" with your own namespace name.) Table per Class Hierarchy (TPH)An entire class hierarchy can be mapped to a single table. This table includes columns for all properties of all classes in the hierarchy. The concrete subclass represented by a particular row is identified by the value of a type discriminator column. You don’t have to do anything special in Code First to enable TPH. It's the default inheritance mapping strategy: This mapping strategy is a winner in terms of both performance and simplicity. It’s the best-performing way to represent polymorphism—both polymorphic and nonpolymorphic queries perform well—and it’s even easy to implement by hand. Ad-hoc reporting is possible without complex joins or unions. Schema evolution is straightforward. Discriminator Column As you can see in the DB schema above, Code First has to add a special column to distinguish between persistent classes: the discriminator. This isn’t a property of the persistent class in our object model; it’s used internally by EF Code First. By default, the column name is "Discriminator", and its type is string. The values defaults to the persistent class names —in this case, “BankAccount” or “CreditCard”. EF Code First automatically sets and retrieves the discriminator values. TPH Requires Properties in SubClasses to be Nullable in the Database TPH has one major problem: Columns for properties declared by subclasses will be nullable in the database. For example, Code First created an (INT, NULL) column to map CardType property in CreditCard class. However, in a typical mapping scenario, Code First always creates an (INT, NOT NULL) column in the database for an int property in persistent class. But in this case, since BankAccount instance won’t have a CardType property, the CardType field must be NULL for that row so Code First creates an (INT, NULL) instead. If your subclasses each define several non-nullable properties, the loss of NOT NULL constraints may be a serious problem from the point of view of data integrity. TPH Violates the Third Normal FormAnother important issue is normalization. We’ve created functional dependencies between nonkey columns, violating the third normal form. Basically, the value of Discriminator column determines the corresponding values of the columns that belong to the subclasses (e.g. BankName) but Discriminator is not part of the primary key for the table. As always, denormalization for performance can be misleading, because it sacrifices long-term stability, maintainability, and the integrity of data for immediate gains that may be also achieved by proper optimization of the SQL execution plans (in other words, ask your DBA). Generated SQL QueryLet's take a look at the SQL statements that EF Code First sends to the database when we write queries in LINQ to Entities or EntitySQL. For example, the polymorphic query for BillingDetails that you saw, generates the following SQL statement: SELECT [Extent1].[Discriminator] AS [Discriminator], [Extent1].[BillingDetailId] AS [BillingDetailId], [Extent1].[Owner] AS [Owner], [Extent1].[Number] AS [Number], [Extent1].[BankName] AS [BankName], [Extent1].[Swift] AS [Swift], [Extent1].[CardType] AS [CardType], [Extent1].[ExpiryMonth] AS [ExpiryMonth], [Extent1].[ExpiryYear] AS [ExpiryYear] FROM [dbo].[BillingDetails] AS [Extent1] WHERE [Extent1].[Discriminator] IN ('BankAccount','CreditCard') Or the non-polymorphic query for the BankAccount subclass generates this SQL statement: SELECT [Extent1].[BillingDetailId] AS [BillingDetailId], [Extent1].[Owner] AS [Owner], [Extent1].[Number] AS [Number], [Extent1].[BankName] AS [BankName], [Extent1].[Swift] AS [Swift] FROM [dbo].[BillingDetails] AS [Extent1] WHERE [Extent1].[Discriminator] = 'BankAccount' Note how Code First adds a restriction on the discriminator column and also how it only selects those columns that belong to BankAccount entity. Change Discriminator Column Data Type and Values With Fluent API Sometimes, especially in legacy schemas, you need to override the conventions for the discriminator column so that Code First can work with the schema. The following fluent API code will change the discriminator column name to "BillingDetailType" and the values to "BA" and "CC" for BankAccount and CreditCard respectively: protected override void OnModelCreating(System.Data.Entity.ModelConfiguration.ModelBuilder modelBuilder) { modelBuilder.Entity<BillingDetail>() .Map<BankAccount>(m => m.Requires("BillingDetailType").HasValue("BA")) .Map<CreditCard>(m => m.Requires("BillingDetailType").HasValue("CC")); } Also, changing the data type of discriminator column is interesting. In the above code, we passed strings to HasValue method but this method has been defined to accepts a type of object: public void HasValue(object value); Therefore, if for example we pass a value of type int to it then Code First not only use our desired values (i.e. 1 & 2) in the discriminator column but also changes the column type to be (INT, NOT NULL): modelBuilder.Entity<BillingDetail>() .Map<BankAccount>(m => m.Requires("BillingDetailType").HasValue(1)) .Map<CreditCard>(m => m.Requires("BillingDetailType").HasValue(2)); SummaryIn this post we learned about Table per Hierarchy as the default mapping strategy in Code First. The disadvantages of the TPH strategy may be too serious for your design—after all, denormalized schemas can become a major burden in the long run. Your DBA may not like it at all. In the next post, we will learn about Table per Type (TPT) strategy that doesn’t expose you to this problem. References ADO.NET team blog Java Persistence with Hibernate book a { text-decoration: none; } a:visited { color: Blue; } .title { padding-bottom: 5px; font-family: Segoe UI; font-size: 11pt; font-weight: bold; padding-top: 15px; } .code, .typeName { font-family: consolas; } .typeName { color: #2b91af; } .padTop5 { padding-top: 5px; } .padTop10 { padding-top: 10px; } p.MsoNormal { margin-top: 0in; margin-right: 0in; margin-bottom: 10.0pt; margin-left: 0in; line-height: 115%; font-size: 11.0pt; font-family: "Calibri" , "sans-serif"; }

Read the article
Guide to reduce TFS database growth using the Test Attachment Cleaner

- by terje

Recently there has been several reports on TFS databases growing too fast and growing too big. Notable this has been observed when one has started to use more features of the Testing system. Also, the TFS 2010 handles test results differently from TFS 2008, and this leads to more data stored in the TFS databases. As a consequence of this there has been released some tools to remove unneeded data in the database, and also some fixes to correct for bugs which has been found and corrected during this process. Further some preventive practices and maintenance rules should be adopted. A lot of people have blogged about this, among these are: Anu’s very important blog post here describes both the problem and solutions to handle it. She describes both the Test Attachment Cleaner tool, and also some QFE/CU releases to fix some underlying bugs which prevented the tool from being fully effective. Brian Harry’s blog post here describes the problem too This forum thread describes the problem with some solution hints. Ravi Shanker’s blog post here describes best practices on solving this (TBP) Grant Holidays blogpost here describes strategies to use the Test Attachment Cleaner both to detect space problems and how to rectify them. The problem can be divided into the following areas: Publishing of test results from builds Publishing of manual test results and their attachments in particular Publishing of deployment binaries for use during a test run Bugs in SQL server preventing total cleanup of data (All the published data above is published into the TFS database as attachments.) The test results will include all data being collected during the run. Some of this data can grow rather large, like IntelliTrace logs and video recordings. Also the pushing of binaries which happen for automated test runs, including tests run during a build using code coverage which will include all the files in the deployment folder, contributes a lot to the size of the attached data. In order to handle this systematically, I have set up a 3-stage process: Find out if you have a database space issue Set up your TFS server to minimize potential database issues If you have the “problem”, clean up the database and otherwise keep it clean Analyze the data Are your database( s) growing ? Are unused test results growing out of proportion ? To find out about this you need to query your TFS database for some of the information, and use the Test Attachment Cleaner (TAC) to obtain some more detailed information. If you don’t have too many databases you can use the SQL Server reports from within the Management Studio to analyze the database and table sizes. Or, you can use a set of queries . I find queries often faster to use because I can tweak them the way I want them. But be aware that these queries are non-documented and non-supported and may change when the product team wants to change them. If you have multiple Project Collections, find out which might have problems: (Disclaimer: The queries below work on TFS 2010. They will not work on Dev-11, since the table structure have been changed. I will try to update them for Dev-11 when it is released.) Open a SQL Management Studio session onto the SQL Server where you have your TFS Databases. Use the query below to find the Project Collection databases and their sizes, in descending size order. use master select DB_NAME(database_id) AS DBName, (size/128) SizeInMB FROM sys.master_files where type=0 and substring(db_name(database_id),1,4)='Tfs_' and DB_NAME(database_id)<>'Tfs_Configuration' order by size desc Doing this on one of our SQL servers gives the following results: It is pretty easy to see on which collection to start the work Find out which tables are possibly too large Keep a special watch out for the Tfs_Attachment table. Use the script at the bottom of Grant’s blog to find the table sizes in descending size order. In our case we got this result: From Grant’s blog we learnt that the tbl_Content is in the Version Control category, so the major only big issue we have here is the tbl_AttachmentContent. Find out which team projects have possibly too large attachments In order to use the TAC to find and eventually delete attachment data we need to find out which team projects have these attachments. The team project is a required parameter to the TAC. Use the following query to find this, replace the collection database name with whatever applies in your case: use Tfs_DefaultCollection select p.projectname, sum(a.compressedlength)/1024/1024 as sizeInMB from dbo.tbl_Attachment as a inner join tbl_testrun as tr on a.testrunid=tr.testrunid inner join tbl_project as p on p.projectid=tr.projectid group by p.projectname order by sum(a.compressedlength) desc In our case we got this result (had to remove some names), out of more than 100 team projects accumulated over quite some years: As can be seen here it is pretty obvious the “Byggtjeneste – Projects” are the main team project to take care of, with the ones on lines 2-4 as the next ones. Check which attachment types takes up the most space It can be nice to know which attachment types takes up the space, so run the following query: use Tfs_DefaultCollection select a.attachmenttype, sum(a.compressedlength)/1024/1024 as sizeInMB from dbo.tbl_Attachment as a inner join tbl_testrun as tr on a.testrunid=tr.testrunid inner join tbl_project as p on p.projectid=tr.projectid group by a.attachmenttype order by sum(a.compressedlength) desc We then got this result: From this it is pretty obvious that the problem here is the binary files, as also mentioned in Anu’s blog. Check which file types, by their extension, takes up the most space Run the following query use Tfs_DefaultCollection select SUBSTRING(filename,len(filename)-CHARINDEX('.',REVERSE(filename))+2,999)as Extension, sum(compressedlength)/1024 as SizeInKB from tbl_Attachment group by SUBSTRING(filename,len(filename)-CHARINDEX('.',REVERSE(filename))+2,999) order by sum(compressedlength) desc This gives a result like this: Now you should have collected enough information to tell you what to do – if you got to do something, and some of the information you need in order to set up your TAC settings file, both for a cleanup and for scheduled maintenance later. Get your TFS server and environment properly set up Even if you have got the problem or if have yet not got the problem, you should ensure the TFS server is set up so that the risk of getting into this problem is minimized. To ensure this you should install the following set of updates and components. The assumption is that your TFS Server is at SP1 level. Install the QFE for KB2608743 – which also contains detailed instructions on its use, download from here. The QFE changes the default settings to not upload deployed binaries, which are used in automated test runs. Binaries will still be uploaded if: Code coverage is enabled in the test settings. You change the UploadDeploymentItem to true in the testsettings file. Be aware that this might be reset back to false by another user which haven't installed this QFE. The hotfix should be installed to The build servers (the build agents) The machine hosting the Test Controller Local development computers (Visual Studio) Local test computers (MTM) It is not required to install it to the TFS Server, test agents or the build controller – it has no effect on these programs. If you use the SQL Server 2008 R2 you should also install the CU 10 (or later). This CU fixes a potential problem of hanging “ghost” files. This seems to happen only in certain trigger situations, but to ensure it doesn’t bite you, it is better to make sure this CU is installed. There is no such CU for SQL Server 2008 pre-R2 Work around: If you suspect hanging ghost files, they can be – with some mental effort, deduced from the ghost counters using the following SQL query: use master SELECT DB_NAME(database_id) as 'database',OBJECT_NAME(object_id) as 'objectname', index_type_desc,ghost_record_count,version_ghost_record_count,record_count,avg_record_size_in_bytes FROM sys.dm_db_index_physical_stats (DB_ID(N'<DatabaseName>'), OBJECT_ID(N'<TableName>'), NULL, NULL , 'DETAILED') The problem is a stalled ghost cleanup process. Restarting the SQL server after having stopped all components that depends on it, like the TFS Server and SPS services – that is all applications that connect to the SQL server. Then restart the SQL server, and finally start up all dependent processes again. (I would guess a complete server reboot would do the trick too.) After this the ghost cleanup process will run properly again. The fix will come in the next CU cycle for SQL Server R2 SP1. The R2 pre-SP1 and R2 SP1 have separate maintenance cycles, and are maintained individually. Each have its own set of CU’s. When it comes I will add the link here to that CU. The "hanging ghost file” issue came up after one have run the TAC, and deleted enourmes amount of data. The SQL Server can get into this hanging state (without the QFE) in certain cases due to this. And of course, install and set up the Test Attachment Cleaner command line power tool. This should be done following some guidelines from Ravi Shanker: “When you run TAC, ensure that you are deleting small chunks of data at regular intervals (say run TAC every night at 3AM to delete data that is between age 730 to 731 days) – this will ensure that small amounts of data are being deleted and SQL ghosted record cleanup can catch up with the number of deletes performed. “ This rule minimizes the risk of the ghosted hang problem to occur, and further makes it easier for the SQL server ghosting process to work smoothly. “Run DBCC SHRINKDB post the ghosted records are cleaned up to physically reclaim the space on the file system” This is the last step in a 3 step process of removing SQL server data. First they are logically deleted. Then they are cleaned out by the ghosting process, and finally removed using the shrinkdb command. Cleaning out the attachments The TAC is run from the command line using a set of parameters and controlled by a settingsfile. The parameters point out a server uri including the team project collection and also point at a specific team project. So in order to run this for multiple team projects regularly one has to set up a script to run the TAC multiple times, once for each team project. When you install the TAC there is a very useful readme file in the same directory. When the deployment binaries are published to the TFS server, ALL items are published up from the deployment folder. That often means much more files than you would assume are necessary. This is a brute force technique. It works, but you need to take care when cleaning up. Grant has shown how their settings file looks in his blog post, removing all attachments older than 180 days , as long as there are no active workitems connected to them. This setting can be useful to clean out all items, both in a clean-up once operation, and in a general There are two scenarios we need to consider: Cleaning up an existing overgrown database Maintaining a server to avoid an overgrown database using scheduled TAC 1. Cleaning up a database which has grown too big due to these attachments. This job is a “Once” job. We do this once and then move on to make sure it won’t happen again, by taking the actions in 2) below. In this scenario you should only consider the large files. Your goal should be to simply reduce the size, and don’t bother about the smaller stuff. That can be left a scheduled TAC cleanup ( 2 below). Here you can use a very general settings file, and just remove the large attachments, or you can choose to remove any old items. Grant’s settings file is an example of the last one. A settings file to remove only large attachments could look like this:  <DeletionCriteria> <TestRun /> <Attachment> <SizeInMB GreaterThan="10" /> </Attachment> </DeletionCriteria> Or like this: If you want only to remove dll’s and pdb’s about that size, add an Extensions-section. Without that section, all extensions will be deleted.  <DeletionCriteria> <TestRun /> <Attachment> <SizeInMB GreaterThan="10" /> <Extensions> <Include value="dll" /> <Include value="pdb" /> </Extensions> </Attachment> </DeletionCriteria> Before you start up your scheduled maintenance, you should clear out all older items. 2. Scheduled maintenance using the TAC If you run a schedule every night, and remove old items, and also remove them in small batches. It is important to run this often, like every night, in order to keep the number of deleted items low. That way the SQL ghost process works better. One approach could be to delete all items older than some number of days, let’s say 180 days. This could be combined with restricting it to keep attachments with active or resolved bugs. Doing this every night ensures that only small amounts of data is deleted.  <DeletionCriteria> <TestRun> <AgeInDays OlderThan="180" /> </TestRun> <Attachment /> <LinkedBugs> <Exclude state="Active" /> <Exclude state="Resolved"/> </LinkedBugs> </DeletionCriteria> In my experience there are projects which are left with active or resolved workitems, akthough no further work is done. It can be wise to have a cleanup process with no restrictions on linked bugs at all. Note that you then have to remove the whole LinkedBugs section. A approach which could work better here is to do a two step approach, use the schedule above to with no LinkedBugs as a sweeper cleaning task taking away all data older than you could care about. Then have another scheduled TAC task to take out more specifically attachments that you are not likely to use. This task could be much more specific, and based on your analysis clean out what you know is troublesome data.  <DeletionCriteria> <TestRun > <AgeInDays OlderThan="30" /> </TestRun> <Attachment> <SizeInMB GreaterThan="10" /> <Extensions> <Include value="iTrace"/> <Include value="dll"/> <Include value="pdb"/> <Include value="wmv"/> </Extensions> </Attachment> <LinkedBugs> <Exclude state="Active" /> <Exclude state="Resolved" /> </LinkedBugs> </DeletionCriteria> The readme document for the TAC says that it recognizes “internal” extensions, but it does recognize any extension. To run the tool do the following command: tcmpt attachmentcleanup /collection:your_tfs_collection_url /teamproject:your_team_project /settingsfile:path_to_settingsfile /outputfile:%temp%/teamproject.tcmpt.log /mode:delete Shrinking the database You could run a shrink database command after the TAC has run in cases where there are a lot of data being deleted. In this case you SHOULD do it, to free up all that space. But, after the shrink operation you should do a rebuild indexes, since the shrink operation will leave the database in a very fragmented state, which will reduce performance. Note that you need to rebuild indexes, reorganizing is not enough. For smaller amounts of data you should NOT shrink the database, since the data will be reused by the SQL server when it need to add more records. In fact, it is regarded as a bad practice to shrink the database regularly. So on a daily maintenance schedule you should NOT shrink the database. To shrink the database you do a DBCC SHRINKDATABASE command, and then follow up with a DBCC INDEXDEFRAG afterwards. I find the easiest way to do this is to create a SQL Maintenance plan including the Shrink Database Task and the Rebuild Index Task and just execute it when you need to do this.

Read the article
Abstracting functionality

- by Ralf Westphal

Originally posted on: http://geekswithblogs.net/theArchitectsNapkin/archive/2014/08/22/abstracting-functionality.aspxWhat is more important than data? Functionality. Yes, I strongly believe we should switch to a functionality over data mindset in programming. Or actually switch back to it. Focus on functionality Functionality once was at the core of software development. Back when algorithms were the first thing you heard about in CS classes. Sure, data structures, too, were important - but always from the point of view of algorithms. (Niklaus Wirth gave one of his books the title “Algorithms + Data Structures” instead of “Data Structures + Algorithms” for a reason.) The reason for the focus on functionality? Firstly, because software was and is about doing stuff. Secondly because sufficient performance was hard to achieve, and only thirdly memory efficiency. But then hardware became more powerful. That gave rise to a new mindset: object orientation. And with it functionality was devalued. Data took over its place as the most important aspect. Now discussions revolved around structures motivated by data relationships. (John Beidler gave his book the title “Data Structures and Algorithms: An Object Oriented Approach” instead of the other way around for a reason.) Sure, this data could be embellished with functionality. But nevertheless functionality was second. When you look at (domain) object models what you mostly find is (domain) data object models. The common object oriented approach is: data aka structure over functionality. This is true even for the most modern modeling approaches like Domain Driven Design. Look at the literature and what you find is recommendations on how to get data structures right: aggregates, entities, value objects. I´m not saying this is what object orientation was invented for. But I´m saying that´s what I happen to see across many teams now some 25 years after object orientation became mainstream through C++, Delphi, and Java. But why should we switch back? Because software development cannot become truly agile with a data focus. The reason for that lies in what customers need first: functionality, behavior, operations. To be clear, that´s not why software is built. The purpose of software is to be more efficient than the alternative. Money mainly is spent to get a certain level of quality (e.g. performance, scalability, security etc.). But without functionality being present, there is nothing to work on the quality of. What customers want is functionality of a certain quality. ASAP. And tomorrow new functionality needs to be added, existing functionality needs to be changed, and quality needs to be increased. No customer ever wanted data or structures. Of course data should be processed. Data is there, data gets generated, transformed, stored. But how the data is structured for this to happen efficiently is of no concern to the customer. Ask a customer (or user) whether she likes the data structured this way or that way. She´ll say, “I don´t care.” But ask a customer (or user) whether he likes the functionality and its quality this way or that way. He´ll say, “I like it” (or “I don´t like it”). Build software incrementally From this very natural focus of customers and users on functionality and its quality follows we should develop software incrementally. That´s what Agility is about. Deliver small increments quickly and often to get frequent feedback. That way less waste is produced, and learning can take place much easier (on the side of the customer as well as on the side of developers). An increment is some added functionality or quality of functionality.[1] So as it turns out, Agility is about functionality over whatever. But software developers’ thinking is still stuck in the object oriented mindset of whatever over functionality. Bummer. I guess that (at least partly) explains why Agility always hits a glass ceiling in projects. It´s a clash of mindsets, of cultures. Driving software development by demanding small increases in functionality runs against thinking about software as growing (data) structures sprinkled with functionality. (Excuse me, if this sounds a bit broad-brush. But you get my point.) The need for abstraction In the end there need to be data structures. Of course. Small and large ones. The phrase functionality over data does not deny that. It´s not functionality instead of data or something. It´s just over, i.e. functionality should be thought of first. It´s a tad more important. It´s what the customer wants. That´s why we need a way to design functionality. Small and large. We need to be able to think about functionality before implementing it. We need to be able to reason about it among team members. We need to be able to communicate our mental models of functionality not just by speaking about them, but also on paper. Otherwise reasoning about it does not scale. We learned thinking about functionality in the small using flow charts, Nassi-Shneiderman diagrams, pseudo code, or UML sequence diagrams. That´s nice and well. But it does not scale. You can use these tools to describe manageable algorithms. But it does not work for the functionality triggered by pressing the “1-Click Order” on an amazon product page for example. There are several reasons for that, I´d say. Firstly, the level of abstraction over code is negligible. It´s essentially non-existent. Drawing a flow chart or writing pseudo code or writing actual code is very, very much alike. All these tools are about control flow like code is.[2] In addition all tools are computationally complete. They are about logic which is expressions and especially control statements. Whatever you code in Java you can fully (!) describe using a flow chart. And then there is no data. They are about control flow and leave out the data altogether. Thus data mostly is assumed to be global. That´s shooting yourself in the foot, as I hope you agree. Even if it´s functionality over data that does not mean “don´t think about data”. Right to the contrary! Functionality only makes sense with regard to data. So data needs to be in the picture right from the start - but it must not dominate the thinking. The above tools fail on this. Bottom line: So far we´re unable to reason in a scalable and abstract manner about functionality. That´s why programmers are so driven to start coding once they are presented with a problem. Programming languages are the only tool they´ve learned to use to reason about functional solutions. Or, well, there might be exceptions. Mathematical notation and SQL may have come to your mind already. Indeed they are tools on a higher level of abstraction than flow charts etc. That´s because they are declarative and not computationally complete. They leave out details - in order to deliver higher efficiency in devising overall solutions. We can easily reason about functionality using mathematics and SQL. That´s great. Except for that they are domain specific languages. They are not general purpose. (And they don´t scale either, I´d say.) Bummer. So to be more precise we need a scalable general purpose tool on a higher than code level of abstraction not neglecting data. Enter: Flow Design. Abstracting functionality using data flows I believe the solution to the problem of abstracting functionality lies in switching from control flow to data flow. Data flow very naturally is not about logic details anymore. There are no expressions and no control statements anymore. There are not even statements anymore. Data flow is declarative by nature. With data flow we get rid of all the limiting traits of former approaches to modeling functionality. In addition, nomen est omen, data flows include data in the functionality picture. With data flows, data is visibly flowing from processing step to processing step. Control is not flowing. Control is wherever it´s needed to process data coming in. That´s a crucial difference and needs some rewiring in your head to be fully appreciated.[2] Since data flows are declarative they are not the right tool to describe algorithms, though, I´d say. With them you don´t design functionality on a low level. During design data flow processing steps are black boxes. They get fleshed out during coding. Data flow design thus is more coarse grained than flow chart design. It starts on a higher level of abstraction - but then is not limited. By nesting data flows indefinitely you can design functionality of any size, without losing sight of your data. Data flows scale very well during design. They can be used on any level of granularity. And they can easily be depicted. Communicating designs using data flows is easy and scales well, too. The result of functional design using data flows is not algorithms (too low level), but processes. Think of data flows as descriptions of industrial production lines. Data as material runs through a number of processing steps to be analyzed, enhances, transformed. On the top level of a data flow design might be just one processing step, e.g. “execute 1-click order”. But below that are arbitrary levels of flows with smaller and smaller steps. That´s not layering as in “layered architecture”, though. Rather it´s a stratified design à la Abelson/Sussman. Refining data flows is not your grandpa´s functional decomposition. That was rooted in control flows. Refining data flows does not suffer from the limits of functional decomposition against which object orientation was supposed to be an antidote. Summary I´ve been working exclusively with data flows for functional design for the past 4 years. It has changed my life as a programmer. What once was difficult is now easy. And, no, I´m not using Clojure or F#. And I´m not a async/parallel execution buff. Designing the functionality of increments using data flows works great with teams. It produces design documentation which can easily be translated into code - in which then the smallest data flow processing steps have to be fleshed out - which is comparatively easy. Using a systematic translation approach code can mirror the data flow design. That way later on the design can easily be reproduced from the code if need be. And finally, data flow designs play well with object orientation. They are a great starting point for class design. But that´s a story for another day. To me data flow design simply is one of the missing links of systematic lightweight software design. There are also other artifacts software development can produce to get feedback, e.g. process descriptions, test cases. But customers can be delighted more easily with code based increments in functionality. ? No, I´m not talking about the endless possibilities this opens for parallel processing. Data flows are useful independently of multi-core processors and Actor-based designs. That´s my whole point here. Data flows are good for reasoning and evolvability. So forget about any special frameworks you might need to reap benefits from data flows. None are necessary. Translating data flow designs even into plain of Java is possible. ?

Read the article
64-bit Archives Needed

- by user9154181

A little over a year ago, we received a question from someone who was trying to build software on Solaris. He was getting errors from the ar command when creating an archive. At that time, the ar command on Solaris was a 32-bit command. There was more than 2GB of data, and the ar command was hitting the file size limit for a 32-bit process that doesn't use the largefile APIs. Even in 2011, 2GB is a very large amount of code, so we had not heard this one before. Most of our toolchain was extended to handle 64-bit sized data back in the 1990's, but archives were not changed, presumably because there was no perceived need for it. Since then of course, programs have continued to get larger, and in 2010, the time had finally come to investigate the issue and find a way to provide for larger archives. As part of that process, I had to do a deep dive into the archive format, and also do some Unix archeology. I'm going to record what I learned here, to document what Solaris does, and in the hope that it might help someone else trying to solve the same problem for their platform. Archive Format Details Archives are hardly cutting edge technology. They are still used of course, but their basic form hasn't changed in decades. Other than to fix a bug, which is rare, we don't tend to touch that code much. The archive file format is described in /usr/include/ar.h, and I won't repeat the details here. Instead, here is a rough overview of the archive file format, implemented by System V Release 4 (SVR4) Unix systems such as Solaris: Every archive starts with a "magic number". This is a sequence of 8 characters: "!<arch>\n". The magic number is followed by 1 or more members. A member starts with a fixed header, defined by the ar_hdr structure in/usr/include/ar.h. Immediately following the header comes the data for the member. Members must be padded at the end with newline characters so that they have even length. The requirement to pad members to an even length is a dead giveaway as to the age of the archive format. It tells you that this format dates from the 1970's, and more specifically from the era of 16-bit systems such as the PDP-11 that Unix was originally developed on. A 32-bit system would have required 4 bytes, and 64-bit systems such as we use today would probably have required 8 bytes. 2 byte alignment is a poor choice for ELF object archive members. 32-bit objects require 4 byte alignment, and 64-bit objects require 64-bit alignment. The link-editor uses mmap() to process archives, and if the members have the wrong alignment, we have to slide (copy) them to the correct alignment before we can access the ELF data structures inside. The archive format requires 2 byte padding, but it doesn't prohibit more. The Solaris ar command takes advantage of this, and pads ELF object members to 8 byte boundaries. Anything else is padded to 2 as required by the format. The archive header (ar_hdr) represents all numeric values using an ASCII text representation rather than as binary integers. This means that an archive that contains only text members can be viewed using tools such as cat, more, or a text editor. The original designers of this format clearly thought that archives would be used for many file types, and not just for objects. Things didn't turn out that way of course — nearly all archives contain relocatable objects for a single operating system and machine, and are used primarily as input to the link-editor (ld). Archives can have special members that are created by the ar command rather than being supplied by the user. These special members are all distinguished by having a name that starts with the slash (/) character. This is an unambiguous marker that says that the user could not have supplied it. The reason for this is that regular archive members are given the plain name of the file that was inserted to create them, and any path components are stripped off. Slash is the delimiter character used by Unix to separate path components, and as such cannot occur within a plain file name. The ar command hides the special members from you when you list the contents of an archive, so most users don't know that they exist. There are only two possible special members: A symbol table that maps ELF symbols to the object archive member that provides it, and a string table used to hold member names that exceed 15 characters. The '/' convention for tagging special members provides room for adding more such members should the need arise. As I will discuss below, we took advantage of this fact to add an alternate 64-bit symbol table special member which is used in archives that are larger than 4GB. When an archive contains ELF object members, the ar command builds a special archive member known as the symbol table that maps all ELF symbols in the object to the archive member that provides it. The link-editor uses this symbol table to determine which symbols are provided by the objects in that archive. If an archive has a symbol table, it will always be the first member in the archive, immediately following the magic number. Unlike member headers, symbol tables do use binary integers to represent offsets. These integers are always stored in big-endian format, even on a little endian host such as x86. The archive header (ar_hdr) provides 15 characters for representing the member name. If any member has a name that is longer than this, then the real name is written into a special archive member called the string table, and the member's name field instead contains a slash (/) character followed by a decimal representation of the offset of the real name within the string table. The string table is required to precede all normal archive members, so it will be the second member if the archive contains a symbol table, and the first member otherwise. The archive format is not designed to make finding a given member easy. Such operations move through the archive from front to back examining each member in turn, and run in O(n) time. This would be bad if archives were commonly used in that manner, but in general, they are not. Typically, the ar command is used to build an new archive from scratch, inserting all the objects in one operation, and then the link-editor accesses the members in the archive in constant time by using the offsets provided by the symbol table. Both of these operations are reasonably efficient. However, listing the contents of a large archive with the ar command can be rather slow. Factors That Limit Solaris Archive Size As is often the case, there was more than one limiting factor preventing Solaris archives from growing beyond the 32-bit limits of 2GB (32-bit signed) and 4GB (32-bit unsigned). These limits are listed in the order they are hit as archive size grows, so the earlier ones mask those that follow. The original Solaris archive file format can handle sizes up to 4GB without issue. However, the ar command was delivered as a 32-bit executable that did not use the largefile APIs. As such, the ar command itself could not create a file larger than 2GB. One can solve this by building ar with the largefile APIs which would allow it to reach 4GB, but a simpler and better answer is to deliver a 64-bit ar, which has the ability to scale well past 4GB. Symbol table offsets are stored as 32-bit big-endian binary integers, which limits the maximum archive size to 4GB. To get around this limit requires a different symbol table format, or an extension mechanism to the current one, similar in nature to the way member names longer than 15 characters are handled in member headers. The size field in the archive member header (ar_hdr) is an ASCII string capable of representing a 32-bit unsigned value. This places a 4GB size limit on the size of any individual member in an archive. In considering format extensions to get past these limits, it is important to remember that very few archives will require the ability to scale past 4GB for many years. The old format, while no beauty, continues to be sufficient for its purpose. This argues for a backward compatible fix that allows newer versions of Solaris to produce archives that are compatible with older versions of the system unless the size of the archive exceeds 4GB. Archive Format Differences Among Unix Variants While considering how to extend Solaris archives to scale to 64-bits, I wanted to know how similar archives from other Unix systems are to those produced by Solaris, and whether they had already solved the 64-bit issue. I've successfully moved archives between different Unix systems before with good luck, so I knew that there was some commonality. If it turned out that there was already a viable defacto standard for 64-bit archives, it would obviously be better to adopt that rather than invent something new. The archive file format is not formally standardized. However, the ar command and archive format were part of the original Unix from Bell Labs. Other systems started with that format, extending it in various often incompatible ways, but usually with the same common shared core. Most of these systems use the same magic number to identify their archives, despite the fact that their archives are not always fully compatible with each other. It is often true that archives can be copied between different Unix variants, and if the member names are short enough, the ar command from one system can often read archives produced on another. In practice, it is rare to find an archive containing anything other than objects for a single operating system and machine type. Such an archive is only of use on the type of system that created it, and is only used on that system. This is probably why cross platform compatibility of archives between Unix variants has never been an issue. Otherwise, the use of the same magic number in archives with incompatible formats would be a problem. I was able to find information for a number of Unix variants, described below. These can be divided roughly into three tribes, SVR4 Unix, BSD Unix, and IBM AIX. Solaris is a SVR4 Unix, and its archives are completely compatible with those from the other members of that group (GNU/Linux, HP-UX, and SGI IRIX). AIX AIX is an exception to rule that Unix archive formats are all based on the original Bell labs Unix format. It appears that AIX supports 2 formats (small and big), both of which differ in fundamental ways from other Unix systems: These formats use a different magic number than the standard one used by Solaris and other Unix variants. They include support for removing archive members from a file without reallocating the file, marking dead areas as unused, and reusing them when new archive items are inserted. They have a special table of contents member (File Member Header) which lets you find out everything that's in the archive without having to actually traverse the entire file. Their symbol table members are quite similar to those from other systems though. Their member headers are doubly linked, containing offsets to both the previous and next members. Of the Unix systems described here, AIX has the only format I saw that will have reasonable insert/delete performance for really large archives. Everyone else has O(n) performance, and are going to be slow to use with large archives. BSD BSD has gone through 4 versions of archive format, which are described in their manpage. They use the same member header as SVR4, but their symbol table format is different, and their scheme for long member names puts the name directly after the member header rather than into a string table. GNU/Linux The GNU toolchain uses the SVR4 format, and is compatible with Solaris. HP-UX HP-UX seems to follow the SVR4 model, and is compatible with Solaris. IRIX IRIX has 32 and 64-bit archives. The 32-bit format is the standard SVR4 format, and is compatible with Solaris. The 64-bit format is the same, except that the symbol table uses 64-bit integers. IRIX assumes that an archive contains objects of a single ELFCLASS/MACHINE, and any archive containing ELFCLASS64 objects receives a 64-bit symbol table. Although they only use it for 64-bit objects, nothing in the archive format limits it to ELFCLASS64. It would be perfectly valid to produce a 64-bit symbol table in an archive containing 32-bit objects, text files, or anything else. Tru64 Unix (Digital/Compaq/HP) Tru64 Unix uses a format much like ours, but their symbol table is a hash table, making specific symbol lookup much faster. The Solaris link-editor uses archives by examining the entire symbol table looking for unsatisfied symbols for the link, and not by looking up individual symbols, so there would be no benefit to Solaris from such a hash table. The Tru64 ld must use a different approach in which the hash table pays off for them. Widening the existing SVR4 archive symbol tables rather than inventing something new is the simplest path forward. There is ample precedent for this approach in the ELF world. When ELF was extended to support 64-bit objects, the approach was largely to take the existing data structures, and define 64-bit versions of them. We called the old set ELF32, and the new set ELF64. My guess is that there was no need to widen the archive format at that time, but had there been, it seems obvious that this is how it would have been done. The Implementation of 64-bit Solaris Archives As mentioned earlier, there was no desire to improve the fundamental nature of archives. They have always had O(n) insert/delete behavior, and for the most part it hasn't mattered. AIX made efforts to improve this, but those efforts did not find widespread adoption. For the purposes of link-editing, which is essentially the only thing that archives are used for, the existing format is adequate, and issues of backward compatibility trump the desire to do something technically better. Widening the existing symbol table format to 64-bits is therefore the obvious way to proceed. For Solaris 11, I implemented that, and I also updated the ar command so that a 64-bit version is run by default. This eliminates the 2 most significant limits to archive size, leaving only the limit on an individual archive member. We only generate a 64-bit symbol table if the archive exceeds 4GB, or when the new -S option to the ar command is used. This maximizes backward compatibility, as an archive produced by Solaris 11 is highly likely to be less than 4GB in size, and will therefore employ the same format understood by older versions of the system. The main reason for the existence of the -S option is to allow us to test the 64-bit format without having to construct huge archives to do so. I don't believe it will find much use outside of that. Other than the new ability to create and use extremely large archives, this change is largely invisible to the end user. When reading an archive, the ar command will transparently accept either form of symbol table. Similarly, the ELF library (libelf) has been updated to understand either format. Users of libelf (such as the link-editor ld) do not need to be modified to use the new format, because these changes are encapsulated behind the existing functions provided by libelf. As mentioned above, this work did not lift the limit on the maximum size of an individual archive member. That limit remains fixed at 4GB for now. This is not because we think objects will never get that large, for the history of computing says otherwise. Rather, this is based on an estimation that single relocatable objects of that size will not appear for a decade or two. A lot can change in that time, and it is better not to overengineer things by writing code that will sit and rot for years without being used. It is not too soon however to have a plan for that eventuality. When the time comes when this limit needs to be lifted, I believe that there is a simple solution that is consistent with the existing format. The archive member header size field is an ASCII string, like the name, and as such, the overflow scheme used for long names can also be used to handle the size. The size string would be placed into the archive string table, and its offset in the string table would then be written into the archive header size field using the same format "/ddd" used for overflowed names.

Read the article
PASS: Bylaw Changes

- by Bill Graziano

While you’re reading this, a post should be going up on the PASS blog on the plans to change our bylaws. You should be able to find our old bylaws, our proposed bylaws and a red-lined version of the changes. We plan to listen to feedback until March 31st. At that point we’ll decide whether to vote on these changes or take other action. The executive summary is that we’re adding a restriction to prevent more than two people from the same company on the Board and eliminating the Board’s Officer Appointment Committee to have Officers directly elected by the Board. This second change better matches how officer elections have been conducted in the past. The Gritty Details Our scope was to change bylaws to match how PASS actually works and tackle a limited set of issues. Changing the bylaws is hard. We’ve been working on these changes since the March board meeting last year. At that meeting we met and talked through the issues we wanted to address. In years past the Board has tried to come up with language and then we’ve discussed and negotiated to get to the result. In March, we gave HQ guidance on what we wanted and asked them to come up with a starting point. Hannes worked on building us an initial set of changes that we could work our way through. Discussing changes like this over email is difficult wasn’t very productive. We do a much better job on this at the in-person Board meetings. Unfortunately there are only 2 or 3 of those a year. In August we met in Nashville and spent time discussing the changes. That was also the day after we released the slate for the 2010 election. The discussion around that colored what we talked about in terms of these changes. We talked very briefly at the Summit and again reviewed and revised the changes at the Board meeting in January. This is the result of those changes and discussions. We made numerous small changes to clean up language and make wording more clear. We also made two big changes. Director Employment Restrictions The first is that only two people from the same company can serve on the Board at the same time. The actual language in section VI.3 reads: A maximum of two (2) Directors who are employed by, or who are joint owners or partners in, the same for-profit venture, company, organization, or other legal entity, may concurrently serve on the PASS Board of Directors at any time. The definition of “employed” is at the sole discretion of the Board. And what a mess this turns out to be in practice. Our membership is a hodgepodge of interlocking relationships. Let’s say three Board members get together and start a blog service for SQL Server bloggers. It’s technically for-profit. Let’s assume it makes $8 in the first year. Does that trigger this clause? (Technically yes.) We had a horrible time trying to write language that covered everything. All the sample bylaws that we found were just as vague as this. That led to the third clause in this section. The first sentence reads: The Board of Directors reserves the right, strictly on a case-by-case basis, to overrule the requirements of Section VI.3 by majority decision for any single Director’s conflict of employment. We needed some way to handle the trivial issues and exercise some judgment. It seems like a public vote is the best way. This discloses the relationship and gets each Board member on record on the issue. In practice I think this clause will rarely be used. I think this entire section will only be invoked for actual employment issues and not for small side projects. In either case we have the mechanisms in place to handle it in a public, transparent way. That’s the first and third clauses. The second clause says that if your situation changes and you fall afoul of this restriction you need to notify the Board. The clause further states that if this new job means a Board members violates the “two-per-company” rule the Board may request their resignation. The Board can also allow the person to continue serving with a majority vote. I think this will also take some judgment. Consider a person switching jobs that leads to three people from the same company. I’m very likely to ask for someone to resign if all three are two weeks into a two year term. I’m unlikely to ask anyone to resign if one is two weeks away from ending their term. In either case, the decision will be a public vote that we can be held accountable for. One concern that was raised was whether this would affect someone choosing to accept a job. I think that’s a choice for them to make. PASS is clearly stating its intent that only two directors from any one organization should serve at any time. Once these bylaws are approved, this policy should not come as a surprise to any potential or current Board members considering a job change. This clause isn’t perfect. The biggest hole is business relationships that aren’t defined above. Let’s say that two employees from company “X” serve on the Board. What happens if I accept a full-time consulting contract with that company? Let’s assume I’m working directly for one of the two existing Board members. That doesn’t violate section VI.3. But I think it’s clearly the kind of relationship we’d like to prevent. Unfortunately that was even harder to write than what we have now. I fully expect that in the next revision of the bylaws we’ll address this. It just didn’t make it into this one. Officer Elections The officer election process received a slightly different rewrite. Our goal was to codify in the bylaws the actual process we used to elect the officers. The officers are the President, Executive Vice-President (EVP) and Vice-President of Marketing. The Immediate Past President (IPP) is also an officer but isn’t elected. The IPP serves in that role for two years after completing their term as President. We do that for continuity’s sake. Some organizations have a President-elect that serves for one or two years. The group that founded PASS chose to have an IPP. When I started on the Board, the Nominating Committee (NomCom) selected the slate for the at-large directors and the slate for the officers. There was always one candidate for each officer position. It wasn’t really an election so much as the NomCom decided who the next person would be for each officer position. Behind the scenes the Board worked to select the best people for the role. In June 2009 that process was changed to bring it line with what actually happens. An Officer Appointment Committee was created that was a subset of the Board. That committee would take time to interview the candidates and present a slate to the Board for approval. The majority vote of the Board would determine the officers for the next two years. In practice the Board itself interviewed the candidates and conducted the elections. That means it was time to change the bylaws again. Section VII.2 and VII.3 spell out the process used to select the officers. We use the phrase “Officer Appointment” to separate it from the Director election but the end result is that the Board elects the officers. Section VII.3 starts: Officers shall be appointed bi-annually by a majority of all the voting members of the Board of Directors. Everything else revolves around that sentence. We use the word appoint but they truly are elected. There are details in the bylaws for term limits, minimum requirements for President (1 prior term as an officer), tie breakers and filling vacancies. In practice we will have an election for President, then an election for EVP and then an election for VP Marketing. That means that losing candidates will be able to fall down the ladder and run for the next open position. Another point to note is that officers aren’t at-large directors. That means if a current sitting officer loses all three elections they are off the Board. Having Board member votes public will help with the transparency of this approach. This process has a number of positive and negatives. The biggest concern I expect to hear is that our members don’t directly choose the officers. I’m going to try and list all the positives and negatives of this approach. Many non-profits value continuity and are slower to change than a business. On the plus side this promotes that. On the negative side this promotes that. If we change too slowly the members complain that we aren’t responsive. If we change too quickly we make mistakes and fail at various things. We’ve been criticized for both of those lately so I’m not entirely sure where to draw the line. My rough assumption to this point is that we’re going too slow on governance and too quickly on becoming “more than a Summit.” This approach creates competition in the officer elections. If you are an at-large director there is no consequence to losing an election. If you are an officer the only way to stay on the Board is to win an officer election or an at-large election. If you are an officer and lose an election you can always run for the next office down. This makes it very easy for multiple people to contest an election. There is value in a person moving through the officer positions up to the Presidency. Having the Board select the officers promotes this. The down side is that it takes a LOT of time to get to the Presidency. We’ve had good people struggle with burnout. We’ve had lots of discussion around this. The process as we’ve described it here makes it possible for someone to move quickly through the ranks but doesn’t prevent people from working their way up through each role. We talked long and hard about having the officers elected by the members. We had a self-imposed deadline to complete these changes prior to elections this summer. The other challenge was that our original goal was to make the bylaws reflect our actual process rather than create a new one. I believe we accomplished this goal. We ran out of time to consider this option in the detail it needs. Having member elections for officers needs a number of problems solved. We would need a way for candidates to fall through the election. This is what promotes competition. Without this few people would risk an election and we’ll be back to one candidate per slot. We need to do this without having multiple elections. We may be able to copy what other organizations are doing but I was surprised at how little I could find on other organizations. We also need a way for people that lose an officer election to win an at-large election. Otherwise we’ll have very little competition for officers. This brings me to an area that I think we as a Board haven’t done a good job. We haven’t built a strong process to tell you who is doing a good job and who isn’t. This is a double-edged sword. I don’t want to highlight Board members that are failing. That’s not a good way to get people to volunteer and run for the Board. But I also need a way let the members make an informed choice about who is doing a good job and would make a good officer. Encouraging Board members to blog, publishing minutes and making votes public helps in that regard but isn’t the final answer. I don’t know what the final answer is yet. I do know that the Board members themselves are uniquely positioned to know which other Board members are doing good work. They know who speaks up in meetings, who works to build consensus, who has good ideas and who works with the members. What I Could Do Better I’ve learned a lot writing this about how we communicated with our members. The next time we revise the bylaws I’d do a few things differently. The biggest change would be to provide better documentation. The March 2009 minutes provide a very detailed look into what changes we wanted to make to the bylaws. Looking back, I’m a little surprised at how closely they matched our final changes and covered the various arguments. If you just read those you’d get 90% of what we eventually changed. Nearly everything else was just details around implementation. I’d also consider publishing a scope document defining exactly what we were doing any why. I think it really helped that we had a limited, defined goal in mind. I don’t think we did a good job communicating that goal outside the meeting minutes though. That said, I wish I’d blogged more after the August and January meeting. I think it would have helped more people to know that this change was coming and to be ready for it. Conclusion These changes address two big concerns that the Board had. First, it prevents a single organization from dominating the Board. Second, it codifies and clearly spells out how officers are elected. This is the process that was previously followed but it was somewhat murky. These changes bring clarity to this and clearly explain the process the Board will follow. We’re going to listen to feedback until March 31st. At that time we’ll decide whether to approve these changes. I’m also assuming that we’ll start another round of changes in the next year or two. Are there other issues in the bylaws that we should tackle in the future?

Read the article
CodePlex Daily Summary for Monday, January 24, 2011

CodePlex Daily Summary for Monday, January 24, 2011Popular ReleasesDeveloper Guidance - Onboarding Windows Phone 7: Fuel Tracker Source: Release NotesThis project is almost complete. We'll be making small updates to the code and documentation over the next week or so. We look forward to your feedback. This documentation and accompanying sample application will get you started creating a complete application for Windows Phone 7. You will learn about common developer issues in the context of a simple fuel-tracking application named Fuel Tracker. Some of the tasks that you will learn include the following: Creating a UI that...Password Generator: 2.2: Parallel password generation Password strength calculation ( Same method used by Microsoft here : https://www.microsoft.com/protect/fraud/passwords/checker.aspx ) Minor code refactoringProcess Enactment Tool Framework: PET 1.2.3: Bug fixing release. Introduced installer. General Upgraded to Visual Studio 2010 Created installer PET Wizard Fixed log bug if PET is placed in read-only folder SharePoint Tool Provider Fixed provider crash in case SharePoint.dll is not availableVisual Studio 2010 Architecture Tooling Guidance: Spanish - Architecture Guidance: Francisco Fagas http://geeks.ms/blogs/ffagas, Microsoft Most Valuable Professional (MVP), localized the Visual Studio 2010 Quick Reference Guidance for the Spanish communities, based on http://vsarchitectureguide.codeplex.com/releases/view/47828. Release Notes The guidance is available in a xps-only (default) or complete package. The complete package contains the files in xps, pdf and Office 2007 formats. 2011-01-24 Publish version 1.0 of the Spanish localized bits.ASP.NET MVC Project Awesome, jQuery Ajax helpers (controls): 1.6.2: A rich set of helpers (controls) that you can use to build highly responsive and interactive Ajax-enabled Web applications. These helpers include Autocomplete, AjaxDropdown, Lookup, Confirm Dialog, Popup Form, Popup and Pager the html generation has been optimized, the html page size is much smaller nowSSH.NET Library: 2011.1.24: No new features in this release, only fixes to issues reported by users: Fixes Fix SFTP path resolution to allow for client to work with glibc and BSD based servers. Close handle when file download completes Fix reusing command object to execute different commands.Facebook Graph Toolkit: Facebook Graph Toolkit 0.6: new Facebook Graph objects: Application, Page, Post, Comment Improved Intellisense documentation new Graph Api connections: albums, photos, posts, feed, home, friends JSON Toolkit upgraded to version 0.9 (beta release) with bug fixes and new features bug fixed: error when handling empty JSON arrays bug fixed: error when handling JSON array with square or large brackets in the message bug fixed: error when handling JSON obejcts with double quotation in the message bug fixed: erro...Microsoft All-In-One Code Framework: Visual Studio 2008 Code Samples 2011-01-23: Code samples for Visual Studio 2008BloodSim: BloodSim - 1.4.0.0: This version requires an update for WControls.dll. - Removed option to use Old Rune Strike - Fixed an issue that was causing Ratings to not properly update when running Progressive simulations - Ability data is now loaded from an XML file in the BloodSim directory that is user editable. This data will be reloaded each time a fresh simulation is run. - Added toggle for showing Graph window. When unchecked, output data will instead be saved to a text file in the BloodSim directory based on the...MVVM Light Toolkit: MVVM Light Toolkit V3 SP1 (3): Instructions for installation: http://www.galasoft.ch/mvvm/installing/manually/ Includes the hotfix templates for Windows Phone 7 development. This is only relevant if you didn't already install the hotfix described at http://blog.galasoft.ch/archive/2010/07/22/mvvm-light-hotfix-for-windows-phone-7-developer-tools-beta.aspx.Community Forums NNTP bridge: Community Forums NNTP Bridge V42: Release of the Community Forums NNTP Bridge to access the social and anwsers MS forums with a single, open source NNTP bridge. This release has added some features / bugfixes: Bugfix: Decoding of Subject now also supports multi-line subjects (occurs only if you have very long subjects with non-ASCII characters)Minecraft Tools: Minecraft Topographical Survey 1.3: MTS requires version 4 of the .NET Framework - you must download it from Microsoft if you have not previously installed it. This version of MTS adds automatic block list updates, so MTS will recognize blocks added in game updates properly rather than drawing them in bright pink. New in this version of MTS: Support for all new blocks added since the Halloween update Auto-update of blockcolors.xml to support future game updates A splash screen that shows while the program searches for upd...StyleCop for ReSharper: StyleCop for ReSharper 5.1.14996.000: New Features: ============= This release is just compiled against the latest release of JetBrains ReSharper 5.1.1766.4 Previous release: A considerable amount of work has gone into this release: Huge focus on performance around the violation scanning subsystem: - caching added to reduce IO operations around reading and merging of settings files - caching added to reduce creation of expensive objects Users should notice condsiderable perf boost and a decrease in memory usage. Bug Fixes...jqGrid ASP.Net MVC Control: Version 1.2.0.0: jqGrid 3.8 support jquery 1.4 support New and exciting features Many bugfixes Complete separation from the jquery, & jqgrid codeMediaScout: MediaScout 3.0 Preview 4: Update ReleaseCoding4Fun Tools: Coding4Fun.Phone.Toolkit v1: Coding4Fun.Phone.Toolkit v1MFCMAPI: January 2011 Release: Build: 6.0.0.1024 Full release notes at SGriffin's blog. If you just want to run the tool, get the executable. If you want to debug it, get the symbol file and the source. The 64 bit build will only work on a machine with Outlook 2010 64 bit installed. All other machines should use the 32 bit build, regardless of the operating system. Facebook BadgeAutoLoL: AutoLoL v1.5.4: Added champion: Renekton Removed automatic file association Fix: The recent files combobox didn't always open a file when an item was selected Fix: Removing a recently opened file caused an errorDotNetNuke® Community Edition: 05.06.01: Major Highlights Fixed issue to remove preCondition checks when upgrading to .Net 4.0 Fixed issue where some valid domains were failing email validation checks. Fixed issue where editing Host menu page settings assigns the page to a Portal. Fixed issue which caused XHTML validation problems in 5.6.0 Fixed issue where an aspx page in any subfolder was inaccessible. Fixed issue where Config.Touch method signature had an unintentional breaking change in 5.6.0 Fixed issue which caused...iTracker Asp.Net Starter Kit: Version 3.0.0: This is the inital release of the version 3.0.0 Visual Studio 2010 (.Net 4.0) remake of the ITracker application. I connsider this a working, stable application but since there are still some features missing to make it "complete" I'm leaving it listed as a "beta" release. I am hoping to make it feature complete for v3.1.0 but anything is possible.New ProjectsAIMP2 .NET Interop Plugin: AIMP2 .NET Interop Plugin allows to write plugin for AIMP2 with managed code (C#, VB and so on).asp.net c# website: hot millions project is an asp.net c# sql express website where you can login and book and order your menu items. This project is built in Visual studio 2008 with asp.net c# sql express editionAutomatic Build Updater: A small tool with scripts to show the recent build of your program on your homepage or blog. It works fully automatically and is easy to use. Works with each IDE which supports Build-Events.BurakoNET: Burako OnLineDelNet: Tool created in C # to automate the development process. With this tool, you do not need to spend time with the creation of classes of connection, data access layer, view or update data with Windows Forms. Everything is here, ready.Droid Builder: Droid Builder is a WPF based application that can help Android ROM maker to create customized ROM. The basic functions of Droid Build are: 1. Deodex 2. Packaging 3. Signing In future, Droid Builder will provide more function on Android ROM customization.eatables: Eatables 2011Food information system: Food information system makes it easier for content providers to deliver information about food producers, contents and information about quality, recipes, allergy and other considerations to the consumers. ForceHttps: Very simple httpModule to ensure that all http requests are https. 1. Drop the dll into your website's bin directory 2. Update httpModules section of your web.config to reference the dll 3. Make sure your Web Server can handle SSL requests for the URL ALL http requests will be redirected as https.Frolic Game Engine for .NET: The Frolic game engine is an attempt at creating a retained style game engine that is agnostic to the rendering technology, allowing for easier inplementations of rendering engines for various graphics layers.Import VCF to MS Outlook: simple tools to import many *.vcf files to default contact in MS OutlookLe blog à TiK: Le blog à TiK is a blog engine written for the Windows Azure platformLearning CPP: Learning CPP.MinConsole: A minimilastic console application with configurable logging and debuggingMineSweeperChallenge: C# programming exercise. Simulates a given number of minesweeper games using a given ISweeper implementation (or use the step by step mode to study how the ISweeper implementations work). Create your own implementation and see how it compares to other implementations.MvcCss: MvcCss brings the convention-based simplicity of MVC to CSS.Natur Map Editor: Natur allows you to easily create and manage maps in 3D using the Xna framework. As you build your map, you can enter simulation mode at any time; this will activate bounds, physics, and any game logic such as animations and triggers.OneGames Open Framework: Projeto open-source da OneGames de desenvolvimento de jogos 2D com a engine SpriteCraft(tm).QQFindX: QQFindXsadd.practical.approach: SADD workshop is a project testorage to demonstrate practical approach of sadd while Lessons Learned site was developing. ?????? ???? ?????????? ??????????? ??? ?? ???????? ?????????? SADD ??? ???????? ????? "????????? ??????", ??????????? ????????? ??????????????? ?????? ?????.Seesmic Feed Reader: A simple plugin for Seesmic Desktop 2 that allows you to read syndication feeds (Rss 2.0 and Atom 1.0) inside Seesmic.Simple Farseer physics platform: Sample described on rabidlion.comSIN Generator: Social Insurance Number (SIN) Generator is a utility designed to generate SIN numbers to testing application software. It uses SIN generation algorithm provided on fakenamegenerator.com It is developed in C# and is meant for application testing ONLY!UsingLib: A library of automatically removed utilities: 1.Changing cursor to hourglass in Windows Forms 2.Logging It's developed in C#WP7Contrib: WP7 Contrib is a set of components to help build WP7 Apps. It can be plugged into MVVM Light or used as separate components in your App. Our goal is to provide a set of tools and patterns that help WP7 developers.

Read the article
How to simulate inner join on very large files in java (without running out of memory)

- by Constantin

I am trying to simulate SQL joins using java and very large text files (INNER, RIGHT OUTER and LEFT OUTER). The files have already been sorted using an external sort routine. The issue I have is I am trying to find the most efficient way to deal with the INNER join part of the algorithm. Right now I am using two Lists to store the lines that have the same key and iterate through the set of lines in the right file once for every line in the left file (provided the keys still match). In other words, the join key is not unique in each file so would need to account for the Cartesian product situations ... left_01, 1 left_02, 1 right_01, 1 right_02, 1 right_03, 1 left_01 joins to right_01 using key 1 left_01 joins to right_02 using key 1 left_01 joins to right_03 using key 1 left_02 joins to right_01 using key 1 left_02 joins to right_02 using key 1 left_02 joins to right_03 using key 1 My concern is one of memory. I will run out of memory if i use the approach below but still want the inner join part to work fairly quickly. What is the best approach to deal with the INNER join part keeping in mind that these files may potentially be huge public class Joiner { private void join(BufferedReader left, BufferedReader right, BufferedWriter output) throws Throwable { BufferedReader _left = left; BufferedReader _right = right; BufferedWriter _output = output; Record _leftRecord; Record _rightRecord; _leftRecord = read(_left); _rightRecord = read(_right); while( _leftRecord != null && _rightRecord != null ) { if( _leftRecord.getKey() < _rightRecord.getKey() ) { write(_output, _leftRecord, null); _leftRecord = read(_left); } else if( _leftRecord.getKey() > _rightRecord.getKey() ) { write(_output, null, _rightRecord); _rightRecord = read(_right); } else { List<Record> leftList = new ArrayList<Record>(); List<Record> rightList = new ArrayList<Record>(); _leftRecord = readRecords(leftList, _leftRecord, _left); _rightRecord = readRecords(rightList, _rightRecord, _right); for( Record equalKeyLeftRecord : leftList ){ for( Record equalKeyRightRecord : rightList ){ write(_output, equalKeyLeftRecord, equalKeyRightRecord); } } } } if( _leftRecord != null ) { write(_output, _leftRecord, null); _leftRecord = read(_left); while(_leftRecord != null) { write(_output, _leftRecord, null); _leftRecord = read(_left); } } else { if( _rightRecord != null ) { write(_output, null, _rightRecord); _rightRecord = read(_right); while(_rightRecord != null) { write(_output, null, _rightRecord); _rightRecord = read(_right); } } } _left.close(); _right.close(); _output.flush(); _output.close(); } private Record read(BufferedReader reader) throws Throwable { Record record = null; String data = reader.readLine(); if( data != null ) { record = new Record(data.split("\t")); } return record; } private Record readRecords(List<Record> list, Record record, BufferedReader reader) throws Throwable { int key = record.getKey(); list.add(record); record = read(reader); while( record != null && record.getKey() == key) { list.add(record); record = read(reader); } return record; } private void write(BufferedWriter writer, Record left, Record right) throws Throwable { String leftKey = (left == null ? "null" : Integer.toString(left.getKey())); String leftData = (left == null ? "null" : left.getData()); String rightKey = (right == null ? "null" : Integer.toString(right.getKey())); String rightData = (right == null ? "null" : right.getData()); writer.write("[" + leftKey + "][" + leftData + "][" + rightKey + "][" + rightData + "]\n"); } public static void main(String[] args) { try { BufferedReader leftReader = new BufferedReader(new FileReader("LEFT.DAT")); BufferedReader rightReader = new BufferedReader(new FileReader("RIGHT.DAT")); BufferedWriter output = new BufferedWriter(new FileWriter("OUTPUT.DAT")); Joiner joiner = new Joiner(); joiner.join(leftReader, rightReader, output); } catch (Throwable e) { e.printStackTrace(); } } } After applying the ideas from the proposed answer, I changed the loop to this private void join(RandomAccessFile left, RandomAccessFile right, BufferedWriter output) throws Throwable { long _pointer = 0; RandomAccessFile _left = left; RandomAccessFile _right = right; BufferedWriter _output = output; Record _leftRecord; Record _rightRecord; _leftRecord = read(_left); _rightRecord = read(_right); while( _leftRecord != null && _rightRecord != null ) { if( _leftRecord.getKey() < _rightRecord.getKey() ) { write(_output, _leftRecord, null); _leftRecord = read(_left); } else if( _leftRecord.getKey() > _rightRecord.getKey() ) { write(_output, null, _rightRecord); _pointer = _right.getFilePointer(); _rightRecord = read(_right); } else { long _tempPointer = 0; int key = _leftRecord.getKey(); while( _leftRecord != null && _leftRecord.getKey() == key ) { _right.seek(_pointer); _rightRecord = read(_right); while( _rightRecord != null && _rightRecord.getKey() == key ) { write(_output, _leftRecord, _rightRecord ); _tempPointer = _right.getFilePointer(); _rightRecord = read(_right); } _leftRecord = read(_left); } _pointer = _tempPointer; } } if( _leftRecord != null ) { write(_output, _leftRecord, null); _leftRecord = read(_left); while(_leftRecord != null) { write(_output, _leftRecord, null); _leftRecord = read(_left); } } else { if( _rightRecord != null ) { write(_output, null, _rightRecord); _rightRecord = read(_right); while(_rightRecord != null) { write(_output, null, _rightRecord); _rightRecord = read(_right); } } } _left.close(); _right.close(); _output.flush(); _output.close(); } UPDATE While this approach worked, it was terribly slow and so I have modified this to create files as buffers and this works very well. Here is the update ... private long getMaxBufferedLines(File file) throws Throwable { long freeBytes = Runtime.getRuntime().freeMemory() / 2; return (freeBytes / (file.length() / getLineCount(file))); } private void join(File left, File right, File output, JoinType joinType) throws Throwable { BufferedReader leftFile = new BufferedReader(new FileReader(left)); BufferedReader rightFile = new BufferedReader(new FileReader(right)); BufferedWriter outputFile = new BufferedWriter(new FileWriter(output)); long maxBufferedLines = getMaxBufferedLines(right); Record leftRecord; Record rightRecord; leftRecord = read(leftFile); rightRecord = read(rightFile); while( leftRecord != null && rightRecord != null ) { if( leftRecord.getKey().compareTo(rightRecord.getKey()) < 0) { if( joinType == JoinType.LeftOuterJoin || joinType == JoinType.LeftExclusiveJoin || joinType == JoinType.FullExclusiveJoin || joinType == JoinType.FullOuterJoin ) { write(outputFile, leftRecord, null); } leftRecord = read(leftFile); } else if( leftRecord.getKey().compareTo(rightRecord.getKey()) > 0 ) { if( joinType == JoinType.RightOuterJoin || joinType == JoinType.RightExclusiveJoin || joinType == JoinType.FullExclusiveJoin || joinType == JoinType.FullOuterJoin ) { write(outputFile, null, rightRecord); } rightRecord = read(rightFile); } else if( leftRecord.getKey().compareTo(rightRecord.getKey()) == 0 ) { String key = leftRecord.getKey(); List<File> rightRecordFileList = new ArrayList<File>(); List<Record> rightRecordList = new ArrayList<Record>(); rightRecordList.add(rightRecord); rightRecord = consume(key, rightFile, rightRecordList, rightRecordFileList, maxBufferedLines); while( leftRecord != null && leftRecord.getKey().compareTo(key) == 0 ) { processRightRecords(outputFile, leftRecord, rightRecordFileList, rightRecordList, joinType); leftRecord = read(leftFile); } // need a dispose for deleting files in list } else { throw new Exception("DATA IS NOT SORTED"); } } if( leftRecord != null ) { if( joinType == JoinType.LeftOuterJoin || joinType == JoinType.LeftExclusiveJoin || joinType == JoinType.FullExclusiveJoin || joinType == JoinType.FullOuterJoin ) { write(outputFile, leftRecord, null); } leftRecord = read(leftFile); while(leftRecord != null) { if( joinType == JoinType.LeftOuterJoin || joinType == JoinType.LeftExclusiveJoin || joinType == JoinType.FullExclusiveJoin || joinType == JoinType.FullOuterJoin ) { write(outputFile, leftRecord, null); } leftRecord = read(leftFile); } } else { if( rightRecord != null ) { if( joinType == JoinType.RightOuterJoin || joinType == JoinType.RightExclusiveJoin || joinType == JoinType.FullExclusiveJoin || joinType == JoinType.FullOuterJoin ) { write(outputFile, null, rightRecord); } rightRecord = read(rightFile); while(rightRecord != null) { if( joinType == JoinType.RightOuterJoin || joinType == JoinType.RightExclusiveJoin || joinType == JoinType.FullExclusiveJoin || joinType == JoinType.FullOuterJoin ) { write(outputFile, null, rightRecord); } rightRecord = read(rightFile); } } } leftFile.close(); rightFile.close(); outputFile.flush(); outputFile.close(); } public void processRightRecords(BufferedWriter outputFile, Record leftRecord, List<File> rightFiles, List<Record> rightRecords, JoinType joinType) throws Throwable { for(File rightFile : rightFiles) { BufferedReader rightReader = new BufferedReader(new FileReader(rightFile)); Record rightRecord = read(rightReader); while(rightRecord != null){ if( joinType == JoinType.LeftOuterJoin || joinType == JoinType.RightOuterJoin || joinType == JoinType.FullOuterJoin || joinType == JoinType.InnerJoin ) { write(outputFile, leftRecord, rightRecord); } rightRecord = read(rightReader); } rightReader.close(); } for(Record rightRecord : rightRecords) { if( joinType == JoinType.LeftOuterJoin || joinType == JoinType.RightOuterJoin || joinType == JoinType.FullOuterJoin || joinType == JoinType.InnerJoin ) { write(outputFile, leftRecord, rightRecord); } } } /** * consume all records having key (either to a single list or multiple files) each file will * store a buffer full of data. The right record returned represents the outside flow (key is * already positioned to next one or null) so we can't use this record in below while loop or * within this block in general when comparing current key. The trick is to keep consuming * from a List. When it becomes empty, re-fill it from the next file until all files have * been consumed (and the last node in the list is read). The next outside iteration will be * ready to be processed (either it will be null or it points to the next biggest key * @throws Throwable * */ private Record consume(String key, BufferedReader reader, List<Record> records, List<File> files, long bufferMaxRecordLines ) throws Throwable { boolean processComplete = false; Record record = records.get(records.size() - 1); while(!processComplete){ long recordCount = records.size(); if( record.getKey().compareTo(key) == 0 ){ record = read(reader); while( record != null && record.getKey().compareTo(key) == 0 && recordCount < bufferMaxRecordLines ) { records.add(record); recordCount++; record = read(reader); } } processComplete = true; // if record is null, we are done if( record != null ) { // if the key has changed, we are done if( record.getKey().compareTo(key) == 0 ) { // Same key means we have exhausted the buffer. // Dump entire buffer into a file. The list of file // pointers will keep track of the files ... processComplete = false; dumpBufferToFile(records, files); records.clear(); records.add(record); } } } return record; } /** * Dump all records in List of Record objects to a file. Then, add that * file to List of File objects * * NEED TO PLACE A LIMIT ON NUMBER OF FILE POINTERS (check size of file list) * * @param records * @param files * @throws Throwable */ private void dumpBufferToFile(List<Record> records, List<File> files) throws Throwable { String prefix = "joiner_" + files.size() + 1; String suffix = ".dat"; File file = File.createTempFile(prefix, suffix, new File("cache")); BufferedWriter writer = new BufferedWriter(new FileWriter(file)); for( Record record : records ) { writer.write( record.dump() ); } files.add(file); writer.flush(); writer.close(); }

Read the article
CodePlex Daily Summary for Tuesday, November 27, 2012

CodePlex Daily Summary for Tuesday, November 27, 2012Popular ReleasesCodeGen Code Generator: CodeGen 4.2.6: IMPORTANT: If you are using CodeGen in conjunction with Symphony Framework then it is important that you do not upgrade to this version of CodeGen until you also upgrade to Symphony Framework V2.1.0.0. Changes in this release include: The CodeGen installation on Windows now supports upgrading from a previously installed version. We use Windows Installers "major upgrade" mechanism, which essentially performs an automatic uninstall of a previous version before installing the new version. The ea...SimpleRest Integrated Pipeline: Beta: Beta version of the integrated REST pipeline.Kooboo CMS: Kooboo CMS 3.3.0: New features: Dropdown/Radio/Checkbox Lists no longer references the userkey. Instead they refer to the UUID field for input value. You can now delete, export, import content from database in the site settings. Labels can now be imported and exported. You can now set the required password strength and maximum number of incorrect login attempts. Child sites can inherit plugins from its parent sites. The view parameter can be changed through the page_context.current value. Addition of c...Facebook Windows 8 Sample: Facebook Windows 8 Sample: The current drop holds two versions of the sample: A basic version that uses a Facebook application to list the content of facebook page. A full version including the use of Bing Maps sdk for positioning the restaurant in a map, and showing how to get there. See Developing a Windows Store App to learn how to use the Bing Maps AJAX Control to add Bing Maps to your Windows Store app.Microsoft Ajax Minifier: Microsoft Ajax Minifier 4.75: Fix over-aggressive removal of local-variable assignments in return statements. Can't remove them if there are any inner-scope references. add settings properties/switches for V3 source map source root value, and a flag to indicate whether to add an XSSI-busting header to the map file.Distributed Publish/Subscribe (Pub/Sub) Event System: Distributed Pub Sub Event System Version 3.0: Important Wsp 3.0 is NOT backward compatible with Wsp 2.1. Prerequisites You need to install the Microsoft Visual C++ 2010 Redistributable Package. You can find it at: x64 http://www.microsoft.com/download/en/details.aspx?id=14632x86 http://www.microsoft.com/download/en/details.aspx?id=5555 Wsp now uses Rx (Reactive Extensions) and .Net 4.0 3.0 Enhancements I changed the topology from a hierarchy to peer-to-peer groups. This should provide much greater scalability and more fault-resi...sb0t v.5: sb0t 500 alpha 5: Keep those bug reports coming. :)Redmine Reports: Redmine Reports V 1.0.7: added new sample report added new DateRange feature for report generation (see issue Tracker ID 15372) updated to latest MySql (6.6.4.0)datajs - JavaScript Library for data-centric web applications: datajs version 1.1.0: datajs is a cross-browser and UI agnostic JavaScript library that enables data-centric web applications with the following features: OData client that enables CRUD operations including batching and metadata support using both ATOM and JSON payloads. Single store abstraction that provides a common API on top of HTML5 local storage technologies. Data cache component that allows reading data ranges from a collection and storing them locally to reduce the number of network requests. Changes...Team Foundation Server Administration Tool: 2.2: TFS Administration Tool 2.2 supports the Team Foundation Server 2012 Object Model. Visual Studio 2012 or Team Explorer 2012 must be installed before you can install this tool. You can download and install Team Explorer 2012 from http://aka.ms/TeamExplorer2012. There are no functional changes between the previous release (2.1) and this release.Coding Guidelines for C# 3.0, C# 4.0 and C# 5.0: Coding Guidelines for CSharp 3.0, 4.0 and 5.0: See Change History for a detailed list of modifications.Math.NET Numerics: Math.NET Numerics v2.3.0: Portable Library Build: Adds support for WP8 (.Net 4.0 and higher, SL5, WP8 and .NET for Windows Store apps) New: portable build also for F# extensions (.Net 4.5, SL5 and .NET for Windows Store apps) NuGet: portable builds are now included in the main packages, no more need for special portable packages Linear Algebra: Continued major storage rework, in this release focusing on vectors (previous release was on matrices) Thin QR decomposition (in addition to existing full QR) Static Cr...ExtJS based ASP.NET 2.0 Controls: FineUI v3.2.1: +2012-11-25 v3.2.1 +????????。 -MenuCheckBox?CheckedChanged??????，??????????。 -???????window.IDS??????????????。 -?????（??TabCollection，ControlBaseCollection）???，????????????????。 +Grid??。 -??SelectAllRows??。 -??PageItems??，?????????????，?????、??、?????。 -????grid/gridpageitems.aspx、grid/gridpageitemsrowexpander.aspx、grid/gridpageitems_pagesize.aspx。 -???????????????????。 -??ExpandAllRowExpanders??，?????????????????（grid/gridrowexpanderexpandall2.aspx）。 -??????ExpandRowExpande...VidCoder: 1.4.9 Beta: Updated HandBrake core to SVN 5079. Fixed crashes when encoding DVDs with title gaps.ZXing.Net: ZXing.Net 0.10.0.0: On the way to a release 1.0 the API should be stable now with this version. sync with rev. 2521 of the java version windows phone 8 assemblies improvements and fixesBlackJumboDog: Ver5.7.3: 2012.11.24 Ver5.7.3 (1)SMTP???????、?????????、??????????????????????? (2)?????????、?????????????????????????? (3)DNS???????CNAME????CNAME????????????????? (4)DNS????????????TTL???????? (5)???????????????????????、?????????????????? (6)???????????????????????????????Liberty: v3.4.3.0 Release 23rd November 2012: Change Log -Added -H4 A dialog which gives further instructions when attempting to open a "Halo 4 Data" file -H4 Added a short note to the weapon editor stating that dropping your weapons will cap their ammo -Reach Edit the world's gravity -Reach Fine invincibility controls in the object editor -Reach Edit object velocity -Reach Change the teams of AI bipeds and vehicles -Reach Enable/disable fall damage on the biped editor screen -Reach Make AIs deaf and/or blind in the objec...Umbraco CMS: Umbraco 4.11.0: NugetNuGet BlogRead the release blog post for 4.11.0. Whats new50 bugfixes (see the issue tracker for a complete list) Read the documentation for the MVC bits. Breaking changesGetPropertyValue now returns an object, not a string (only affects upgrades from 4.10.x to 4.11.0) NoteIf you need Courier use the release candidate (as of build 26). The code editor has been greatly improved, but is sometimes problematic in Internet Explorer 9 and lower. Previously it was just disabled for IE and...Audio Pitch & Shift: Audio Pitch And Shift 5.1.0.3: Fixed supported files list on open dialog (added .pls and .m3u) Impulse Media Player splash message (can be disabled anyway)WiX Toolset: WiX v3.7 RC: WiX v3.7 RC (3.7.1119.0) provides feature complete Bundle update and reference tracking plus several bug fixes. For more information see Rob's blog post about the release: http://robmensching.com/blog/posts/2012/11/20/WiX-v3.7-Release-Candidate-availableNew Projects20121126: ??????SVN????。AHMobe: Testing deployments to AppHarborArborium: A versatile, tree-based data structure to store or exchange data and metadata efficiently (in binary format). Written in pure C#.Ballenato: Mobile app salidas.Blend Assets Manager (Blasm): Blend Assets Manager (Blasm) is a simple application to allow users of Expression Blend adding custom and third-party controls to Blend Assets tab.Cornell Class Explorer: Cornell University Courses of Study Windows 8 AppCustom Cursor for Metro APP XAML Based: It's a porting program from Nielsen for win 8 Metro Style app XAML based. original: http://www.sharpgis.net/post/2011/05/09/Custom-Cursors-in-Silverlight.aspxDataAnalyzer: Application for analyzing protocols and other binary dataDebugWriterTextBox: This is modified TextBox which can catch up Debug.Write() and display log. Also it can write log data to file - all you need is to set up file name!Dewin: Solution th? nghi?m cho chuong trình Dewin, dùng d? th?c hi?n co b?n v? hu?ng d?i tu?ngEducação no Trânsito: Sistema para Educação para o Trânsito – Um Ambiente para o Aprendizado tem como principal função gerir conteúdos de educação no trânsitoEjemplos alabra: Ejemplos para el blog http://www.alejandrolabra.comEnterprise MVC Music Store: The Enterprise Music Store takes the MvcMusicStore sample from ASP.Net and adds Dependency Injection, Unit Tests and a more maintainable architecture. ERC - Easy Redirect Converter: Tool to convert websites using .htaccess to maintain redirects on sites that do not support it. Great for moving from Apache to IIS Facebook Windows 8 Sample: This sample shows a way to work with Facebook APIs by using the Open Graph API in a Windows Store App. Gruyas: Plataforma educativa online Os like.G's Syndication Pocket: G's Syndication Pocket is simple RSS Aggregate application. This is suitable for .NET Compact Framework. I checked it on Sharp's W-ZERO3.HTML to OpenXML (PHP Script): H2OXML : HTML to OpenXML Converter is a simple PHP script which take HTML code and transform it into OpenXML Code. (for Docx) Indexr: Indexr is an open source project, where I am trying to share how to build a web application with Strong Architecture, Manageable, High Performance, EffectivelyjQuery Expandable Menu for SharePoint 2010: Packaged as a site collection feature, this component transforms the SharePoint 2010 standard navigation menu into an expandable-collapsible menu, using jQuery.K-Vizinhos: K-VizinhosLa Ranisima: La Ranisima is an open source "Space Invaders" alike game totally written in DHTML (JavaScript, CSS and HTML) that uses keyboard. This cross-platform and cross-browser game was tested under BeOS, Linux, NetBSD, OpenBSD, FreeBSD, Windows and others.lambdaCommandBuilderData: ???????????????????，????????????，???????。 ??????test??，?????????。LeyRay: Quick view and merge Doc and Pdf filesMessegeBox RightToLeft Lib: This is really simple lib project for use RTL in MessegeBox class. This just for short code and default option for RTL.MineFlagger: MineFlagger is a mine clearing game modeled after Microsoft’s Minesweeper. In addition to standard play, MineFlagger incorporates an AI for fun and training.MineSweeperChallenge: C# programming exercise. Simulates a given number of minesweeper games using a given ISweeper implementation (or use the step by step mode to study how the ISweeper implementations work). Create your own implementation and see how it compares to other implementations.MVVM Light Plus: Multiplatform MVVM Framework.Nethouse MVP: Nethouse MVP is a framework incorporating MVP base presenters, views and interfaces. Neznayka: Static Ruleset for VS2010 Database Edition. Noctl Library: Noctl is a C# library which contains tools to improve production time. It supports .net 4.5, Windows Phone 8 and Windows Store applications.Oridea.Data.Fetching: Oridea.Data.Fetching is a class library that consists of a few wrappers over the LINQ's IQueryable interface narrowing its scope to fetching, ordering, and paging operations. It is designed for use in the implementations of the Repository pattern.p301: Old project.PluggEd: To be continuedProject13241127: papaProject13271127: papareading control for windows phone: reading control for windows phoneRSUtility: RS Utility is a C# application that interacts with a SQL Server Reporting Services (SSRS) web service to manage items on a report server.sadd.practical.approach: SADD workshop is a project testorage to demonstrate practical approach of sadd while Lessons Learned site was developing. ?????? ???? ?????????? ??????????? ??? ?? ???????? ?????????? SADD ??? ???????? ????? "????????? ??????", ??????????? ????????? ??????????????? ?????? ?????.SimpleRest Integrated Pipeline: Simple light weight integrated extensible REST pipeline that developers can use to very quickly and reliably create REST services. Influenced by WebApi 0..6.0.0Substitute - Variable Substitution Utility for Config Management: Variable Substitution Utility for Configuration Management (FinalBuilder etc)Synq Placement Management Console: A client-service system for dynamic application deployment which integrates directly into active directory. This is the management console.Task Manager Student Project: Task manager or tmanager is a student ASP.NET MVC 4.0 project for Software Engineering classes. It is a web site, where you can planning your tasks.Tempus Fugate: YAFA for tracking music ratings and statistics. End state of these tools, utilities, plug-ins, and services will be to allow merging of statistic data between the various services and repositories in which users store their musical preferences. Examples of musical preferences include rating information and play history. Examples of services include last.fm, Facebook, iLike, Windows Media Player, and iTunes. tysyjsj: afaprojectWASM: Simple ASP.MVC windows azure storage manager with SQL Azure query window.Weather Slice: Use the German weather sevice Wetter.com for a weather forecast gadget Wettervorhersage von Wetter.comwwtfly: 111YBOT_Field_Control_2013: YBOT 2013 Field Control Software. Used to control the Youth BOT game field. Score the games and track field actions. The field currently uses 1-wire.??_iwtfly: ??_iwtfly

Read the article
Selling Federal Enterprise Architecture (EA)

- by TedMcLaughlan

Selling Federal Enterprise Architecture A taxonomy of subject areas, from which to develop a prioritized marketing and communications plan to evangelize EA activities within and among US Federal Government organizations and constituents. Any and all feedback is appreciated, particularly in developing and extending this discussion as a tool for use – more information and details are also available. "Selling" the discipline of Enterprise Architecture (EA) in the Federal Government (particularly in non-DoD agencies) is difficult, notwithstanding the general availability and use of the Federal Enterprise Architecture Framework (FEAF) for some time now, and the relatively mature use of the reference models in the OMB Capital Planning and Investment (CPIC) cycles. EA in the Federal Government also tends to be a very esoteric and hard to decipher conversation – early apologies to those who agree to continue reading this somewhat lengthy article. Alignment to the FEAF and OMB compliance mandates is long underway across the Federal Departments and Agencies (and visible via tools like PortfolioStat and ITDashboard.gov – but there is still a gap between the top-down compliance directives and enablement programs, and the bottom-up awareness and effective use of EA for either IT investment management or actual mission effectiveness. "EA isn't getting deep enough penetration into programs, components, sub-agencies, etc.", verified a panelist at the most recent EA Government Conference in DC. Newer guidance from OMB may be especially difficult to handle, where bottom-up input can't be accurately aligned, analyzed and reported via standardized EA discipline at the Agency level – for example in addressing the new (for FY13) Exhibit 53D "Agency IT Reductions and Reinvestments" and the information required for "Cloud Computing Alternatives Evaluation" (supporting the new Exhibit 53C, "Agency Cloud Computing Portfolio"). Therefore, EA must be "sold" directly to the communities that matter, from a coordinated, proactive messaging perspective that takes BOTH the Program-level value drivers AND the broader Agency mission and IT maturity context into consideration. Selling EA means persuading others to take additional time and possibly assign additional resources, for a mix of direct and indirect benefits – many of which aren't likely to be realized in the short-term. This means there's probably little current, allocated budget to work with; ergo the challenge of trying to sell an "unfunded mandate". Also, the concept of "Enterprise" in large Departments like Homeland Security tends to cross all kinds of organizational boundaries – as Richard Spires recently indicated by commenting that "...organizational boundaries still trump functional similarities. Most people understand what we're trying to do internally, and at a high level they get it. The problem, of course, is when you get down to them and their system and the fact that you're going to be touching them...there's always that fear factor," Spires said. It is quite clear to the Federal IT Investment community that for EA to meet its objective, understandable, relevant value must be measured and reported using a repeatable method – as described by GAO's recent report "Enterprise Architecture Value Needs To Be Measured and Reported". What's not clear is the method or guidance to sell this value. In fact, the current GAO "Framework for Assessing and Improving Enterprise Architecture Management (Version 2.0)", a.k.a. the "EAMMF", does not include words like "sell", "persuade", "market", etc., except in reference ("within Core Element 19: Organization business owner and CXO representatives are actively engaged in architecture development") to a brief section in the CIO Council's 2001 "Practical Guide to Federal Enterprise Architecture", entitled "3.3.1. Develop an EA Marketing Strategy and Communications Plan." Furthermore, Core Element 19 of the EAMMF is advised to be applied in "Stage 3: Developing Initial EA Versions". This kind of EA sales campaign truly should start much earlier in the maturity progress, i.e. in Stages 0 or 1. So, what are the understandable, relevant benefits (or value) to sell, that can find an agreeable, participatory audience, and can pave the way towards success of a longer-term, funded set of EA mechanisms that can be methodically measured and reported? Pragmatic benefits from a useful EA that can help overcome the fear of change? And how should they be sold? Following is a brief taxonomy (it's a taxonomy, to help organize SME support) of benefit-related subjects that might make the most sense, in creating the messages and organizing an initial "engagement plan" for evangelizing EA "from within". An EA "Sales Taxonomy" of sorts. We're not boiling the ocean here; the subjects that are included are ones that currently appear to be urgently relevant to the current Federal IT Investment landscape. Note that successful dialogue in these topics is directly usable as input or guidance for actually developing early-stage, "Fit-for-Purpose" (a DoDAF term) Enterprise Architecture artifacts, as prescribed by common methods found in most EA methodologies, including FEAF, TOGAF, DoDAF and our own Oracle Enterprise Architecture Framework (OEAF). The taxonomy below is organized by (1) Target Community, (2) Benefit or Value, and (3) EA Program Facet - as in: "Let's talk to (1: Community Member) about how and why (3: EA Facet) the EA program can help with (2: Benefit/Value)". Once the initial discussion targets and subjects are approved (that can be measured and reported), a "marketing and communications plan" can be created. A working example follows the Taxonomy. Enterprise Architecture Sales Taxonomy Draft, Summary Version 1. Community 1.1. Budgeted Programs or Portfolios Communities of Purpose (CoPR) 1.1.1. Program/System Owners (Senior Execs) Creating or Executing Acquisition Plans 1.1.2. Program/System Owners Facing Strategic Change 1.1.2.1. Mandated 1.1.2.2. Expected/Anticipated 1.1.3. Program Managers - Creating Employee Performance Plans 1.1.4. CO/COTRs – Creating Contractor Performance Plans, or evaluating Value Engineering Change Proposals (VECP) 1.2. Governance & Communications Communities of Practice (CoP) 1.2.1. Policy Owners 1.2.1.1. OCFO 1.2.1.1.1. Budget/Procurement Office 1.2.1.1.2. Strategic Planning 1.2.1.2. OCIO 1.2.1.2.1. IT Management 1.2.1.2.2. IT Operations 1.2.1.2.3. Information Assurance (Cyber Security) 1.2.1.2.4. IT Innovation 1.2.1.3. Information-Sharing/ Process Collaboration (i.e. policies and procedures regarding Partners, Agreements) 1.2.2. Governing IT Council/SME Peers (i.e. an "Architects Council") 1.2.2.1. Enterprise Architects (assumes others exist; also assumes EA participants aren't buried solely within the CIO shop) 1.2.2.2. Domain, Enclave, Segment Architects – i.e. the right affinity group for a "shared services" EA structure (per the EAMMF), which may be classified as Federated, Segmented, Service-Oriented, or Extended 1.2.2.3. External Oversight/Constraints 1.2.2.3.1. GAO/OIG & Legal 1.2.2.3.2. Industry Standards 1.2.2.3.3. Official public notification, response 1.2.3. Mission Constituents Participant & Analyst Community of Interest (CoI) 1.2.3.1. Mission Operators/Users 1.2.3.2. Public Constituents 1.2.3.3. Industry Advisory Groups, Stakeholders 1.2.3.4. Media 2. Benefit/Value (Note the actual benefits may not be discretely attributable to EA alone; EA is a very collaborative, cross-cutting discipline.) 2.1. Program Costs – EA enables sound decisions regarding... 2.1.1. Cost Avoidance – a TCO theme 2.1.2. Sequencing – alignment of capability delivery 2.1.3. Budget Instability – a Federal reality 2.2. Investment Capital – EA illuminates new investment resources via... 2.2.1. Value Engineering – contractor-driven cost savings on existing budgets, direct or collateral 2.2.2. Reuse – reuse of investments between programs can result in savings, chargeback models; avoiding duplication 2.2.3. License Refactoring – IT license & support models may not reflect actual or intended usage 2.3. Contextual Knowledge – EA enables informed decisions by revealing... 2.3.1. Common Operating Picture (COP) – i.e. cross-program impacts and synergy, relative to context 2.3.2. Expertise & Skill – who truly should be involved in architectural decisions, both business and IT 2.3.3. Influence – the impact of politics and relationships can be examined 2.3.4. Disruptive Technologies – new technologies may reduce costs or mitigate risk in unanticipated ways 2.3.5. What-If Scenarios – can become much more refined, current, verifiable; basis for Target Architectures 2.4. Mission Performance – EA enables beneficial decision results regarding... 2.4.1. IT Performance and Optimization – towards 100% effective, available resource utilization 2.4.2. IT Stability – towards 100%, real-time uptime 2.4.3. Agility – responding to rapid changes in mission 2.4.4. Outcomes –measures of mission success, KPIs – vs. only "Outputs" 2.4.5. Constraints – appropriate response to constraints 2.4.6. Personnel Performance – better line-of-sight through performance plans to mission outcome 2.5. Mission Risk Mitigation – EA mitigates decision risks in terms of... 2.5.1. Compliance – all the right boxes are checked 2.5.2. Dependencies –cross-agency, segment, government 2.5.3. Transparency – risks, impact and resource utilization are illuminated quickly, comprehensively 2.5.4. Threats and Vulnerabilities – current, realistic awareness and profiles 2.5.5. Consequences – realization of risk can be mapped as a series of consequences, from earlier decisions or new decisions required for current issues 2.5.5.1. Unanticipated – illuminating signals of future or non-symmetric risk; helping to "future-proof" 2.5.5.2. Anticipated – discovering the level of impact that matters 3. EA Program Facet (What parts of the EA can and should be communicated, using business or mission terms?) 3.1. Architecture Models – the visual tools to be created and used 3.1.1. Operating Architecture – the Business Operating Model/Architecture elements of the EA truly drive all other elements, plus expose communication channels 3.1.2. Use Of – how can the EA models be used, and how are they populated, from a reasonable, pragmatic yet compliant perspective? What are the core/minimal models required? What's the relationship of these models, with existing system models? 3.1.3. Scope – what level of granularity within the models, and what level of abstraction across the models, is likely to be most effective and useful? 3.2. Traceability – the maturity, status, completeness of the tools 3.2.1. Status – what in fact is the degree of maturity across the integrated EA model and other relevant governance models, and who may already be benefiting from it? 3.2.2. Visibility – how does the EA visibly and effectively prove IT investment performance goals are being reached, with positive mission outcome? 3.3. Governance – what's the interaction, participation method; how are the tools used? 3.3.1. Contributions – how is the EA program informed, accept submissions, collect data? Who are the experts? 3.3.2. Review – how is the EA validated, against what criteria? Taxonomy Usage Example: 1. To speak with: a. ...a particular set of System Owners Facing Strategic Change, via mandate (like the "Cloud First" mandate); about... b. ...how the EA program's visible and easily accessible Infrastructure Reference Model (i.e. "IRM" or "TRM"), if updated more completely with current system data, can... c. ...help shed light on ways to mitigate risks and avoid future costs associated with NOT leveraging potentially-available shared services across the enterprise... 2. ....the following Marketing & Communications (Sales) Plan can be constructed: a. Create an easy-to-read "Consequence Model" that illustrates how adoption of a cloud capability (like elastic operational storage) can enable rapid and durable compliance with the mandate – using EA traceability. Traceability might be from the IRM to the ARM (that identifies reusable services invoking the elastic storage), and then to the PRM with performance measures (such as % utilization of purchased storage allocation) included in the OMB Exhibits; and b. Schedule a meeting with the Program Owners, timed during their Acquisition Strategy meetings in response to the mandate, to use the "Consequence Model" for advising them to organize a rapid and relevant RFI solicitation for this cloud capability (regarding alternatives for sourcing elastic operational storage); and c. Schedule a series of short "Discovery" meetings with the system architecture leads (as agreed by the Program Owners), to further populate/validate the "As-Is" models and frame the "To Be" models (via scenarios), to better inform the RFI, obtain the best feedback from the vendor community, and provide potential value for and avoid impact to all other programs and systems. --end example -- Note that communications with the intended audience should take a page out of the standard "Search Engine Optimization" (SEO) playbook, using keywords and phrases relating to "value" and "outcome" vs. "compliance" and "output". Searches in email boxes, internal and external search engines for phrases like "cost avoidance strategies", "mission performance metrics" and "innovation funding" should yield messages and content from the EA team. This targeted, informed, practical sales approach should result in additional buy-in and participation, additional EA information contribution and model validation, development of more SMEs and quick "proof points" (with real-life testing) to bolster the case for EA. The proof point here is a successful, timely procurement that satisfies not only the external mandate and external oversight review, but also meets internal EA compliance/conformance goals and therefore is more transparently useful across the community. In short, if sold effectively, the EA will perform and be recognized. EA won’t therefore be used only for compliance, but also (according to a validated, stated purpose) to directly influence decisions and outcomes. The opinions, views and analysis expressed in this document are those of the author and do not necessarily reflect the views of Oracle.

Read the article
How to install SQL Server 2005 Configuration Manager without installing SQL Server Management Studio

- by Arnold Zokas

Hi, I need to configure SQL Server aliases on a public-facing production server. To do that, I need to install SQL Server Configuration Manager. I was not able to find a standalone installer for that, so I am having to install SQL Server 2005 Client Components. This approach is not ideal as we don't want to have SSMS on an public-facing production server. Is there a way to install SQL Server 2005 Configuration Manager without installing SQL Server Management Studio? Thanks, Arnold

Read the article
ERR_INCOMPLETE_CHUNKED_ENCODING apache 2.4

- by Bujanca Mihai

I upgraded my Ubuntu server to 14.04 and Apache 2.4.7. Now my images don't load and console yields net::ERR_INCOMPLETE_CHUNKED_ENCODING. Also, I can sometimes see some of the images load for a little while (1 sec max) and then they disappear. .htaccess RewriteEngine On # Serve the favicon file from img folder RewriteCond %{REQUEST_URI} ^/favicon.ico$ RewriteRule ^(.*)$ /img/$1 [NC,L] # Redirect HTTP traffic to WWW subdomain RewriteCond %{HTTPS} off [NC] RewriteCond %{HTTP_HOST} !^www\. [NC] RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L] # Redirect HTTPS traffic to WWW subdomain RewriteCond %{HTTPS} on [NC] RewriteCond %{HTTP_HOST} !^www\. [NC] RewriteRule ^(.*)$ https://www.%{HTTP_HOST}/$1 [R=301,L] # Auto Versioning rules RewriteCond %{REQUEST_FILENAME} !-s RewriteRule ^(.*)\.[\d]+\.(css|js)$ $1.$2 [L] # Default Zend rewrite rules RewriteCond %{REQUEST_FILENAME} -s [OR] RewriteCond %{REQUEST_FILENAME} -l [OR] RewriteCond %{REQUEST_FILENAME} -d RewriteRule ^.*$ - [NC,L] RewriteRule ^.*$ index.php [NC,L] VHost <VirtualHost *:80> ServerAdmin admin@localhost ServerName localhost DocumentRoot /home/mihai/ARTD/www/public/website # Omit this in production environment SetEnv APPLICATION_ENV local <Directory /home/mihai/ARTD/www/public/website > Options Indexes FollowSymLinks MultiViews AllowOverride All #Order deny,allow #Allow from all Require all granted </Directory> <IfModule mod_php5.c> php_value memory_limit 128M php_value upload_max_filesize 20M php_value post_max_size 20M </IfModule> ErrorLog /var/log/apache2/ARTD-error.log # Possible values include: debug, info, notice, warn, error, crit, # alert, emerg. LogLevel warn CustomLog /var/log/apache2/ARTD-access.log combined </VirtualHost> <IfModule mod_ssl.c> <VirtualHost *:443> ServerAdmin admin@localhost ServerName localhost DocumentRoot /home/mihai/ARTD/www/public/website # Omit this in production environment SetEnv APPLICATION_ENV local <Directory /home/mihai/ARTD/www/public/website > Options Indexes FollowSymLinks MultiViews AllowOverride All #Order deny,allow #Allow from all Require all granted </Directory> <IfModule mod_php5.c> php_value memory_limit 128M php_value upload_max_filesize 20M php_value post_max_size 20M </IfModule> ErrorLog /var/log/apache2/ARTD-ssl-error.log # Possible values include: debug, info, notice, warn, error, crit, # alert, emerg. LogLevel warn CustomLog /var/log/apache2/ARTD.log combined # SSL Engine Switch: # Enable/Disable SSL for this virtual host. SSLEngine on # A self-signed (snakeoil) certificate can be created by installing # the ssl-cert package. See # /usr/share/doc/apache2.2-common/README.Debian.gz for more info. # If both key and certificate are stored in the same file, only the # SSLCertificateFile directive is needed. SSLCertificateFile /etc/ssl/certs/ssl-cert-snakeoil.pem SSLCertificateKeyFile /etc/ssl/private/ssl-cert-snakeoil.key # Server Certificate Chain: # Point SSLCertificateChainFile at a file containing the # concatenation of PEM encoded CA certificates which form the # certificate chain for the server certificate. Alternatively # the referenced file can be the same as SSLCertificateFile # when the CA certificates are directly appended to the server # certificate for convinience. #SSLCertificateChainFile /etc/apache2/ssl.crt/server-ca.crt # Certificate Authority (CA): # Set the CA certificate verification path where to find CA # certificates for client authentication or alternatively one # huge file containing all of them (file must be PEM encoded) # Note: Inside SSLCACertificatePath you need hash symlinks # to point to the certificate files. Use the provided # Makefile to update the hash symlinks after changes. #SSLCACertificatePath /etc/ssl/certs/ #SSLCACertificateFile /etc/apache2/ssl.crt/ca-bundle.crt # Certificate Revocation Lists (CRL): # Set the CA revocation path where to find CA CRLs for client # authentication or alternatively one huge file containing all # of them (file must be PEM encoded) # Note: Inside SSLCARevocationPath you need hash symlinks # to point to the certificate files. Use the provided # Makefile to update the hash symlinks after changes. #SSLCARevocationPath /etc/apache2/ssl.crl/ #SSLCARevocationFile /etc/apache2/ssl.crl/ca-bundle.crl # Client Authentication (Type): # Client certificate verification type and depth. Types are # none, optional, require and optional_no_ca. Depth is a # number which specifies how deeply to verify the certificate # issuer chain before deciding the certificate is not valid. #SSLVerifyClient require #SSLVerifyDepth 10 # Access Control: # With SSLRequire you can do per-directory access control based # on arbitrary complex boolean expressions containing server # variable checks and other lookup directives. The syntax is a # mixture between C and Perl. See the mod_ssl documentation # for more details. #<Location /> #SSLRequire ( %{SSL_CIPHER} !~ m/^(EXP|NULL)/ \ # and %{SSL_CLIENT_S_DN_O} eq "Snake Oil, Ltd." \ # and %{SSL_CLIENT_S_DN_OU} in {"Staff", "CA", "Dev"} \ # and %{TIME_WDAY} >= 1 and %{TIME_WDAY} <= 5 \ # and %{TIME_HOUR} >= 8 and %{TIME_HOUR} <= 20 ) \ # or %{REMOTE_ADDR} =~ m/^192\.76\.162\.[0-9]+$/ #</Location> # SSL Engine Options: # Set various options for the SSL engine. # o FakeBasicAuth: # Translate the client X.509 into a Basic Authorisation. This means that # the standard Auth/DBMAuth methods can be used for access control. The # user name is the `one line' version of the client's X.509 certificate. # Note that no password is obtained from the user. Every entry in the user # file needs this password: `xxj31ZMTZzkVA'. # o ExportCertData: # This exports two additional environment variables: SSL_CLIENT_CERT and # SSL_SERVER_CERT. These contain the PEM-encoded certificates of the # server (always existing) and the client (only existing when client # authentication is used). This can be used to import the certificates # into CGI scripts. # o StdEnvVars: # This exports the standard SSL/TLS related `SSL_*' environment variables. # Per default this exportation is switched off for performance reasons, # because the extraction step is an expensive operation and is usually # useless for serving static content. So one usually enables the # exportation for CGI and SSI requests only. # o StrictRequire: # This denies access when "SSLRequireSSL" or "SSLRequire" applied even # under a "Satisfy any" situation, i.e. when it applies access is denied # and no other module can change it. # o OptRenegotiate: # This enables optimized SSL connection renegotiation handling when SSL # directives are used in per-directory context. #SSLOptions +FakeBasicAuth +ExportCertData +StrictRequire #<FilesMatch "\.(cgi|shtml|phtml|php)$"> # SSLOptions +StdEnvVars #</FilesMatch> # SSL Protocol Adjustments: # The safe and default but still SSL/TLS standard compliant shutdown # approach is that mod_ssl sends the close notify alert but doesn't wait for # the close notify alert from client. When you need a different shutdown # approach you can use one of the following variables: # o ssl-unclean-shutdown: # This forces an unclean shutdown when the connection is closed, i.e. no # SSL close notify alert is send or allowed to received. This violates # the SSL/TLS standard but is needed for some brain-dead browsers. Use # this when you receive I/O errors because of the standard approach where # mod_ssl sends the close notify alert. # o ssl-accurate-shutdown: # This forces an accurate shutdown when the connection is closed, i.e. a # SSL close notify alert is send and mod_ssl waits for the close notify # alert of the client. This is 100% SSL/TLS standard compliant, but in # practice often causes hanging connections with brain-dead browsers. Use # this only for browsers where you know that their SSL implementation # works correctly. # Notice: Most problems of broken clients are also related to the HTTP # keep-alive facility, so you usually additionally want to disable # keep-alive for those clients, too. Use variable "nokeepalive" for this. # Similarly, one has to force some clients to use HTTP/1.0 to workaround # their broken HTTP/1.1 implementation. Use variables "downgrade-1.0" and # "force-response-1.0" for this. #BrowserMatch ".*MSIE.*" \ # nokeepalive ssl-unclean-shutdown \ # downgrade-1.0 force-response-1.0 </VirtualHost> </IfModule> logs Apache/2.4.7 (Ubuntu) PHP/5.5.9-1ubuntu4.3 OpenSSL/1.0.1f (internal dummy connection) 127.0.0.1 - - [25/Aug/2014:13:09:53 +0300] "GET /img/header/top-nav-separator.png HTTP/1.1" 200 462 "https://localhost/art" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.132 Safari/537.36"

Read the article
Where can I find the yum repos files for centos 4.7 i386?

- by Peter Kahn

Does anyone know where I can find the i386 centos 4.7 repos files that would normally sit in /etc/yum.repos.d. Alas, it looks like someone copied the 5.5 edition over to a 4.7 system. I can setup a new VM, install 4.7 and extract the files from that system (but I was hoping for a faster approach. Please let me know if you know where these files live on the net. I'm off to RPMfind to see what I can locate. Thanks Peter

Read the article
Logging Virus Definition Updates for MS Security Essentials in The Security Event Log

- by Steve

I would like to log a security in event in Windows 7 whenever the Microsoft Security Essentials 2 virus definition files are updates, deleted, or changed. I was expecting to do this with an Audit setting on one of the MS Security Essentials folders but I wasn't sure which one and how to avoid getting swamped with messages. What folder or files should I audit to track definition updates (or corruption) in the security events or is there a better approach?

Read the article
Installing Windows 2003 over remote DRAC 4

- by Delli

I'm trying to install windows 2003 over a DRAC 4 console session by using a ISO file. But it keeps freezing during the GUI installation then when login back Windows Setup says some files not found and setup won't continue. What's the best approach to install Windows 2003 over a DRAC 4 console session?

Read the article
How stable is Cherokee Web Server?

- by KRTac

I just installed Cherokee and gave it a try. I'm pretty impressed with it. The configuration of the server is certainly a new approach and I must say that I generaly like it (surprisingly). Do you have any experience with it? Is it reliable?

Read the article

Search Results

Search found 10556 results on 423 pages for 'practical approach'.

Page 117/423 | < Previous Page | 113 114 115 116 117 118 119 120 121 122 123 124 | Next Page >

- by PulpFiction

- by Bojin Li

- by jasterm007

- by user193655

- by Newbie

- by thr

- by whytheq

- by user157195

- by Rick Strahl

- by Rick Strahl

- by mortezam

- by terje

- by Ralf Westphal

- by user9154181

- by Bill Graziano

- by Constantin

- by TedMcLaughlan

- by Arnold Zokas

- by Bujanca Mihai

- by Peter Kahn

- by Steve

- by Delli

- by KRTac

< Previous Page | 113 114 115 116 117 118 119 120 121 122 123 124 | Next Page >