letters - Page 48 - Developer IT

Understanding Request Validation in ASP.NET MVC 3

- by imran_ku07

Introduction: A fact that you must always remember "never ever trust user inputs". An application that trusts user inputs may be easily vulnerable to XSS, XSRF, SQL Injection, etc attacks. XSS and XSRF are very dangerous attacks. So to mitigate these attacks ASP.NET introduced request validation in ASP.NET 1.1. During request validation, ASP.NET will throw HttpRequestValidationException: 'A potentially dangerous XXX value was detected from the client', if he found, < followed by an exclamation(like <!) or < followed by the letters a through z(like <s) or & followed by a pound sign(like &#123) as a part of query string, posted form and cookie collection. In ASP.NET 4.0, request validation becomes extensible. This means that you can extend request validation. Also in ASP.NET 4.0, by default request validation is enabled before the BeginRequest phase of an HTTP request. ASP.NET MVC 3 moves one step further by making request validation granular. This allows you to disable request validation for some properties of a model while maintaining request validation for all other cases. In this article I will show you the use of request validation in ASP.NET MVC 3. Then I will briefly explain the internal working of granular request validation. Description: First of all create a new ASP.NET MVC 3 application. Then create a simple model class called MyModel, public class MyModel { public string Prop1 { get; set; } public string Prop2 { get; set; } } Then just update the index action method as follows, public ActionResult Index(MyModel p) { return View(); } Now just run this application. You will find that everything works just fine. Now just append this query string ?Prop1=<s to the url of this application, you will get the HttpRequestValidationException exception. Now just decorate the Index action method with [ValidateInputAttribute(false)], [ValidateInput(false)] public ActionResult Index(MyModel p) { return View(); } Run this application again with same query string. You will find that your application run without any unhandled exception. Up to now, there is nothing new in ASP.NET MVC 3 because ValidateInputAttribute was present in the previous versions of ASP.NET MVC. Any problem with this approach? Yes there is a problem with this approach. The problem is that now users can send html for both Prop1 and Prop2 properties and a lot of developers are not aware of it. This means that now everyone can send html with both parameters(e.g, ?Prop1=<s&Prop2=<s). So ValidateInput attribute does not gives you the guarantee that your application is safe to XSS or XSRF. This is the reason why ASP.NET MVC team introduced granular request validation in ASP.NET MVC 3. Let's see this feature. Remove [ValidateInputAttribute(false)] on Index action and update MyModel class as follows, public class MyModel { [AllowHtml] public string Prop1 { get; set; } public string Prop2 { get; set; } } Note that AllowHtml attribute is only decorated on Prop1 property. Run this application again with ?Prop1=<s query string. You will find that your application run just fine. Run this application again with ?Prop1=<s&Prop2=<s query string, you will get HttpRequestValidationException exception. This shows that the granular request validation in ASP.NET MVC 3 only allows users to send html for properties decorated with AllowHtml attribute. Sometimes you may need to access Request.QueryString or Request.Form directly. You may change your code as follows, [ValidateInput(false)] public ActionResult Index() { var prop1 = Request.QueryString["Prop1"]; return View(); } Run this application again, you will get the HttpRequestValidationException exception again even you have [ValidateInput(false)] on your Index action. The reason is that Request flags are still not set to unvalidate. I will explain this later. For making this work you need to use Unvalidated extension method, public ActionResult Index() { var q = Request.Unvalidated().QueryString; var prop1 = q["Prop1"]; return View(); } Unvalidated extension method is defined in System.Web.Helpers namespace . So you need to add using System.Web.Helpers; in this class file. Run this application again, your application run just fine. There you have it. If you are not curious to know the internal working of granular request validation then you can skip next paragraphs completely. If you are interested then carry on reading. Create a new ASP.NET MVC 2 application, then open global.asax.cs file and the following lines, protected void Application_BeginRequest() { var q = Request.QueryString; } Then make the Index action method as, [ValidateInput(false)] public ActionResult Index(string id) { return View(); } Please note that the Index action method contains a parameter and this action method is decorated with [ValidateInput(false)]. Run this application again, but now with ?id=<s query string, you will get HttpRequestValidationException exception at Application_BeginRequest method. Now just add the following entry in web.config, <httpRuntime requestValidationMode="2.0"/> Now run this application again. This time your application will run just fine. Now just see the following quote from ASP.NET 4 Breaking Changes, In ASP.NET 4, by default, request validation is enabled for all requests, because it is enabled before the BeginRequest phase of an HTTP request. As a result, request validation applies to requests for all ASP.NET resources, not just .aspx page requests. This includes requests such as Web service calls and custom HTTP handlers. Request validation is also active when custom HTTP modules are reading the contents of an HTTP request. This clearly state that request validation is enabled before the BeginRequest phase of an HTTP request. For understanding what does enabled means here, we need to see HttpRequest.ValidateInput, HttpRequest.QueryString and HttpRequest.Form methods/properties in System.Web assembly. Here is the implementation of HttpRequest.ValidateInput, HttpRequest.QueryString and HttpRequest.Form methods/properties in System.Web assembly, public NameValueCollection Form { get { if (this._form == null) { this._form = new HttpValueCollection(); if (this._wr != null) { this.FillInFormCollection(); } this._form.MakeReadOnly(); } if (this._flags[2]) { this._flags.Clear(2); this.ValidateNameValueCollection(this._form, RequestValidationSource.Form); } return this._form; } } public NameValueCollection QueryString { get { if (this._queryString == null) { this._queryString = new HttpValueCollection(); if (this._wr != null) { this.FillInQueryStringCollection(); } this._queryString.MakeReadOnly(); } if (this._flags[1]) { this._flags.Clear(1); this.ValidateNameValueCollection(this._queryString, RequestValidationSource.QueryString); } return this._queryString; } } public void ValidateInput() { if (!this._flags[0x8000]) { this._flags.Set(0x8000); this._flags.Set(1); this._flags.Set(2); this._flags.Set(4); this._flags.Set(0x40); this._flags.Set(0x80); this._flags.Set(0x100); this._flags.Set(0x200); this._flags.Set(8); } } The above code indicates that HttpRequest.QueryString and HttpRequest.Form will only validate the querystring and form collection if certain flags are set. These flags are automatically set if you call HttpRequest.ValidateInput method. Now run the above application again(don't forget to append ?id=<s query string in the url) with the same settings(i.e, requestValidationMode="2.0" setting in web.config and Application_BeginRequest method in global.asax.cs), your application will run just fine. Now just update the Application_BeginRequest method as, protected void Application_BeginRequest() { Request.ValidateInput(); var q = Request.QueryString; } Note that I am calling Request.ValidateInput method prior to use Request.QueryString property. ValidateInput method will internally set certain flags(discussed above). These flags will then tells the Request.QueryString (and Request.Form) property that validate the query string(or form) when user call Request.QueryString(or Request.Form) property. So running this application again with ?id=<s query string will throw HttpRequestValidationException exception. Now I hope it is clear to you that what does requestValidationMode do. It just tells the ASP.NET that not invoke the Request.ValidateInput method internally before the BeginRequest phase of an HTTP request if requestValidationMode is set to a value less than 4.0 in web.config. Here is the implementation of HttpRequest.ValidateInputIfRequiredByConfig method which will prove this statement(Don't be confused with HttpRequest and Request. Request is the property of HttpRequest class), internal void ValidateInputIfRequiredByConfig() { ............................................................... ............................................................... ............................................................... ............................................................... if (httpRuntime.RequestValidationMode >= VersionUtil.Framework40) { this.ValidateInput(); } } Hopefully the above discussion will clear you how requestValidationMode works in ASP.NET 4. It is also interesting to note that both HttpRequest.QueryString and HttpRequest.Form only throws the exception when you access them first time. Any subsequent access to HttpRequest.QueryString and HttpRequest.Form will not throw any exception. Continuing with the above example, just update Application_BeginRequest method in global.asax.cs file as, protected void Application_BeginRequest() { try { var q = Request.QueryString; var f = Request.Form; } catch//swallow this exception { } var q1 = Request.QueryString; var f1 = Request.Form; } Without setting requestValidationMode to 2.0 and without decorating ValidateInput attribute on Index action, your application will work just fine because both HttpRequest.QueryString and HttpRequest.Form will clear their flags after reading HttpRequest.QueryString and HttpRequest.Form for the first time(see the implementation of HttpRequest.QueryString and HttpRequest.Form above). Now let's see ASP.NET MVC 3 granular request validation internal working. First of all we need to see type of HttpRequest.QueryString and HttpRequest.Form properties. Both HttpRequest.QueryString and HttpRequest.Form properties are of type NameValueCollection which is inherited from the NameObjectCollectionBase class. NameObjectCollectionBase class contains _entriesArray, _entriesTable, NameObjectEntry.Key and NameObjectEntry.Value fields which granular request validation uses internally. In addition granular request validation also uses _queryString, _form and _flags fields, ValidateString method and the Indexer of HttpRequest class. Let's see when and how granular request validation uses these fields. Create a new ASP.NET MVC 3 application. Then put a breakpoint at Application_BeginRequest method and another breakpoint at HomeController.Index method. Now just run this application. When the break point inside Application_BeginRequest method hits then add the following expression in quick watch window, System.Web.HttpContext.Current.Request.QueryString. You will see the following screen, Now Press F5 so that the second breakpoint inside HomeController.Index method hits. When the second breakpoint hits then add the following expression in quick watch window again, System.Web.HttpContext.Current.Request.QueryString. You will see the following screen, First screen shows that _entriesTable field is of type System.Collections.Hashtable and _entriesArray field is of type System.Collections.ArrayList during the BeginRequest phase of the HTTP request. While the second screen shows that _entriesTable type is changed to Microsoft.Web.Infrastructure.DynamicValidationHelper.LazilyValidatingHashtable and _entriesArray type is changed to Microsoft.Web.Infrastructure.DynamicValidationHelper.LazilyValidatingArrayList during executing the Index action method. In addition to these members, ASP.NET MVC 3 also perform some operation on _flags, _form, _queryString and other members of HttpRuntime class internally. This shows that ASP.NET MVC 3 performing some operation on the members of HttpRequest class for making granular request validation possible. Both LazilyValidatingArrayList and LazilyValidatingHashtable classes are defined in the Microsoft.Web.Infrastructure assembly. You may wonder why their name starts with Lazily. The fact is that now with ASP.NET MVC 3, request validation will be performed lazily. In simple words, Microsoft.Web.Infrastructure assembly is now taking the responsibility for request validation from System.Web assembly. See the below screens. The first screen depicting HttpRequestValidationException exception in ASP.NET MVC 2 application while the second screen showing HttpRequestValidationException exception in ASP.NET MVC 3 application. In MVC 2: In MVC 3: The stack trace of the second screenshot shows that Microsoft.Web.Infrastructure assembly (instead of System.Web assembly) is now performing request validation in ASP.NET MVC 3. Now you may ask: where Microsoft.Web.Infrastructure assembly is performing some operation on the members of HttpRequest class. There are at least two places where the Microsoft.Web.Infrastructure assembly performing some operation , Microsoft.Web.Infrastructure.DynamicValidationHelper.GranularValidationReflectionUtil.GetInstance method and Microsoft.Web.Infrastructure.DynamicValidationHelper.ValidationUtility.CollectionReplacer.ReplaceCollection method, Here is the implementation of these methods, private static GranularValidationReflectionUtil GetInstance() { try { if (DynamicValidationShimReflectionUtil.Instance != null) { return null; } GranularValidationReflectionUtil util = new GranularValidationReflectionUtil(); Type containingType = typeof(NameObjectCollectionBase); string fieldName = "_entriesArray"; bool isStatic = false; Type fieldType = typeof(ArrayList); FieldInfo fieldInfo = CommonReflectionUtil.FindField(containingType, fieldName, isStatic, fieldType); util._del_get_NameObjectCollectionBase_entriesArray = MakeFieldGetterFunc<NameObjectCollectionBase, ArrayList>(fieldInfo); util._del_set_NameObjectCollectionBase_entriesArray = MakeFieldSetterFunc<NameObjectCollectionBase, ArrayList>(fieldInfo); Type type6 = typeof(NameObjectCollectionBase); string str2 = "_entriesTable"; bool flag2 = false; Type type7 = typeof(Hashtable); FieldInfo info2 = CommonReflectionUtil.FindField(type6, str2, flag2, type7); util._del_get_NameObjectCollectionBase_entriesTable = MakeFieldGetterFunc<NameObjectCollectionBase, Hashtable>(info2); util._del_set_NameObjectCollectionBase_entriesTable = MakeFieldSetterFunc<NameObjectCollectionBase, Hashtable>(info2); Type targetType = CommonAssemblies.System.GetType("System.Collections.Specialized.NameObjectCollectionBase+NameObjectEntry"); Type type8 = targetType; string str3 = "Key"; bool flag3 = false; Type type9 = typeof(string); FieldInfo info3 = CommonReflectionUtil.FindField(type8, str3, flag3, type9); util._del_get_NameObjectEntry_Key = MakeFieldGetterFunc<string>(targetType, info3); Type type10 = targetType; string str4 = "Value"; bool flag4 = false; Type type11 = typeof(object); FieldInfo info4 = CommonReflectionUtil.FindField(type10, str4, flag4, type11); util._del_get_NameObjectEntry_Value = MakeFieldGetterFunc<object>(targetType, info4); util._del_set_NameObjectEntry_Value = MakeFieldSetterFunc(targetType, info4); Type type12 = typeof(HttpRequest); string methodName = "ValidateString"; bool flag5 = false; Type[] argumentTypes = new Type[] { typeof(string), typeof(string), typeof(RequestValidationSource) }; Type returnType = typeof(void); MethodInfo methodInfo = CommonReflectionUtil.FindMethod(type12, methodName, flag5, argumentTypes, returnType); util._del_validateStringCallback = CommonReflectionUtil.MakeFastCreateDelegate<HttpRequest, ValidateStringCallback>(methodInfo); Type type = CommonAssemblies.SystemWeb.GetType("System.Web.HttpValueCollection"); util._del_HttpValueCollection_ctor = CommonReflectionUtil.MakeFastNewObject<Func<NameValueCollection>>(type); Type type14 = typeof(HttpRequest); string str6 = "_form"; bool flag6 = false; Type type15 = type; FieldInfo info6 = CommonReflectionUtil.FindField(type14, str6, flag6, type15); util._del_get_HttpRequest_form = MakeFieldGetterFunc<HttpRequest, NameValueCollection>(info6); util._del_set_HttpRequest_form = MakeFieldSetterFunc(typeof(HttpRequest), info6); Type type16 = typeof(HttpRequest); string str7 = "_queryString"; bool flag7 = false; Type type17 = type; FieldInfo info7 = CommonReflectionUtil.FindField(type16, str7, flag7, type17); util._del_get_HttpRequest_queryString = MakeFieldGetterFunc<HttpRequest, NameValueCollection>(info7); util._del_set_HttpRequest_queryString = MakeFieldSetterFunc(typeof(HttpRequest), info7); Type type3 = CommonAssemblies.SystemWeb.GetType("System.Web.Util.SimpleBitVector32"); Type type18 = typeof(HttpRequest); string str8 = "_flags"; bool flag8 = false; Type type19 = type3; FieldInfo flagsFieldInfo = CommonReflectionUtil.FindField(type18, str8, flag8, type19); Type type20 = type3; string str9 = "get_Item"; bool flag9 = false; Type[] typeArray4 = new Type[] { typeof(int) }; Type type21 = typeof(bool); MethodInfo itemGetter = CommonReflectionUtil.FindMethod(type20, str9, flag9, typeArray4, type21); Type type22 = type3; string str10 = "set_Item"; bool flag10 = false; Type[] typeArray6 = new Type[] { typeof(int), typeof(bool) }; Type type23 = typeof(void); MethodInfo itemSetter = CommonReflectionUtil.FindMethod(type22, str10, flag10, typeArray6, type23); MakeRequestValidationFlagsAccessors(flagsFieldInfo, itemGetter, itemSetter, out util._del_BitVector32_get_Item, out util._del_BitVector32_set_Item); return util; } catch { return null; } } private static void ReplaceCollection(HttpContext context, FieldAccessor<NameValueCollection> fieldAccessor, Func<NameValueCollection> propertyAccessor, Action<NameValueCollection> storeInUnvalidatedCollection, RequestValidationSource validationSource, ValidationSourceFlag validationSourceFlag) { NameValueCollection originalBackingCollection; ValidateStringCallback validateString; SimpleValidateStringCallback simpleValidateString; Func<NameValueCollection> getActualCollection; Action<NameValueCollection> makeCollectionLazy; HttpRequest request = context.Request; Func<bool> getValidationFlag = delegate { return _reflectionUtil.GetRequestValidationFlag(request, validationSourceFlag); }; Func<bool> func = delegate { return !getValidationFlag(); }; Action<bool> setValidationFlag = delegate (bool value) { _reflectionUtil.SetRequestValidationFlag(request, validationSourceFlag, value); }; if ((fieldAccessor.Value != null) && func()) { storeInUnvalidatedCollection(fieldAccessor.Value); } else { originalBackingCollection = fieldAccessor.Value; validateString = _reflectionUtil.MakeValidateStringCallback(context.Request); simpleValidateString = delegate (string value, string key) { if (((key == null) || !key.StartsWith("__", StringComparison.Ordinal)) && !string.IsNullOrEmpty(value)) { validateString(value, key, validationSource); } }; getActualCollection = delegate { fieldAccessor.Value = originalBackingCollection; bool flag = getValidationFlag(); setValidationFlag(false); NameValueCollection col = propertyAccessor(); setValidationFlag(flag); storeInUnvalidatedCollection(new NameValueCollection(col)); return col; }; makeCollectionLazy = delegate (NameValueCollection col) { simpleValidateString(col[null], null); LazilyValidatingArrayList array = new LazilyValidatingArrayList(_reflectionUtil.GetNameObjectCollectionEntriesArray(col), simpleValidateString); _reflectionUtil.SetNameObjectCollectionEntriesArray(col, array); LazilyValidatingHashtable table = new LazilyValidatingHashtable(_reflectionUtil.GetNameObjectCollectionEntriesTable(col), simpleValidateString); _reflectionUtil.SetNameObjectCollectionEntriesTable(col, table); }; Func<bool> hasValidationFired = func; Action disableValidation = delegate { setValidationFlag(false); }; Func<int> fillInActualFormContents = delegate { NameValueCollection values = getActualCollection(); makeCollectionLazy(values); return values.Count; }; DeferredCountArrayList list = new DeferredCountArrayList(hasValidationFired, disableValidation, fillInActualFormContents); NameValueCollection target = _reflectionUtil.NewHttpValueCollection(); _reflectionUtil.SetNameObjectCollectionEntriesArray(target, list); fieldAccessor.Value = target; } } Hopefully the above code will help you to understand the internal working of granular request validation. It is also important to note that Microsoft.Web.Infrastructure assembly invokes HttpRequest.ValidateInput method internally. For further understanding please see Microsoft.Web.Infrastructure assembly code. Finally you may ask: at which stage ASP NET MVC 3 will invoke these methods. You will find this answer by looking at the following method source, Unvalidated extension method for HttpRequest class defined in System.Web.Helpers.Validation class. System.Web.Mvc.MvcHandler.ProcessRequestInit method. System.Web.Mvc.ControllerActionInvoker.ValidateRequest method. System.Web.WebPages.WebPageHttpHandler.ProcessRequestInternal method. Summary: ASP.NET helps in preventing XSS attack using a feature called request validation. In this article, I showed you how you can use granular request validation in ASP.NET MVC 3. I explain you the internal working of granular request validation. Hope you will enjoy this article too. SyntaxHighlighter.all()

Read the article

Squid + Dans Guardian (simple configuration)

- by The Digital Ninja

I just built a new proxy server and compiled the latest versions of squid and dansguardian. We use basic authentication to select what users are allowed outside of our network. It seems squid is working just fine and accepts my username and password and lets me out. But if i connect to dans guardian, it prompts for username and password and then displays a message saying my username is not allowed to access the internet. Its pulling my username for the error message so i know it knows who i am. The part i get confused on is i thought that part was handled all by squid, and squid is working flawlessly. Can someone please double check my config files and tell me if i'm missing something or there is some new option i must set to get this to work. dansguardian.conf # Web Access Denied Reporting (does not affect logging) # # -1 = log, but do not block - Stealth mode # 0 = just say 'Access Denied' # 1 = report why but not what denied phrase # 2 = report fully # 3 = use HTML template file (accessdeniedaddress ignored) - recommended # reportinglevel = 3 # Language dir where languages are stored for internationalisation. # The HTML template within this dir is only used when reportinglevel # is set to 3. When used, DansGuardian will display the HTML file instead of # using the perl cgi script. This option is faster, cleaner # and easier to customise the access denied page. # The language file is used no matter what setting however. # languagedir = '/etc/dansguardian/languages' # language to use from languagedir. language = 'ukenglish' # Logging Settings # # 0 = none 1 = just denied 2 = all text based 3 = all requests loglevel = 3 # Log Exception Hits # Log if an exception (user, ip, URL, phrase) is matched and so # the page gets let through. Can be useful for diagnosing # why a site gets through the filter. on | off logexceptionhits = on # Log File Format # 1 = DansGuardian format 2 = CSV-style format # 3 = Squid Log File Format 4 = Tab delimited logfileformat = 1 # Log file location # # Defines the log directory and filename. #loglocation = '/var/log/dansguardian/access.log' # Network Settings # # the IP that DansGuardian listens on. If left blank DansGuardian will # listen on all IPs. That would include all NICs, loopback, modem, etc. # Normally you would have your firewall protecting this, but if you want # you can limit it to only 1 IP. Yes only one. filterip = # the port that DansGuardian listens to. filterport = 8080 # the ip of the proxy (default is the loopback - i.e. this server) proxyip = 127.0.0.1 # the port DansGuardian connects to proxy on proxyport = 3128 # accessdeniedaddress is the address of your web server to which the cgi # dansguardian reporting script was copied # Do NOT change from the default if you are not using the cgi. # accessdeniedaddress = 'http://YOURSERVER.YOURDOMAIN/cgi-bin/dansguardian.pl' # Non standard delimiter (only used with accessdeniedaddress) # Default is enabled but to go back to the original standard mode dissable it. nonstandarddelimiter = on # Banned image replacement # Images that are banned due to domain/url/etc reasons including those # in the adverts blacklists can be replaced by an image. This will, # for example, hide images from advert sites and remove broken image # icons from banned domains. # 0 = off # 1 = on (default) usecustombannedimage = 1 custombannedimagefile = '/etc/dansguardian/transparent1x1.gif' # Filter groups options # filtergroups sets the number of filter groups. A filter group is a set of content # filtering options you can apply to a group of users. The value must be 1 or more. # DansGuardian will automatically look for dansguardianfN.conf where N is the filter # group. To assign users to groups use the filtergroupslist option. All users default # to filter group 1. You must have some sort of authentication to be able to map users # to a group. The more filter groups the more copies of the lists will be in RAM so # use as few as possible. filtergroups = 1 filtergroupslist = '/etc/dansguardian/filtergroupslist' # Authentication files location bannediplist = '/etc/dansguardian/bannediplist' exceptioniplist = '/etc/dansguardian/exceptioniplist' banneduserlist = '/etc/dansguardian/banneduserlist' exceptionuserlist = '/etc/dansguardian/exceptionuserlist' # Show weighted phrases found # If enabled then the phrases found that made up the total which excedes # the naughtyness limit will be logged and, if the reporting level is # high enough, reported. on | off showweightedfound = on # Weighted phrase mode # There are 3 possible modes of operation: # 0 = off = do not use the weighted phrase feature. # 1 = on, normal = normal weighted phrase operation. # 2 = on, singular = each weighted phrase found only counts once on a page. # weightedphrasemode = 2 # Positive result caching for text URLs # Caches good pages so they don't need to be scanned again # 0 = off (recommended for ISPs with users with disimilar browsing) # 1000 = recommended for most users # 5000 = suggested max upper limit urlcachenumber = # # Age before they are stale and should be ignored in seconds # 0 = never # 900 = recommended = 15 mins urlcacheage = # Smart and Raw phrase content filtering options # Smart is where the multiple spaces and HTML are removed before phrase filtering # Raw is where the raw HTML including meta tags are phrase filtered # CPU usage can be effectively halved by using setting 0 or 1 # 0 = raw only # 1 = smart only # 2 = both (default) phrasefiltermode = 2 # Lower casing options # When a document is scanned the uppercase letters are converted to lower case # in order to compare them with the phrases. However this can break Big5 and # other 16-bit texts. If needed preserve the case. As of version 2.7.0 accented # characters are supported. # 0 = force lower case (default) # 1 = do not change case preservecase = 0 # Hex decoding options # When a document is scanned it can optionally convert %XX to chars. # If you find documents are getting past the phrase filtering due to encoding # then enable. However this can break Big5 and other 16-bit texts. # 0 = disabled (default) # 1 = enabled hexdecodecontent = 0 # Force Quick Search rather than DFA search algorithm # The current DFA implementation is not totally 16-bit character compatible # but is used by default as it handles large phrase lists much faster. # If you wish to use a large number of 16-bit character phrases then # enable this option. # 0 = off (default) # 1 = on (Big5 compatible) forcequicksearch = 0 # Reverse lookups for banned site and URLs. # If set to on, DansGuardian will look up the forward DNS for an IP URL # address and search for both in the banned site and URL lists. This would # prevent a user from simply entering the IP for a banned address. # It will reduce searching speed somewhat so unless you have a local caching # DNS server, leave it off and use the Blanket IP Block option in the # bannedsitelist file instead. reverseaddresslookups = off # Reverse lookups for banned and exception IP lists. # If set to on, DansGuardian will look up the forward DNS for the IP # of the connecting computer. This means you can put in hostnames in # the exceptioniplist and bannediplist. # It will reduce searching speed somewhat so unless you have a local DNS server, # leave it off. reverseclientiplookups = off # Build bannedsitelist and bannedurllist cache files. # This will compare the date stamp of the list file with the date stamp of # the cache file and will recreate as needed. # If a bsl or bul .processed file exists, then that will be used instead. # It will increase process start speed by 300%. On slow computers this will # be significant. Fast computers do not need this option. on | off createlistcachefiles = on # POST protection (web upload and forms) # does not block forms without any file upload, i.e. this is just for # blocking or limiting uploads # measured in kibibytes after MIME encoding and header bumph # use 0 for a complete block # use higher (e.g. 512 = 512Kbytes) for limiting # use -1 for no blocking #maxuploadsize = 512 #maxuploadsize = 0 maxuploadsize = -1 # Max content filter page size # Sometimes web servers label binary files as text which can be very # large which causes a huge drain on memory and cpu resources. # To counter this, you can limit the size of the document to be # filtered and get it to just pass it straight through. # This setting also applies to content regular expression modification. # The size is in Kibibytes - eg 2048 = 2Mb # use 0 for no limit maxcontentfiltersize = # Username identification methods (used in logging) # You can have as many methods as you want and not just one. The first one # will be used then if no username is found, the next will be used. # * proxyauth is for when basic proxy authentication is used (no good for # transparent proxying). # * ntlm is for when the proxy supports the MS NTLM authentication # protocol. (Only works with IE5.5 sp1 and later). **NOT IMPLEMENTED** # * ident is for when the others don't work. It will contact the computer # that the connection came from and try to connect to an identd server # and query it for the user owner of the connection. usernameidmethodproxyauth = on usernameidmethodntlm = off # **NOT IMPLEMENTED** usernameidmethodident = off # Preemptive banning - this means that if you have proxy auth enabled and a user accesses # a site banned by URL for example they will be denied straight away without a request # for their user and pass. This has the effect of requiring the user to visit a clean # site first before it knows who they are and thus maybe an admin user. # This is how DansGuardian has always worked but in some situations it is less than # ideal. So you can optionally disable it. Default is on. # As a side effect disabling this makes AD image replacement work better as the mime # type is know. preemptivebanning = on # Misc settings # if on it adds an X-Forwarded-For: <clientip> to the HTTP request # header. This may help solve some problem sites that need to know the # source ip. on | off forwardedfor = on # if on it uses the X-Forwarded-For: <clientip> to determine the client # IP. This is for when you have squid between the clients and DansGuardian. # Warning - headers are easily spoofed. on | off usexforwardedfor = off # if on it logs some debug info regarding fork()ing and accept()ing which # can usually be ignored. These are logged by syslog. It is safe to leave # it on or off logconnectionhandlingerrors = on # Fork pool options # sets the maximum number of processes to sporn to handle the incomming # connections. Max value usually 250 depending on OS. # On large sites you might want to try 180. maxchildren = 180 # sets the minimum number of processes to sporn to handle the incomming connections. # On large sites you might want to try 32. minchildren = 32 # sets the minimum number of processes to be kept ready to handle connections. # On large sites you might want to try 8. minsparechildren = 8 # sets the minimum number of processes to sporn when it runs out # On large sites you might want to try 10. preforkchildren = 10 # sets the maximum number of processes to have doing nothing. # When this many are spare it will cull some of them. # On large sites you might want to try 64. maxsparechildren = 64 # sets the maximum age of a child process before it croaks it. # This is the number of connections they handle before exiting. # On large sites you might want to try 10000. maxagechildren = 5000 # Process options # (Change these only if you really know what you are doing). # These options allow you to run multiple instances of DansGuardian on a single machine. # Remember to edit the log file path above also if that is your intention. # IPC filename # # Defines IPC server directory and filename used to communicate with the log process. ipcfilename = '/tmp/.dguardianipc' # URL list IPC filename # # Defines URL list IPC server directory and filename used to communicate with the URL # cache process. urlipcfilename = '/tmp/.dguardianurlipc' # PID filename # # Defines process id directory and filename. #pidfilename = '/var/run/dansguardian.pid' # Disable daemoning # If enabled the process will not fork into the background. # It is not usually advantageous to do this. # on|off ( defaults to off ) nodaemon = off # Disable logging process # on|off ( defaults to off ) nologger = off # Daemon runas user and group # This is the user that DansGuardian runs as. Normally the user/group nobody. # Uncomment to use. Defaults to the user set at compile time. # daemonuser = 'nobody' # daemongroup = 'nobody' # Soft restart # When on this disables the forced killing off all processes in the process group. # This is not to be confused with the -g run time option - they are not related. # on|off ( defaults to off ) softrestart = off maxcontentramcachescansize = 2000 maxcontentfilecachescansize = 20000 downloadmanager = '/etc/dansguardian/downloadmanagers/default.conf' authplugin = '/etc/dansguardian/authplugins/proxy-basic.conf' Squid.conf http_port 3128 hierarchy_stoplist cgi-bin ? acl QUERY urlpath_regex cgi-bin \? cache deny QUERY acl apache rep_header Server ^Apache #broken_vary_encoding allow apache access_log /squid/var/logs/access.log squid hosts_file /etc/hosts auth_param basic program /squid/libexec/ncsa_auth /squid/etc/userbasic.auth auth_param basic children 5 auth_param basic realm proxy auth_param basic credentialsttl 2 hours auth_param basic casesensitive off refresh_pattern ^ftp: 1440 20% 10080 refresh_pattern ^gopher: 1440 0% 1440 refresh_pattern . 0 20% 4320 acl NoAuthNec src <HIDDEN FOR SECURITY> acl BrkRm src <HIDDEN FOR SECURITY> acl Dials src <HIDDEN FOR SECURITY> acl Comps src <HIDDEN FOR SECURITY> acl whsws dstdom_regex -i .opensuse.org .novell.com .suse.com mirror.mcs.an1.gov mirrors.kernerl.org www.suse.de suse.mirrors.tds.net mirrros.usc.edu ftp.ale.org suse.cs.utah.edu mirrors.usc.edu mirror.usc.an1.gov linux.nssl.noaa.gov noaa.gov .kernel.org ftp.ale.org ftp.gwdg.de .medibuntu.org mirrors.xmission.com .canonical.com .ubuntu. acl opensites dstdom_regex -i .mbsbooks.com .bowker.com .usps.com .usps.gov .ups.com .fedex.com go.microsoft.com .microsoft.com .apple.com toolbar.msn.com .contacts.msn.com update.services.openoffice.org fms2.pointroll.speedera.net services.wmdrm.windowsmedia.com windowsupdate.com .adobe.com .symantec.com .vitalbook.com vxn1.datawire.net vxn.datawire.net download.lavasoft.de .download.lavasoft.com .lavasoft.com updates.ls-servers.com .canadapost. .myyellow.com minirick symantecliveupdate.com wm.overdrive.com www.overdrive.com productactivation.one.microsoft.com www.update.microsoft.com testdrive.whoson.com www.columbia.k12.mo.us banners.wunderground.com .kofax.com .gotomeeting.com tools.google.com .dl.google.com .cache.googlevideo.com .gpdl.google.com .clients.google.com cache.pack.google.com kh.google.com maps.google.com auth.keyhole.com .contacts.msn.com .hrblock.com .taxcut.com .merchantadvantage.com .jtv.com .malwarebytes.org www.google-analytics.com dcs.support.xerox.com .dhl.com .webtrendslive.com javadl-esd.sun.com javadl-alt.sun.com .excelsior.edu .dhlglobalmail.com .nessus.org .foxitsoftware.com foxit.vo.llnwd.net installshield.com .mindjet.com .mediascouter.com media.us.elsevierhealth.com .xplana.com .govtrack.us sa.tulsacc.edu .omniture.com fpdownload.macromedia.com webservices.amazon.com acl password proxy_auth REQUIRED acl all src all acl manager proto cache_object acl localhost src 127.0.0.1/255.255.255.255 acl to_localhost dst 127.0.0.0/8 acl SSL_ports port 443 563 631 2001 2005 8731 9001 9080 10000 acl Safe_ports port 80 # http acl Safe_ports port 21 # ftp acl Safe_ports port # https, snews 443 563 acl Safe_ports port 70 # gopher acl Safe_ports port 210 # wais acl Safe_ports port # unregistered ports 1936-65535 acl Safe_ports port 280 # http-mgmt acl Safe_ports port 488 # gss-http acl Safe_ports port 10000 acl Safe_ports port 631 acl Safe_ports port 901 # SWAT acl purge method PURGE acl CONNECT method CONNECT acl UTubeUsers proxy_auth "/squid/etc/utubeusers.list" acl RestrictUTube dstdom_regex -i youtube.com acl RestrictFacebook dstdom_regex -i facebook.com acl FacebookUsers proxy_auth "/squid/etc/facebookusers.list" acl BuemerKEC src 10.10.128.0/24 acl MBSsortnet src 10.10.128.0/26 acl MSNExplorer browser -i MSN acl Printers src <HIDDEN FOR SECURITY> acl SpecialFolks src <HIDDEN FOR SECURITY> # streaming download acl fails rep_mime_type ^.*mms.* acl fails rep_mime_type ^.*ms-hdr.* acl fails rep_mime_type ^.*x-fcs.* acl fails rep_mime_type ^.*x-ms-asf.* acl fails2 urlpath_regex dvrplayer mediastream mms:// acl fails2 urlpath_regex \.asf$ \.afx$ \.flv$ \.swf$ acl deny_rep_mime_flashvideo rep_mime_type -i video/flv acl deny_rep_mime_shockwave rep_mime_type -i ^application/x-shockwave-flash$ acl x-type req_mime_type -i ^application/octet-stream$ acl x-type req_mime_type -i application/octet-stream acl x-type req_mime_type -i ^application/x-mplayer2$ acl x-type req_mime_type -i application/x-mplayer2 acl x-type req_mime_type -i ^application/x-oleobject$ acl x-type req_mime_type -i application/x-oleobject acl x-type req_mime_type -i application/x-pncmd acl x-type req_mime_type -i ^video/x-ms-asf$ acl x-type2 rep_mime_type -i ^application/octet-stream$ acl x-type2 rep_mime_type -i application/octet-stream acl x-type2 rep_mime_type -i ^application/x-mplayer2$ acl x-type2 rep_mime_type -i application/x-mplayer2 acl x-type2 rep_mime_type -i ^application/x-oleobject$ acl x-type2 rep_mime_type -i application/x-oleobject acl x-type2 rep_mime_type -i application/x-pncmd acl x-type2 rep_mime_type -i ^video/x-ms-asf$ acl RestrictHulu dstdom_regex -i hulu.com acl broken dstdomain cms.montgomerycollege.edu events.columbiamochamber.com members.columbiamochamber.com public.genexusserver.com acl RestrictVimeo dstdom_regex -i vimeo.com acl http_port port 80 #http_reply_access deny deny_rep_mime_flashvideo #http_reply_access deny deny_rep_mime_shockwave #streaming files #http_access deny fails #http_reply_access deny fails #http_access deny fails2 #http_reply_access deny fails2 #http_access deny x-type #http_reply_access deny x-type #http_access deny x-type2 #http_reply_access deny x-type2 follow_x_forwarded_for allow localhost acl_uses_indirect_client on log_uses_indirect_client on http_access allow manager localhost http_access deny manager http_access allow purge localhost http_access deny purge http_access allow SpecialFolks http_access deny CONNECT !SSL_ports http_access allow whsws http_access allow opensites http_access deny BuemerKEC !MBSsortnet http_access deny BrkRm RestrictUTube RestrictFacebook RestrictVimeo http_access allow RestrictUTube UTubeUsers http_access deny RestrictUTube http_access allow RestrictFacebook FacebookUsers http_access deny RestrictFacebook http_access deny RestrictHulu http_access allow NoAuthNec http_access allow BrkRm http_access allow FacebookUsers RestrictVimeo http_access deny RestrictVimeo http_access allow Comps http_access allow Dials http_access allow Printers http_access allow password http_access deny !Safe_ports http_access deny SSL_ports !CONNECT http_access allow http_port http_access deny all http_reply_access allow all icp_access allow all access_log /squid/var/logs/access.log squid visible_hostname proxy.site.com forwarded_for off coredump_dir /squid/cache/ #header_access Accept-Encoding deny broken #acl snmppublic snmp_community mysecretcommunity #snmp_port 3401 #snmp_access allow snmppublic all cache_mem 3 GB #acl snmppublic snmp_community mbssquid #snmp_port 3401 #snmp_access allow snmppublic all

Read the article

Tip/Trick: Fix Common SEO Problems Using the URL Rewrite Extension

- by ScottGu

Search engine optimization (SEO) is important for any publically facing web-site. A large % of traffic to sites now comes directly from search engines, and improving your site’s search relevancy will lead to more users visiting your site from search engine queries. This can directly or indirectly increase the money you make through your site. This blog post covers how you can use the free Microsoft URL Rewrite Extension to fix a bunch of common SEO problems that your site might have. It takes less than 15 minutes (and no code changes) to apply 4 simple URL Rewrite rules to your site, and in doing so cause search engines to drive more visitors and traffic to your site. The techniques below work equally well with both ASP.NET Web Forms and ASP.NET MVC based sites. They also works with all versions of ASP.NET (and even work with non-ASP.NET content). [In addition to blogging, I am also now using Twitter for quick updates and to share links. Follow me at: twitter.com/scottgu] Measuring the SEO of your website with the Microsoft SEO Toolkit A few months ago I blogged about the free SEO Toolkit that we’ve shipped. This useful tool enables you to automatically crawl/scan your site for SEO correctness, and it then flags any SEO issues it finds. I highly recommend downloading and using the tool against any public site you work on. It makes it easy to spot SEO issues you might have in your site, and pinpoint ways to optimize it further. Below is a simple example of a report I ran against one of my sites (www.scottgu.com) prior to applying the URL Rewrite rules I’ll cover later in this blog post: Search Relevancy and URL Splitting Two of the important things that search engines evaluate when assessing your site’s “search relevancy” are: How many other sites link to your content. Search engines assume that if a lot of people around the web are linking to your content, then it is likely useful and so weight it higher in relevancy. The uniqueness of the content it finds on your site. If search engines find that the content is duplicated in multiple places around the Internet (or on multiple URLs on your site) then it is likely to drop the relevancy of the content. One of the things you want to be very careful to avoid when building public facing sites is to not allow different URLs to retrieve the same content within your site. Doing so will hurt with both of the situations above. In particular, allowing external sites to link to the same content with multiple URLs will cause your link-count and page-ranking to be split up across those different URLs (and so give you a smaller page rank than what it would otherwise be if it was just one URL). Not allowing external sites to link to you in different ways sounds easy in theory – but you might wonder what exactly this means in practice and how you avoid it. 4 Really Common SEO Problems Your Sites Might Have Below are 4 really common scenarios that can cause your site to inadvertently expose multiple URLs for the same content. When this happens external sites linking to yours will end up splitting their page links across multiple URLs - and as a result cause you to have a lower page ranking with search engines than you deserve. SEO Problem #1: Default Document IIS (and other web servers) supports the concept of a “default document”. This allows you to avoid having to explicitly specify the page you want to serve at either the root of the web-site/application, or within a sub-directory. This is convenient – but means that by default this content is available via two different publically exposed URLs (which is bad). For example: http://scottgu.com/ http://scottgu.com/default.aspx SEO Problem #2: Different URL Casings Web developers often don’t realize URLs are case sensitive to search engines on the web. This means that search engines will treat the following links as two completely different URLs: http://scottgu.com/Albums.aspx http://scottgu.com/albums.aspx SEO Problem #3: Trailing Slashes Consider the below two URLs – they might look the same at first, but they are subtly different. The trailing slash creates yet another situation that causes search engines to treat the URLs as different and so split search rankings: http://scottgu.com http://scottgu.com/ SEO Problem #4: Canonical Host Names Sometimes sites support scenarios where they support a web-site with both a leading “www” hostname prefix as well as just the hostname itself. This causes search engines to treat the URLs as different and split search rankling: http://scottgu.com/albums.aspx/ http://www.scottgu.com/albums.aspx/ How to Easily Fix these SEO Problems in 10 minutes (or less) using IIS Rewrite If you haven’t been careful when coding your sites, chances are you are suffering from one (or more) of the above SEO problems. Addressing these issues will improve your search engine relevancy ranking and drive more traffic to your site. The “good news” is that fixing the above 4 issues is really easy using the URL Rewrite Extension. This is a completely free Microsoft extension available for IIS 7.x (on Windows Server 2008, Windows Server 2008 R2, Windows 7 and Windows Vista). The great thing about using the IIS Rewrite extension is that it allows you to fix the above problems *without* having to change any code within your applications. You can easily install the URL Rewrite Extension in under 3 minutes using the Microsoft Web Platform Installer (a free tool we ship that automates setting up web servers and development machines). Just click the green “Install Now” button on the URL Rewrite Spotlight page to install it on your Windows Server 2008, Windows 7 or Windows Vista machine: Once installed you’ll find that a new “URL Rewrite” icon is available within the IIS 7 Admin Tool: Double-clicking the icon will open up the URL Rewrite admin panel – which will display the list of URL Rewrite rules configured for a particular application or site: Notice that our rewrite rule list above is currently empty (which is the default when you first install the extension). We can click the “Add Rule…” link button in the top-right of the panel to add and enable new URL Rewriting logic for our site. Scenario 1: Handling Default Document Scenarios One of the SEO problems I discussed earlier in this post was the scenario where the “default document” feature of IIS causes you to inadvertently expose two URLs for the same content on your site. For example: http://scottgu.com/ http://scottgu.com/default.aspx We can fix this by adding a new IIS Rewrite rule that automatically redirects anyone who navigates to the second URL to instead go to the first one. We will setup the HTTP redirect to be a “permanent redirect” – which will indicate to search engines that they should follow the redirect and use the new URL they are redirected to as the identifier of the content they retrieve. Let’s look at how we can create such a rule. We’ll begin by clicking the “Add Rule” link in the screenshot above. This will cause the below dialog to display: We’ll select the “Blank Rule” template within the “Inbound rules” section to create a new custom URL Rewriting rule. This will display an empty pane like below: Don’t worry – setting up the above rule is easy. The following 4 steps explain how to do so: Step 1: Name the Rule Our first step will be to name the rule we are creating. Naming it with a descriptive name will make it easier to find and understand later. Let’s name this rule our “Default Document URL Rewrite” rule: Step 2: Setup the Regular Expression that Matches this Rule Our second step will be to specify a regular expression filter that will cause this rule to execute when an incoming URL matches the regex pattern. Don’t worry if you aren’t good with regular expressions - I suck at them too. The trick is to know someone who is good at them or copy/paste them from a web-site. Below we are going to specify the following regular expression as our pattern rule: (.*?)/?Default\.aspx$ This pattern will match any URL string that ends with Default.aspx. The "(.*?)" matches any preceding character zero or more times. The "/?" part says to match the slash symbol zero or one times. The "$" symbol at the end will ensure that the pattern will only match strings that end with Default.aspx. Combining all these regex elements allows this rule to work not only for the root of your web site (e.g. http://scottgu.com/default.aspx) but also for any application or subdirectory within the site (e.g. http://scottgu.com/photos/default.aspx. Because the “ignore case” checkbox is selected it will match both “Default.aspx” as well as “default.aspx” within the URL. One nice feature built-into the rule editor is a “Test pattern” button that you can click to bring up a dialog that allows you to test out a few URLs with the rule you are configuring: Above I've added a “products/default.aspx” URL and clicked the “Test” button. This will give me immediate feedback on whether the rule will execute for it. Step 3: Setup a Permanent Redirect Action We’ll then setup an action to occur when our regular expression pattern matches the incoming URL: In the dialog above I’ve changed the “Action Type” drop down to be a “Redirect” action. The “Redirect Type” will be a HTTP 301 Permanent redirect – which means search engines will follow it. I’ve also set the “Redirect URL” property to be: {R:1}/ This indicates that we want to redirect the web client requesting the original URL to a new URL that has the originally requested URL path - minus the "Default.aspx" in it. For example, requests for http://scottgu.com/default.aspx will be redirected to http://scottgu.com/, and requests for http://scottgu.com/photos/default.aspx will be redirected to http://scottgu.com/photos/ The "{R:N}" regex construct, where N >= 0, is called a back-reference and N is the back-reference index. In the case of our pattern "(.*?)/?Default\.aspx$", if the input URL is "products/Default.aspx" then {R:0} will contain "products/Default.aspx" and {R:1} will contain "products". We are going to use this {R:1}/ value to be the URL we redirect users to. Step 4: Apply and Save the Rule Our final step is to click the “Apply” button in the top right hand of the IIS admin tool – which will cause the tool to persist the URL Rewrite rule into our application’s root web.config file (under a <system.webServer/rewrite> configuration section): <configuration> <system.webServer> <rewrite> <rules> <rule name="Default Document" stopProcessing="true"> <match url="(.*?)/?Default\.aspx$" /> <action type="Redirect" url="{R:1}/" /> </rule> </rules> </rewrite> </system.webServer> </configuration> Because IIS 7.x and ASP.NET share the same web.config files, you can actually just copy/paste the above code into your web.config files using Visual Studio and skip the need to run the admin tool entirely. This also makes adding/deploying URL Rewrite rules with your ASP.NET applications really easy. Step 5: Try the Rule Out Now that we’ve saved the rule, let’s try it out on our site. Try the following two URLs on my site: http://scottgu.com/ http://scottgu.com/default.aspx Notice that the second URL automatically redirects to the first one. Because it is a permanent redirect, search engines will follow the URL and should update the page ranking of http://scottgu.com to include links to http://scottgu.com/default.aspx as well. Scenario 2: Different URL Casing Another common SEO problem I discussed earlier in this post is that URLs are case sensitive to search engines on the web. This means that search engines will treat the following links as two completely different URLs: http://scottgu.com/Albums.aspx http://scottgu.com/albums.aspx We can fix this by adding a new IIS Rewrite rule that automatically redirects anyone who navigates to the first URL to instead go to the second (all lower-case) one. Like before, we will setup the HTTP redirect to be a “permanent redirect” – which will indicate to search engines that they should follow the redirect and use the new URL they are redirected to as the identifier of the content they retrieve. To create such a rule we’ll click the “Add Rule” link in the URL Rewrite admin tool again. This will cause the “Add Rule” dialog to appear again: Unlike the previous scenario (where we created a “Blank Rule”), with this scenario we can take advantage of a built-in “Enforce lowercase URLs” rule template. When we click the “ok” button we’ll see the following dialog which asks us if we want to create a rule that enforces the use of lowercase letters in URLs: When we click the “Yes” button we’ll get a pre-written rule that automatically performs a permanent redirect if an incoming URL has upper-case characters in it – and automatically send users to a lower-case version of the URL: We can click the “Apply” button to use this rule “as-is” and have it apply to all incoming URLs to our site. Because my www.scottgu.com site uses ASP.NET Web Forms, I’m going to make one small change to the rule we generated above – which is to add a condition that will ensure that URLs to ASP.NET’s built-in “WebResource.axd” handler are excluded from our case-sensitivity URL Rewrite logic. URLs to the WebResource.axd handler will only come from server-controls emitted from my pages – and will never be linked to from external sites. While my site will continue to function fine if we redirect these URLs to automatically be lower-case – doing so isn’t necessary and will add an extra HTTP redirect to many of my pages. The good news is that adding a condition that prevents my URL Rewriting rule from happening with certain URLs is easy. We simply need to expand the “Conditions” section of the form above We can then click the “Add” button to add a condition clause. This will bring up the “Add Condition” dialog: Above I’ve entered {URL} as the Condition input – and said that this rule should only execute if the URL does not match a regex pattern which contains the string “WebResource.axd”. This will ensure that WebResource.axd URLs to my site will be allowed to execute just fine without having the URL be re-written to be all lower-case. Note: If you have static resources (like references to .jpg, .css, and .js files) within your site that currently use upper-case characters you’ll probably want to add additional condition filter clauses so that URLs to them also don’t get redirected to be lower-case (just add rules for patterns like .jpg, .gif, .js, etc). Your site will continue to work fine if these URLs get redirected to be lower case (meaning the site won’t break) – but it will cause an extra HTTP redirect to happen on your site for URLs that don’t need to be redirected for SEO reasons. So setting up a condition clause makes sense to add. When I click the “ok” button above and apply our lower-case rewriting rule the admin tool will save the following additional rule to our web.config file: <configuration> <system.webServer> <rewrite> <rules> <rule name="Default Document" stopProcessing="true"> <match url="(.*?)/?Default\.aspx$" /> <action type="Redirect" url="{R:1}/" /> </rule> <rule name="Lower Case URLs" stopProcessing="true"> <match url="[A-Z]" ignoreCase="false" /> <conditions logicalGrouping="MatchAll" trackAllCaptures="false"> <add input="{URL}" pattern="WebResource.axd" negate="true" /> </conditions> <action type="Redirect" url="{ToLower:{URL}}" /> </rule> </rules> </rewrite> </system.webServer> </configuration> Try the Rule Out Now that we’ve saved the rule, let’s try it out on our site. Try the following two URLs on my site: http://scottgu.com/Albums.aspx http://scottgu.com/albums.aspx Notice that the first URL (which has a capital “A”) automatically does a redirect to a lower-case version of the URL. Scenario 3: Trailing Slashes Another common SEO problem I discussed earlier in this post is the scenario of trailing slashes within URLs. The trailing slash creates yet another situation that causes search engines to treat the URLs as different and so split search rankings: http://scottgu.com http://scottgu.com/ We can fix this by adding a new IIS Rewrite rule that automatically redirects anyone who navigates to the first URL (that does not have a trailing slash) to instead go to the second one that does. Like before, we will setup the HTTP redirect to be a “permanent redirect” – which will indicate to search engines that they should follow the redirect and use the new URL they are redirected to as the identifier of the content they retrieve. To create such a rule we’ll click the “Add Rule” link in the URL Rewrite admin tool again. This will cause the “Add Rule” dialog to appear again: The URL Rewrite admin tool has a built-in “Append or remove the trailing slash symbol” rule template. When we select it and click the “ok” button we’ll see the following dialog which asks us if we want to create a rule that automatically redirects users to a URL with a trailing slash if one isn’t present: Like within our previous lower-casing rewrite rule we’ll add one additional condition clause that will exclude WebResource.axd URLs from being processed by this rule. This will avoid an unnecessary redirect for happening for those URLs. When we click the “OK” button we’ll get a pre-written rule that automatically performs a permanent redirect if the URL doesn’t have a trailing slash – and if the URL is not processed by either a directory or a file. This will save the following additional rule to our web.config file: <configuration> <system.webServer> <rewrite> <rules> <rule name="Default Document" stopProcessing="true"> <match url="(.*?)/?Default\.aspx$" /> <action type="Redirect" url="{R:1}/" /> </rule> <rule name="Lower Case URLs" stopProcessing="true"> <match url="[A-Z]" ignoreCase="false" /> <conditions logicalGrouping="MatchAll" trackAllCaptures="false"> <add input="{URL}" pattern="WebResource.axd" negate="true" /> </conditions> <action type="Redirect" url="{ToLower:{URL}}" /> </rule> <rule name="Trailing Slash" stopProcessing="true"> <match url="(.*[^/])$" /> <conditions logicalGrouping="MatchAll" trackAllCaptures="false"> <add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" /> <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" /> <add input="{URL}" pattern="WebResource.axd" negate="true" /> </conditions> <action type="Redirect" url="{R:1}/" /> </rule> </rules> </rewrite> </system.webServer> </configuration> Try the Rule Out Now that we’ve saved the rule, let’s try it out on our site. Try the following two URLs on my site: http://scottgu.com http://scottgu.com/ Notice that the first URL (which has no trailing slash) automatically does a redirect to a URL with the trailing slash. Because it is a permanent redirect, search engines will follow the URL and update the page ranking. Scenario 4: Canonical Host Names The final SEO problem I discussed earlier are scenarios where a site works with both a leading “www” hostname prefix as well as just the hostname itself. This causes search engines to treat the URLs as different and split search rankling: http://www.scottgu.com/albums.aspx http://scottgu.com/albums.aspx We can fix this by adding a new IIS Rewrite rule that automatically redirects anyone who navigates to the first URL (that has a www prefix) to instead go to the second URL. Like before, we will setup the HTTP redirect to be a “permanent redirect” – which will indicate to search engines that they should follow the redirect and use the new URL they are redirected to as the identifier of the content they retrieve. To create such a rule we’ll click the “Add Rule” link in the URL Rewrite admin tool again. This will cause the “Add Rule” dialog to appear again: The URL Rewrite admin tool has a built-in “Canonical domain name” rule template. When we select it and click the “ok” button we’ll see the following dialog which asks us if we want to create a redirect rule that automatically redirects users to a primary host name URL: Above I’m entering the primary URL address I want to expose to the web: scottgu.com. When we click the “OK” button we’ll get a pre-written rule that automatically performs a permanent redirect if the URL has another leading domain name prefix. This will save the following additional rule to our web.config file: <configuration> <system.webServer> <rewrite> <rules> <rule name="Cannonical Hostname"> <match url="(.*)" /> <conditions logicalGrouping="MatchAll" trackAllCaptures="false"> <add input="{HTTP_HOST}" pattern="^scottgu\.com$" negate="true" /> </conditions> <action type="Redirect" url="http://scottgu.com/{R:1}" /> </rule> <rule name="Default Document" stopProcessing="true"> <match url="(.*?)/?Default\.aspx$" /> <action type="Redirect" url="{R:1}/" /> </rule> <rule name="Lower Case URLs" stopProcessing="true"> <match url="[A-Z]" ignoreCase="false" /> <conditions logicalGrouping="MatchAll" trackAllCaptures="false"> <add input="{URL}" pattern="WebResource.axd" negate="true" /> </conditions> <action type="Redirect" url="{ToLower:{URL}}" /> </rule> <rule name="Trailing Slash" stopProcessing="true"> <match url="(.*[^/])$" /> <conditions logicalGrouping="MatchAll" trackAllCaptures="false"> <add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" /> <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" /> <add input="{URL}" pattern="WebResource.axd" negate="true" /> </conditions> <action type="Redirect" url="{R:1}/" /> </rule> </rules> </rewrite> </system.webServer> </configuration> Try the Rule Out Now that we’ve saved the rule, let’s try it out on our site. Try the following two URLs on my site: http://www.scottgu.com/albums.aspx http://scottgu.com/albums.aspx Notice that the first URL (which has the “www” prefix) now automatically does a redirect to the second URL which does not have the www prefix. Because it is a permanent redirect, search engines will follow the URL and update the page ranking. 4 Simple Rules for Improved SEO The above 4 rules are pretty easy to setup and should take less than 15 minutes to configure on existing sites you already have. The beauty of using a solution like the URL Rewrite Extension is that you can take advantage of it without having to change code within your web-site – and without having to break any existing links already pointing at your site. Users who follow existing links will be automatically redirected to the new URLs you wish to publish. And search engines will start to give your site a higher search relevancy ranking – which will list your site higher in search results and drive more traffic to it. Customizing your URL Rewriting rules further is easy to-do either by editing the web.config file directly, or alternatively, just double click the URL Rewrite icon within the IIS 7.x admin tool and it will list all the active rules for your web-site or application: Clicking any of the rules above will open the rules editor back up and allow you to tweak/customize/save them further. Summary Measuring and improving SEO is something every developer building a public-facing web-site needs to think about and focus on. If you haven’t already, download and use the SEO Toolkit to analyze the SEO of your sites today. New URL Routing features in ASP.NET MVC and ASP.NET Web Forms 4 make it much easier to build applications that have more control over the URLs that are published. Tools like the URL Rewrite Extension that I’ve talked about in this blog post make it much easier to improve the URLs that are published from sites you already have built today – without requiring you to change a lot of code. The URL Rewrite Extension provides a bunch of additional great capabilities – far beyond just SEO - as well. I’ll be covering these additional capabilities more in future blog posts. Hope this helps, Scott

Read the article

The Incremental Architect’s Napkin - #5 - Design functions for extensibility and readability

- by Ralf Westphal

Originally posted on: http://geekswithblogs.net/theArchitectsNapkin/archive/2014/08/24/the-incremental-architectrsquos-napkin---5---design-functions-for.aspx The functionality of programs is entered via Entry Points. So what we´re talking about when designing software is a bunch of functions handling the requests represented by and flowing in through those Entry Points. Designing software thus consists of at least three phases: Analyzing the requirements to find the Entry Points and their signatures Designing the functionality to be executed when those Entry Points get triggered Implementing the functionality according to the design aka coding I presume, you´re familiar with phase 1 in some way. And I guess you´re proficient in implementing functionality in some programming language. But in my experience developers in general are not experienced in going through an explicit phase 2. “Designing functionality? What´s that supposed to mean?” you might already have thought. Here´s my definition: To design functionality (or functional design for short) means thinking about… well, functions. You find a solution for what´s supposed to happen when an Entry Point gets triggered in terms of functions. A conceptual solution that is, because those functions only exist in your head (or on paper) during this phase. But you may have guess that, because it´s “design” not “coding”. And here is, what functional design is not: It´s not about logic. Logic is expressions (e.g. +, -, && etc.) and control statements (e.g. if, switch, for, while etc.). Also I consider calling external APIs as logic. It´s equally basic. It´s what code needs to do in order to deliver some functionality or quality. Logic is what´s doing that needs to be done by software. Transformations are either done through expressions or API-calls. And then there is alternative control flow depending on the result of some expression. Basically it´s just jumps in Assembler, sometimes to go forward (if, switch), sometimes to go backward (for, while, do). But calling your own function is not logic. It´s not necessary to produce any outcome. Functionality is not enhanced by adding functions (subroutine calls) to your code. Nor is quality increased by adding functions. No performance gain, no higher scalability etc. through functions. Functions are not relevant to functionality. Strange, isn´t it. What they are important for is security of investment. By introducing functions into our code we can become more productive (re-use) and can increase evolvability (higher unterstandability, easier to keep code consistent). That´s no small feat, however. Evolvable code can hardly be overestimated. That´s why to me functional design is so important. It´s at the core of software development. To sum this up: Functional design is on a level of abstraction above (!) logical design or algorithmic design. Functional design is only done until you get to a point where each function is so simple you are very confident you can easily code it. Functional design an logical design (which mostly is coding, but can also be done using pseudo code or flow charts) are complementary. Software needs both. If you start coding right away you end up in a tangled mess very quickly. Then you need back out through refactoring. Functional design on the other hand is bloodless without actual code. It´s just a theory with no experiments to prove it. But how to do functional design? An example of functional design Let´s assume a program to de-duplicate strings. The user enters a number of strings separated by commas, e.g. a, b, a, c, d, b, e, c, a. And the program is supposed to clear this list of all doubles, e.g. a, b, c, d, e. There is only one Entry Point to this program: the user triggers the de-duplication by starting the program with the string list on the command line C:\>deduplicate "a, b, a, c, d, b, e, c, a" a, b, c, d, e …or by clicking on a GUI button. This leads to the Entry Point function to get called. It´s the program´s main function in case of the batch version or a button click event handler in the GUI version. That´s the physical Entry Point so to speak. It´s inevitable. What then happens is a three step process: Transform the input data from the user into a request. Call the request handler. Transform the output of the request handler into a tangible result for the user. Or to phrase it a bit more generally: Accept input. Transform input into output. Present output. This does not mean any of these steps requires a lot of effort. Maybe it´s just one line of code to accomplish it. Nevertheless it´s a distinct step in doing the processing behind an Entry Point. Call it an aspect or a responsibility - and you will realize it most likely deserves a function of its own to satisfy the Single Responsibility Principle (SRP). Interestingly the above list of steps is already functional design. There is no logic, but nevertheless the solution is described - albeit on a higher level of abstraction than you might have done yourself. But it´s still on a meta-level. The application to the domain at hand is easy, though: Accept string list from command line De-duplicate Present de-duplicated strings on standard output And this concrete list of processing steps can easily be transformed into code:static void Main(string[] args) { var input = Accept_string_list(args); var output = Deduplicate(input); Present_deduplicated_string_list(output); } Instead of a big problem there are three much smaller problems now. If you think each of those is trivial to implement, then go for it. You can stop the functional design at this point. But maybe, just maybe, you´re not so sure how to go about with the de-duplication for example. Then just implement what´s easy right now, e.g.private static string Accept_string_list(string[] args) { return args[0]; } private static void Present_deduplicated_string_list( string[] output) { var line = string.Join(", ", output); Console.WriteLine(line); } Accept_string_list() contains logic in the form of an API-call. Present_deduplicated_string_list() contains logic in the form of an expression and an API-call. And then repeat the functional design for the remaining processing step. What´s left is the domain logic: de-duplicating a list of strings. How should that be done? Without any logic at our disposal during functional design you´re left with just functions. So which functions could make up the de-duplication? Here´s a suggestion: De-duplicate Parse the input string into a true list of strings. Register each string in a dictionary/map/set. That way duplicates get cast away. Transform the data structure into a list of unique strings. Processing step 2 obviously was the core of the solution. That´s where real creativity was needed. That´s the core of the domain. But now after this refinement the implementation of each step is easy again:private static string[] Parse_string_list(string input) { return input.Split(',') .Select(s => s.Trim()) .ToArray(); } private static Dictionary<string,object> Compile_unique_strings(string[] strings) { return strings.Aggregate( new Dictionary<string, object>(), (agg, s) => { agg[s] = null; return agg; }); } private static string[] Serialize_unique_strings( Dictionary<string,object> dict) { return dict.Keys.ToArray(); } With these three additional functions Main() now looks like this:static void Main(string[] args) { var input = Accept_string_list(args); var strings = Parse_string_list(input); var dict = Compile_unique_strings(strings); var output = Serialize_unique_strings(dict); Present_deduplicated_string_list(output); } I think that´s very understandable code: just read it from top to bottom and you know how the solution to the problem works. It´s a mirror image of the initial design: Accept string list from command line Parse the input string into a true list of strings. Register each string in a dictionary/map/set. That way duplicates get cast away. Transform the data structure into a list of unique strings. Present de-duplicated strings on standard output You can even re-generate the design by just looking at the code. Code and functional design thus are always in sync - if you follow some simple rules. But about that later. And as a bonus: all the functions making up the process are small - which means easy to understand, too. So much for an initial concrete example. Now it´s time for some theory. Because there is method to this madness ;-) The above has only scratched the surface. Introducing Flow Design Functional design starts with a given function, the Entry Point. Its goal is to describe the behavior of the program when the Entry Point is triggered using a process, not an algorithm. An algorithm consists of logic, a process on the other hand consists just of steps or stages. Each processing step transforms input into output or a side effect. Also it might access resources, e.g. a printer, a database, or just memory. Processing steps thus can rely on state of some sort. This is different from Functional Programming, where functions are supposed to not be stateful and not cause side effects.[1] In its simplest form a process can be written as a bullet point list of steps, e.g. Get data from user Output result to user Transform data Parse data Map result for output Such a compilation of steps - possibly on different levels of abstraction - often is the first artifact of functional design. It can be generated by a team in an initial design brainstorming. Next comes ordering the steps. What should happen first, what next etc.? Get data from user Parse data Transform data Map result for output Output result to user That´s great for a start into functional design. It´s better than starting to code right away on a given function using TDD. Please get me right: TDD is a valuable practice. But it can be unnecessarily hard if the scope of a functionn is too large. But how do you know beforehand without investing some thinking? And how to do this thinking in a systematic fashion? My recommendation: For any given function you´re supposed to implement first do a functional design. Then, once you´re confident you know the processing steps - which are pretty small - refine and code them using TDD. You´ll see that´s much, much easier - and leads to cleaner code right away. For more information on this approach I call “Informed TDD” read my book of the same title. Thinking before coding is smart. And writing down the solution as a bunch of functions possibly is the simplest thing you can do, I´d say. It´s more according to the KISS (Keep It Simple, Stupid) principle than returning constants or other trivial stuff TDD development often is started with. So far so good. A simple ordered list of processing steps will do to start with functional design. As shown in the above example such steps can easily be translated into functions. Moving from design to coding thus is simple. However, such a list does not scale. Processing is not always that simple to be captured in a list. And then the list is just text. Again. Like code. That means the design is lacking visuality. Textual representations need more parsing by your brain than visual representations. Plus they are limited in their “dimensionality”: text just has one dimension, it´s sequential. Alternatives and parallelism are hard to encode in text. In addition the functional design using numbered lists lacks data. It´s not visible what´s the input, output, and state of the processing steps. That´s why functional design should be done using a lightweight visual notation. No tool is necessary to draw such designs. Use pen and paper; a flipchart, a whiteboard, or even a napkin is sufficient. Visualizing processes The building block of the functional design notation is a functional unit. I mostly draw it like this: Something is done, it´s clear what goes in, it´s clear what comes out, and it´s clear what the processing step requires in terms of state or hardware. Whenever input flows into a functional unit it gets processed and output is produced and/or a side effect occurs. Flowing data is the driver of something happening. That´s why I call this approach to functional design Flow Design. It´s about data flow instead of control flow. Control flow like in algorithms is of no concern to functional design. Thinking about control flow simply is too low level. Once you start with control flow you easily get bogged down by tons of details. That´s what you want to avoid during design. Design is supposed to be quick, broad brush, abstract. It should give overview. But what about all the details? As Robert C. Martin rightly said: “Programming is abot detail”. Detail is a matter of code. Once you start coding the processing steps you designed you can worry about all the detail you want. Functional design does not eliminate all the nitty gritty. It just postpones tackling them. To me that´s also an example of the SRP. Function design has the responsibility to come up with a solution to a problem posed by a single function (Entry Point). And later coding has the responsibility to implement the solution down to the last detail (i.e. statement, API-call). TDD unfortunately mixes both responsibilities. It´s just coding - and thereby trying to find detailed implementations (green phase) plus getting the design right (refactoring). To me that´s one reason why TDD has failed to deliver on its promise for many developers. Using functional units as building blocks of functional design processes can be depicted very easily. Here´s the initial process for the example problem: For each processing step draw a functional unit and label it. Choose a verb or an “action phrase” as a label, not a noun. Functional design is about activities, not state or structure. Then make the output of an upstream step the input of a downstream step. Finally think about the data that should flow between the functional units. Write the data above the arrows connecting the functional units in the direction of the data flow. Enclose the data description in brackets. That way you can clearly see if all flows have already been specified. Empty brackets mean “no data is flowing”, but nevertheless a signal is sent. A name like “list” or “strings” in brackets describes the data content. Use lower case labels for that purpose. A name starting with an upper case letter like “String” or “Customer” on the other hand signifies a data type. If you like, you also can combine descriptions with data types by separating them with a colon, e.g. (list:string) or (strings:string[]). But these are just suggestions from my practice with Flow Design. You can do it differently, if you like. Just be sure to be consistent. Flows wired-up in this manner I call one-dimensional (1D). Each functional unit just has one input and/or one output. A functional unit without an output is possible. It´s like a black hole sucking up input without producing any output. Instead it produces side effects. A functional unit without an input, though, does make much sense. When should it start to work? What´s the trigger? That´s why in the above process even the first processing step has an input. If you like, view such 1D-flows as pipelines. Data is flowing through them from left to right. But as you can see, it´s not always the same data. It get´s transformed along its passage: (args) becomes a (list) which is turned into (strings). The Principle of Mutual Oblivion A very characteristic trait of flows put together from function units is: no functional units knows another one. They are all completely independent of each other. Functional units don´t know where their input is coming from (or even when it´s gonna arrive). They just specify a range of values they can process. And they promise a certain behavior upon input arriving. Also they don´t know where their output is going. They just produce it in their own time independent of other functional units. That means at least conceptually all functional units work in parallel. Functional units don´t know their “deployment context”. They now nothing about the overall flow they are place in. They are just consuming input from some upstream, and producing output for some downstream. That makes functional units very easy to test. At least as long as they don´t depend on state or resources. I call this the Principle of Mutual Oblivion (PoMO). Functional units are oblivious of others as well as an overall context/purpose. They are just parts of a whole focused on a single responsibility. How the whole is built, how a larger goal is achieved, is of no concern to the single functional units. By building software in such a manner, functional design interestingly follows nature. Nature´s building blocks for organisms also follow the PoMO. The cells forming your body do not know each other. Take a nerve cell “controlling” a muscle cell for example:[2] The nerve cell does not know anything about muscle cells, let alone the specific muscel cell it is “attached to”. Likewise the muscle cell does not know anything about nerve cells, let a lone a specific nerve cell “attached to” it. Saying “the nerve cell is controlling the muscle cell” thus only makes sense when viewing both from the outside. “Control” is a concept of the whole, not of its parts. Control is created by wiring-up parts in a certain way. Both cells are mutually oblivious. Both just follow a contract. One produces Acetylcholine (ACh) as output, the other consumes ACh as input. Where the ACh is going, where it´s coming from neither cell cares about. Million years of evolution have led to this kind of division of labor. And million years of evolution have produced organism designs (DNA) which lead to the production of these different cell types (and many others) and also to their co-location. The result: the overall behavior of an organism. How and why this happened in nature is a mystery. For our software, though, it´s clear: functional and quality requirements needs to be fulfilled. So we as developers have to become “intelligent designers” of “software cells” which we put together to form a “software organism” which responds in satisfying ways to triggers from it´s environment. My bet is: If nature gets complex organisms working by following the PoMO, who are we to not apply this recipe for success to our much simpler “machines”? So my rule is: Wherever there is functionality to be delivered, because there is a clear Entry Point into software, design the functionality like nature would do it. Build it from mutually oblivious functional units. That´s what Flow Design is about. In that way it´s even universal, I´d say. Its notation can also be applied to biology: Never mind labeling the functional units with nouns. That´s ok in Flow Design. You´ll do that occassionally for functional units on a higher level of abstraction or when their purpose is close to hardware. Getting a cockroach to roam your bedroom takes 1,000,000 nerve cells (neurons). Getting the de-duplication program to do its job just takes 5 “software cells” (functional units). Both, though, follow the same basic principle. Translating functional units into code Moving from functional design to code is no rocket science. In fact it´s straightforward. There are two simple rules: Translate an input port to a function. Translate an output port either to a return statement in that function or to a function pointer visible to that function. The simplest translation of a functional unit is a function. That´s what you saw in the above example. Functions are mutually oblivious. That why Functional Programming likes them so much. It makes them composable. Which is the reason, nature works according to the PoMO. Let´s be clear about one thing: There is no dependency injection in nature. For all of an organism´s complexity no DI container is used. Behavior is the result of smooth cooperation between mutually oblivious building blocks. Functions will often be the adequate translation for the functional units in your designs. But not always. Take for example the case, where a processing step should not always produce an output. Maybe the purpose is to filter input. Here the functional unit consumes words and produces words. But it does not pass along every word flowing in. Some words are swallowed. Think of a spell checker. It probably should not check acronyms for correctness. There are too many of them. Or words with no more than two letters. Such words are called “stop words”. In the above picture the optionality of the output is signified by the astrisk outside the brackets. It means: Any number of (word) data items can flow from the functional unit for each input data item. It might be none or one or even more. This I call a stream of data. Such behavior cannot be translated into a function where output is generated with return. Because a function always needs to return a value. So the output port is translated into a function pointer or continuation which gets passed to the subroutine when called:[3]void filter_stop_words( string word, Action<string> onNoStopWord) { if (...check if not a stop word...) onNoStopWord(word); } If you want to be nitpicky you might call such a function pointer parameter an injection. And technically you´re right. Conceptually, though, it´s not an injection. Because the subroutine is not functionally dependent on the continuation. Firstly continuations are procedures, i.e. subroutines without a return type. Remember: Flow Design is about unidirectional data flow. Secondly the name of the formal parameter is chosen in a way as to not assume anything about downstream processing steps. onNoStopWord describes a situation (or event) within the functional unit only. Translating output ports into function pointers helps keeping functional units mutually oblivious in cases where output is optional or produced asynchronically. Either pass the function pointer to the function upon call. Or make it global by putting it on the encompassing class. Then it´s called an event. In C# that´s even an explicit feature.class Filter { public void filter_stop_words( string word) { if (...check if not a stop word...) onNoStopWord(word); } public event Action<string> onNoStopWord; } When to use a continuation and when to use an event dependens on how a functional unit is used in flows and how it´s packed together with others into classes. You´ll see examples further down the Flow Design road. Another example of 1D functional design Let´s see Flow Design once more in action using the visual notation. How about the famous word wrap kata? Robert C. Martin has posted a much cited solution including an extensive reasoning behind his TDD approach. So maybe you want to compare it to Flow Design. The function signature given is:string WordWrap(string text, int maxLineLength) {...} That´s not an Entry Point since we don´t see an application with an environment and users. Nevertheless it´s a function which is supposed to provide a certain functionality. The text passed in has to be reformatted. The input is a single line of arbitrary length consisting of words separated by spaces. The output should consist of one or more lines of a maximum length specified. If a word is longer than a the maximum line length it can be split in multiple parts each fitting in a line. Flow Design Let´s start by brainstorming the process to accomplish the feat of reformatting the text. What´s needed? Words need to be assembled into lines Words need to be extracted from the input text The resulting lines need to be assembled into the output text Words too long to fit in a line need to be split Does sound about right? I guess so. And it shows a kind of priority. Long words are a special case. So maybe there is a hint for an incremental design here. First let´s tackle “average words” (words not longer than a line). Here´s the Flow Design for this increment: The the first three bullet points turned into functional units with explicit data added. As the signature requires a text is transformed into another text. See the input of the first functional unit and the output of the last functional unit. In between no text flows, but words and lines. That´s good to see because thereby the domain is clearly represented in the design. The requirements are talking about words and lines and here they are. But note the asterisk! It´s not outside the brackets but inside. That means it´s not a stream of words or lines, but lists or sequences. For each text a sequence of words is output. For each sequence of words a sequence of lines is produced. The asterisk is used to abstract from the concrete implementation. Like with streams. Whether the list of words gets implemented as an array or an IEnumerable is not important during design. It´s an implementation detail. Does any processing step require further refinement? I don´t think so. They all look pretty “atomic” to me. And if not… I can always backtrack and refine a process step using functional design later once I´ve gained more insight into a sub-problem. Implementation The implementation is straightforward as you can imagine. The processing steps can all be translated into functions. Each can be tested easily and separately. Each has a focused responsibility. And the process flow becomes just a sequence of function calls: Easy to understand. It clearly states how word wrapping works - on a high level of abstraction. And it´s easy to evolve as you´ll see. Flow Design - Increment 2 So far only texts consisting of “average words” are wrapped correctly. Words not fitting in a line will result in lines too long. Wrapping long words is a feature of the requested functionality. Whether it´s there or not makes a difference to the user. To quickly get feedback I decided to first implement a solution without this feature. But now it´s time to add it to deliver the full scope. Fortunately Flow Design automatically leads to code following the Open Closed Principle (OCP). It´s easy to extend it - instead of changing well tested code. How´s that possible? Flow Design allows for extension of functionality by inserting functional units into the flow. That way existing functional units need not be changed. The data flow arrow between functional units is a natural extension point. No need to resort to the Strategy Pattern. No need to think ahead where extions might need to be made in the future. I just “phase in” the remaining processing step: Since neither Extract words nor Reformat know of their environment neither needs to be touched due to the “detour”. The new processing step accepts the output of the existing upstream step and produces data compatible with the existing downstream step. Implementation - Increment 2 A trivial implementation checking the assumption if this works does not do anything to split long words. The input is just passed on: Note how clean WordWrap() stays. The solution is easy to understand. A developer looking at this code sometime in the future, when a new feature needs to be build in, quickly sees how long words are dealt with. Compare this to Robert C. Martin´s solution:[4] How does this solution handle long words? Long words are not even part of the domain language present in the code. At least I need considerable time to understand the approach. Admittedly the Flow Design solution with the full implementation of long word splitting is longer than Robert C. Martin´s. At least it seems. Because his solution does not cover all the “word wrap situations” the Flow Design solution handles. Some lines would need to be added to be on par, I guess. But even then… Is a difference in LOC that important as long as it´s in the same ball park? I value understandability and openness for extension higher than saving on the last line of code. Simplicity is not just less code, it´s also clarity in design. But don´t take my word for it. Try Flow Design on larger problems and compare for yourself. What´s the easier, more straightforward way to clean code? And keep in mind: You ain´t seen all yet ;-) There´s more to Flow Design than described in this chapter. In closing I hope I was able to give you a impression of functional design that makes you hungry for more. To me it´s an inevitable step in software development. Jumping from requirements to code does not scale. And it leads to dirty code all to quickly. Some thought should be invested first. Where there is a clear Entry Point visible, it´s functionality should be designed using data flows. Because with data flows abstraction is possible. For more background on why that´s necessary read my blog article here. For now let me point out to you - if you haven´t already noticed - that Flow Design is a general purpose declarative language. It´s “programming by intention” (Shalloway et al.). Just write down how you think the solution should work on a high level of abstraction. This breaks down a large problem in smaller problems. And by following the PoMO the solutions to those smaller problems are independent of each other. So they are easy to test. Or you could even think about getting them implemented in parallel by different team members. Flow Design not only increases evolvability, but also helps becoming more productive. All team members can participate in functional design. This goes beyon collective code ownership. We´re talking collective design/architecture ownership. Because with Flow Design there is a common visual language to talk about functional design - which is the foundation for all other design activities. PS: If you like what you read, consider getting my ebook “The Incremental Architekt´s Napkin”. It´s where I compile all the articles in this series for easier reading. I like the strictness of Function Programming - but I also find it quite hard to live by. And it certainly is not what millions of programmers are used to. Also to me it seems, the real world is full of state and side effects. So why give them such a bad image? That´s why functional design takes a more pragmatic approach. State and side effects are ok for processing steps - but be sure to follow the SRP. Don´t put too much of it into a single processing step. ? Image taken from www.physioweb.org ? My code samples are written in C#. C# sports typed function pointers called delegates. Action is such a function pointer type matching functions with signature void someName(T t). Other languages provide similar ways to work with functions as first class citizens - even Java now in version 8. I trust you find a way to map this detail of my translation to your favorite programming language. I know it works for Java, C++, Ruby, JavaScript, Python, Go. And if you´re using a Functional Programming language it´s of course a no brainer. ? Taken from his blog post “The Craftsman 62, The Dark Path”. ?

Read the article

Control layout using graphviz twopi

- by vy32

I am trying to draw a graph showing search prefixes using twopi. I have a simple input file and am getting this output: (full image) Here is the input file: digraph search { // ordering=out; // color=blue; // rank=same; // overlap=scale; rankdir=LR; root=root; ranksep=1.25; overlap=true; "root"; a [color=none,fontsize=12]; b [color=none,fontsize=12]; c [color=none,fontsize=12]; d [color=none,fontsize=12]; e [color=none,fontsize=12]; f [color=none,fontsize=12]; #g [color=none,fontsize=12]; h [color=none,fontsize=12]; i [color=none,fontsize=12]; j [color=none,fontsize=12]; k [color=none,fontsize=12]; l [color=none,fontsize=12]; m [color=none,fontsize=12]; n [color=none,fontsize=12]; o [color=none,fontsize=12]; p [color=none,fontsize=12]; q [color=none,fontsize=12]; r [color=none,fontsize=12]; s [color=none,fontsize=12]; t [color=none,fontsize=12]; u [color=none,fontsize=12]; v [color=none,fontsize=12]; w [color=none,fontsize=12]; x [color=none,fontsize=12]; y [color=none,fontsize=12]; #ga [color=none,fontsize=12]; gb [color=none,fontsize=12]; gc [color=none,fontsize=12]; gd [color=none,fontsize=12]; ge [color=none,fontsize=12]; gf [color=none,fontsize=12]; gg [color=none,fontsize=12]; gh [color=none,fontsize=12]; gi [color=none,fontsize=12]; gj [color=none,fontsize=12]; gk [color=none,fontsize=12]; gl [color=none,fontsize=12]; gm [color=none,fontsize=12]; gn [color=none,fontsize=12]; go [color=none,fontsize=12]; gp [color=none,fontsize=12]; gq [color=none,fontsize=12]; gr [color=none,fontsize=12]; gs [color=none,fontsize=12]; gt [color=none,fontsize=12]; gu [color=none,fontsize=12]; gv [color=none,fontsize=12]; gw [color=none,fontsize=12]; gx [color=none,fontsize=12]; gy [color=none,fontsize=12]; gaa [color=none,fontsize=12]; gab [color=none,fontsize=12]; gac [color=none,fontsize=12]; gad [color=none,fontsize=12]; gae [color=none,fontsize=12]; gaf [color=none,fontsize=12]; gag [color=none,fontsize=12]; gah [color=none,fontsize=12]; gai [color=none,fontsize=12]; gaj [color=none,fontsize=12]; gak [color=none,fontsize=12]; gal [color=none,fontsize=12]; gam [color=none,fontsize=12]; gan [color=none,fontsize=12]; gao [color=none,fontsize=12]; gap [color=none,fontsize=12]; gaq [color=none,fontsize=12]; #gaz [color=none,fontsize=12]; gas [color=none,fontsize=12]; gat [color=none,fontsize=12]; gau [color=none,fontsize=12]; gav [color=none,fontsize=12]; gaw [color=none,fontsize=12]; gax [color=none,fontsize=12]; gay [color=none,fontsize=12]; gaza [color=none,fontsize=12]; gazb [color=none,fontsize=12]; gazc [color=none,fontsize=12]; gazd [color=none,fontsize=12]; gaze [color=none,fontsize=12]; #gazf [color=none,fontsize=12]; gazg [color=none,fontsize=12]; gazh [color=none,fontsize=12]; gazi [color=none,fontsize=12]; gazj [color=none,fontsize=12]; gazk [color=none,fontsize=12]; gazl [color=none,fontsize=12]; gazm [color=none,fontsize=12]; gazn [color=none,fontsize=12]; gazo [color=none,fontsize=12]; gazp [color=none,fontsize=12]; gazq [color=none,fontsize=12]; gazr [color=none,fontsize=12]; gazs [color=none,fontsize=12]; gazt [color=none,fontsize=12]; gazu [color=none,fontsize=12]; gazv [color=none,fontsize=12]; gazw [color=none,fontsize=12]; gazx [color=none,fontsize=12]; gazy [color=none,fontsize=12]; root -> a [minlen=2]; root -> b [minlen=2]; root -> c [minlen=2]; root -> d [minlen=2]; root -> e [minlen=2]; root -> f [minlen=2]; root -> g [minlen=2]; root -> h [minlen=2]; root -> i [minlen=2]; root -> j [minlen=2]; root -> k [minlen=2]; root -> l [minlen=2]; root -> m [minlen=2]; root -> n [minlen=2]; root -> o [minlen=2]; root -> p [minlen=2]; root -> q [minlen=2]; root -> r [minlen=2]; root -> s [minlen=20]; root -> t [minlen=2]; root -> u [minlen=2]; root -> v [minlen=2]; root -> w [minlen=2]; root -> x [minlen=2]; root -> y [minlen=2]; root -> 0 [minlen=2]; root -> 1 [minlen=2]; root -> 2 [minlen=2]; root -> 3 [minlen=2]; root -> 4 [minlen=2]; root -> 5 [minlen=2]; root -> 6 [minlen=2]; root -> 7 [minlen=2]; root -> 8 [minlen=2]; root -> 9 [minlen=2]; root -> "." [minlen=2]; g -> ga ; g -> gb ; g -> gc ; g -> gd ; g -> ge ; g -> gf ; g -> gg ; g -> gh ; g -> gi ; g -> gj ; g -> gk ; g -> gl ; g -> gm ; g -> gn ; g -> go ; g -> gp ; g -> gq ; g -> gr ; g -> gs ; g -> gt ; g -> gu ; g -> gv ; g -> gw ; g -> gx ; g -> gy ; ga -> gaa ; ga -> gab ; ga -> gac ; ga -> gad ; ga -> gae ; ga -> gaf ; ga -> gag ; ga -> gah ; ga -> gai ; ga -> gaj ; ga -> gak ; ga -> gal ; ga -> gam ; ga -> gan ; ga -> gao ; ga -> gap ; ga -> gaq ; ga -> gaz ; ga -> gas ; ga -> gat ; ga -> gau ; ga -> gav ; ga -> gaw ; ga -> gax ; ga -> gay ; gaz -> gaza ; gaz -> gazb ; gaz -> gazc ; gaz -> gazd ; gaz -> gaze ; gaz -> gazf ; gaz -> gazg ; gaz -> gazh ; gaz -> gazi ; gaz -> gazj ; gaz -> gazk ; gaz -> gazl ; gaz -> gazm ; gaz -> gazn ; gaz -> gazo ; gaz -> gazp ; gaz -> gazq ; gaz -> gazr ; gaz -> gazs ; gaz -> gazt ; gaz -> gazu ; gaz -> gazv ; gaz -> gazw ; gaz -> gazx ; gaz -> gazy ; gazo -> "Blue Tuesday" ; "Blue Tuesday" [ fontsize=10]; // Layout engines: circo dot fdp neato nop nop1 nop2 osage patchwork sfdp twopi } This output is generated with: twopi -os1.png -Tpng s1.dot I'm posting here because the printout is pretty dreadful. All of the nodes hung of "gaz" are overlapping; I've tried specifying nodesep and it is simply ignored. I would like to see the lines from root to the single letters further apart, but again, I can't control that. This seems to be a bug in twopi. The documentation says it should clearly follow these directives, but it doesn't seem to. My questions: Is there any way to make twopi behave? Failing that, is there a better layout engine to use? Thanks.

Read the article

Convert Java program to C

- by imicrothinking

I need a bit of guidance with writing a C program...a bit of quick background as to my level, I've programmed in Java previously, but this is my first time programming in C, and we've been tasked to translate a word count program from Java to C that consists of the following: Read a file from memory Count the words in the file For each occurrence of a unique word, keep a word counter variable Print out the top ten most frequent words and their corresponding occurrences Here's the source program in Java: package lab0; import java.io.File; import java.io.FileReader; import java.util.ArrayList; import java.util.Calendar; import java.util.Collections; public class WordCount { private ArrayList<WordCountNode> outputlist = null; public WordCount(){ this.outputlist = new ArrayList<WordCountNode>(); } /** * Read the file into memory. * * @param filename name of the file. * @return content of the file. * @throws Exception if the file is too large or other file related exception. */ public char[] readFile(String filename) throws Exception{ char [] result = null; File file = new File(filename); long size = file.length(); if (size > Integer.MAX_VALUE){ throw new Exception("File is too large"); } result = new char[(int)size]; FileReader reader = new FileReader(file); int len, offset = 0, size2read = (int)size; while(size2read > 0){ len = reader.read(result, offset, size2read); if(len == -1) break; size2read -= len; offset += len; } return result; } /** * Make article word by word. * * @param article the content of file to be counted. * @return string contains only letters and "'". */ private enum SPLIT_STATE {IN_WORD, NOT_IN_WORD}; /** * Go through article, find all the words and add to output list * with their count. * * @param article the content of the file to be counted. * @return words in the file and their counts. */ public ArrayList<WordCountNode> countWords(char[] article){ SPLIT_STATE state = SPLIT_STATE.NOT_IN_WORD; if(null == article) return null; char curr_ltr; int curr_start = 0; for(int i = 0; i < article.length; i++){ curr_ltr = Character.toUpperCase( article[i]); if(state == SPLIT_STATE.IN_WORD){ article[i] = curr_ltr; if ((curr_ltr < 'A' || curr_ltr > 'Z') && curr_ltr != '\'') { article[i] = ' '; //printf("\nthe word is %s\n\n",curr_start); if(i - curr_start < 0){ System.out.println("i = " + i + " curr_start = " + curr_start); } addWord(new String(article, curr_start, i-curr_start)); state = SPLIT_STATE.NOT_IN_WORD; } }else{ if (curr_ltr >= 'A' && curr_ltr <= 'Z') { curr_start = i; article[i] = curr_ltr; state = SPLIT_STATE.IN_WORD; } } } return outputlist; } /** * Add the word to output list. */ public void addWord(String word){ int pos = dobsearch(word); if(pos >= outputlist.size()){ outputlist.add(new WordCountNode(1L, word)); }else{ WordCountNode tmp = outputlist.get(pos); if(tmp.getWord().compareTo(word) == 0){ tmp.setCount(tmp.getCount() + 1); }else{ outputlist.add(pos, new WordCountNode(1L, word)); } } } /** * Search the output list and return the position to put word. * @param word is the word to be put into output list. * @return position in the output list to insert the word. */ public int dobsearch(String word){ int cmp, high = outputlist.size(), low = -1, next; // Binary search the array to find the key while (high - low > 1) { next = (high + low) / 2; // all in upper case cmp = word.compareTo((outputlist.get(next)).getWord()); if (cmp == 0) return next; else if (cmp < 0) high = next; else low = next; } return high; } public static void main(String args[]){ // handle input if (args.length == 0){ System.out.println("USAGE: WordCount <filename> [Top # of results to display]\n"); System.exit(1); } String filename = args[0]; int dispnum; try{ dispnum = Integer.parseInt(args[1]); }catch(Exception e){ dispnum = 10; } long start_time = Calendar.getInstance().getTimeInMillis(); WordCount wordcount = new WordCount(); System.out.println("Wordcount: Running..."); // read file char[] input = null; try { input = wordcount.readFile(filename); } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); System.exit(1); } // count all word ArrayList<WordCountNode> result = wordcount.countWords(input); long end_time = Calendar.getInstance().getTimeInMillis(); System.out.println("wordcount: completed " + (end_time - start_time)/1000000 + "." + (end_time - start_time)%1000000 + "(s)"); System.out.println("wordsort: running ..."); start_time = Calendar.getInstance().getTimeInMillis(); Collections.sort(result); end_time = Calendar.getInstance().getTimeInMillis(); System.out.println("wordsort: completed " + (end_time - start_time)/1000000 + "." + (end_time - start_time)%1000000 + "(s)"); Collections.reverse(result); System.out.println("\nresults (TOP "+ dispnum +" from "+ result.size() +"):\n" ); // print out result String str ; for (int i = 0; i < result.size() && i < dispnum; i++){ if(result.get(i).getWord().length() > 15) str = result.get(i).getWord().substring(0, 14); else str = result.get(i).getWord(); System.out.println(str + " - " + result.get(i).getCount()); } } public class WordCountNode implements Comparable{ private String word; private long count; public WordCountNode(long count, String word){ this.count = count; this.word = word; } public String getWord() { return word; } public void setWord(String word) { this.word = word; } public long getCount() { return count; } public void setCount(long count) { this.count = count; } public int compareTo(Object arg0) { // TODO Auto-generated method stub WordCountNode obj = (WordCountNode)arg0; if( count - obj.getCount() < 0) return -1; else if( count - obj.getCount() == 0) return 0; else return 1; } } } Here's my attempt (so far) in C: #include <stdio.h> #include <stdlib.h> #include <stdbool.h> #include <string.h> // Read in a file FILE *readFile (char filename[]) { FILE *inputFile; inputFile = fopen (filename, "r"); if (inputFile == NULL) { printf ("File could not be opened.\n"); exit (EXIT_FAILURE); } return inputFile; } // Return number of words in an array int wordCount (FILE *filePointer, char filename[]) {//, char *words[]) { // count words int count = 0; char temp; while ((temp = getc(filePointer)) != EOF) { //printf ("%c", temp); if ((temp == ' ' || temp == '\n') && (temp != '\'')) count++; } count += 1; // counting method uses space AFTER last character in word - the last space // of the last character isn't counted - off by one error // close file fclose (filePointer); return count; } // Print out the frequencies of the 10 most frequent words in the console int main (int argc, char *argv[]) { /* Step 1: Read in file and check for errors */ FILE *filePointer; filePointer = readFile (argv[1]); /* Step 2: Do a word count to prep for array size */ int count = wordCount (filePointer, argv[1]); printf ("Number of words is: %i\n", count); /* Step 3: Create a 2D array to store words in the file */ // open file to reset marker to beginning of file filePointer = fopen (argv[1], "r"); // store words in character array (each element in array = consecutive word) char allWords[count][100]; // 100 is an arbitrary size - max length of word int i,j; char temp; for (i = 0; i < count; i++) { for (j = 0; j < 100; j++) { // labels are used with goto statements, not loops in C temp = getc(filePointer); if ((temp == ' ' || temp == '\n' || temp == EOF) && (temp != '\'') ) { allWords[i][j] = '\0'; break; } else { allWords[i][j] = temp; } printf ("%c", allWords[i][j]); } printf ("\n"); } // close file fclose (filePointer); /* Step 4: Use a simple selection sort algorithm to sort 2D char array */ // PStep 1: Compare two char arrays, and if // (a) c1 > c2, return 2 // (b) c1 == c2, return 1 // (c) c1 < c2, return 0 qsort(allWords, count, sizeof(char[][]), pstrcmp); /* int k = 0, l = 0, m = 0; char currentMax, comparedElement; int max; // the largest element in the current 2D array int elementToSort = 0; // elementToSort determines the element to swap with starting from the left // Outer a iterates through number of swaps needed for (k = 0; k < count - 1; k++) { // times of swaps max = k; // max element set to k // Inner b iterates through successive elements to fish out the largest element for (m = k + 1; m < count - k; m++) { currentMax = allWords[k][l]; comparedElement = allWords[m][l]; // Inner c iterates through successive chars to set the max vars to the largest for (l = 0; (currentMax != '\0' || comparedElement != '\0'); l++) { if (currentMax > comparedElement) break; else if (currentMax < comparedElement) { max = m; currentMax = allWords[m][l]; break; } else if (currentMax == comparedElement) continue; } } // After max (count and string) is determined, perform swap with temp variable char swapTemp[1][20]; int y = 0; do { swapTemp[0][y] = allWords[elementToSort][y]; allWords[elementToSort][y] = allWords[max][y]; allWords[max][y] = swapTemp[0][y]; } while (swapTemp[0][y++] != '\0'); elementToSort++; } */ int a, b; for (a = 0; a < count; a++) { for (b = 0; (temp = allWords[a][b]) != '\0'; b++) { printf ("%c", temp); } printf ("\n"); } // Copy rows to different array and print results /* char arrayCopy [count][20]; int ac, ad; char tempa; for (ac = 0; ac < count; ac++) { for (ad = 0; (tempa = allWords[ac][ad]) != '\0'; ad++) { arrayCopy[ac][ad] = tempa; printf("%c", arrayCopy[ac][ad]); } printf("\n"); } */ /* Step 5: Create two additional arrays: (a) One in which each element contains unique words from char array (b) One which holds the count for the corresponding word in the other array */ /* Step 6: Sort the count array in decreasing order, and print the corresponding array element as well as word count in the console */ return 0; } // Perform housekeeping tasks like freeing up memory and closing file I'm really stuck on the selection sort algorithm. I'm currently using 2D arrays to represent strings, and that worked out fine, but when it came to sorting, using three level nested loops didn't seem to work, I tried to use qsort instead, but I don't fully understand that function as well. Constructive feedback and criticism greatly welcome (...and needed)!

Developer IT