negligible - Page 5 - Developer IT

Chrome does not re-draw properly on Windows 8

- by Akshat Mittal

There are a lot of problems with Chrome (24.0.1312.14 beta || But all this happened before update also) on Windows 8. Problems and explanations are listed below: Google Chrome re-draw time: When I switch tabs, the window retains the content of the previous tab and displays that even if I move my mouse, if only refreshes (re-draws) when there is a change on the webpage (like on hover) or I do a select all (or scroll). One thing to note is that the hover and select happens on the real page and not the retained image-like thing of the older webpage. Chrome is slow and laggy: Websites such as Facebook and Twitter (and more) have gone extremely laggy on Chrome (Win 8). When I was using Windows 7, I never experienced a lag or something. Also when using HTML-5 Websites, the transition (the -webkit-transition in CSS) goes extremely slow at times. Plugins Crash: Plugins like Flash Player, Shockwave Player, and more that are in-built into Chrome Crashes a lot, even when doing simple tasks like playing YouTube Videos, displaying ads or something. Chrome Crashes: Chrome has crashed over 100 times in the past month. Google Chrome just crashes randomly or I don't know the reason. Random Page crashes: Chrome results chrome://crash/(Copy-Paste this in address bar) on random pages even when the page is just loaded, I understand that this can happen on heavy HTML5 or JS websites but what about HTML only websites? Computer Freeze: Chrome sometimes, randomly, freezes my computer. Freeze in the sense, none of the other apps are also working. It's like the whole system freezes, I can not even switch to other apps. I am sure that this is because of Chrome since this happens only when Chrome is active. Most of the things above happens on Super User also, Super User never had any problem when using Chrome on Windows 7. UPDATE 1: @magicandre1981 Commented for trying to disable Hardware Acceleration. I tried it, it somewhat solved the problem but din't fix it. I am still experiencing all the above issues but less frequently (maybe because Chrome Restarted Completely) UPDATE 2: @avirk asked me to try a Stable Version of Chrome and Firefox, I din't experience any lag in Firefox, a little (negligible) lag in Chrome 22 (Maybe because its a new copy of Chrome, I haven't used it much). UPDATE 3: @NothingsImpossible said that He is also experiencing the same problem on Windows Server 2008! This seams to be a major issue now. He also said that GPU load is also high at the same time! Even I saw the same thing. UPDATE 4: Recently, Chrome updated to v24 Stable (I am using stable from a long time now). I was experiencing this problem a lot less in Chrome 23, but this is back in Chrome 24. Seams like Chrome 24 is the most affected from this bug, as this same problem was high in Chrome 24 beta also. UPDATE 5: Chrome was updated to v25 Stable. This problem is 99% Gone, it is still there in 1% of the cases. One such example is when I leave chrome inactive for a while with a few tabs open, the tabs go black and no activity can get them back to active state. If I open a new tab, the new tab is OK but the others are still black, I need to close all those tabs. UPDATE 6: Chrome updated to v27 Stable channel, this problem is nearly gone. This does happen occasionally, but not as frequent as in earlier versions of Chrome. UPDATE 7: I am on Chrome v35.0.1916.114 Stable, Windows 8.1 Pro Update 1. Some of the other problems appears to be back. Chrome is slow and laggy again. Re-draw time is getting worse. Is anybody else experiencing such issues? Does anybody have a solution to any of these?

Read the article

Of C# Iterators and Performance

- by James Michael Hare

Some of you reading this will be wondering, "what is an iterator" and think I'm locked in the world of C++. Nope, I'm talking C# iterators. No, not enumerators, iterators. So, for those of you who do not know what iterators are in C#, I will explain it in summary, and for those of you who know what iterators are but are curious of the performance impacts, I will explore that as well. Iterators have been around for a bit now, and there are still a bunch of people who don't know what they are or what they do. I don't know how many times at work I've had a code review on my code and have someone ask me, "what's that yield word do?" Basically, this post came to me as I was writing some extension methods to extend IEnumerable<T> -- I'll post some of the fun ones in a later post. Since I was filtering the resulting list down, I was using the standard C# iterator concept; but that got me wondering: what are the performance implications of using an iterator versus returning a new enumeration? So, to begin, let's look at a couple of methods. This is a new (albeit contrived) method called Every(...). The goal of this method is to access and enumeration and return every nth item in the enumeration (including the first). So Every(2) would return items 0, 2, 4, 6, etc. Now, if you wanted to write this in the traditional way, you may come up with something like this: public static IEnumerable<T> Every<T>(this IEnumerable<T> list, int interval) { List<T> newList = new List<T>(); int count = 0; foreach (var i in list) { if ((count++ % interval) == 0) { newList.Add(i); } } return newList; } So basically this method takes any IEnumerable<T> and returns a new IEnumerable<T> that contains every nth item. Pretty straight forward. The problem? Well, Every<T>(...) will construct a list containing every nth item whether or not you care. What happens if you were searching this result for a certain item and find that item after five tries? You would have generated the rest of the list for nothing. Enter iterators. This C# construct uses the yield keyword to effectively defer evaluation of the next item until it is asked for. This can be very handy if the evaluation itself is expensive or if there's a fair chance you'll never want to fully evaluate a list. We see this all the time in Linq, where many expressions are chained together to do complex processing on a list. This would be very expensive if each of these expressions evaluated their entire possible result set on call. Let's look at the same example function, this time using an iterator: public static IEnumerable<T> Every<T>(this IEnumerable<T> list, int interval) { int count = 0; foreach (var i in list) { if ((count++ % interval) == 0) { yield return i; } } } Notice it does not create a new return value explicitly, the only evidence of a return is the "yield return" statement. What this means is that when an item is requested from the enumeration, it will enter this method and evaluate until it either hits a yield return (in which case that item is returned) or until it exits the method or hits a yield break (in which case the iteration ends. Behind the scenes, this is all done with a class that the CLR creates behind the scenes that keeps track of the state of the iteration, so that every time the next item is asked for, it finds that item and then updates the current position so it knows where to start at next time. It doesn't seem like a big deal, does it? But keep in mind the key point here: it only returns items as they are requested. Thus if there's a good chance you will only process a portion of the return list and/or if the evaluation of each item is expensive, an iterator may be of benefit. This is especially true if you intend your methods to be chainable similar to the way Linq methods can be chained. For example, perhaps you have a List<int> and you want to take every tenth one until you find one greater than 10. We could write that as: List<int> someList = new List<int>(); // fill list here someList.Every(10).TakeWhile(i => i <= 10); Now is the difference more apparent? If we use the first form of Every that makes a copy of the list. It's going to copy the entire list whether we will need those items or not, that can be costly! With the iterator version, however, it will only take items from the list until it finds one that is > 10, at which point no further items in the list are evaluated. So, sounds neat eh? But what's the cost is what you're probably wondering. So I ran some tests using the two forms of Every above on lists varying from 5 to 500,000 integers and tried various things. Now, iteration isn't free. If you are more likely than not to iterate the entire collection every time, iterator has some very slight overhead: Copy vs Iterator on 100% of Collection (10,000 iterations) Collection Size Num Iterated Type Total ms 5 5 Copy 5 5 5 Iterator 5 50 50 Copy 28 50 50 Iterator 27 500 500 Copy 227 500 500 Iterator 247 5000 5000 Copy 2266 5000 5000 Iterator 2444 50,000 50,000 Copy 24,443 50,000 50,000 Iterator 24,719 500,000 500,000 Copy 250,024 500,000 500,000 Iterator 251,521 Notice that when iterating over the entire produced list, the times for the iterator are a little better for smaller lists, then getting just a slight bit worse for larger lists. In reality, given the number of items and iterations, the result is near negligible, but just to show that iterators come at a price. However, it should also be noted that the form of Every that returns a copy will have a left-over collection to garbage collect. However, if we only partially evaluate less and less through the list, the savings start to show and make it well worth the overhead. Let's look at what happens if you stop looking after 80% of the list: Copy vs Iterator on 80% of Collection (10,000 iterations) Collection Size Num Iterated Type Total ms 5 4 Copy 5 5 4 Iterator 5 50 40 Copy 27 50 40 Iterator 23 500 400 Copy 215 500 400 Iterator 200 5000 4000 Copy 2099 5000 4000 Iterator 1962 50,000 40,000 Copy 22,385 50,000 40,000 Iterator 19,599 500,000 400,000 Copy 236,427 500,000 400,000 Iterator 196,010 Notice that the iterator form is now operating quite a bit faster. But the savings really add up if you stop on average at 50% (which most searches would typically do): Copy vs Iterator on 50% of Collection (10,000 iterations) Collection Size Num Iterated Type Total ms 5 2 Copy 5 5 2 Iterator 4 50 25 Copy 25 50 25 Iterator 16 500 250 Copy 188 500 250 Iterator 126 5000 2500 Copy 1854 5000 2500 Iterator 1226 50,000 25,000 Copy 19,839 50,000 25,000 Iterator 12,233 500,000 250,000 Copy 208,667 500,000 250,000 Iterator 122,336 Now we see that if we only expect to go on average 50% into the results, we tend to shave off around 40% of the time. And this is only for one level deep. If we are using this in a chain of query expressions it only adds to the savings. So my recommendation? If you have a resonable expectation that someone may only want to partially consume your enumerable result, I would always tend to favor an iterator. The cost if they iterate the whole thing does not add much at all -- and if they consume only partially, you reap some really good performance gains. Next time I'll discuss some of my favorite extensions I've created to make development life a little easier and maintainability a little better.

Read the article

Is SHA-1 secure for password storage?

- by Tgr

Some people throw around remarks like "SHA-1 is broken" a lot, so I'm trying to understand what exactly that means. Let's assume I have a database of SHA-1 password hashes, and an attacker whith a state of the art SHA-1 breaking algorithm and a botnet with 100,000 machines gets access to it. (Having control over 100k home computers would mean they can do about 10^15 operations per second.) How much time would they need to find out the password of any one user? find out the password of a given user? find out the password of all users? find a way to log in as one of the users? find a way to log in as a specific user? How does that change if the passwords are salted? Does the method of salting (prefix, postfix, both, or something more complicated like xor-ing) matter? Here is my current understanding, after some googling. Please correct in the answers if I misunderstood something. If there is no salt, a rainbow attack will immediately find all passwords (except extremely long ones). If there is a sufficiently long random salt, the most effective way to find out the passwords is a brute force or dictionary attack. Neither collision nor preimage attacks are any help in finding out the actual password, so cryptographic attacks against SHA-1 are no help here. It doesn't even matter much what algorithm is used - one could even use MD5 or MD4 and the passwords would be just as safe (there is a slight difference because computing a SHA-1 hash is slower). To evaluate how safe "just as safe" is, let's assume that a single sha1 run takes 1000 operations and passwords contain uppercase, lowercase and digits (that is, 60 characters). That means the attacker can test 1015*60*60*24 / 1000 ~= 1017 potential password a day. For a brute force attack, that would mean testing all passwords up to 9 characters in 3 hours, up to 10 characters in a week, up to 11 characters in a year. (It takes 60 times as much for every additional character.) A dictionary attack is much, much faster (even an attacker with a single computer could pull it off in hours), but only finds weak passwords. To log in as a user, the attacker does not need to find out the exact password; it is enough to find a string that results in the same hash. This is called a first preimage attack. As far as I could find, there are no preimage attacks against SHA-1. (A bruteforce attack would take 2160 operations, which means our theoretical attacker would need 1030 years to pull it off. Limits of theoretical possibility are around 260 operations, at which the attack would take a few years.) There are preimage attacks against reduced versions of SHA-1 with negligible effect (for the reduced SHA-1 which uses 44 steps instead of 80, attack time is down from 2160 operations to 2157). There are collision attacks against SHA-1 which are well within theoretical possibility (the best I found brings the time down from 280 to 252), but those are useless against password hashes, even without salting. In short, storing passwords with SHA-1 seems perfectly safe. Did I miss something?

Read the article

Why is SQLite3 using covering indices instead of the indices I created?

- by Geoff

I have an extremely large database (contacts has ~3 billion entries, people has ~280 million entries, and the other tables have a negligible number of entries). Most other queries I've run are really fast. However, I've encountered a more complicated query that's really slow. I'm wondering if there's any way to speed this up. First of all, here is my schema: CREATE TABLE activities (id INTEGER PRIMARY KEY, name TEXT NOT NULL); CREATE TABLE contacts ( id INTEGER PRIMARY KEY, person1_id INTEGER NOT NULL, person2_id INTEGER NOT NULL, duration REAL NOT NULL, -- hours activity_id INTEGER NOT NULL -- FOREIGN_KEY(person1_id) REFERENCES people(id), -- FOREIGN_KEY(person2_id) REFERENCES people(id) ); CREATE TABLE people ( id INTEGER PRIMARY KEY, state_id INTEGER NOT NULL, county_id INTEGER NOT NULL, age INTEGER NOT NULL, gender TEXT NOT NULL, -- M or F income INTEGER NOT NULL -- FOREIGN_KEY(state_id) REFERENCES states(id) ); CREATE TABLE states ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, abbreviation TEXT NOT NULL ); CREATE INDEX activities_name_index on activities(name); CREATE INDEX contacts_activity_id_index on contacts(activity_id); CREATE INDEX contacts_duration_index on contacts(duration); CREATE INDEX contacts_person1_id_index on contacts(person1_id); CREATE INDEX contacts_person2_id_index on contacts(person2_id); CREATE INDEX people_age_index on people(age); CREATE INDEX people_county_id_index on people(county_id); CREATE INDEX people_gender_index on people(gender); CREATE INDEX people_income_index on people(income); CREATE INDEX people_state_id_index on people(state_id); CREATE INDEX states_abbreviation_index on states(abbreviation); CREATE INDEX states_name_index on states(name); Note that I've created an index on every column in the database. I don't care about the size of the database; speed is all I care about. Here's an example of a query that, as expected, runs almost instantly: SELECT count(*) FROM people, states WHERE people.state_id=states.id and states.abbreviation='IA'; Here's the troublesome query: SELECT * FROM contacts WHERE rowid IN (SELECT contacts.rowid FROM contacts, people, states WHERE contacts.person1_id=people.id AND people.state_id=states.id AND states.name='Kansas' INTERSECT SELECT contacts.rowid FROM contacts, people, states WHERE contacts.person2_id=people.id AND people.state_id=states.id AND states.name='Missouri'); Now, what I think would happen is that each subquery would use each relevant index I've created to speed this up. However, when I show the query plan, I see this: sqlite> EXPLAIN QUERY PLAN SELECT * FROM contacts WHERE rowid IN (SELECT contacts.rowid FROM contacts, people, states WHERE contacts.person1_id=people.id AND people.state_id=states.id AND states.name='Kansas' INTERSECT SELECT contacts.rowid FROM contacts, people, states WHERE contacts.person2_id=people.id AND people.state_id=states.id AND states.name='Missouri'); 0|0|0|SEARCH TABLE contacts USING INTEGER PRIMARY KEY (rowid=?) (~25 rows) 0|0|0|EXECUTE LIST SUBQUERY 1 2|0|2|SEARCH TABLE states USING COVERING INDEX states_name_index (name=?) (~1 rows) 2|1|1|SEARCH TABLE people USING COVERING INDEX people_state_id_index (state_id=?) (~5569556 rows) 2|2|0|SEARCH TABLE contacts USING COVERING INDEX contacts_person1_id_index (person1_id=?) (~12 rows) 3|0|2|SEARCH TABLE states USING COVERING INDEX states_name_index (name=?) (~1 rows) 3|1|1|SEARCH TABLE people USING COVERING INDEX people_state_id_index (state_id=?) (~5569556 rows) 3|2|0|SEARCH TABLE contacts USING COVERING INDEX contacts_person2_id_index (person2_id=?) (~12 rows) 1|0|0|COMPOUND SUBQUERIES 2 AND 3 USING TEMP B-TREE (INTERSECT) In fact, if I show the query plan for the first query I posted, I get this: sqlite> EXPLAIN QUERY PLAN SELECT count(*) FROM people, states WHERE people.state_id=states.id and states.abbreviation='IA'; 0|0|1|SEARCH TABLE states USING COVERING INDEX states_abbreviation_index (abbreviation=?) (~1 rows) 0|1|0|SEARCH TABLE people USING COVERING INDEX people_state_id_index (state_id=?) (~5569556 rows) Why is SQLite using covering indices instead of the indices I created? Shouldn't the search in the people table be able to happen in log(n) time given state_id which in turn is found in log(n) time?

Read the article

URL Rewrite – Multiple domains under one site. Part II

- by OWScott

I believe I have it … I’ve been meaning to put together the ultimate outgoing rule for hosting multiple domains under one site. I finally sat down this week and setup a few test cases, and created one rule to rule them all. In Part I of this two part series, I covered the incoming rule necessary to host a site in a subfolder of a website, while making it appear as if it’s in the root of the site. Part II won’t work without applying Part I first, so if you haven’t read it, I encourage you to read it now. However, the incoming rule by itself doesn’t address everything. Here’s the problem … Let’s say that we host www.site2.com in a subfolder called site2, off of masterdomain.com. This is the same example I used in Part I. Using an incoming rewrite rule, we are able to make a request to www.site2.com even though the site is really in the /site2 folder. The gotcha comes with any type of path that ASP.NET generates (I’m sure other scripting technologies could do the same too). ASP.NET thinks that the path to the root of the site is /site2, but the URL is /. See the issue? If ASP.NET generates a path or a redirect for us, it will always add /site2 to the URL. That results in a path that looks something like www.site2.com/site2. In Part I, I mentioned that you should add a condition where “{PATH_INFO} ‘does not match’ /site2”. That allows www.site2.com/site2 and www.site2.com to both function the same. This allows the site to always work, but if you want to hide /site2 in the URL, you need to take it one step further. One way to address this is in your code. Ultimately this is the best bet. Ruslan Yakushev has a great article on a few considerations that you can address in code. I recommend giving that serious consideration. Additionally, if you have upgraded to ASP.NET 3.5 SP1 or greater, it takes care of some of the references automatically for you. However, what if you inherit an existing application? Or you can’t easily go through your existing site and make the code changes? If this applies to you, read on. That’s where URL Rewrite 2.0 comes in. With URL Rewrite 2.0, you can create an outgoing rule that will remove the /site2 before the page is sent back to the user. This means that you can take an existing application, host it in a subfolder of your site, and ensure that the URL never reveals that it’s in a subfolder. Performance Considerations Performance overhead is something to be mindful of. These outbound rules aren’t simply changing the server variables. The first rule I’ll cover below needs to parse the HTML body and pull out the path (i.e. /site2) on the way through. This will add overhead, possibly significant if you have large pages and a busy site. In other words, your mileage may vary and you may need to test to see the impact that these rules have. Don’t worry too much though. For many sites, the performance impact is negligible. So, how do we do it? Creating the Outgoing Rule There are really two things to keep in mind. First, ASP.NET applications frequently generate a URL that adds the /site2 back into the URL. In addition to URLs, they can be in form elements, img elements and the like. The goal is to find all of those situations and rewrite it on the way out. Let’s call this the ‘URL problem’. Second, and similarly, ASP.NET can send a LOCATION redirect that causes a redirect back to another page. Again, ASP.NET isn’t aware of the different URL and it will add the /site2 to the redirect. Form Authentication is a good example on when this occurs. Try to password protect a site running from a subfolder using forms auth and you’ll quickly find that the URL becomes www.site2.com/site2 again. Let’s term this the ‘redirect problem’. Solving the URL Problem – Outgoing Rule #1 Let’s create a rule that removes the /site2 from any URL. We want to remove it from relative URLs like /site2/something, or absolute URLs like http://www.site2.com/site2/something. Most URLs that ASP.NET creates will be relative URLs, but I figure that there may be some applications that piece together a full URL, so we might as well expect that situation. Let’s get started. First, create a new outbound rule. You can create the rule within the /site2 folder which will reduce the performance impact of the rule. Just a reminder that incoming rules for this situation won’t work in a subfolder … but outgoing rules will. Give it a name that makes sense to you, for example “Outgoing – URL paths”. Precondition. If you place the rule in the subfolder, it will only run for that site and folder, so there isn’t need for a precondition. Run it for all requests. If you place it in the root of the site, you may want to create a precondition for HTTP_HOST = ^(www\.)?site2\.com$. For the Match section, there are a few things to consider. For performance reasons, it’s best to match the least amount of elements that you need to accomplish the task. For my test cases, I just needed to rewrite the <a /> tag, but you may need to rewrite any number of HTML elements. Note that as long as you have the exclude /site2 rule in your incoming rule as I described in Part I, some elements that don’t show their URL—like your images—will work without removing the /site2 from them. That reduces the processing needed for this rule. Leave the “matching scope” at “Response” and choose the elements that you want to change. Set the pattern to “^(?:site2|(.*//[_a-zA-Z0-9-\.]*)?/site2)(.*)”. Make sure to replace ‘site2’ with your subfolder name in both places. Yes, I realize this is a pretty messy looking rule, but it handles a few situations. This rule will handle the following situations correctly: Original Rewritten using {R:1}{R:2} http://www.site2.com/site2/default.aspx http://www.site2.com/default.aspx http://www.site2.com/folder1/site2/default.aspx Won’t rewrite since it’s a sub-sub folder /site2/default.aspx /default.aspx site2/default.aspx /default.aspx /folder1/site2/default.aspx Won’t rewrite since it’s a sub-sub folder. For the conditions section, you can leave that be. Finally, for the rule, set the Action Type to “Rewrite” and set the Value to “{R:1}{R:2}”. The {R:1} and {R:2} are back references to the sections within parentheses. In other words, in http://domain.com/site2/something, {R:1} will be http://domain.com and {R:2} will be /something. If you view your rule from your web.config file (or applicationHost.config if it’s a global rule), it should look like this: <rule name="Outgoing - URL paths" enabled="true"> <match filterByTags="A" pattern="^(?:site2|(.*//[_a-zA-Z0-9-\.]*)?/site2)(.*)" /> <action type="Rewrite" value="{R:1}{R:2}" /> </rule> Solving the Redirect Problem Outgoing Rule #2 The second issue that we can run into is with a client-side redirect. This is triggered by a LOCATION response header that is sent to the client. Forms authentication is a common example. To reproduce this, password protect your subfolder and watch how it redirects and adds the subfolder path back in. Notice in my test case the extra paths: http://site2.com/site2/login.aspx?ReturnUrl=%2fsite2%2fdefault.aspx I want to remove /site2 from both the URL and the ReturnUrl querystring value. For semi-readability, let’s do this in 2 separate rules, one for the URL and one for the querystring. Create a second rule. As with the previous rule, it can be created in the /site2 subfolder. In the URL Rewrite wizard, select Outbound rules –> “Blank Rule”. Fill in the following information: Name response_location URL Precondition Don’t set Match: Matching Scope Server Variable Match: Variable Name RESPONSE_LOCATION Match: Pattern ^(?:site2|(.*//[_a-zA-Z0-9-\.]*)?/site2)(.*) Conditions Don’t set Action Type Rewrite Action Properties {R:1}{R:2} It should end up like so: <rule name="response_location URL"> <match serverVariable="RESPONSE_LOCATION" pattern="^(?:site2|(.*//[_a-zA-Z0-9-\.]*)?/site2)(.*)" /> <action type="Rewrite" value="{R:1}{R:2}" /> </rule> Outgoing Rule #3 Outgoing Rule #2 only takes care of the URL path, and not the querystring path. Let’s create one final rule to take care of the path in the querystring to ensure that ReturnUrl=%2fsite2%2fdefault.aspx gets rewritten to ReturnUrl=%2fdefault.aspx. The %2f is the HTML encoding for forward slash (/). Create a rule like the previous one, but with the following settings: Name response_location querystring Precondition Don’t set Match: Matching Scope Server Variable Match: Variable Name RESPONSE_LOCATION Match: Pattern (.*)%2fsite2(.*) Conditions Don’t set Action Type Rewrite Action Properties {R:1}{R:2} The config should look like this: <rule name="response_location querystring"> <match serverVariable="RESPONSE_LOCATION" pattern="(.*)%2fsite2(.*)" /> <action type="Rewrite" value="{R:1}{R:2}" /> </rule> It’s possible to squeeze the last two rules into one, but it gets kind of confusing so I felt that it’s better to show it as two separate rules. Summary With the rules covered in these two parts, we’re able to have a site in a subfolder and make it appear as if it’s in the root of the site. Not only that, we can overcome automatic redirecting that is caused by ASP.NET, other scripting technologies, and especially existing applications. Following is an example of the incoming and outgoing rules necessary for a site called www.site2.com hosted in a subfolder called /site2. Remember that the outgoing rules can be placed in the /site2 folder instead of the in the root of the site. <rewrite> <rules> <rule name="site2.com in a subfolder" enabled="true" stopProcessing="true"> <match url=".*" /> <conditions logicalGrouping="MatchAll" trackAllCaptures="false"> <add input="{HTTP_HOST}" pattern="^(www\.)?site2\.com$" /> <add input="{PATH_INFO}" pattern="^/site2($|/)" negate="true" /> </conditions> <action type="Rewrite" url="/site2/{R:0}" /> </rule> </rules> <outboundRules> <rule name="Outgoing - URL paths" enabled="true"> <match filterByTags="A" pattern="^(?:site2|(.*//[_a-zA-Z0-9-\.]*)?/site2)(.*)" /> <action type="Rewrite" value="{R:1}{R:2}" /> </rule> <rule name="response_location URL"> <match serverVariable="RESPONSE_LOCATION" pattern="^(?:site2|(.*//[_a-zA-Z0-9-\.]*)?/site2)(.*)" /> <action type="Rewrite" value="{R:1}{R:2}" /> </rule> <rule name="response_location querystring"> <match serverVariable="RESPONSE_LOCATION" pattern="(.*)%2fsite2(.*)" /> <action type="Rewrite" value="{R:1}{R:2}" /> </rule> </outboundRules> </rewrite> If you run into any situations that aren’t caught by these rules, please let me know so I can update this to be as complete as possible. Happy URL Rewriting!

Read the article

Abstracting functionality

- by Ralf Westphal

Originally posted on: http://geekswithblogs.net/theArchitectsNapkin/archive/2014/08/22/abstracting-functionality.aspxWhat is more important than data? Functionality. Yes, I strongly believe we should switch to a functionality over data mindset in programming. Or actually switch back to it. Focus on functionality Functionality once was at the core of software development. Back when algorithms were the first thing you heard about in CS classes. Sure, data structures, too, were important - but always from the point of view of algorithms. (Niklaus Wirth gave one of his books the title “Algorithms + Data Structures” instead of “Data Structures + Algorithms” for a reason.) The reason for the focus on functionality? Firstly, because software was and is about doing stuff. Secondly because sufficient performance was hard to achieve, and only thirdly memory efficiency. But then hardware became more powerful. That gave rise to a new mindset: object orientation. And with it functionality was devalued. Data took over its place as the most important aspect. Now discussions revolved around structures motivated by data relationships. (John Beidler gave his book the title “Data Structures and Algorithms: An Object Oriented Approach” instead of the other way around for a reason.) Sure, this data could be embellished with functionality. But nevertheless functionality was second. When you look at (domain) object models what you mostly find is (domain) data object models. The common object oriented approach is: data aka structure over functionality. This is true even for the most modern modeling approaches like Domain Driven Design. Look at the literature and what you find is recommendations on how to get data structures right: aggregates, entities, value objects. I´m not saying this is what object orientation was invented for. But I´m saying that´s what I happen to see across many teams now some 25 years after object orientation became mainstream through C++, Delphi, and Java. But why should we switch back? Because software development cannot become truly agile with a data focus. The reason for that lies in what customers need first: functionality, behavior, operations. To be clear, that´s not why software is built. The purpose of software is to be more efficient than the alternative. Money mainly is spent to get a certain level of quality (e.g. performance, scalability, security etc.). But without functionality being present, there is nothing to work on the quality of. What customers want is functionality of a certain quality. ASAP. And tomorrow new functionality needs to be added, existing functionality needs to be changed, and quality needs to be increased. No customer ever wanted data or structures. Of course data should be processed. Data is there, data gets generated, transformed, stored. But how the data is structured for this to happen efficiently is of no concern to the customer. Ask a customer (or user) whether she likes the data structured this way or that way. She´ll say, “I don´t care.” But ask a customer (or user) whether he likes the functionality and its quality this way or that way. He´ll say, “I like it” (or “I don´t like it”). Build software incrementally From this very natural focus of customers and users on functionality and its quality follows we should develop software incrementally. That´s what Agility is about. Deliver small increments quickly and often to get frequent feedback. That way less waste is produced, and learning can take place much easier (on the side of the customer as well as on the side of developers). An increment is some added functionality or quality of functionality.[1] So as it turns out, Agility is about functionality over whatever. But software developers’ thinking is still stuck in the object oriented mindset of whatever over functionality. Bummer. I guess that (at least partly) explains why Agility always hits a glass ceiling in projects. It´s a clash of mindsets, of cultures. Driving software development by demanding small increases in functionality runs against thinking about software as growing (data) structures sprinkled with functionality. (Excuse me, if this sounds a bit broad-brush. But you get my point.) The need for abstraction In the end there need to be data structures. Of course. Small and large ones. The phrase functionality over data does not deny that. It´s not functionality instead of data or something. It´s just over, i.e. functionality should be thought of first. It´s a tad more important. It´s what the customer wants. That´s why we need a way to design functionality. Small and large. We need to be able to think about functionality before implementing it. We need to be able to reason about it among team members. We need to be able to communicate our mental models of functionality not just by speaking about them, but also on paper. Otherwise reasoning about it does not scale. We learned thinking about functionality in the small using flow charts, Nassi-Shneiderman diagrams, pseudo code, or UML sequence diagrams. That´s nice and well. But it does not scale. You can use these tools to describe manageable algorithms. But it does not work for the functionality triggered by pressing the “1-Click Order” on an amazon product page for example. There are several reasons for that, I´d say. Firstly, the level of abstraction over code is negligible. It´s essentially non-existent. Drawing a flow chart or writing pseudo code or writing actual code is very, very much alike. All these tools are about control flow like code is.[2] In addition all tools are computationally complete. They are about logic which is expressions and especially control statements. Whatever you code in Java you can fully (!) describe using a flow chart. And then there is no data. They are about control flow and leave out the data altogether. Thus data mostly is assumed to be global. That´s shooting yourself in the foot, as I hope you agree. Even if it´s functionality over data that does not mean “don´t think about data”. Right to the contrary! Functionality only makes sense with regard to data. So data needs to be in the picture right from the start - but it must not dominate the thinking. The above tools fail on this. Bottom line: So far we´re unable to reason in a scalable and abstract manner about functionality. That´s why programmers are so driven to start coding once they are presented with a problem. Programming languages are the only tool they´ve learned to use to reason about functional solutions. Or, well, there might be exceptions. Mathematical notation and SQL may have come to your mind already. Indeed they are tools on a higher level of abstraction than flow charts etc. That´s because they are declarative and not computationally complete. They leave out details - in order to deliver higher efficiency in devising overall solutions. We can easily reason about functionality using mathematics and SQL. That´s great. Except for that they are domain specific languages. They are not general purpose. (And they don´t scale either, I´d say.) Bummer. So to be more precise we need a scalable general purpose tool on a higher than code level of abstraction not neglecting data. Enter: Flow Design. Abstracting functionality using data flows I believe the solution to the problem of abstracting functionality lies in switching from control flow to data flow. Data flow very naturally is not about logic details anymore. There are no expressions and no control statements anymore. There are not even statements anymore. Data flow is declarative by nature. With data flow we get rid of all the limiting traits of former approaches to modeling functionality. In addition, nomen est omen, data flows include data in the functionality picture. With data flows, data is visibly flowing from processing step to processing step. Control is not flowing. Control is wherever it´s needed to process data coming in. That´s a crucial difference and needs some rewiring in your head to be fully appreciated.[2] Since data flows are declarative they are not the right tool to describe algorithms, though, I´d say. With them you don´t design functionality on a low level. During design data flow processing steps are black boxes. They get fleshed out during coding. Data flow design thus is more coarse grained than flow chart design. It starts on a higher level of abstraction - but then is not limited. By nesting data flows indefinitely you can design functionality of any size, without losing sight of your data. Data flows scale very well during design. They can be used on any level of granularity. And they can easily be depicted. Communicating designs using data flows is easy and scales well, too. The result of functional design using data flows is not algorithms (too low level), but processes. Think of data flows as descriptions of industrial production lines. Data as material runs through a number of processing steps to be analyzed, enhances, transformed. On the top level of a data flow design might be just one processing step, e.g. “execute 1-click order”. But below that are arbitrary levels of flows with smaller and smaller steps. That´s not layering as in “layered architecture”, though. Rather it´s a stratified design à la Abelson/Sussman. Refining data flows is not your grandpa´s functional decomposition. That was rooted in control flows. Refining data flows does not suffer from the limits of functional decomposition against which object orientation was supposed to be an antidote. Summary I´ve been working exclusively with data flows for functional design for the past 4 years. It has changed my life as a programmer. What once was difficult is now easy. And, no, I´m not using Clojure or F#. And I´m not a async/parallel execution buff. Designing the functionality of increments using data flows works great with teams. It produces design documentation which can easily be translated into code - in which then the smallest data flow processing steps have to be fleshed out - which is comparatively easy. Using a systematic translation approach code can mirror the data flow design. That way later on the design can easily be reproduced from the code if need be. And finally, data flow designs play well with object orientation. They are a great starting point for class design. But that´s a story for another day. To me data flow design simply is one of the missing links of systematic lightweight software design. There are also other artifacts software development can produce to get feedback, e.g. process descriptions, test cases. But customers can be delighted more easily with code based increments in functionality. ? No, I´m not talking about the endless possibilities this opens for parallel processing. Data flows are useful independently of multi-core processors and Actor-based designs. That´s my whole point here. Data flows are good for reasoning and evolvability. So forget about any special frameworks you might need to reap benefits from data flows. None are necessary. Translating data flow designs even into plain of Java is possible. ?

Read the article

Implementing an async "read all currently available data from stream" operation

- by Jon

I recently provided an answer to this question: C# - Realtime console output redirection. As often happens, explaining stuff (here "stuff" was how I tackled a similar problem) leads you to greater understanding and/or, as is the case here, "oops" moments. I realized that my solution, as implemented, has a bug. The bug has little practical importance, but it has an extremely large importance to me as a developer: I can't rest easy knowing that my code has the potential to blow up. Squashing the bug is the purpose of this question. I apologize for the long intro, so let's get dirty. I wanted to build a class that allows me to receive input from a console's standard output Stream. Console output streams are of type FileStream; the implementation can cast to that, if needed. There is also an associated StreamReader already present to leverage. There is only one thing I need to implement in this class to achieve my desired functionality: an async "read all the data available this moment" operation. Reading to the end of the stream is not viable because the stream will not end unless the process closes the console output handle, and it will not do that because it is interactive and expecting input before continuing. I will be using that hypothetical async operation to implement event-based notification, which will be more convenient for my callers. The public interface of the class is this: public class ConsoleAutomator { public event EventHandler<ConsoleOutputReadEventArgs> StandardOutputRead; public void StartSendingEvents(); public void StopSendingEvents(); } StartSendingEvents and StopSendingEvents do what they advertise; for the purposes of this discussion, we can assume that events are always being sent without loss of generality. The class uses these two fields internally: protected readonly StringBuilder inputAccumulator = new StringBuilder(); protected readonly byte[] buffer = new byte[256]; The functionality of the class is implemented in the methods below. To get the ball rolling: public void StartSendingEvents(); { this.stopAutomation = false; this.BeginReadAsync(); } To read data out of the Stream without blocking, and also without requiring a carriage return char, BeginRead is called: protected void BeginReadAsync() { if (!this.stopAutomation) { this.StandardOutput.BaseStream.BeginRead( this.buffer, 0, this.buffer.Length, this.ReadHappened, null); } } The challenging part: BeginRead requires using a buffer. This means that when reading from the stream, it is possible that the bytes available to read ("incoming chunk") are larger than the buffer. Remember that the goal here is to read all of the chunk and call event subscribers exactly once for each chunk. To this end, if the buffer is full after EndRead, we don't send its contents to subscribers immediately but instead append them to a StringBuilder. The contents of the StringBuilder are only sent back whenever there is no more to read from the stream. private void ReadHappened(IAsyncResult asyncResult) { var bytesRead = this.StandardOutput.BaseStream.EndRead(asyncResult); if (bytesRead == 0) { this.OnAutomationStopped(); return; } var input = this.StandardOutput.CurrentEncoding.GetString( this.buffer, 0, bytesRead); this.inputAccumulator.Append(input); if (bytesRead < this.buffer.Length) { this.OnInputRead(); // only send back if we 're sure we got it all } this.BeginReadAsync(); // continue "looping" with BeginRead } After any read which is not enough to fill the buffer (in which case we know that there was no more data to be read during the last read operation), all accumulated data is sent to the subscribers: private void OnInputRead() { var handler = this.StandardOutputRead; if (handler == null) { return; } handler(this, new ConsoleOutputReadEventArgs(this.inputAccumulator.ToString())); this.inputAccumulator.Clear(); } (I know that as long as there are no subscribers the data gets accumulated forever. This is a deliberate decision). The good This scheme works almost perfectly: Async functionality without spawning any threads Very convenient to the calling code (just subscribe to an event) Never more than one event for each time data is available to be read Is almost agnostic to the buffer size The bad That last almost is a very big one. Consider what happens when there is an incoming chunk with length exactly equal to the size of the buffer. The chunk will be read and buffered, but the event will not be triggered. This will be followed up by a BeginRead that expects to find more data belonging to the current chunk in order to send it back all in one piece, but... there will be no more data in the stream. In fact, as long as data is put into the stream in chunks with length exactly equal to the buffer size, the data will be buffered and the event will never be triggered. This scenario may be highly unlikely to occur in practice, especially since we can pick any number for the buffer size, but the problem is there. Solution? Unfortunately, after checking the available methods on FileStream and StreamReader, I can't find anything which lets me peek into the stream while also allowing async methods to be used on it. One "solution" would be to have a thread wait on a ManualResetEvent after the "buffer filled" condition is detected. If the event is not signaled (by the async callback) in a small amount of time, then more data from the stream will not be forthcoming and the data accumulated so far should be sent to subscribers. However, this introduces the need for another thread, requires thread synchronization, and is plain inelegant. Specifying a timeout for BeginRead would also suffice (call back into my code every now and then so I can check if there's data to be sent back; most of the time there will not be anything to do, so I expect the performance hit to be negligible). But it looks like timeouts are not supported in FileStream. Since I imagine that async calls with timeouts are an option in bare Win32, another approach might be to PInvoke the hell out of the problem. But this is also undesirable as it will introduce complexity and simply be a pain to code. Is there an elegant way to get around the problem? Thanks for being patient enough to read all of this. Update: I definitely did not communicate the scenario well in my initial writeup. I have since revised the writeup quite a bit, but to be extra sure: The question is about how to implement an async "read all the data available this moment" operation. My apologies to the people who took the time to read and answer without me making my intent clear enough.

Read the article

What is the fastest cyclic synchronization in Java (ExecutorService vs. CyclicBarrier vs. X)?

- by Alex Dunlop

Which Java synchronization construct is likely to provide the best performance for a concurrent, iterative processing scenario with a fixed number of threads like the one outlined below? After experimenting on my own for a while (using ExecutorService and CyclicBarrier) and being somewhat surprised by the results, I would be grateful for some expert advice and maybe some new ideas. Existing questions here do not seem to focus primarily on performance, hence this new one. Thanks in advance! The core of the app is a simple iterative data processing algorithm, parallelized to the spread the computational load across 8 cores on a Mac Pro, running OS X 10.6 and Java 1.6.0_07. The data to be processed is split into 8 blocks and each block is fed to a Runnable to be executed by one of a fixed number of threads. Parallelizing the algorithm was fairly straightforward, and it functionally works as desired, but its performance is not yet what I think it could be. The app seems to spend a lot of time in system calls synchronizing, so after some profiling I wonder whether I selected the most appropriate synchronization mechanism(s). A key requirement of the algorithm is that it needs to proceed in stages, so the threads need to sync up at the end of each stage. The main thread prepares the work (very low overhead), passes it to the threads, lets them work on it, then proceeds when all threads are done, rearranges the work (again very low overhead) and repeats the cycle. The machine is dedicated to this task, Garbage Collection is minimized by using per-thread pools of pre-allocated items, and the number of threads can be fixed (no incoming requests or the like, just one thread per CPU core). V1 - ExecutorService My first implementation used an ExecutorService with 8 worker threads. The program creates 8 tasks holding the work and then lets them work on it, roughly like this: // create one thread per CPU executorService = Executors.newFixedThreadPool( 8 ); ... // now process data in cycles while( ...) { // package data into 8 work items ... // create one Callable task per work item ... // submit the Callables to the worker threads executorService.invokeAll( taskList ); } This works well functionally (it does what it should), and for very large work items indeed all 8 CPUs become highly loaded, as much as the processing algorithm would be expected to allow (some work items will finish faster than others, then idle). However, as the work items become smaller (and this is not really under the program's control), the user CPU load shrinks dramatically: blocksize | system | user | cycles/sec 256k 1.8% 85% 1.30 64k 2.5% 77% 5.6 16k 4% 64% 22.5 4096 8% 56% 86 1024 13% 38% 227 256 17% 19% 420 64 19% 17% 948 16 19% 13% 1626 Legend: - block size = size of the work item (= computational steps) - system = system load, as shown in OS X Activity Monitor (red bar) - user = user load, as shown in OS X Activity Monitor (green bar) - cycles/sec = iterations through the main while loop, more is better The primary area of concern here is the high percentage of time spent in the system, which appears to be driven by thread synchronization calls. As expected, for smaller work items, ExecutorService.invokeAll() will require relatively more effort to sync up the threads versus the amount of work being performed in each thread. But since ExecutorService is more generic than it would need to be for this use case (it can queue tasks for threads if there are more tasks than cores), I though maybe there would be a leaner synchronization construct. V2 - CyclicBarrier The next implementation used a CyclicBarrier to sync up the threads before receiving work and after completing it, roughly as follows: main() { // create the barrier barrier = new CyclicBarrier( 8 + 1 ); // create Runable for thread, tell it about the barrier Runnable task = new WorkerThreadRunnable( barrier ); // start the threads for( int i = 0; i < 8; i++ ) { // create one thread per core new Thread( task ).start(); } while( ... ) { // tell threads about the work ... // N threads + this will call await(), then system proceeds barrier.await(); // ... now worker threads work on the work... // wait for worker threads to finish barrier.await(); } } class WorkerThreadRunnable implements Runnable { CyclicBarrier barrier; WorkerThreadRunnable( CyclicBarrier barrier ) { this.barrier = barrier; } public void run() { while( true ) { // wait for work barrier.await(); // do the work ... // wait for everyone else to finish barrier.await(); } } } Again, this works well functionally (it does what it should), and for very large work items indeed all 8 CPUs become highly loaded, as before. However, as the work items become smaller, the load still shrinks dramatically: blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.7% 78% 6.1 16k 5.5% 52% 25 4096 9% 29% 64 1024 11% 15% 117 256 12% 8% 169 64 12% 6.5% 285 16 12% 6% 377 For large work items, synchronization is negligible and the performance is identical to V1. But unexpectedly, the results of the (highly specialized) CyclicBarrier seem MUCH WORSE than those for the (generic) ExecutorService: throughput (cycles/sec) is only about 1/4th of V1. A preliminary conclusion would be that even though this seems to be the advertised ideal use case for CyclicBarrier, it performs much worse than the generic ExecutorService. V3 - Wait/Notify + CyclicBarrier It seemed worth a try to replace the first cyclic barrier await() with a simple wait/notify mechanism: main() { // create the barrier // create Runable for thread, tell it about the barrier // start the threads while( ... ) { // tell threads about the work // for each: workerThreadRunnable.setWorkItem( ... ); // ... now worker threads work on the work... // wait for worker threads to finish barrier.await(); } } class WorkerThreadRunnable implements Runnable { CyclicBarrier barrier; @NotNull volatile private Callable<Integer> workItem; WorkerThreadRunnable( CyclicBarrier barrier ) { this.barrier = barrier; this.workItem = NO_WORK; } final protected void setWorkItem( @NotNull final Callable<Integer> callable ) { synchronized( this ) { workItem = callable; notify(); } } public void run() { while( true ) { // wait for work while( true ) { synchronized( this ) { if( workItem != NO_WORK ) break; try { wait(); } catch( InterruptedException e ) { e.printStackTrace(); } } } // do the work ... // wait for everyone else to finish barrier.await(); } } } Again, this works well functionally (it does what it should). blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.4% 80% 6.3 16k 4.6% 60% 30.1 4096 8.6% 41% 98.5 1024 12% 23% 202 256 14% 11.6% 299 64 14% 10.0% 518 16 14.8% 8.7% 679 The throughput for small work items is still much worse than that of the ExecutorService, but about 2x that of the CyclicBarrier. Eliminating one CyclicBarrier eliminates half of the gap. V4 - Busy wait instead of wait/notify Since this app is the primary one running on the system and the cores idle anyway if they're not busy with a work item, why not try a busy wait for work items in each thread, even if that spins the CPU needlessly. The worker thread code changes as follows: class WorkerThreadRunnable implements Runnable { // as before final protected void setWorkItem( @NotNull final Callable<Integer> callable ) { workItem = callable; } public void run() { while( true ) { // busy-wait for work while( true ) { if( workItem != NO_WORK ) break; } // do the work ... // wait for everyone else to finish barrier.await(); } } } Also works well functionally (it does what it should). blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.2% 81% 6.3 16k 4.2% 62% 33 4096 7.5% 40% 107 1024 10.4% 23% 210 256 12.0% 12.0% 310 64 11.9% 10.2% 550 16 12.2% 8.6% 741 For small work items, this increases throughput by a further 10% over the CyclicBarrier + wait/notify variant, which is not insignificant. But it is still much lower-throughput than V1 with the ExecutorService. V5 - ? So what is the best synchronization mechanism for such a (presumably not uncommon) problem? I am weary of writing my own sync mechanism to completely replace ExecutorService (assuming that it is too generic and there has to be something that can still be taken out to make it more efficient). It is not my area of expertise and I'm concerned that I'd spend a lot of time debugging it (since I'm not even sure my wait/notify and busy wait variants are correct) for uncertain gain. Any advice would be greatly appreciated.

Read the article

Implementing a robust async stream reader

- by Jon

I recently provided an answer to this question: C# - Realtime console output redirection. As often happens, explaining stuff (here "stuff" was how I tackled a similar problem) leads you to greater understanding and/or, as is the case here, "oops" moments. I realized that my solution, as implemented, has a bug. The bug has little practical importance, but it has an extremely large importance to me as a developer: I can't rest easy knowing that my code has the potential to blow up. Squashing the bug is the purpose of this question. I apologize for the long intro, so let's get dirty. I wanted to build a class that allows me to receive input from a Stream in an event-based manner. The stream, in my scenario, is guaranteed to be a FileStream and there is also an associated StreamReader already present to leverage. The public interface of the class is this: public class MyStreamManager { public event EventHandler<ConsoleOutputReadEventArgs> StandardOutputRead; public void StartSendingEvents(); public void StopSendingEvents(); } Obviously this specific scenario has to do with a console's standard output, but that is a detail and does not play an important role. StartSendingEvents and StopSendingEvents do what they advertise; for the purposes of this discussion, we can assume that events are always being sent without loss of generality. The class uses these two fields internally: protected readonly StringBuilder inputAccumulator = new StringBuilder(); protected readonly byte[] buffer = new byte[256]; The functionality of the class is implemented in the methods below. To get the ball rolling: public void StartSendingEvents(); { this.stopAutomation = false; this.BeginReadAsync(); } To read data out of the Stream without blocking, and also without requiring a carriage return char, BeginRead is called: protected void BeginReadAsync() { if (!this.stopAutomation) { this.StandardOutput.BaseStream.BeginRead( this.buffer, 0, this.buffer.Length, this.ReadHappened, null); } } The challenging part: BeginRead requires using a buffer. This means that when reading from the stream, it is possible that the bytes available to read ("incoming chunk") are larger than the buffer. Since we are only handing off data from the stream to a consumer, and that consumer may well have inside knowledge about the size and/or format of these chunks, I want to call event subscribers exactly once for each chunk. Otherwise the abstraction breaks down and the subscribers have to buffer the incoming data and reconstruct the chunks themselves using said knowledge. This is much less convenient to the calling code, and detracts from the usefulness of my class. To this end, if the buffer is full after EndRead, we don't send its contents to subscribers immediately but instead append them to a StringBuilder. The contents of the StringBuilder are only sent back whenever there is no more to read from the stream (thus preserving the chunks). private void ReadHappened(IAsyncResult asyncResult) { var bytesRead = this.StandardOutput.BaseStream.EndRead(asyncResult); if (bytesRead == 0) { this.OnAutomationStopped(); return; } var input = this.StandardOutput.CurrentEncoding.GetString( this.buffer, 0, bytesRead); this.inputAccumulator.Append(input); if (bytesRead < this.buffer.Length) { this.OnInputRead(); // only send back if we 're sure we got it all } this.BeginReadAsync(); // continue "looping" with BeginRead } After any read which is not enough to fill the buffer, all accumulated data is sent to the subscribers: private void OnInputRead() { var handler = this.StandardOutputRead; if (handler == null) { return; } handler(this, new ConsoleOutputReadEventArgs(this.inputAccumulator.ToString())); this.inputAccumulator.Clear(); } (I know that as long as there are no subscribers the data gets accumulated forever. This is a deliberate decision). The good This scheme works almost perfectly: Async functionality without spawning any threads Very convenient to the calling code (just subscribe to an event) Maintains the "chunkiness" of the data; this allows the calling code to use inside knowledge of the data without doing any extra work Is almost agnostic to the buffer size (it will work correctly with any size buffer irrespective of the data being read) The bad That last almost is a very big one. Consider what happens when there is an incoming chunk with length exactly equal to the size of the buffer. The chunk will be read and buffered, but the event will not be triggered. This will be followed up by a BeginRead that expects to find more data belonging to the current chunk in order to send it back all in one piece, but... there will be no more data in the stream. In fact, as long as data is put into the stream in chunks with length exactly equal to the buffer size, the data will be buffered and the event will never be triggered. This scenario may be highly unlikely to occur in practice, especially since we can pick any number for the buffer size, but the problem is there. Solution? Unfortunately, after checking the available methods on FileStream and StreamReader, I can't find anything which lets me peek into the stream while also allowing async methods to be used on it. One "solution" would be to have a thread wait on a ManualResetEvent after the "buffer filled" condition is detected. If the event is not signaled (by the async callback) in a small amount of time, then more data from the stream will not be forthcoming and the data accumulated so far should be sent to subscribers. However, this introduces the need for another thread, requires thread synchronization, and is plain inelegant. Specifying a timeout for BeginRead would also suffice (call back into my code every now and then so I can check if there's data to be sent back; most of the time there will not be anything to do, so I expect the performance hit to be negligible). But it looks like timeouts are not supported in FileStream. Since I imagine that async calls with timeouts are an option in bare Win32, another approach might be to PInvoke the hell out of the problem. But this is also undesirable as it will introduce complexity and simply be a pain to code. Is there an elegant way to get around the problem? Thanks for being patient enough to read all of this.

Read the article

Implementing a robust async stream reader for a console

- by Jon

I recently provided an answer to this question: C# - Realtime console output redirection. As often happens, explaining stuff (here "stuff" was how I tackled a similar problem) leads you to greater understanding and/or, as is the case here, "oops" moments. I realized that my solution, as implemented, has a bug. The bug has little practical importance, but it has an extremely large importance to me as a developer: I can't rest easy knowing that my code has the potential to blow up. Squashing the bug is the purpose of this question. I apologize for the long intro, so let's get dirty. I wanted to build a class that allows me to receive input from a Stream in an event-based manner. The stream, in my scenario, is guaranteed to be a FileStream and there is also an associated StreamReader already present to leverage. The public interface of the class is this: public class MyStreamManager { public event EventHandler<ConsoleOutputReadEventArgs> StandardOutputRead; public void StartSendingEvents(); public void StopSendingEvents(); } Obviously this specific scenario has to do with a console's standard output. StartSendingEvents and StopSendingEvents do what they advertise; for the purposes of this discussion, we can assume that events are always being sent without loss of generality. The class uses these two fields internally: protected readonly StringBuilder inputAccumulator = new StringBuilder(); protected readonly byte[] buffer = new byte[256]; The functionality of the class is implemented in the methods below. To get the ball rolling: public void StartSendingEvents(); { this.stopAutomation = false; this.BeginReadAsync(); } To read data out of the Stream without blocking, and also without requiring a carriage return char, BeginRead is called: protected void BeginReadAsync() { if (!this.stopAutomation) { this.StandardOutput.BaseStream.BeginRead( this.buffer, 0, this.buffer.Length, this.ReadHappened, null); } } The challenging part: BeginRead requires using a buffer. This means that when reading from the stream, it is possible that the bytes available to read ("incoming chunk") are larger than the buffer. Since we are only handing off data from the stream to a consumer, and that consumer may well have inside knowledge about the size and/or format of these chunks, I want to call event subscribers exactly once for each chunk. Otherwise the abstraction breaks down and the subscribers have to buffer the incoming data and reconstruct the chunks themselves using said knowledge. This is much less convenient to the calling code, and detracts from the usefulness of my class. Edit: There are comments below correctly stating that since the data is coming from a stream, there is absolutely nothing that the receiver can infer about the structure of the data unless it is fully prepared to parse it. What I am trying to do here is leverage the "flush the output" "structure" that the owner of the console imparts while writing on it. I am prepared to assume (better: allow my caller to have the option to assume) that the OS will pass me the data written between two flushes of the stream in exactly one piece. To this end, if the buffer is full after EndRead, we don't send its contents to subscribers immediately but instead append them to a StringBuilder. The contents of the StringBuilder are only sent back whenever there is no more to read from the stream (thus preserving the chunks). private void ReadHappened(IAsyncResult asyncResult) { var bytesRead = this.StandardOutput.BaseStream.EndRead(asyncResult); if (bytesRead == 0) { this.OnAutomationStopped(); return; } var input = this.StandardOutput.CurrentEncoding.GetString( this.buffer, 0, bytesRead); this.inputAccumulator.Append(input); if (bytesRead < this.buffer.Length) { this.OnInputRead(); // only send back if we 're sure we got it all } this.BeginReadAsync(); // continue "looping" with BeginRead } After any read which is not enough to fill the buffer, all accumulated data is sent to the subscribers: private void OnInputRead() { var handler = this.StandardOutputRead; if (handler == null) { return; } handler(this, new ConsoleOutputReadEventArgs(this.inputAccumulator.ToString())); this.inputAccumulator.Clear(); } (I know that as long as there are no subscribers the data gets accumulated forever. This is a deliberate decision). The good This scheme works almost perfectly: Async functionality without spawning any threads Very convenient to the calling code (just subscribe to an event) Maintains the "chunkiness" of the data; this allows the calling code to use inside knowledge of the data without doing any extra work Is almost agnostic to the buffer size (it will work correctly with any size buffer irrespective of the data being read) The bad That last almost is a very big one. Consider what happens when there is an incoming chunk with length exactly equal to the size of the buffer. The chunk will be read and buffered, but the event will not be triggered. This will be followed up by a BeginRead that expects to find more data belonging to the current chunk in order to send it back all in one piece, but... there will be no more data in the stream. In fact, as long as data is put into the stream in chunks with length exactly equal to the buffer size, the data will be buffered and the event will never be triggered. This scenario may be highly unlikely to occur in practice, especially since we can pick any number for the buffer size, but the problem is there. Solution? Unfortunately, after checking the available methods on FileStream and StreamReader, I can't find anything which lets me peek into the stream while also allowing async methods to be used on it. One "solution" would be to have a thread wait on a ManualResetEvent after the "buffer filled" condition is detected. If the event is not signaled (by the async callback) in a small amount of time, then more data from the stream will not be forthcoming and the data accumulated so far should be sent to subscribers. However, this introduces the need for another thread, requires thread synchronization, and is plain inelegant. Specifying a timeout for BeginRead would also suffice (call back into my code every now and then so I can check if there's data to be sent back; most of the time there will not be anything to do, so I expect the performance hit to be negligible). But it looks like timeouts are not supported in FileStream. Since I imagine that async calls with timeouts are an option in bare Win32, another approach might be to PInvoke the hell out of the problem. But this is also undesirable as it will introduce complexity and simply be a pain to code. Is there an elegant way to get around the problem? Thanks for being patient enough to read all of this.

Developer IT