Search Results

Search found 5423 results on 217 pages for 'care industry'.

Page 57/217 | < Previous Page | 53 54 55 56 57 58 59 60 61 62 63 64  | Next Page >

  • Guide to reduce TFS database growth using the Test Attachment Cleaner

    - by terje
    Recently there has been several reports on TFS databases growing too fast and growing too big.  Notable this has been observed when one has started to use more features of the Testing system.  Also, the TFS 2010 handles test results differently from TFS 2008, and this leads to more data stored in the TFS databases. As a consequence of this there has been released some tools to remove unneeded data in the database, and also some fixes to correct for bugs which has been found and corrected during this process.  Further some preventive practices and maintenance rules should be adopted. A lot of people have blogged about this, among these are: Anu’s very important blog post here describes both the problem and solutions to handle it.  She describes both the Test Attachment Cleaner tool, and also some QFE/CU releases to fix some underlying bugs which prevented the tool from being fully effective. Brian Harry’s blog post here describes the problem too This forum thread describes the problem with some solution hints. Ravi Shanker’s blog post here describes best practices on solving this (TBP) Grant Holidays blogpost here describes strategies to use the Test Attachment Cleaner both to detect space problems and how to rectify them.   The problem can be divided into the following areas: Publishing of test results from builds Publishing of manual test results and their attachments in particular Publishing of deployment binaries for use during a test run Bugs in SQL server preventing total cleanup of data (All the published data above is published into the TFS database as attachments.) The test results will include all data being collected during the run.  Some of this data can grow rather large, like IntelliTrace logs and video recordings.   Also the pushing of binaries which happen for automated test runs, including tests run during a build using code coverage which will include all the files in the deployment folder, contributes a lot to the size of the attached data.   In order to handle this systematically, I have set up a 3-stage process: Find out if you have a database space issue Set up your TFS server to minimize potential database issues If you have the “problem”, clean up the database and otherwise keep it clean   Analyze the data Are your database( s) growing ?  Are unused test results growing out of proportion ? To find out about this you need to query your TFS database for some of the information, and use the Test Attachment Cleaner (TAC) to obtain some  more detailed information. If you don’t have too many databases you can use the SQL Server reports from within the Management Studio to analyze the database and table sizes. Or, you can use a set of queries . I find queries often faster to use because I can tweak them the way I want them.  But be aware that these queries are non-documented and non-supported and may change when the product team wants to change them. If you have multiple Project Collections, find out which might have problems: (Disclaimer: The queries below work on TFS 2010. They will not work on Dev-11, since the table structure have been changed.  I will try to update them for Dev-11 when it is released.) Open a SQL Management Studio session onto the SQL Server where you have your TFS Databases. Use the query below to find the Project Collection databases and their sizes, in descending size order.  use master select DB_NAME(database_id) AS DBName, (size/128) SizeInMB FROM sys.master_files where type=0 and substring(db_name(database_id),1,4)='Tfs_' and DB_NAME(database_id)<>'Tfs_Configuration' order by size desc Doing this on one of our SQL servers gives the following results: It is pretty easy to see on which collection to start the work   Find out which tables are possibly too large Keep a special watch out for the Tfs_Attachment table. Use the script at the bottom of Grant’s blog to find the table sizes in descending size order. In our case we got this result: From Grant’s blog we learnt that the tbl_Content is in the Version Control category, so the major only big issue we have here is the tbl_AttachmentContent.   Find out which team projects have possibly too large attachments In order to use the TAC to find and eventually delete attachment data we need to find out which team projects have these attachments. The team project is a required parameter to the TAC. Use the following query to find this, replace the collection database name with whatever applies in your case:   use Tfs_DefaultCollection select p.projectname, sum(a.compressedlength)/1024/1024 as sizeInMB from dbo.tbl_Attachment as a inner join tbl_testrun as tr on a.testrunid=tr.testrunid inner join tbl_project as p on p.projectid=tr.projectid group by p.projectname order by sum(a.compressedlength) desc In our case we got this result (had to remove some names), out of more than 100 team projects accumulated over quite some years: As can be seen here it is pretty obvious the “Byggtjeneste – Projects” are the main team project to take care of, with the ones on lines 2-4 as the next ones.  Check which attachment types takes up the most space It can be nice to know which attachment types takes up the space, so run the following query: use Tfs_DefaultCollection select a.attachmenttype, sum(a.compressedlength)/1024/1024 as sizeInMB from dbo.tbl_Attachment as a inner join tbl_testrun as tr on a.testrunid=tr.testrunid inner join tbl_project as p on p.projectid=tr.projectid group by a.attachmenttype order by sum(a.compressedlength) desc We then got this result: From this it is pretty obvious that the problem here is the binary files, as also mentioned in Anu’s blog. Check which file types, by their extension, takes up the most space Run the following query use Tfs_DefaultCollection select SUBSTRING(filename,len(filename)-CHARINDEX('.',REVERSE(filename))+2,999)as Extension, sum(compressedlength)/1024 as SizeInKB from tbl_Attachment group by SUBSTRING(filename,len(filename)-CHARINDEX('.',REVERSE(filename))+2,999) order by sum(compressedlength) desc This gives a result like this:   Now you should have collected enough information to tell you what to do – if you got to do something, and some of the information you need in order to set up your TAC settings file, both for a cleanup and for scheduled maintenance later.    Get your TFS server and environment properly set up Even if you have got the problem or if have yet not got the problem, you should ensure the TFS server is set up so that the risk of getting into this problem is minimized.  To ensure this you should install the following set of updates and components. The assumption is that your TFS Server is at SP1 level. Install the QFE for KB2608743 – which also contains detailed instructions on its use, download from here. The QFE changes the default settings to not upload deployed binaries, which are used in automated test runs. Binaries will still be uploaded if: Code coverage is enabled in the test settings. You change the UploadDeploymentItem to true in the testsettings file. Be aware that this might be reset back to false by another user which haven't installed this QFE. The hotfix should be installed to The build servers (the build agents) The machine hosting the Test Controller Local development computers (Visual Studio) Local test computers (MTM) It is not required to install it to the TFS Server, test agents or the build controller – it has no effect on these programs. If you use the SQL Server 2008 R2 you should also install the CU 10 (or later).  This CU fixes a potential problem of hanging “ghost” files.  This seems to happen only in certain trigger situations, but to ensure it doesn’t bite you, it is better to make sure this CU is installed. There is no such CU for SQL Server 2008 pre-R2 Work around:  If you suspect hanging ghost files, they can be – with some mental effort, deduced from the ghost counters using the following SQL query: use master SELECT DB_NAME(database_id) as 'database',OBJECT_NAME(object_id) as 'objectname', index_type_desc,ghost_record_count,version_ghost_record_count,record_count,avg_record_size_in_bytes FROM sys.dm_db_index_physical_stats (DB_ID(N'<DatabaseName>'), OBJECT_ID(N'<TableName>'), NULL, NULL , 'DETAILED') The problem is a stalled ghost cleanup process.  Restarting the SQL server after having stopped all components that depends on it, like the TFS Server and SPS services – that is all applications that connect to the SQL server. Then restart the SQL server, and finally start up all dependent processes again.  (I would guess a complete server reboot would do the trick too.) After this the ghost cleanup process will run properly again. The fix will come in the next CU cycle for SQL Server R2 SP1.  The R2 pre-SP1 and R2 SP1 have separate maintenance cycles, and are maintained individually. Each have its own set of CU’s. When it comes I will add the link here to that CU. The "hanging ghost file” issue came up after one have run the TAC, and deleted enourmes amount of data.  The SQL Server can get into this hanging state (without the QFE) in certain cases due to this. And of course, install and set up the Test Attachment Cleaner command line power tool.  This should be done following some guidelines from Ravi Shanker: “When you run TAC, ensure that you are deleting small chunks of data at regular intervals (say run TAC every night at 3AM to delete data that is between age 730 to 731 days) – this will ensure that small amounts of data are being deleted and SQL ghosted record cleanup can catch up with the number of deletes performed. “ This rule minimizes the risk of the ghosted hang problem to occur, and further makes it easier for the SQL server ghosting process to work smoothly. “Run DBCC SHRINKDB post the ghosted records are cleaned up to physically reclaim the space on the file system” This is the last step in a 3 step process of removing SQL server data. First they are logically deleted. Then they are cleaned out by the ghosting process, and finally removed using the shrinkdb command. Cleaning out the attachments The TAC is run from the command line using a set of parameters and controlled by a settingsfile.  The parameters point out a server uri including the team project collection and also point at a specific team project. So in order to run this for multiple team projects regularly one has to set up a script to run the TAC multiple times, once for each team project.  When you install the TAC there is a very useful readme file in the same directory. When the deployment binaries are published to the TFS server, ALL items are published up from the deployment folder. That often means much more files than you would assume are necessary. This is a brute force technique. It works, but you need to take care when cleaning up. Grant has shown how their settings file looks in his blog post, removing all attachments older than 180 days , as long as there are no active workitems connected to them. This setting can be useful to clean out all items, both in a clean-up once operation, and in a general There are two scenarios we need to consider: Cleaning up an existing overgrown database Maintaining a server to avoid an overgrown database using scheduled TAC   1. Cleaning up a database which has grown too big due to these attachments. This job is a “Once” job.  We do this once and then move on to make sure it won’t happen again, by taking the actions in 2) below.  In this scenario you should only consider the large files. Your goal should be to simply reduce the size, and don’t bother about  the smaller stuff. That can be left a scheduled TAC cleanup ( 2 below). Here you can use a very general settings file, and just remove the large attachments, or you can choose to remove any old items.  Grant’s settings file is an example of the last one.  A settings file to remove only large attachments could look like this: <!-- Scenario : Remove large files --> <DeletionCriteria> <TestRun /> <Attachment> <SizeInMB GreaterThan="10" /> </Attachment> </DeletionCriteria> Or like this: If you want only to remove dll’s and pdb’s about that size, add an Extensions-section.  Without that section, all extensions will be deleted. <!-- Scenario : Remove large files of type dll's and pdb's --> <DeletionCriteria> <TestRun /> <Attachment> <SizeInMB GreaterThan="10" /> <Extensions> <Include value="dll" /> <Include value="pdb" /> </Extensions> </Attachment> </DeletionCriteria> Before you start up your scheduled maintenance, you should clear out all older items. 2. Scheduled maintenance using the TAC If you run a schedule every night, and remove old items, and also remove them in small batches.  It is important to run this often, like every night, in order to keep the number of deleted items low. That way the SQL ghost process works better. One approach could be to delete all items older than some number of days, let’s say 180 days. This could be combined with restricting it to keep attachments with active or resolved bugs.  Doing this every night ensures that only small amounts of data is deleted. <!-- Scenario : Remove old items except if they have active or resolved bugs --> <DeletionCriteria> <TestRun> <AgeInDays OlderThan="180" /> </TestRun> <Attachment /> <LinkedBugs> <Exclude state="Active" /> <Exclude state="Resolved"/> </LinkedBugs> </DeletionCriteria> In my experience there are projects which are left with active or resolved workitems, akthough no further work is done.  It can be wise to have a cleanup process with no restrictions on linked bugs at all. Note that you then have to remove the whole LinkedBugs section. A approach which could work better here is to do a two step approach, use the schedule above to with no LinkedBugs as a sweeper cleaning task taking away all data older than you could care about.  Then have another scheduled TAC task to take out more specifically attachments that you are not likely to use. This task could be much more specific, and based on your analysis clean out what you know is troublesome data. <!-- Scenario : Remove specific files early --> <DeletionCriteria> <TestRun > <AgeInDays OlderThan="30" /> </TestRun> <Attachment> <SizeInMB GreaterThan="10" /> <Extensions> <Include value="iTrace"/> <Include value="dll"/> <Include value="pdb"/> <Include value="wmv"/> </Extensions> </Attachment> <LinkedBugs> <Exclude state="Active" /> <Exclude state="Resolved" /> </LinkedBugs> </DeletionCriteria> The readme document for the TAC says that it recognizes “internal” extensions, but it does recognize any extension. To run the tool do the following command: tcmpt attachmentcleanup /collection:your_tfs_collection_url /teamproject:your_team_project /settingsfile:path_to_settingsfile /outputfile:%temp%/teamproject.tcmpt.log /mode:delete   Shrinking the database You could run a shrink database command after the TAC has run in cases where there are a lot of data being deleted.  In this case you SHOULD do it, to free up all that space.  But, after the shrink operation you should do a rebuild indexes, since the shrink operation will leave the database in a very fragmented state, which will reduce performance. Note that you need to rebuild indexes, reorganizing is not enough. For smaller amounts of data you should NOT shrink the database, since the data will be reused by the SQL server when it need to add more records.  In fact, it is regarded as a bad practice to shrink the database regularly.  So on a daily maintenance schedule you should NOT shrink the database. To shrink the database you do a DBCC SHRINKDATABASE command, and then follow up with a DBCC INDEXDEFRAG afterwards.  I find the easiest way to do this is to create a SQL Maintenance plan including the Shrink Database Task and the Rebuild Index Task and just execute it when you need to do this.

    Read the article

  • URL Rewrite – Multiple domains under one site. Part II

    - by OWScott
    I believe I have it … I’ve been meaning to put together the ultimate outgoing rule for hosting multiple domains under one site.  I finally sat down this week and setup a few test cases, and created one rule to rule them all.  In Part I of this two part series, I covered the incoming rule necessary to host a site in a subfolder of a website, while making it appear as if it’s in the root of the site.  Part II won’t work without applying Part I first, so if you haven’t read it, I encourage you to read it now. However, the incoming rule by itself doesn’t address everything.  Here’s the problem … Let’s say that we host www.site2.com in a subfolder called site2, off of masterdomain.com.  This is the same example I used in Part I.   Using an incoming rewrite rule, we are able to make a request to www.site2.com even though the site is really in the /site2 folder.  The gotcha comes with any type of path that ASP.NET generates (I’m sure other scripting technologies could do the same too).  ASP.NET thinks that the path to the root of the site is /site2, but the URL is /.  See the issue?  If ASP.NET generates a path or a redirect for us, it will always add /site2 to the URL.  That results in a path that looks something like www.site2.com/site2.  In Part I, I mentioned that you should add a condition where “{PATH_INFO} ‘does not match’ /site2”.  That allows www.site2.com/site2 and www.site2.com to both function the same.  This allows the site to always work, but if you want to hide /site2 in the URL, you need to take it one step further. One way to address this is in your code.  Ultimately this is the best bet.  Ruslan Yakushev has a great article on a few considerations that you can address in code.  I recommend giving that serious consideration.  Additionally, if you have upgraded to ASP.NET 3.5 SP1 or greater, it takes care of some of the references automatically for you. However, what if you inherit an existing application?  Or you can’t easily go through your existing site and make the code changes?  If this applies to you, read on. That’s where URL Rewrite 2.0 comes in.  With URL Rewrite 2.0, you can create an outgoing rule that will remove the /site2 before the page is sent back to the user.  This means that you can take an existing application, host it in a subfolder of your site, and ensure that the URL never reveals that it’s in a subfolder. Performance Considerations Performance overhead is something to be mindful of.  These outbound rules aren’t simply changing the server variables.  The first rule I’ll cover below needs to parse the HTML body and pull out the path (i.e. /site2) on the way through.  This will add overhead, possibly significant if you have large pages and a busy site.  In other words, your mileage may vary and you may need to test to see the impact that these rules have.  Don’t worry too much though.  For many sites, the performance impact is negligible. So, how do we do it? Creating the Outgoing Rule There are really two things to keep in mind.  First, ASP.NET applications frequently generate a URL that adds the /site2 back into the URL.  In addition to URLs, they can be in form elements, img elements and the like.  The goal is to find all of those situations and rewrite it on the way out.  Let’s call this the ‘URL problem’. Second, and similarly, ASP.NET can send a LOCATION redirect that causes a redirect back to another page.  Again, ASP.NET isn’t aware of the different URL and it will add the /site2 to the redirect.  Form Authentication is a good example on when this occurs.  Try to password protect a site running from a subfolder using forms auth and you’ll quickly find that the URL becomes www.site2.com/site2 again.  Let’s term this the ‘redirect problem’. Solving the URL Problem – Outgoing Rule #1 Let’s create a rule that removes the /site2 from any URL.  We want to remove it from relative URLs like /site2/something, or absolute URLs like http://www.site2.com/site2/something.  Most URLs that ASP.NET creates will be relative URLs, but I figure that there may be some applications that piece together a full URL, so we might as well expect that situation. Let’s get started.  First, create a new outbound rule.  You can create the rule within the /site2 folder which will reduce the performance impact of the rule.  Just a reminder that incoming rules for this situation won’t work in a subfolder … but outgoing rules will. Give it a name that makes sense to you, for example “Outgoing – URL paths”. Precondition.  If you place the rule in the subfolder, it will only run for that site and folder, so there isn’t need for a precondition.  Run it for all requests.  If you place it in the root of the site, you may want to create a precondition for HTTP_HOST = ^(www\.)?site2\.com$. For the Match section, there are a few things to consider.  For performance reasons, it’s best to match the least amount of elements that you need to accomplish the task.  For my test cases, I just needed to rewrite the <a /> tag, but you may need to rewrite any number of HTML elements.  Note that as long as you have the exclude /site2 rule in your incoming rule as I described in Part I, some elements that don’t show their URL—like your images—will work without removing the /site2 from them.  That reduces the processing needed for this rule. Leave the “matching scope” at “Response” and choose the elements that you want to change. Set the pattern to “^(?:site2|(.*//[_a-zA-Z0-9-\.]*)?/site2)(.*)”.  Make sure to replace ‘site2’ with your subfolder name in both places.  Yes, I realize this is a pretty messy looking rule, but it handles a few situations.  This rule will handle the following situations correctly: Original Rewritten using {R:1}{R:2} http://www.site2.com/site2/default.aspx http://www.site2.com/default.aspx http://www.site2.com/folder1/site2/default.aspx Won’t rewrite since it’s a sub-sub folder /site2/default.aspx /default.aspx site2/default.aspx /default.aspx /folder1/site2/default.aspx Won’t rewrite since it’s a sub-sub folder. For the conditions section, you can leave that be. Finally, for the rule, set the Action Type to “Rewrite” and set the Value to “{R:1}{R:2}”.  The {R:1} and {R:2} are back references to the sections within parentheses.  In other words, in http://domain.com/site2/something, {R:1} will be http://domain.com and {R:2} will be /something. If you view your rule from your web.config file (or applicationHost.config if it’s a global rule), it should look like this: <rule name="Outgoing - URL paths" enabled="true"> <match filterByTags="A" pattern="^(?:site2|(.*//[_a-zA-Z0-9-\.]*)?/site2)(.*)" /> <action type="Rewrite" value="{R:1}{R:2}" /> </rule> Solving the Redirect Problem Outgoing Rule #2 The second issue that we can run into is with a client-side redirect.  This is triggered by a LOCATION response header that is sent to the client.  Forms authentication is a common example.  To reproduce this, password protect your subfolder and watch how it redirects and adds the subfolder path back in. Notice in my test case the extra paths: http://site2.com/site2/login.aspx?ReturnUrl=%2fsite2%2fdefault.aspx I want to remove /site2 from both the URL and the ReturnUrl querystring value.  For semi-readability, let’s do this in 2 separate rules, one for the URL and one for the querystring. Create a second rule.  As with the previous rule, it can be created in the /site2 subfolder.  In the URL Rewrite wizard, select Outbound rules –> “Blank Rule”. Fill in the following information: Name response_location URL Precondition Don’t set Match: Matching Scope Server Variable Match: Variable Name RESPONSE_LOCATION Match: Pattern ^(?:site2|(.*//[_a-zA-Z0-9-\.]*)?/site2)(.*) Conditions Don’t set Action Type Rewrite Action Properties {R:1}{R:2} It should end up like so: <rule name="response_location URL"> <match serverVariable="RESPONSE_LOCATION" pattern="^(?:site2|(.*//[_a-zA-Z0-9-\.]*)?/site2)(.*)" /> <action type="Rewrite" value="{R:1}{R:2}" /> </rule> Outgoing Rule #3 Outgoing Rule #2 only takes care of the URL path, and not the querystring path.  Let’s create one final rule to take care of the path in the querystring to ensure that ReturnUrl=%2fsite2%2fdefault.aspx gets rewritten to ReturnUrl=%2fdefault.aspx. The %2f is the HTML encoding for forward slash (/). Create a rule like the previous one, but with the following settings: Name response_location querystring Precondition Don’t set Match: Matching Scope Server Variable Match: Variable Name RESPONSE_LOCATION Match: Pattern (.*)%2fsite2(.*) Conditions Don’t set Action Type Rewrite Action Properties {R:1}{R:2} The config should look like this: <rule name="response_location querystring"> <match serverVariable="RESPONSE_LOCATION" pattern="(.*)%2fsite2(.*)" /> <action type="Rewrite" value="{R:1}{R:2}" /> </rule> It’s possible to squeeze the last two rules into one, but it gets kind of confusing so I felt that it’s better to show it as two separate rules. Summary With the rules covered in these two parts, we’re able to have a site in a subfolder and make it appear as if it’s in the root of the site.  Not only that, we can overcome automatic redirecting that is caused by ASP.NET, other scripting technologies, and especially existing applications. Following is an example of the incoming and outgoing rules necessary for a site called www.site2.com hosted in a subfolder called /site2.  Remember that the outgoing rules can be placed in the /site2 folder instead of the in the root of the site. <rewrite> <rules> <rule name="site2.com in a subfolder" enabled="true" stopProcessing="true"> <match url=".*" /> <conditions logicalGrouping="MatchAll" trackAllCaptures="false"> <add input="{HTTP_HOST}" pattern="^(www\.)?site2\.com$" /> <add input="{PATH_INFO}" pattern="^/site2($|/)" negate="true" /> </conditions> <action type="Rewrite" url="/site2/{R:0}" /> </rule> </rules> <outboundRules> <rule name="Outgoing - URL paths" enabled="true"> <match filterByTags="A" pattern="^(?:site2|(.*//[_a-zA-Z0-9-\.]*)?/site2)(.*)" /> <action type="Rewrite" value="{R:1}{R:2}" /> </rule> <rule name="response_location URL"> <match serverVariable="RESPONSE_LOCATION" pattern="^(?:site2|(.*//[_a-zA-Z0-9-\.]*)?/site2)(.*)" /> <action type="Rewrite" value="{R:1}{R:2}" /> </rule> <rule name="response_location querystring"> <match serverVariable="RESPONSE_LOCATION" pattern="(.*)%2fsite2(.*)" /> <action type="Rewrite" value="{R:1}{R:2}" /> </rule> </outboundRules> </rewrite> If you run into any situations that aren’t caught by these rules, please let me know so I can update this to be as complete as possible. Happy URL Rewriting!

    Read the article

  • C#/.NET Little Wonders: The Useful But Overlooked Sets

    - by James Michael Hare
    Once again we consider some of the lesser known classes and keywords of C#.  Today we will be looking at two set implementations in the System.Collections.Generic namespace: HashSet<T> and SortedSet<T>.  Even though most people think of sets as mathematical constructs, they are actually very useful classes that can be used to help make your application more performant if used appropriately. A Background From Math In mathematical terms, a set is an unordered collection of unique items.  In other words, the set {2,3,5} is identical to the set {3,5,2}.  In addition, the set {2, 2, 4, 1} would be invalid because it would have a duplicate item (2).  In addition, you can perform set arithmetic on sets such as: Intersections: The intersection of two sets is the collection of elements common to both.  Example: The intersection of {1,2,5} and {2,4,9} is the set {2}. Unions: The union of two sets is the collection of unique items present in either or both set.  Example: The union of {1,2,5} and {2,4,9} is {1,2,4,5,9}. Differences: The difference of two sets is the removal of all items from the first set that are common between the sets.  Example: The difference of {1,2,5} and {2,4,9} is {1,5}. Supersets: One set is a superset of a second set if it contains all elements that are in the second set. Example: The set {1,2,5} is a superset of {1,5}. Subsets: One set is a subset of a second set if all the elements of that set are contained in the first set. Example: The set {1,5} is a subset of {1,2,5}. If We’re Not Doing Math, Why Do We Care? Now, you may be thinking: why bother with the set classes in C# if you have no need for mathematical set manipulation?  The answer is simple: they are extremely efficient ways to determine ownership in a collection. For example, let’s say you are designing an order system that tracks the price of a particular equity, and once it reaches a certain point will trigger an order.  Now, since there’s tens of thousands of equities on the markets, you don’t want to track market data for every ticker as that would be a waste of time and processing power for symbols you don’t have orders for.  Thus, we just want to subscribe to the stock symbol for an equity order only if it is a symbol we are not already subscribed to. Every time a new order comes in, we will check the list of subscriptions to see if the new order’s stock symbol is in that list.  If it is, great, we already have that market data feed!  If not, then and only then should we subscribe to the feed for that symbol. So far so good, we have a collection of symbols and we want to see if a symbol is present in that collection and if not, add it.  This really is the essence of set processing, but for the sake of comparison, let’s say you do a list instead: 1: // class that handles are order processing service 2: public sealed class OrderProcessor 3: { 4: // contains list of all symbols we are currently subscribed to 5: private readonly List<string> _subscriptions = new List<string>(); 6:  7: ... 8: } Now whenever you are adding a new order, it would look something like: 1: public PlaceOrderResponse PlaceOrder(Order newOrder) 2: { 3: // do some validation, of course... 4:  5: // check to see if already subscribed, if not add a subscription 6: if (!_subscriptions.Contains(newOrder.Symbol)) 7: { 8: // add the symbol to the list 9: _subscriptions.Add(newOrder.Symbol); 10: 11: // do whatever magic is needed to start a subscription for the symbol 12: } 13:  14: // place the order logic! 15: } What’s wrong with this?  In short: performance!  Finding an item inside a List<T> is a linear - O(n) – operation, which is not a very performant way to find if an item exists in a collection. (I used to teach algorithms and data structures in my spare time at a local university, and when you began talking about big-O notation you could immediately begin to see eyes glossing over as if it was pure, useless theory that would not apply in the real world, but I did and still do believe it is something worth understanding well to make the best choices in computer science). Let’s think about this: a linear operation means that as the number of items increases, the time that it takes to perform the operation tends to increase in a linear fashion.  Put crudely, this means if you double the collection size, you might expect the operation to take something like the order of twice as long.  Linear operations tend to be bad for performance because they mean that to perform some operation on a collection, you must potentially “visit” every item in the collection.  Consider finding an item in a List<T>: if you want to see if the list has an item, you must potentially check every item in the list before you find it or determine it’s not found. Now, we could of course sort our list and then perform a binary search on it, but sorting is typically a linear-logarithmic complexity – O(n * log n) - and could involve temporary storage.  So performing a sort after each add would probably add more time.  As an alternative, we could use a SortedList<TKey, TValue> which sorts the list on every Add(), but this has a similar level of complexity to move the items and also requires a key and value, and in our case the key is the value. This is why sets tend to be the best choice for this type of processing: they don’t rely on separate keys and values for ordering – so they save space – and they typically don’t care about ordering – so they tend to be extremely performant.  The .NET BCL (Base Class Library) has had the HashSet<T> since .NET 3.5, but at that time it did not implement the ISet<T> interface.  As of .NET 4.0, HashSet<T> implements ISet<T> and a new set, the SortedSet<T> was added that gives you a set with ordering. HashSet<T> – For Unordered Storage of Sets When used right, HashSet<T> is a beautiful collection, you can think of it as a simplified Dictionary<T,T>.  That is, a Dictionary where the TKey and TValue refer to the same object.  This is really an oversimplification, but logically it makes sense.  I’ve actually seen people code a Dictionary<T,T> where they store the same thing in the key and the value, and that’s just inefficient because of the extra storage to hold both the key and the value. As it’s name implies, the HashSet<T> uses a hashing algorithm to find the items in the set, which means it does take up some additional space, but it has lightning fast lookups!  Compare the times below between HashSet<T> and List<T>: Operation HashSet<T> List<T> Add() O(1) O(1) at end O(n) in middle Remove() O(1) O(n) Contains() O(1) O(n)   Now, these times are amortized and represent the typical case.  In the very worst case, the operations could be linear if they involve a resizing of the collection – but this is true for both the List and HashSet so that’s a less of an issue when comparing the two. The key thing to note is that in the general case, HashSet is constant time for adds, removes, and contains!  This means that no matter how large the collection is, it takes roughly the exact same amount of time to find an item or determine if it’s not in the collection.  Compare this to the List where almost any add or remove must rearrange potentially all the elements!  And to find an item in the list (if unsorted) you must search every item in the List. So as you can see, if you want to create an unordered collection and have very fast lookup and manipulation, the HashSet is a great collection. And since HashSet<T> implements ICollection<T> and IEnumerable<T>, it supports nearly all the same basic operations as the List<T> and can use the System.Linq extension methods as well. All we have to do to switch from a List<T> to a HashSet<T>  is change our declaration.  Since List and HashSet support many of the same members, chances are we won’t need to change much else. 1: public sealed class OrderProcessor 2: { 3: private readonly HashSet<string> _subscriptions = new HashSet<string>(); 4:  5: // ... 6:  7: public PlaceOrderResponse PlaceOrder(Order newOrder) 8: { 9: // do some validation, of course... 10: 11: // check to see if already subscribed, if not add a subscription 12: if (!_subscriptions.Contains(newOrder.Symbol)) 13: { 14: // add the symbol to the list 15: _subscriptions.Add(newOrder.Symbol); 16: 17: // do whatever magic is needed to start a subscription for the symbol 18: } 19: 20: // place the order logic! 21: } 22:  23: // ... 24: } 25: Notice, we didn’t change any code other than the declaration for _subscriptions to be a HashSet<T>.  Thus, we can pick up the performance improvements in this case with minimal code changes. SortedSet<T> – Ordered Storage of Sets Just like HashSet<T> is logically similar to Dictionary<T,T>, the SortedSet<T> is logically similar to the SortedDictionary<T,T>. The SortedSet can be used when you want to do set operations on a collection, but you want to maintain that collection in sorted order.  Now, this is not necessarily mathematically relevant, but if your collection needs do include order, this is the set to use. So the SortedSet seems to be implemented as a binary tree (possibly a red-black tree) internally.  Since binary trees are dynamic structures and non-contiguous (unlike List and SortedList) this means that inserts and deletes do not involve rearranging elements, or changing the linking of the nodes.  There is some overhead in keeping the nodes in order, but it is much smaller than a contiguous storage collection like a List<T>.  Let’s compare the three: Operation HashSet<T> SortedSet<T> List<T> Add() O(1) O(log n) O(1) at end O(n) in middle Remove() O(1) O(log n) O(n) Contains() O(1) O(log n) O(n)   The MSDN documentation seems to indicate that operations on SortedSet are O(1), but this seems to be inconsistent with its implementation and seems to be a documentation error.  There’s actually a separate MSDN document (here) on SortedSet that indicates that it is, in fact, logarithmic in complexity.  Let’s put it in layman’s terms: logarithmic means you can double the collection size and typically you only add a single extra “visit” to an item in the collection.  Take that in contrast to List<T>’s linear operation where if you double the size of the collection you double the “visits” to items in the collection.  This is very good performance!  It’s still not as performant as HashSet<T> where it always just visits one item (amortized), but for the addition of sorting this is a good thing. Consider the following table, now this is just illustrative data of the relative complexities, but it’s enough to get the point: Collection Size O(1) Visits O(log n) Visits O(n) Visits 1 1 1 1 10 1 4 10 100 1 7 100 1000 1 10 1000   Notice that the logarithmic – O(log n) – visit count goes up very slowly compare to the linear – O(n) – visit count.  This is because since the list is sorted, it can do one check in the middle of the list, determine which half of the collection the data is in, and discard the other half (binary search).  So, if you need your set to be sorted, you can use the SortedSet<T> just like the HashSet<T> and gain sorting for a small performance hit, but it’s still faster than a List<T>. Unique Set Operations Now, if you do want to perform more set-like operations, both implementations of ISet<T> support the following, which play back towards the mathematical set operations described before: IntersectWith() – Performs the set intersection of two sets.  Modifies the current set so that it only contains elements also in the second set. UnionWith() – Performs a set union of two sets.  Modifies the current set so it contains all elements present both in the current set and the second set. ExceptWith() – Performs a set difference of two sets.  Modifies the current set so that it removes all elements present in the second set. IsSupersetOf() – Checks if the current set is a superset of the second set. IsSubsetOf() – Checks if the current set is a subset of the second set. For more information on the set operations themselves, see the MSDN description of ISet<T> (here). What Sets Don’t Do Don’t get me wrong, sets are not silver bullets.  You don’t really want to use a set when you want separate key to value lookups, that’s what the IDictionary implementations are best for. Also sets don’t store temporal add-order.  That is, if you are adding items to the end of a list all the time, your list is ordered in terms of when items were added to it.  This is something the sets don’t do naturally (though you could use a SortedSet with an IComparer with a DateTime but that’s overkill) but List<T> can. Also, List<T> allows indexing which is a blazingly fast way to iterate through items in the collection.  Iterating over all the items in a List<T> is generally much, much faster than iterating over a set. Summary Sets are an excellent tool for maintaining a lookup table where the item is both the key and the value.  In addition, if you have need for the mathematical set operations, the C# sets support those as well.  The HashSet<T> is the set of choice if you want the fastest possible lookups but don’t care about order.  In contrast the SortedSet<T> will give you a sorted collection at a slight reduction in performance.   Technorati Tags: C#,.Net,Little Wonders,BlackRabbitCoder,ISet,HashSet,SortedSet

    Read the article

  • Is the Core i5 Processor from Intel like the Celerons of yesteryear?

    - by Chris
    The title pretty much says it. I know that the Core i7's are Quad Core and Hyper-threaded (so 4 cores, and 8 logical), and the Core i5's are Quad Core as well but not Hyper-threaded, does this really make a difference? Or are the only people who are going to care are the ones who CPU intensive operations? I'm a developer, so I'm more concerned about hard drive speed most times than CPU speed. Any thoughts?

    Read the article

  • Buying used MacBook, what to look at?

    - by Wojtek
    I'm planning to buy used MacBook, I'm looking for more recent models, but detailed specification is not important. I would like to know what should I especially look at when buying used MacBook? What are common flaws in those models (manufactured 2009/2010)? I don't care about minor damages, like scratches, but would like to know what symptoms of repair/previous damage/possible failures in the future are?

    Read the article

  • Reality behind wireless security - the weakness of encrypting

    - by Cawas
    I welcome better key-wording here, both on tags and title, and I'll add more links as soon as possible. For some years I'm trying to conceive a wireless environment that I'd setup anywhere and advise for everyone, including from big enterprises to small home networks of 1 machine. I've always had the feeling using any kind of the so called "wireless security" methods is actually a bad design. I'm talking mostly about encrypting and pass-phrasing (which are actually two different concepts), since I won't even considering hiding SSID and mac filtering. I understand it's a natural way of thinking. With cable networking nobody can access the network unless they have access to the physical cable, so you're "secure" in the physical way. In a way, encrypting is for wireless what walling (building walls) is for the cables. And giving pass-phrases is adding a door with a key. But the cabling without encryption is also insecure. Someone just need to plugin and get your data! And while I can see the use for encrypting data, I don't think it's a security measure in wireless networks. As I said elsewhere, I believe we should encrypt only sensitive data regardless of wires. And passwords should be added to the users, always, not to wifi. For securing files, truly, best solution is backup. Sure all that doesn't happen that often, but I won't consider the most situations where people just don't care. I think there are enough situations where people actually care on using passwords on their OS users, so let's go with that in mind. For being able to break the walls or the door someone will need proper equipment such as a hammer or a master key of some kind. Same is true for breaking the wireless walls in the analogy. But, I'd say true data security is at another place. I keep promoting the Fonera concept as an instance. It opens up a free wifi port, if you choose so, and anyone can connect to the internet through that, without having any access to your LAN. It also uses a QoS which will never let your bandwidth drop from that public usage. That's security, and it's open. And who doesn't want to be able to use internet freely anywhere you can find wifi spots? I have 3G myself, but that's beyond the point here. If I have a wifi at home I want to let people freely use it for internet as to not be an hypocrite and even guests can easily access my files, just for reading access, so I don't need to keep setting up encryption and pass-phrases that are not whole compatible. I'll probably be bashed for promoting the non-usage of WPA 2 with AES or whatever, but I wanted to know from more experienced (super) users out there: what do you think? Is there really a need for encryption to have true wireless security?

    Read the article

  • Autologin on Ubuntu Server

    - by hekevintran
    I have a machine running Ubuntu Server. It has only a command-line interface. How can I make the system login with a specific user automatically (I don't want to type the username/password). I know that this is insecure and I don't care.

    Read the article

  • Hardware RAID Controller Support for SSD TRIM

    - by dss539
    Do any hardware RAID controllers available today support TRIM? If not, do any manufacturers have target dates for supporting TRIM? Should I even care about TRIM for SSDs installed in performance-sensitive workstations? Before you suggest it, yes software RAID would sidestep the issue, but my requirements do not allow software RAID. edit: The answer appears to be "no RAID controllers support TRIM" at the current date.

    Read the article

  • mysql broke; how to save some of the table?

    - by user1048138
    For some reason, my mysql cant connect any more. Im running 3 wordpress websites and I need to save the tables. Thats what I really really really really care about... here is the problem: root@dev:/var/log/mysql# mysql ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) Same error is display when logging in with phpmyadmin.. All of the websites display this error Error establishing a database connection

    Read the article

  • chroot on OSX as a different OS

    - by ekaqu
    I was wondering if anyone has been able to use chroot on OSX to run another OS (ubuntu, centos). I know that they are very different, but almost everything I want to use this for wouldn't care about anything at the level of the kernel, so was hoping there would be a way to do this without using a VM. Based off my google searches, I see this question is asked, but no real answer other than "try a VM". Would really like to do this without a VM though.

    Read the article

  • blogging without having to manually upload pictures

    - by bguiz
    Hi, I want to blog on a Wordpress account, without having to upload pictures manually. I'm looking for a program that allows you to edit text + simple formatting + pictures, and lets you publish it to your blog in one step, taking care of uploading both the text and the pictures. Does anyone know of such a program? Thanks.

    Read the article

  • What happens when a server has too many requests?

    - by eSKay
    I am wondering why websites crash at all. If a server has too many requests, it might queue up the request in its waiting lists and serve it when all the earlier requests have been served. That means that the request for the website will be taken care of, although it may take some more time than expected. Then, how do websites crash due to server overload?

    Read the article

  • Old Visioneer 5800 on Windows 7 64-bit?

    - by dsimcha
    Does anyone know of a way to get an old Visioneer 5800 scanner that's supposed to work with nothing past Windows XP SP1 to work on Windows 7 64-bit? I don't care about all the bells and whistles, just the basic features. Is there any kind of generic TWAIN interface that can be used?

    Read the article

  • How does too many requests make a server crash?

    - by eSKay
    I am wondering why websites crash at all. If a server has too many requests, it might queue up the request in its waiting lists and serve it when all the earlier requests have been served. That means that the request for the website will be taken care of, although it may take some more time than expected. Then, how do websites crash due to server overload?

    Read the article

  • Upgrade Postgres 8.3 to 8.4 using Ports (on FreeBSD)

    - by vpetersson
    Hi fellow geeks, I'm trying to upgrade PostgreSQL 8.3 to 8.4 using Ports on FreeBSD (7.2). I found a few good guides, such as this one, but they won't work. Upon running 'portupgrade -o databases/postgresql84-client postgresql-client' I get nothing. Not even an error. (I dropped the -f, since I already have the latest 8.3-client installed.) Anyone care to offer a solution?

    Read the article

  • Excluding specific file types from a security audit in windows server 2008

    - by Mozez
    Hi, I am looking for a way to exclude specific file types from being logged in the security audits. I have a folder being audited for deletion events and the majority of logged events are .tmp files (such as a temp Word file that is automatically deleted when the app is closed) which I do not care about. Would anyone know of a way to exclude these types of files from being logged? Thanks in advance for any comments.

    Read the article

  • Why does newline come before space in the output of hexdump?

    - by ??????? ???????????
    Printing these characters in the "Canonical" format gives the output that I expect, while the default format throws me off. $ echo " " |hexdump # Reversed? 0000000 0a20 0000002 $ echo -n " " |hexdump # Ok, fair enough. 0000000 0020 $ echo " " |hexdump -C # Canonical 00000000 20 0a | .| 00000002 With a different string, such as "123" the output is even more confusing: $ echo "123" |hexdump 0000000 3231 0a33 0000004 The output here does not seem "reversed", but rather shuffled. Would anyone care to explain (briefly) what is going on here?

    Read the article

  • How do I remove the numerous Genius Playlists from my iPad?

    - by spilth
    I’m using the Music app on my iPad 1 with iOS 5.1.1. For some reason, over time my iPad’s Playlists have been populated with more and more Genius Playlists making it extremely sluggish and almost impossible to access my own playlists. As of right now there are 16 copies for an “Adult Contemporary Rock Mix”. Multiple copies exist for a large number of genres. How do I get these off my iPad so I can get to the playlists I actually care about?

    Read the article

  • How does too many requests make a server crash?

    - by eSKay
    I am wondering why websites crash at all. If a server has too many requests, it might queue up the request in its waiting lists and serve it when all the earlier requests have been served. That means that the request for the website will be taken care of, although it may take some more time than expected. Then, how do websites crash due to server overload?

    Read the article

  • best tomcat hosting servrice provider for jsp

    - by akshay
    I want to host my website build using jsp/java.I am looking for a good host which offer following features. unlimited bandwidth (to support large traffic.I dont want to run slowely) low price` good customer care suport that can help me in deploying in case of any problems I am running tight on budget.As i am a university passout. `

    Read the article

< Previous Page | 53 54 55 56 57 58 59 60 61 62 63 64  | Next Page >