Search Results

Search found 41338 results on 1654 pages for 'used'.

Page 86/1654 | < Previous Page | 82 83 84 85 86 87 88 89 90 91 92 93 | Next Page >

The blocking nature of aggregates

- by Rob Farley

I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here. In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

Read the article
The blocking nature of aggregates

- by Rob Farley

I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here. In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

Read the article
RIF PRD: Presentation syntax issues

- by Charles Young

Over Christmas I got to play a bit with the W3C RIF PRD and came across a few issues which I thought I would record for posterity. Specifically, I was working on a grammar for the presentation syntax using a GLR grammar parser tool (I was using the current CTP of ‘M’ (MGrammer) and Intellipad – I do so hope the MS guys don’t kill off M and Intellipad now they have dropped the other parts of SQL Server Modelling). I realise that the presentation syntax is non-normative and that any issues with it do not therefore compromise the standard. However, presentation syntax is useful in its own right, and it would be great to iron out any issues in a future revision of the standard. The main issues are actually not to do with the grammar at all, but rather with the ‘running example’ in the RIF PRD recommendation. I started with the code provided in Example 9.1. There are several discrepancies when compared with the EBNF rules documented in the standard. Broadly the problems can be categorised as follows: · Parenthesis mismatch – the wrong number of parentheses are used in various places. For example, in GoldRule, the RHS of the rule (the ‘Then’) is nested in the LHS (‘the If’). In NewCustomerAndWidgetRule, the RHS is orphaned from the LHS. Together with additional incorrect parenthesis, this leads to orphanage of UnknownStatusRule from the entire Document. · Invalid use of parenthesis in ‘Forall’ constructs. Parenthesis should not be used to enclose formulae. Removal of the invalid parenthesis gave me a feeling of inconsistency when comparing formulae in Forall to formulae in If. The use of parenthesis is not actually inconsistent in these two context, but in an If construct it ‘feels’ as if you are enclosing formulae in parenthesis in a LISP-like fashion. In reality, the parenthesis is simply being used to group subordinate syntax elements. The fact that an If construct can contain only a single formula as an immediate child adds to this feeling of inconsistency. · Invalid representation of compact URIs (CURIEs) in the context of Frame productions. In several places the URIs are not qualified with a namespace prefix (‘ex1:’). This conflicts with the definition of CURIEs in the RIF Datatypes and Built-Ins 1.0 document. Here are the productions: CURIE ::= PNAME_LN | PNAME_NS PNAME_LN ::= PNAME_NS PN_LOCAL PNAME_NS ::= PN_PREFIX? ':' PN_LOCAL ::= ( PN_CHARS_U | [0-9] ) ((PN_CHARS|'.')* PN_CHARS)? PN_CHARS ::= PN_CHARS_U | '-' | [0-9] | #x00B7 | [#x0300-#x036F] | [#x203F-#x2040] PN_CHARS_U ::= PN_CHARS_BASE | '_' PN_CHARS_BASE ::= [A-Z] | [a-z] | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x02FF] | [#x0370-#x037D] | [#x037F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] PN_PREFIX ::= PN_CHARS_BASE ((PN_CHARS|'.')* PN_CHARS)? The more I look at CURIEs, the more my head hurts! The RIF specification allows prefixes and colons without local names, which surprised me. However, the CURIE Syntax 1.0 working group note specifically states that this form is supported…and then promptly provides a syntactic definition that seems to preclude it! However, on (much) deeper inspection, it appears that ‘ex1:’ (for example) is allowed, but would really represent a ‘fragment’ of the ‘reference’, rather than a prefix! Ouch! This is so completely ambiguous that it surely calls into question the whole CURIE specification. In any case, RIF does not allow local names without a prefix. · Missing ‘External’ specifiers for built-in functions and predicates. The EBNF specification enforces this for terms within frames, but does not appear to enforce (what I believe is) the correct use of External on built-in predicates. In any case, the running example only specifies ‘External’ once on the predicate in UnknownStatusRule. External() is required in several other places. · The List used on the LHS of UnknownStatusRule is comma-delimited. This is not supported by the EBNF definition. Similarly, the argument list of pred:list-contains is illegally comma-delimited. · Unnecessary use of conjunction around a single formula in DiscountRule. This is strictly legal in the EBNF, but redundant. All the above issues concern the presentation syntax used in the running example. There are a few minor issues with the grammar itself. Note that Michael Kiefer stated in his paper “Rule Interchange Format: The Framework” that: “The presentation syntax of RIF … is an abstract syntax and, as such, it omits certain details that might be important for unambiguous parsing.” · The grammar cannot differentiate unambiguously between strategies and priorities on groups. A processor is forced to resolve this by detecting the use of IRIs and integers. This could easily be fixed in the grammar. · The grammar cannot unambiguously parse the ‘->’ operator in frames. Specifically, ‘-’ characters are allowed in PN_LOCAL names and hence a parser cannot determine if ‘status->’ is (‘status’ ‘->’) or (‘status-’ ‘>’). One way to fix this is to amend the PN_LOCAL production as follows: PN_LOCAL ::= ( PN_CHARS_U | [0-9] ) ((PN_CHARS|'.')* ((PN_CHARS)-('-')))? However, unilaterally changing the definition of this production, which is defined in the SPARQL Query Language for RDF specification, makes me uncomfortable. · I assume that the presentation syntax is case-sensitive. I couldn’t find this stated anywhere in the documentation, but function/predicate names do appear to be documented as being case-sensitive. · The EBNF does not specify whitespace handling. A couple of productions (RULE and ACTION_BLOCK) are crafted to enforce the use of whitespace. This is not necessary. It seems inconsistent with the rest of the specification and can cause parsing issues. In addition, the Const production exhibits whitespaces issues. The intention may have been to disallow the use of whitespace around ‘^^’, but any direct implementation of the EBNF will probably allow whitespace between ‘^^’ and the SYMSPACE. Of course, I am being a little nit-picking about all this. On the whole, the EBNF translated very smoothly and directly to ‘M’ (MGrammar) and proved to be fairly complete. I have encountered far worse issues when translating other EBNF specifications into usable grammars. I can’t imagine there would be any difficulty in implementing the same grammar in Antlr, COCO/R, gppg, XText, Bison, etc. A general observation, which repeats a point made above, is that the use of parenthesis in the presentation syntax can feel inconsistent and un-intuitive. It isn’t actually inconsistent, but I think the presentation syntax could be improved by adopting braces, rather than parenthesis, to delimit subordinate syntax elements in a similar way to so many programming languages. The familiarity of braces would communicate the structure of the syntax more clearly to people like me. If braces were adopted, parentheses could be retained around ‘var (frame | ‘new()’) constructs in action blocks. This use of parenthesis feels very LISP-like, and I think that this is my issue. It’s as if the presentation syntax represents the deformed love-child of LISP and C. In some places (specifically, action blocks), parenthesis is used in a LISP-like fashion. In other places it is used like braces in C. I find this quite confusing. Here is a corrected version of the running example (Example 9.1) in compliant presentation syntax: Document( Prefix( ex1 <http://example.com/2009/prd2> ) (* ex1:CheckoutRuleset *) Group rif:forwardChaining ( (* ex1:GoldRule *) Group 10 ( Forall ?customer such that And(?customer # ex1:Customer ?customer[ex1:status->"Silver"]) (Forall ?shoppingCart such that ?customer[ex1:shoppingCart->?shoppingCart] (If Exists ?value (And(?shoppingCart[ex1:value->?value] External(pred:numeric-greater-than-or-equal(?value 2000)))) Then Do(Modify(?customer[ex1:status->"Gold"]))))) (* ex1:DiscountRule *) Group ( Forall ?customer such that ?customer # ex1:Customer (If Or( ?customer[ex1:status->"Silver"] ?customer[ex1:status->"Gold"]) Then Do ((?s ?customer[ex1:shoppingCart-> ?s]) (?v ?s[ex1:value->?v]) Modify(?s [ex1:value->External(func:numeric-multiply (?v 0.95))])))) (* ex1:NewCustomerAndWidgetRule *) Group ( Forall ?customer such that And(?customer # ex1:Customer ?customer[ex1:status->"New"] ) (If Exists ?shoppingCart ?item (And(?customer[ex1:shoppingCart->?shoppingCart] ?shoppingCart[ex1:containsItem->?item] ?item # ex1:Widget ) ) Then Do( (?s ?customer[ex1:shoppingCart->?s]) (?val ?s[ex1:value->?val]) (?voucher ?customer[ex1:voucher->?voucher]) Retract(?customer[ex1:voucher->?voucher]) Retract(?voucher) Modify(?s[ex1:value->External(func:numeric-multiply(?val 0.90))])))) (* ex1:UnknownStatusRule *) Group ( Forall ?customer such that ?customer # ex1:Customer (If Not(Exists ?status (And(?customer[ex1:status->?status] External(pred:list-contains(List("New" "Bronze" "Silver" "Gold") ?status)) ))) Then Do( Execute(act:print(External(func:concat("New customer: " ?customer)))) Assert(?customer[ex1:status->"New"])))) ) ) I hope that helps someone out there :-)

Read the article
first install for windows eight.....da beta

- by raysmithequip

The W8 preview is now installed and I am enjoying it. I remember the learning curve of my first unix machine back in the eighties, this ain't that.It is normal for me to do the first os install with a keyboard and low end monitor...you never know what you'll encounter out in the field. The OS took like a fish to water. I used a low end INTEL motherboard dp55w I gathered on the cheap, an 1157 i5 from the used bin a pair of 6 gig ddr3 sticks, a rosewell 550 watt power supply a cheap used twenty buck sub 200g wd sata drive, a half working dvd burner and an asus fanless nvidia vid card, not a great one but Sub 50.00 on newey eggey...I did have to hunt the ms forums for a key and of course to activate the thing, if dos would of needed this outmoded ritual, we would still be on cpm and osborne would be a household name, of course little do people know that this ritual was common as far back as the seventies on att unix installs....not, but it was possible, I used to joke about when I ran a bbs, what hell would of been wrought had dos 3.2 machines been required to dial into my bbs to send fido mail to ms and wait for an acknowledgement. All in all the thing was pushing a seven on the ms richter scale, not including the vid card, sadly it came in at just a tad over three....I wanted to evaluate it for a possible replacement on critical machines that in the past went down due to a vid card fan failure....you have no idea what a customer thinks when you show them a failed vid card fan..."you mean that little plastic piece of junk caused all this!!??!!!"...yea man. Some production machines don't need any sort of vid, I will at least keep it on the maybe list for those, MTBF is a very important factor, some big box stores should put percentage of failure rate within 24 month estimates on the outside of the carton for sure. And a warning that the power supplies are already at their limit. Let's face it, today even 550w can be iffy.A few neat eye candy improvements over the earlier windows is nice, the metro screen is nice, anyone who has used a newer phone recently will intuitively drag their fingers across the screen....lot of good that was with no mouse or touch screen though. Lucky me, I have been using windows since day one, I still have a copy of win 2.0 (and every other version) for no good reason. Still the old ix collection of disks is much larger, recompiling any kernal is another silly ritual, same machine, different day, same recompile...argh. Rh is my all time fav, mandrake was always missing something, like it rewrote the init file or something, novell is ok as long as you stay on the beaten path and of course ubuntu normally recompiles with the same errors consistantly....makes life easy that way....no errors on windows eight, just a screen that did not match the installed hardware, natuarally I alt tabbed right out of it, then hit the flag key to find the start menu....no start button. I miss the start button already. Keyboard cowboy funnin and I was browsing the harddrive, nothing stunning there, I like that, means I can find stuff. Only I can't find what I want, the start button....the start menu is that first screen for touch tablets. No biggie for useruser, that is where they will want to be, I can see that. Admins won't want to be there, it is easy enough to get the control panel a bazzilion other ways though, just not the start button. (see a pattern here?). Personally, from the keyboard I find it fun to hit the carets along the location bar at the top of the explorer screen with tabs and arrows and choose SHOW ALL CONTROL PANEL ITEMS, or thereabouts. Bottom line, I love seven and I'll love eight even more!...very happy I did not have to follow the normal rule of thumb (a customer watching me build a system and asking questions said "oh I get it, so every piece you put in there is basically a hundred bucks, right?)...ok, sure, pretty much, more or less, well, ya dude. It will be WAY past october till I get a real touch screen but I did pick up a pair of cheap tatungs so I can try the NEW main start screen, I parse a lot of folders and have a vision of how a pair of touch screens will be easier than landing a rover on mars. Ok. fine, they are way smallish, and I don't expect multitouch to work but we are talking a few percent of a new 21 inch viewsonic touch screen. Will this OS be a game changer? I don't know. Bottom line with all the pads and droids in the world, it is more of a catch up move at first glance. Not something ms is used to. An app store? I can see ms's motivation, the others have it. I gather there will not be gadgets there, go ahead and see what ms did to the once populated gadget page...go ahead, google gadgets and take a gander, used to hundreds of gadgets, they are already gone. They replaced gadgets? sort of, I'll drop that, it's a bit of a sore point for me. More of interest was what happened when I downloaded stuff off codeplex and some other normal programs that I like, like orbitron, top o' my list!!...cardware it is...anyways, click on the exe, get a screen, normal for windows, this one indicated that I was not running a normal windows program and had a button for exit the install, naw, I hit details, a hidden run program anyways came into view....great, my path to the normal windows has detected a program tha.....yea ok, acl is on, fine, moving along I got orbitron installed in record time and was tracking the iss on the newest Microsoft OS, beta of course, felt like the first time I setup bsd all those year ago...FUN!!...I suppose I gotta start to think about budgeting for the real os when it comes out in october, by then I should have a rasberry pi and be done with fedora remixed. Of course that sounds like fun too!! I would use this OS on a tablet or phone. I don't like the idea of being hearded to an app store, don't like that on anything, we are americans and want real choices not marketed hype, lest you are younger with opm (other peoples money). This os would be neat on a zune, but I suspect the zune is a gonner, I am rooting for microsoft, after all their default password is not admin anymore, nor alpine, it's blank. Others force a password, my first fawn password was so long I could not even log into it with the password in front of me, who the heck uses %$# anyways, and if I was writing a brute force attack what the heck kinda impasse is that anyways at .00001 microseconds of a code execution cycle (just a non qualified number, not a real clock speed)....AI is where it will be before too long, MS is on that path, perhaps soon someone will sit down and write an app for the kinect that watches your eyes while you scan the new main start screen, clicking on the big E icon when you blink.....boy is that going to be fun!!!! sure. Blink,dammit,blink,dammit...... OPM no doubt.I like windows eight, we are moving forwards, better keep a close eye on ubuntu. The real clinch comes when open source becomes paid source......don't blink, I already see plenty of very expensive 'ix apps, some even in app stores already. more to come.......

Read the article
DTracing TCP congestion control

- by user12820842

In a previous post, I showed how we can use DTrace to probe TCP receive and send window events. TCP receive and send windows are in effect both about flow-controlling how much data can be received - the receive window reflects how much data the local TCP is prepared to receive, while the send window simply reflects the size of the receive window of the peer TCP. Both then represent flow control as imposed by the receiver. However, consider that without the sender imposing flow control, and a slow link to a peer, TCP will simply fill up it's window with sent segments. Dealing with multiple TCP implementations filling their peer TCP's receive windows in this manner, busy intermediate routers may drop some of these segments, leading to timeout and retransmission, which may again lead to drops. This is termed congestion, and TCP has multiple congestion control strategies. We can see that in this example, we need to have some way of adjusting how much data we send depending on how quickly we receive acknowledgement - if we get ACKs quickly, we can safely send more segments, but if acknowledgements come slowly, we should proceed with more caution. More generally, we need to implement flow control on the send side also. Slow Start and Congestion Avoidance From RFC2581, let's examine the relevant variables: "The congestion window (cwnd) is a sender-side limit on the amount of data the sender can transmit into the network before receiving an acknowledgment (ACK). Another state variable, the slow start threshold (ssthresh), is used to determine whether the slow start or congestion avoidance algorithm is used to control data transmission" Slow start is used to probe the network's ability to handle transmission bursts both when a connection is first created and when retransmission timers fire. The latter case is important, as the fact that we have effectively lost TCP data acts as a motivator for re-probing how much data the network can handle from the sending TCP. The congestion window (cwnd) is initialized to a relatively small value, generally a low multiple of the sending maximum segment size. When slow start kicks in, we will only send that number of bytes before waiting for acknowledgement. When acknowledgements are received, the congestion window is increased in size until cwnd reaches the slow start threshold ssthresh value. For most congestion control algorithms the window increases exponentially under slow start, assuming we receive acknowledgements. We send 1 segment, receive an ACK, increase the cwnd by 1 MSS to 2*MSS, send 2 segments, receive 2 ACKs, increase the cwnd by 2*MSS to 4*MSS, send 4 segments etc. When the congestion window exceeds the slow start threshold, congestion avoidance is used instead of slow start. During congestion avoidance, the congestion window is generally updated by one MSS for each round-trip-time as opposed to each ACK, and so cwnd growth is linear instead of exponential (we may receive multiple ACKs within a single RTT). This continues until congestion is detected. If a retransmit timer fires, congestion is assumed and the ssthresh value is reset. It is reset to a fraction of the number of bytes outstanding (unacknowledged) in the network. At the same time the congestion window is reset to a single max segment size. Thus, we initiate slow start until we start receiving acknowledgements again, at which point we can eventually flip over to congestion avoidance when cwnd ssthresh. Congestion control algorithms differ most in how they handle the other indication of congestion - duplicate ACKs. A duplicate ACK is a strong indication that data has been lost, since they often come from a receiver explicitly asking for a retransmission. In some cases, a duplicate ACK may be generated at the receiver as a result of packets arriving out-of-order, so it is sensible to wait for multiple duplicate ACKs before assuming packet loss rather than out-of-order delivery. This is termed fast retransmit (i.e. retransmit without waiting for the retransmission timer to expire). Note that on Oracle Solaris 11, the congestion control method used can be customized. See here for more details. In general, 3 or more duplicate ACKs indicate packet loss and should trigger fast retransmit . It's best not to revert to slow start in this case, as the fact that the receiver knew it was missing data suggests it has received data with a higher sequence number, so we know traffic is still flowing. Falling back to slow start would be excessive therefore, so fast recovery is used instead. Observing slow start and congestion avoidance The following script counts TCP segments sent when under slow start (cwnd ssthresh). #!/usr/sbin/dtrace -s #pragma D option quiet tcp:::connect-request / start[args[1]-cs_cid] == 0/ { start[args[1]-cs_cid] = 1; } tcp:::send / start[args[1]-cs_cid] == 1 && args[3]-tcps_cwnd tcps_cwnd_ssthresh / { @c["Slow start", args[2]-ip_daddr, args[4]-tcp_dport] = count(); } tcp:::send / start[args[1]-cs_cid] == 1 && args[3]-tcps_cwnd args[3]-tcps_cwnd_ssthresh / { @c["Congestion avoidance", args[2]-ip_daddr, args[4]-tcp_dport] = count(); } As we can see the script only works on connections initiated since it is started (using the start[] associative array with the connection ID as index to set whether it's a new connection (start[cid] = 1). From there we simply differentiate send events where cwnd ssthresh (congestion avoidance). Here's the output taken when I accessed a YouTube video (where rport is 80) and from an FTP session where I put a large file onto a remote system. # dtrace -s tcp_slow_start.d ^C ALGORITHM RADDR RPORT #SEG Slow start 10.153.125.222 20 6 Slow start 138.3.237.7 80 14 Slow start 10.153.125.222 21 18 Congestion avoidance 10.153.125.222 20 1164 We see that in the case of the YouTube video, slow start was exclusively used. Most of the segments we sent in that case were likely ACKs. Compare this case - where 14 segments were sent using slow start - to the FTP case, where only 6 segments were sent before we switched to congestion avoidance for 1164 segments. In the case of the FTP session, the FTP data on port 20 was predominantly sent with congestion avoidance in operation, while the FTP session relied exclusively on slow start. For the default congestion control algorithm - "newreno" - on Solaris 11, slow start will increase the cwnd by 1 MSS for every acknowledgement received, and by 1 MSS for each RTT in congestion avoidance mode. Different pluggable congestion control algorithms operate slightly differently. For example "highspeed" will update the slow start cwnd by the number of bytes ACKed rather than the MSS. And to finish, here's a neat oneliner to visually display the distribution of congestion window values for all TCP connections to a given remote port using a quantization. In this example, only port 80 is in use and we see the majority of cwnd values for that port are in the 4096-8191 range. # dtrace -n 'tcp:::send { @q[args[4]-tcp_dport] = quantize(args[3]-tcps_cwnd); }' dtrace: description 'tcp:::send ' matched 10 probes ^C 80 value ------------- Distribution ------------- count -1 | 0 0 |@@@@@@ 5 1 | 0 2 | 0 4 | 0 8 | 0 16 | 0 32 | 0 64 | 0 128 | 0 256 | 0 512 | 0 1024 | 0 2048 |@@@@@@@@@ 8 4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@ 23 8192 | 0

Read the article
Why do we (really) program to interfaces?

- by Kyle Burns

One of the earliest lessons I was taught in Enterprise development was "always program against an interface". This was back in the VB6 days and I quickly learned that no code would be allowed to move to the QA server unless my business objects and data access objects each are defined as an interface and have a matching implementation class. Why? "It's more reusable" was one answer. "It doesn't tie you to a specific implementation" a slightly more knowing answer. And let's not forget the discussion ending "it's a standard". The problem with these responses was that senior people didn't really understand the reason we were doing the things we were doing and because of that, we were entirely unable to realize the intent behind the practice - we simply used interfaces and had a bunch of extra code to maintain to show for it. It wasn't until a few years later that I finally heard the term "Inversion of Control". Simply put, "Inversion of Control" takes the creation of objects that used to be within the control (and therefore a responsibility of) of your component and moves it to some outside force. For example, consider the following code which follows the old "always program against an interface" rule in the manner of many corporate development shops: 1: ICatalog catalog = new Catalog(); 2: Category[] categories = catalog.GetCategories(); In this example, I met the requirement of the rule by declaring the variable as ICatalog, but I didn't hit "it doesn't tie you to a specific implementation" because I explicitly created an instance of the concrete Catalog object. If I want to test the functionality of the code I just wrote I have to have an environment in which Catalog can be created along with any of the resources upon which it depends (e.g. configuration files, database connections, etc) in order to test my functionality. That's a lot of setup work and one of the things that I think ultimately discourages real buy-in of unit testing in many development shops. So how do I test my code without needing Catalog to work? A very primitive approach I've seen is to change the line the instantiates catalog to read: 1: ICatalog catalog = new FakeCatalog(); once the test is run and passes, the code is switched back to the real thing. This obviously poses a huge risk for introducing test code into production and in my opinion is worse than just keeping the dependency and its associated setup work. Another popular approach is to make use of Factory methods which use an object whose "job" is to know how to obtain a valid instance of the object. Using this approach, the code may look something like this: 1: ICatalog catalog = CatalogFactory.GetCatalog(); The code inside the factory is responsible for deciding "what kind" of catalog is needed. This is a far better approach than the previous one, but it does make projects grow considerably because now in addition to the interface, the real implementation, and the fake implementation(s) for testing you have added a minimum of one factory (or at least a factory method) for each of your interfaces. Once again, developers say "that's too complicated and has me writing a bunch of useless code" and quietly slip back into just creating a new Catalog and chalking any test failures up to "it will probably work on the server". This is where software intended specifically to facilitate Inversion of Control comes into play. There are many libraries that take on the Inversion of Control responsibilities in .Net and most of them have many pros and cons. From this point forward I'll discuss concepts from the standpoint of the Unity framework produced by Microsoft's Patterns and Practices team. I'm primarily focusing on this library because it questions about it inspired this posting. At Unity's core and that of most any IoC framework is a catalog or registry of components. This registry can be configured either through code or using the application's configuration file and in the most simple terms says "interface X maps to concrete implementation Y". It can get much more complicated, but I want to keep things at the "what does it do" level instead of "how does it do it". The object that exposes most of the Unity functionality is the UnityContainer. This object exposes methods to configure the catalog as well as the Resolve<T> method which is used to obtain an instance of the type represented by T. When using the Resolve<T> method, Unity does not necessarily have to just "new up" the requested object, but also can track dependencies of that object and ensure that the entire dependency chain is satisfied. There are three basic ways that I have seen Unity used within projects. Those are through classes directly using the Unity container, classes requiring injection of dependencies, and classes making use of the Service Locator pattern. The first usage of Unity is when classes are aware of the Unity container and directly call its Resolve method whenever they need the services advertised by an interface. The up side of this approach is that IoC is utilized, but the down side is that every class has to be aware that Unity is being used and tied directly to that implementation. Many developers don't like the idea of as close a tie to specific IoC implementation as is represented by using Unity within all of your classes and for the most part I agree that this isn't a good idea. As an alternative, classes can be designed for Dependency Injection. Dependency Injection is where a force outside the class itself manipulates the object to provide implementations of the interfaces that the class needs to interact with the outside world. This is typically done either through constructor injection where the object has a constructor that accepts an instance of each interface it requires or through property setters accepting the service providers. When using dependency, I lean toward the use of constructor injection because I view the constructor as being a much better way to "discover" what is required for the instance to be ready for use. During resolution, Unity looks for an injection constructor and will attempt to resolve instances of each interface required by the constructor, throwing an exception of unable to meet the advertised needs of the class. The up side of this approach is that the needs of the class are very clearly advertised and the class is unaware of which IoC container (if any) is being used. The down side of this approach is that you're required to maintain the objects passed to the constructor as instance variables throughout the life of your object and that objects which coordinate with many external services require a lot of additional constructor arguments (this gets ugly and may indicate a need for refactoring). The final way that I've seen and used Unity is to make use of the ServiceLocator pattern, of which the Patterns and Practices team has also provided a Unity-compatible implementation. When using the ServiceLocator, your class calls ServiceLocator.Retrieve in places where it would have called Resolve on the Unity container. Like using Unity directly, it does tie you directly to the ServiceLocator implementation and makes your code aware that dependency injection is taking place, but it does have the up side of giving you the freedom to swap out the underlying IoC container if necessary. I'm not hugely concerned with hiding IoC entirely from the class (I view this as a "nice to have"), so the single biggest problem that I see with the ServiceLocator approach is that it provides no way to proactively advertise needs in the way that constructor injection does, allowing more opportunity for difficult to track runtime errors. This blog entry has not been intended in any way to be a definitive work on IoC, but rather as something to spur thought about why we program to interfaces and some ways to reach the intended value of the practice instead of having it just complicate your code. I hope that it helps somebody begin or continue a journey away from being a "Cargo Cult Programmer".

Read the article
Using R to Analyze G1GC Log Files

- by user12620111

Using R to Analyze G1GC Log Files body, td { font-family: sans-serif; background-color: white; font-size: 12px; margin: 8px; } tt, code, pre { font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace; } h1 { font-size:2.2em; } h2 { font-size:1.8em; } h3 { font-size:1.4em; } h4 { font-size:1.0em; } h5 { font-size:0.9em; } h6 { font-size:0.8em; } a:visited { color: rgb(50%, 0%, 50%); } pre { margin-top: 0; max-width: 95%; border: 1px solid #ccc; white-space: pre-wrap; } pre code { display: block; padding: 0.5em; } code.r, code.cpp { background-color: #F8F8F8; } table, td, th { border: none; } blockquote { color:#666666; margin:0; padding-left: 1em; border-left: 0.5em #EEE solid; } hr { height: 0px; border-bottom: none; border-top-width: thin; border-top-style: dotted; border-top-color: #999999; } @media print { * { background: transparent !important; color: black !important; filter:none !important; -ms-filter: none !important; } body { font-size:12pt; max-width:100%; } a, a:visited { text-decoration: underline; } hr { visibility: hidden; page-break-before: always; } pre, blockquote { padding-right: 1em; page-break-inside: avoid; } tr, img { page-break-inside: avoid; } img { max-width: 100% !important; } @page :left { margin: 15mm 20mm 15mm 10mm; } @page :right { margin: 15mm 10mm 15mm 20mm; } p, h2, h3 { orphans: 3; widows: 3; } h2, h3 { page-break-after: avoid; } } pre .operator, pre .paren { color: rgb(104, 118, 135) } pre .literal { color: rgb(88, 72, 246) } pre .number { color: rgb(0, 0, 205); } pre .comment { color: rgb(76, 136, 107); } pre .keyword { color: rgb(0, 0, 255); } pre .identifier { color: rgb(0, 0, 0); } pre .string { color: rgb(3, 106, 7); } var hljs=new function(){function m(p){return p.replace(/&/gm,"&").replace(/"}while(y.length||w.length){var v=u().splice(0,1)[0];z+=m(x.substr(q,v.offset-q));q=v.offset;if(v.event=="start"){z+=t(v.node);s.push(v.node)}else{if(v.event=="stop"){var p,r=s.length;do{r--;p=s[r];z+=("")}while(p!=v.node);s.splice(r,1);while(r'+M[0]+""}else{r+=M[0]}O=P.lR.lastIndex;M=P.lR.exec(L)}return r+L.substr(O,L.length-O)}function J(L,M){if(M.sL&&e[M.sL]){var r=d(M.sL,L);x+=r.keyword_count;return r.value}else{return F(L,M)}}function I(M,r){var L=M.cN?'':"";if(M.rB){y+=L;M.buffer=""}else{if(M.eB){y+=m(r)+L;M.buffer=""}else{y+=L;M.buffer=r}}D.push(M);A+=M.r}function G(N,M,Q){var R=D[D.length-1];if(Q){y+=J(R.buffer+N,R);return false}var P=q(M,R);if(P){y+=J(R.buffer+N,R);I(P,M);return P.rB}var L=v(D.length-1,M);if(L){var O=R.cN?"":"";if(R.rE){y+=J(R.buffer+N,R)+O}else{if(R.eE){y+=J(R.buffer+N,R)+O+m(M)}else{y+=J(R.buffer+N+M,R)+O}}while(L1){O=D[D.length-2].cN?"":"";y+=O;L--;D.length--}var r=D[D.length-1];D.length--;D[D.length-1].buffer="";if(r.starts){I(r.starts,"")}return R.rE}if(w(M,R)){throw"Illegal"}}var E=e[B];var D=[E.dM];var A=0;var x=0;var y="";try{var s,u=0;E.dM.buffer="";do{s=p(C,u);var t=G(s[0],s[1],s[2]);u+=s[0].length;if(!t){u+=s[1].length}}while(!s[2]);if(D.length1){throw"Illegal"}return{r:A,keyword_count:x,value:y}}catch(H){if(H=="Illegal"){return{r:0,keyword_count:0,value:m(C)}}else{throw H}}}function g(t){var p={keyword_count:0,r:0,value:m(t)};var r=p;for(var q in e){if(!e.hasOwnProperty(q)){continue}var s=d(q,t);s.language=q;if(s.keyword_count+s.rr.keyword_count+r.r){r=s}if(s.keyword_count+s.rp.keyword_count+p.r){r=p;p=s}}if(r.language){p.second_best=r}return p}function i(r,q,p){if(q){r=r.replace(/^((]+|\t)+)/gm,function(t,w,v,u){return w.replace(/\t/g,q)})}if(p){r=r.replace(/\n/g,"")}return r}function n(t,w,r){var x=h(t,r);var v=a(t);var y,s;if(v){y=d(v,x)}else{return}var q=c(t);if(q.length){s=document.createElement("pre");s.innerHTML=y.value;y.value=k(q,c(s),x)}y.value=i(y.value,w,r);var u=t.className;if(!u.match("(\\s|^)(language-)?"+v+"(\\s|$)")){u=u?(u+" "+v):v}if(/MSIE [678]/.test(navigator.userAgent)&&t.tagName=="CODE"&&t.parentNode.tagName=="PRE"){s=t.parentNode;var p=document.createElement("div");p.innerHTML=""+y.value+"";t=p.firstChild.firstChild;p.firstChild.cN=s.cN;s.parentNode.replaceChild(p.firstChild,s)}else{t.innerHTML=y.value}t.className=u;t.result={language:v,kw:y.keyword_count,re:y.r};if(y.second_best){t.second_best={language:y.second_best.language,kw:y.second_best.keyword_count,re:y.second_best.r}}}function o(){if(o.called){return}o.called=true;var r=document.getElementsByTagName("pre");for(var p=0;p|=||=||=|\\?|\\[|\\{|\\(|\\^|\\^=|\\||\\|=|\\|\\||~";this.ER="(?![\\s\\S])";this.BE={b:"\\\\.",r:0};this.ASM={cN:"string",b:"'",e:"'",i:"\\n",c:[this.BE],r:0};this.QSM={cN:"string",b:'"',e:'"',i:"\\n",c:[this.BE],r:0};this.CLCM={cN:"comment",b:"//",e:"$"};this.CBLCLM={cN:"comment",b:"/\\*",e:"\\*/"};this.HCM={cN:"comment",b:"#",e:"$"};this.NM={cN:"number",b:this.NR,r:0};this.CNM={cN:"number",b:this.CNR,r:0};this.BNM={cN:"number",b:this.BNR,r:0};this.inherit=function(r,s){var p={};for(var q in r){p[q]=r[q]}if(s){for(var q in s){p[q]=s[q]}}return p}}();hljs.LANGUAGES.cpp=function(){var a={keyword:{"false":1,"int":1,"float":1,"while":1,"private":1,"char":1,"catch":1,"export":1,virtual:1,operator:2,sizeof:2,dynamic_cast:2,typedef:2,const_cast:2,"const":1,struct:1,"for":1,static_cast:2,union:1,namespace:1,unsigned:1,"long":1,"throw":1,"volatile":2,"static":1,"protected":1,bool:1,template:1,mutable:1,"if":1,"public":1,friend:2,"do":1,"return":1,"goto":1,auto:1,"void":2,"enum":1,"else":1,"break":1,"new":1,extern:1,using:1,"true":1,"class":1,asm:1,"case":1,typeid:1,"short":1,reinterpret_cast:2,"default":1,"double":1,register:1,explicit:1,signed:1,typename:1,"try":1,"this":1,"switch":1,"continue":1,wchar_t:1,inline:1,"delete":1,alignof:1,char16_t:1,char32_t:1,constexpr:1,decltype:1,noexcept:1,nullptr:1,static_assert:1,thread_local:1,restrict:1,_Bool:1,complex:1},built_in:{std:1,string:1,cin:1,cout:1,cerr:1,clog:1,stringstream:1,istringstream:1,ostringstream:1,auto_ptr:1,deque:1,list:1,queue:1,stack:1,vector:1,map:1,set:1,bitset:1,multiset:1,multimap:1,unordered_set:1,unordered_map:1,unordered_multiset:1,unordered_multimap:1,array:1,shared_ptr:1}};return{dM:{k:a,i:"",k:a,r:10,c:["self"]}]}}}();hljs.LANGUAGES.r={dM:{c:[hljs.HCM,{cN:"number",b:"\\b0[xX][0-9a-fA-F]+[Li]?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+(?:[eE][+\\-]?\\d*)?L\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+\\.(?!\\d)(?:i\\b)?",e:hljs.IMMEDIATE_RE,r:1},{cN:"number",b:"\\b\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"keyword",b:"(?:tryCatch|library|setGeneric|setGroupGeneric)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\.",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\d+(?![\\w.])",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\b(?:function)",e:hljs.IMMEDIATE_RE,r:2},{cN:"keyword",b:"(?:if|in|break|next|repeat|else|for|return|switch|while|try|stop|warning|require|attach|detach|source|setMethod|setClass)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"literal",b:"(?:NA|NA_integer_|NA_real_|NA_character_|NA_complex_)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"literal",b:"(?:NULL|TRUE|FALSE|T|F|Inf|NaN)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"identifier",b:"[a-zA-Z.][a-zA-Z0-9._]*\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"operator",b:"|=|| Using R to Analyze G1GC Log Files Using R to Analyze G1GC Log Files Introduction Working in Oracle Platform Integration gives an engineer opportunities to work on a wide array of technologies. My team’s goal is to make Oracle applications run best on the Solaris/SPARC platform. When looking for bottlenecks in a modern applications, one needs to be aware of not only how the CPUs and operating system are executing, but also network, storage, and in some cases, the Java Virtual Machine. I was recently presented with about 1.5 GB of Java Garbage First Garbage Collector log file data. If you’re not familiar with the subject, you might want to review Garbage First Garbage Collector Tuning by Monica Beckwith. The customer had been running Java HotSpot 1.6.0_31 to host a web application server. I was told that the Solaris/SPARC server was running a Java process launched using a commmand line that included the following flags: -d64 -Xms9g -Xmx9g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintFlagsFinal -XX:+DisableExplicitGC -XX:+UnlockExperimentalVMOptions -XX:ParallelGCThreads=8 Several sources on the internet indicate that if I were to print out the 1.5 GB of log files, it would require enough paper to fill the bed of a pick up truck. Of course, it would be fruitless to try to scan the log files by hand. Tools will be required to summarize the contents of the log files. Others have encountered large Java garbage collection log files. There are existing tools to analyze the log files: IBM’s GC toolkit The chewiebug GCViewer gchisto HPjmeter Instead of using one of the other tools listed, I decide to parse the log files with standard Unix tools, and analyze the data with R. Data Cleansing The log files arrived in two different formats. I guess that the difference is that one set of log files was generated using a more verbose option, maybe -XX:+PrintHeapAtGC, and the other set of log files was generated without that option. Format 1 In some of the log files, the log files with the less verbose format, a single trace, i.e. the report of a singe garbage collection event, looks like this: {Heap before GC invocations=12280 (full 61): garbage-first heap total 9437184K, used 7499918K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 1 young (4096K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. 2014-05-14T07:24:00.988-0700: 60586.353: [GC pause (young) 7324M->7320M(9216M), 0.1567265 secs] Heap after GC invocations=12281 (full 61): garbage-first heap total 9437184K, used 7496533K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 0 young (0K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. } A simple grep can be used to extract a summary: $ grep "\[ GC pause (young" g1gc.log 2014-05-13T13:24:35.091-0700: 3.109: [GC pause (young) 20M->5029K(9216M), 0.0146328 secs] 2014-05-13T13:24:35.440-0700: 3.459: [GC pause (young) 9125K->6077K(9216M), 0.0086723 secs] 2014-05-13T13:24:37.581-0700: 5.599: [GC pause (young) 25M->8470K(9216M), 0.0203820 secs] 2014-05-13T13:24:42.686-0700: 10.704: [GC pause (young) 44M->15M(9216M), 0.0288848 secs] 2014-05-13T13:24:48.941-0700: 16.958: [GC pause (young) 51M->20M(9216M), 0.0491244 secs] 2014-05-13T13:24:56.049-0700: 24.066: [GC pause (young) 92M->26M(9216M), 0.0525368 secs] 2014-05-13T13:25:34.368-0700: 62.383: [GC pause (young) 602M->68M(9216M), 0.1721173 secs] But that format wasn't easily read into R, so I needed to be a bit more tricky. I used the following Unix command to create a summary file that was easy for R to read. $ echo "SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime" $ grep "\[GC pause (young" g1gc.log | grep -v mark | sed -e 's/[A-SU-z,]/ /g' -e 's/->/ /' -e 's/: / /g' | more SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime 2014-05-13T13:24:35.091-0700 3.109 20 5029 9216 0.0146328 2014-05-13T13:24:35.440-0700 3.459 9125 6077 9216 0.0086723 2014-05-13T13:24:37.581-0700 5.599 25 8470 9216 0.0203820 2014-05-13T13:24:42.686-0700 10.704 44 15 9216 0.0288848 2014-05-13T13:24:48.941-0700 16.958 51 20 9216 0.0491244 2014-05-13T13:24:56.049-0700 24.066 92 26 9216 0.0525368 2014-05-13T13:25:34.368-0700 62.383 602 68 9216 0.1721173 Format 2 In some of the log files, the log files with the more verbose format, a single trace, i.e. the report of a singe garbage collection event, was more complicated than Format 1. Here is a text file with an example of a single G1GC trace in the second format. As you can see, it is quite complicated. It is nice that there is so much information available, but the level of detail can be overwhelming. I wrote this awk script (download) to summarize each trace on a single line. #!/usr/bin/env awk -f BEGIN { printf("SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize\n") } ###################### # Save count data from lines that are at the start of each G1GC trace. # Each trace starts out like this: # {Heap before GC invocations=14 (full 0): # garbage-first heap total 9437184K, used 325496K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) ###################### /{Heap.*full/{ gsub ( "\\)" , "" ); nf=split($0,a,"="); split(a[2],b," "); getline; if ( match($0, "first") ) { G1GC=1; IncrementalCount=b[1]; FullCount=substr( b[3], 1, length(b[3])-1 ); } else { G1GC=0; } } ###################### # Pull out time stamps that are in lines with this format: # 2014-05-12T14:02:06.025-0700: 94.312: [GC pause (young), 0.08870154 secs] ###################### /GC pause/ { DateTime=$1; SecondsSinceLaunch=substr($2, 1, length($2)-1); } ###################### # Heap sizes are in lines that look like this: # [ 4842M->4838M(9216M)] ###################### /\[ .*]$/ { gsub ( "\\[" , "" ); gsub ( "\ \]" , "" ); gsub ( "->" , " " ); gsub ( "\$ " , " " ); gsub ( "\ $" , " " ); split($0,a," "); if ( split(a[1],b,"M") > 1 ) {BeforeSize=b[1]*1024;} if ( split(a[1],b,"K") > 1 ) {BeforeSize=b[1];} if ( split(a[2],b,"M") > 1 ) {AfterSize=b[1]*1024;} if ( split(a[2],b,"K") > 1 ) {AfterSize=b[1];} if ( split(a[3],b,"M") > 1 ) {TotalSize=b[1]*1024;} if ( split(a[3],b,"K") > 1 ) {TotalSize=b[1];} } ###################### # Emit an output line when you find input that looks like this: # [Times: user=1.41 sys=0.08, real=0.24 secs] ###################### /\[Times/ { if (G1GC==1) { gsub ( "," , "" ); split($2,a,"="); UserTime=a[2]; split($3,a,"="); SysTime=a[2]; split($4,a,"="); RealTime=a[2]; print DateTime,SecondsSinceLaunch,IncrementalCount,FullCount,UserTime,SysTime,RealTime,BeforeSize,AfterSize,TotalSize; G1GC=0; } } The resulting summary is about 25X smaller that the original file, but still difficult for a human to digest. SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ... 2014-05-12T18:36:34.669-0700: 3985.744 561 0 0.57 0.06 0.16 1724416 1720320 9437184 2014-05-12T18:36:34.839-0700: 3985.914 562 0 0.51 0.06 0.19 1724416 1720320 9437184 2014-05-12T18:36:35.069-0700: 3986.144 563 0 0.60 0.04 0.27 1724416 1721344 9437184 2014-05-12T18:36:35.354-0700: 3986.429 564 0 0.33 0.04 0.09 1725440 1722368 9437184 2014-05-12T18:36:35.545-0700: 3986.620 565 0 0.58 0.04 0.17 1726464 1722368 9437184 2014-05-12T18:36:35.726-0700: 3986.801 566 0 0.43 0.05 0.12 1726464 1722368 9437184 2014-05-12T18:36:35.856-0700: 3986.930 567 0 0.30 0.04 0.07 1726464 1723392 9437184 2014-05-12T18:36:35.947-0700: 3987.023 568 0 0.61 0.04 0.26 1727488 1723392 9437184 2014-05-12T18:36:36.228-0700: 3987.302 569 0 0.46 0.04 0.16 1731584 1724416 9437184 Reading the Data into R Once the GC log data had been cleansed, either by processing the first format with the shell script, or by processing the second format with the awk script, it was easy to read the data into R. g1gc.df = read.csv("summary.txt", row.names = NULL, stringsAsFactors=FALSE,sep="") str(g1gc.df) ## 'data.frame': 8307 obs. of 10 variables: ## $ row.names : chr "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ... ## $ SecondsSinceLaunch: num 1.16 1.47 1.97 3.83 6.1 ... ## $ IncrementalCount : int 0 1 2 3 4 5 6 7 8 9 ... ## $ FullCount : int 0 0 0 0 0 0 0 0 0 0 ... ## $ UserTime : num 0.11 0.05 0.04 0.21 0.08 0.26 0.31 0.33 0.34 0.56 ... ## $ SysTime : num 0.04 0.01 0.01 0.05 0.01 0.06 0.07 0.06 0.07 0.09 ... ## $ RealTime : num 0.02 0.02 0.01 0.04 0.02 0.04 0.05 0.04 0.04 0.06 ... ## $ BeforeSize : int 8192 5496 5768 22528 24576 43008 34816 53248 55296 93184 ... ## $ AfterSize : int 1400 1672 2557 4907 7072 14336 16384 18432 19456 21504 ... ## $ TotalSize : int 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 ... head(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount ## 1 2014-05-12T14:00:32.868-0700: 1.161 0 ## 2 2014-05-12T14:00:33.179-0700: 1.472 1 ## 3 2014-05-12T14:00:33.677-0700: 1.969 2 ## 4 2014-05-12T14:00:35.538-0700: 3.830 3 ## 5 2014-05-12T14:00:37.811-0700: 6.103 4 ## 6 2014-05-12T14:00:41.428-0700: 9.720 5 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 1 0 0.11 0.04 0.02 8192 1400 9437184 ## 2 0 0.05 0.01 0.02 5496 1672 9437184 ## 3 0 0.04 0.01 0.01 5768 2557 9437184 ## 4 0 0.21 0.05 0.04 22528 4907 9437184 ## 5 0 0.08 0.01 0.02 24576 7072 9437184 ## 6 0 0.26 0.06 0.04 43008 14336 9437184 Basic Statistics Once the data has been read into R, simple statistics are very easy to generate. All of the numbers from high school statistics are available via simple commands. For example, generate a summary of every column: summary(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount FullCount ## Length:8307 Min. : 1 Min. : 0 Min. : 0.0 ## Class :character 1st Qu.: 9977 1st Qu.:2048 1st Qu.: 0.0 ## Mode :character Median :12855 Median :4136 Median : 12.0 ## Mean :12527 Mean :4156 Mean : 31.6 ## 3rd Qu.:15758 3rd Qu.:6262 3rd Qu.: 61.0 ## Max. :55484 Max. :8391 Max. :113.0 ## UserTime SysTime RealTime BeforeSize ## Min. :0.040 Min. :0.0000 Min. : 0.0 Min. : 5476 ## 1st Qu.:0.470 1st Qu.:0.0300 1st Qu.: 0.1 1st Qu.:5137920 ## Median :0.620 Median :0.0300 Median : 0.1 Median :6574080 ## Mean :0.751 Mean :0.0355 Mean : 0.3 Mean :5841855 ## 3rd Qu.:0.920 3rd Qu.:0.0400 3rd Qu.: 0.2 3rd Qu.:7084032 ## Max. :3.370 Max. :1.5600 Max. :488.1 Max. :8696832 ## AfterSize TotalSize ## Min. : 1380 Min. :9437184 ## 1st Qu.:5002752 1st Qu.:9437184 ## Median :6559744 Median :9437184 ## Mean :5785454 Mean :9437184 ## 3rd Qu.:7054336 3rd Qu.:9437184 ## Max. :8482816 Max. :9437184 Q: What is the total amount of User CPU time spent in garbage collection? sum(g1gc.df$UserTime) ## [1] 6236 As you can see, less than two hours of CPU time was spent in garbage collection. Is that too much? To find the percentage of time spent in garbage collection, divide the number above by total_elapsed_time*CPU_count. In this case, there are a lot of CPU’s and it turns out the the overall amount of CPU time spent in garbage collection isn’t a problem when viewed in isolation. When calculating rates, i.e. events per unit time, you need to ask yourself if the rate is homogenous across the time period in the log file. Does the log file include spikes of high activity that should be separately analyzed? Averaging in data from nights and weekends with data from business hours may alias problems. If you have a reason to suspect that the garbage collection rates include peaks and valleys that need independent analysis, see the “Time Series” section, below. Q: How much garbage is collected on each pass? The amount of heap space that is recovered per GC pass is surprisingly low: At least one collection didn’t recover any data. (“Min.=0”) 25% of the passes recovered 3MB or less. (“1st Qu.=3072”) Half of the GC passes recovered 4MB or less. (“Median=4096”) The average amount recovered was 56MB. (“Mean=56390”) 75% of the passes recovered 36MB or less. (“3rd Qu.=36860”) At least one pass recovered 2GB. (“Max.=2121000”) g1gc.df$Delta = g1gc.df$BeforeSize - g1gc.df$AfterSize summary(g1gc.df$Delta) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0 3070 4100 56400 36900 2120000 Q: What is the maximum User CPU time for a single collection? The worst garbage collection (“Max.”) is many standard deviations away from the mean. The data appears to be right skewed. summary(g1gc.df$UserTime) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.040 0.470 0.620 0.751 0.920 3.370 sd(g1gc.df$UserTime) ## [1] 0.3966 Basic Graphics Once the data is in R, it is trivial to plot the data with formats including dot plots, line charts, bar charts (simple, stacked, grouped), pie charts, boxplots, scatter plots histograms, and kernel density plots. Histogram of User CPU Time per Collection I don't think that this graph requires any explanation. hist(g1gc.df$UserTime, main="User CPU Time per Collection", xlab="Seconds", ylab="Frequency") Box plot to identify outliers When the initial data is viewed with a box plot, you can see the one crazy outlier in the real time per GC. Save this data point for future analysis and drop the outlier so that it’s not throwing off our statistics. Now the box plot shows many outliers, which will be examined later, using times series analysis. Notice that the scale of the x-axis changes drastically once the crazy outlier is removed. par(mfrow=c(2,1)) boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(dominated by a crazy outlier)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") crazy.outlier.df=g1gc.df[g1gc.df$RealTime > 400,] g1gc.df=g1gc.df[g1gc.df$RealTime < 400,] boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(crazy outlier excluded)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") box(which = "outer", lty = "solid") Here is the crazy outlier for future analysis: crazy.outlier.df ## row.names SecondsSinceLaunch IncrementalCount ## 8233 2014-05-12T23:15:43.903-0700: 20741 8316 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 8233 112 0.55 0.42 488.1 8381440 8235008 9437184 ## Delta ## 8233 146432 R Time Series Data To analyze the garbage collection as a time series, I’ll use Z’s Ordered Observations (zoo). “zoo is the creator for an S3 class of indexed totally ordered observations which includes irregular time series.” require(zoo) ## Loading required package: zoo ## ## Attaching package: 'zoo' ## ## The following objects are masked from 'package:base': ## ## as.Date, as.Date.numeric head(g1gc.df[,1]) ## [1] "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" ## [3] "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ## [5] "2014-05-12T14:00:37.811-0700:" "2014-05-12T14:00:41.428-0700:" options("digits.secs"=3) times=as.POSIXct( g1gc.df[,1], format="%Y-%m-%dT%H:%M:%OS%z:") g1gc.z = zoo(g1gc.df[,-c(1)], order.by=times) head(g1gc.z) ## SecondsSinceLaunch IncrementalCount FullCount ## 2014-05-12 17:00:32.868 1.161 0 0 ## 2014-05-12 17:00:33.178 1.472 1 0 ## 2014-05-12 17:00:33.677 1.969 2 0 ## 2014-05-12 17:00:35.538 3.830 3 0 ## 2014-05-12 17:00:37.811 6.103 4 0 ## 2014-05-12 17:00:41.427 9.720 5 0 ## UserTime SysTime RealTime BeforeSize AfterSize ## 2014-05-12 17:00:32.868 0.11 0.04 0.02 8192 1400 ## 2014-05-12 17:00:33.178 0.05 0.01 0.02 5496 1672 ## 2014-05-12 17:00:33.677 0.04 0.01 0.01 5768 2557 ## 2014-05-12 17:00:35.538 0.21 0.05 0.04 22528 4907 ## 2014-05-12 17:00:37.811 0.08 0.01 0.02 24576 7072 ## 2014-05-12 17:00:41.427 0.26 0.06 0.04 43008 14336 ## TotalSize Delta ## 2014-05-12 17:00:32.868 9437184 6792 ## 2014-05-12 17:00:33.178 9437184 3824 ## 2014-05-12 17:00:33.677 9437184 3211 ## 2014-05-12 17:00:35.538 9437184 17621 ## 2014-05-12 17:00:37.811 9437184 17504 ## 2014-05-12 17:00:41.427 9437184 28672 Example of Two Benchmark Runs in One Log File The data in the following graph is from a different log file, not the one of primary interest to this article. I’m including this image because it is an example of idle periods followed by busy periods. It would be uninteresting to average the rate of garbage collection over the entire log file period. More interesting would be the rate of garbage collect in the two busy periods. Are they the same or different? Your production data may be similar, for example, bursts when employees return from lunch and idle times on weekend evenings, etc. Once the data is in an R Time Series, you can analyze isolated time windows. Clipping the Time Series data Flashing back to our test case… Viewing the data as a time series is interesting. You can see that the work intensive time period is between 9:00 PM and 3:00 AM. Lets clip the data to the interesting period: par(mfrow=c(2,1)) plot(g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Complete Log File", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") clipped.g1gc.z=window(g1gc.z, start=as.POSIXct("2014-05-12 21:00:00"), end=as.POSIXct("2014-05-13 03:00:00")) plot(clipped.g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Limited to Benchmark Execution", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") box(which = "outer", lty = "solid") Cumulative Incremental and Full GC count Here is the cumulative incremental and full GC count. When the line is very steep, it indicates that the GCs are repeating very quickly. Notice that the scale on the Y axis is different for full vs. incremental. plot(clipped.g1gc.z[,c(2:3)], main="Cumulative Incremental and Full GC count", xlab="Time of Day", col="#1b9e77") GC Analysis of Benchmark Execution using Time Series data In the following series of 3 graphs: The “After Size” show the amount of heap space in use after each garbage collection. Many Java objects are still referenced, i.e. alive, during each garbage collection. This may indicate that the application has a memory leak, or may indicate that the application has a very large memory footprint. Typically, an application's memory footprint plateau's in the early stage of execution. One would expect this graph to have a flat top. The steep decline in the heap space may indicate that the application crashed after 2:00. The second graph shows that the outliers in real execution time, discussed above, occur near 2:00. when the Java heap seems to be quite full. The third graph shows that Full GCs are infrequent during the first few hours of execution. The rate of Full GC's, (the slope of the cummulative Full GC line), changes near midnight. plot(clipped.g1gc.z[,c("AfterSize","RealTime","FullCount")], xlab="Time of Day", col=c("#1b9e77","red","#1b9e77")) GC Analysis of heap recovered Each GC trace includes the amount of heap space in use before and after the individual GC event. During garbage coolection, unreferenced objects are identified, the space holding the unreferenced objects is freed, and thus, the difference in before and after usage indicates how much space has been freed. The following box plot and bar chart both demonstrate the same point - the amount of heap space freed per garbage colloection is surprisingly low. par(mfrow=c(2,1)) boxplot(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", horizontal = TRUE, col="red") hist(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", breaks=100, col="red") box(which = "outer", lty = "solid") This graph is the most interesting. The dark blue area shows how much heap is occupied by referenced Java objects. This represents memory that holds live data. The red fringe at the top shows how much data was recovered after each garbage collection. barplot(clipped.g1gc.z[,c("AfterSize","Delta")], col=c("#7570b3","#e7298a"), xlab="Time of Day", border=NA) legend("topleft", c("Live Objects","Heap Recovered on GC"), fill=c("#7570b3","#e7298a")) box(which = "outer", lty = "solid") When I discuss the data in the log files with the customer, I will ask for an explaination for the large amount of referenced data resident in the Java heap. There are two are posibilities: There is a memory leak and the amount of space required to hold referenced objects will continue to grow, limited only by the maximum heap size. After the maximum heap size is reached, the JVM will throw an “Out of Memory” exception every time that the application tries to allocate a new object. If this is the case, the aplication needs to be debugged to identify why old objects are referenced when they are no longer needed. The application has a legitimate requirement to keep a large amount of data in memory. The customer may want to further increase the maximum heap size. Another possible solution would be to partition the application across multiple cluster nodes, where each node has responsibility for managing a unique subset of the data. Conclusion In conclusion, R is a very powerful tool for the analysis of Java garbage collection log files. The primary difficulty is data cleansing so that information can be read into an R data frame. Once the data has been read into R, a rich set of tools may be used for thorough evaluation.

Read the article
Troubleshooting High-CPU Utilization for SQL Server

- by Susantha Bathige

The objective of this FAQ is to outline the basic steps in troubleshooting high CPU utilization on a server hosting a SQL Server instance. The first and the most common step if you suspect high CPU utilization (or are alerted for it) is to login to the physical server and check the Windows Task Manager. The Performance tab will show the high utilization as shown below: Next, we need to determine which process is responsible for the high CPU consumption. The Processes tab of the Task Manager will show this information: Note that to see all processes you should select Show processes from all user. In this case, SQL Server (sqlserver.exe) is consuming 99% of the CPU (a normal benchmark for max CPU utilization is about 50-60%). Next we examine the scheduler data. Scheduler is a component of SQLOS which evenly distributes load amongst CPUs. The query below returns the important columns for CPU troubleshooting. Note – if your server is under severe stress and you are unable to login to SSMS, you can use another machine’s SSMS to login to the server through DAC – Dedicated Administrator Connection (see http://msdn.microsoft.com/en-us/library/ms189595.aspx for details on using DAC) SELECT scheduler_id ,cpu_id ,status ,runnable_tasks_count ,active_workers_count ,load_factor ,yield_count FROM sys.dm_os_schedulers WHERE scheduler_id See below for the BOL definitions for the above columns. scheduler_id – ID of the scheduler. All schedulers that are used to run regular queries have ID numbers less than 1048576. Those schedulers that have IDs greater than or equal to 1048576 are used internally by SQL Server, such as the dedicated administrator connection scheduler. cpu_id – ID of the CPU with which this scheduler is associated. status – Indicates the status of the scheduler. runnable_tasks_count – Number of workers, with tasks assigned to them that are waiting to be scheduled on the runnable queue. active_workers_count – Number of workers that are active. An active worker is never preemptive, must have an associated task, and is either running, runnable, or suspended. current_tasks_count - Number of current tasks that are associated with this scheduler. load_factor – Internal value that indicates the perceived load on this scheduler. yield_count – Internal value that is used to indicate progress on this scheduler. Now to interpret the above data. There are four schedulers and each assigned to a different CPU. All the CPUs are ready to accept user queries as they all are ONLINE. There are 294 active tasks in the output as per the current_tasks_count column. This count indicates how many activities currently associated with the schedulers. When a task is complete, this number is decremented. The 294 is quite a high figure and indicates all four schedulers are extremely busy. When a task is enqueued, the load_factor value is incremented. This value is used to determine whether a new task should be put on this scheduler or another scheduler. The new task will be allocated to less loaded scheduler by SQLOS. The very high value of this column indicates all the schedulers have a high load. There are 268 runnable tasks which mean all these tasks are assigned a worker and waiting to be scheduled on the runnable queue. The next step is to identify which queries are demanding a lot of CPU time. The below query is useful for this purpose (note, in its current form, it only shows the top 10 records). SELECT TOP 10 st.text ,st.dbid ,st.objectid ,qs.total_worker_time ,qs.last_worker_time ,qp.query_plan FROM sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp ORDER BY qs.total_worker_time DESC This query as total_worker_time as the measure of CPU load and is in descending order of the total_worker_time to show the most expensive queries and their plans at the top: Note the BOL definitions for the important columns: total_worker_time - Total amount of CPU time, in microseconds, that was consumed by executions of this plan since it was compiled. last_worker_time - CPU time, in microseconds, that was consumed the last time the plan was executed. I re-ran the same query again after few seconds and was returned the below output. After few seconds the SP dbo.TestProc1 is shown in fourth place and once again the last_worker_time is the highest. This means the procedure TestProc1 consumes a CPU time continuously each time it executes. In this case, the primary cause for high CPU utilization was a stored procedure. You can view the execution plan by clicking on query_plan column to investigate why this is causing a high CPU load. I have used SQL Server 2008 (SP1) to test all the queries used in this article.

Read the article
Errors while updating Ubuntu 13.10

- by santiago

When I execute the updater it always throws the same error saying: Failed to download repository information. Check your internet connection. And when I run the sudo apt-get update I get these lines at the end: W: Failed to fetch cdrom://Ubuntu 13.10 Saucy Salamander - Release i386 (20131016.1)/dists/saucy/main/binary-i386/Packages Please use apt-cdrom to make this CD-ROM recognized by APT. apt-get update cannot be used to add new CD-ROMs W: Failed to fetch cdrom://Ubuntu 13.10 Saucy Salamander - Release i386 (20131016.1)/dists/saucy/restricted/binary-i386/Packages Please use apt-cdrom to make this CD-ROM recognized by APT. apt-get update cannot be used to add new CD-ROMs W: Failed to fetch http://ppa.launchpad.net/tiheum/equinox/ubuntu/dists/saucy/main/source/Sources 404 Not Found W: Failed to fetch http://ppa.launchpad.net/tiheum/equinox/ubuntu/dists/saucy/main/binary-i386/Packages 404 Not Found E: Some index files failed to download. They have been ignored, or old ones used instead

Read the article
dsl modem problem in ubuntu 11.10

- by Misterious

Earlier I used Windows 7 and the modem used to work absolutely fine. Even when it disconnected it used to reconnect automatically without any trouble. But then I installed Ubuntu 11.10 on dual boot and set up the modem connection properly but the modem now disconnects much more frequently(eariler it disconnected once in 5 hours or so and after ubuntu in 5 minutes!!). Also once disconnected it does not reconnect even when i have checked the connect automatically button. I have to restart the system to reconnect it. Also then I clean installed windows and modem works perfectly fine again. What is the reason for this and how can I solve this? I really want to use ubuntu but due to this problem I cant. Sorry for my poor English.

Read the article
How to instrument existing ASP.NET application?

- by jkohlhepp

We have several highly complex ASP.NET web applications that are used internally by hundreds of users. We are trying to figure out which areas of the applications to invest in to improve functionality, but we aren't sure which screens/features are more heavily used. So, ideally, I'd like to find a way to add a layer of instrumentation to the applications that gathers metrics on which buttons are being clicked, which text boxes are being used, etc. Are there any products / open source apps out there that will do this sort of instrumentation for ASP.NET? Obviously I could do it myself manually by going into the code and injecting logging statements everywhere but this would be a significant amount of work that will be hard to accomplish.

Read the article
Ubuntu One Installation error - W:Failed to fetch cdrom://Ubunto 12.04.2 LTS_Precise

- by Chuck J

W:Failed to fetch cdrom://Ubunto 12.04.2 LTS_Precise Pangolin_-Release amd64(20130213)/dists/precise/main/binary-i386/Packages Please use apt-cdrom to make this CD-ROM recognized by APT.apt-get update cannont be used to add new CD-ROMs, W:Failed to fetch cdrom://Ubuntu 12.04.02 LTS_Precise Pangolin_-Release amd64(201302213)/dists/precise/restriced binary-i386/Packages Please use apt-cdrom to make this CD-ROM recognized by APT.-get update connot be used to add new CD-ROMs, E:Some index files failed to download they have been ignored, or old ones used instead. Yes my laptop CD-ROM is not working, and I ASSUME that has something to do with this install not working. I don't want to have to fix my CD-ROM drive to get this to install, and my BIOS does not support disabling it... Any idea's?

Read the article
C++ Pointers: Number of levels of Indirection

- by A B

In a C++ program that doesn't contain legacy C code, is there a guideline regarding the maximum number of levels of indirection that should be used in the source code? I know that in C (as opposed to C++), some programmers have used pointers to pointers for a multiple dimension array, but for the case of arrays, there are data structures in C++ that can be used to avoid the pointers to pointers. Are users who still create pointers to pointers (or more than this) trying to use pointers to pointers only for performance ETC. reasons? I have tried NOT to use any more than a pointer to a pointer, only in the case that a pointer needed modification; does anyone have any other official or unofficial guidelines or rules regarding the number of levels of indirection?

Read the article
What is so good about Linux? [closed]

- by Chris Bridgett

Self-explanatory question title. I've only ever used Windows OS's (except Mac OSX at friends etc, years ago occasionally) and when diving into the world of programming, Linux is a name that is coming up every other tutorial or article. All my web hosts run Linux and a lot of programming literature covers how to go about various tasks on Linux as well as Windows, but other than the odd raving I've read years ago about Linux being less resource-intensive, I haven't really given it much thought. Any article I read about Linux and whether it should be used for... 'regular' use, it's shunned since any windows applications I'm familiar with will usually require the windows API and there's no end of 'hacking' to get various programs working on Linux. As far as I understand a GUI is optional on Linux too? This all sounds very noobish I'm sure, but we all start somewhere, so: What is Linux good for? What should Linux be used for?

Read the article
Man pages not finding entry

- by Mike

So, I'm not sure what is going on with my system (ubuntu 12.04), but my man pages do not seem to be working. I try man gcc and get the following response No manual entry for gcc See 'man 7 undocumented' for help when manual pages are not available. However I see the man entry in /usr/share/man/man1/gcc.1.gz Here is what my /etc/manpath.config file looks like # manpath.config # # This file is used by the man-db package to configure the man and cat paths. # It is also used to provide a manpath for those without one by examining # their PATH environment variable. For details see the manpath(5) man page. # # Lines beginning with `#' are comments and are ignored. Any combination of # tabs or spaces may be used as `whitespace' separators. # # There are three mappings allowed in this file: # -------------------------------------------------------- # MANDATORY_MANPATH manpath_element # MANPATH_MAP path_element manpath_element # MANDB_MAP global_manpath [relative_catpath] #--------------------------------------------------------- # every automatically generated MANPATH includes these fields # #MANDATORY_MANPATH /usr/src/pvm3/man # MANDATORY_MANPATH /usr/man MANDATORY_MANPATH /usr/share/man MANDATORY_MANPATH /usr/local/share/man #--------------------------------------------------------- # set up PATH to MANPATH mapping # ie. what man tree holds man pages for what binary directory. # # *PATH* -> *MANPATH* # MANPATH_MAP /bin /usr/share/man MANPATH_MAP /usr/bin /usr/share/man MANPATH_MAP /sbin /usr/share/man MANPATH_MAP /usr/sbin /usr/share/man MANPATH_MAP /usr/local/bin /usr/local/man MANPATH_MAP /usr/local/bin /usr/local/share/man MANPATH_MAP /usr/local/sbin /usr/local/man MANPATH_MAP /usr/local/sbin /usr/local/share/man MANPATH_MAP /usr/X11R6/bin /usr/X11R6/man MANPATH_MAP /usr/bin/X11 /usr/X11R6/man MANPATH_MAP /usr/games /usr/share/man MANPATH_MAP /opt/bin /opt/man MANPATH_MAP /opt/sbin /opt/man #--------------------------------------------------------- # For a manpath element to be treated as a system manpath (as most of those # above should normally be), it must be mentioned below. Each line may have # an optional extra string indicating the catpath associated with the # manpath. If no catpath string is used, the catpath will default to the # given manpath. # # You *must* provide all system manpaths, including manpaths for alternate # operating systems, locale specific manpaths, and combinations of both, if # they exist, otherwise the permissions of the user running man/mandb will # be used to manipulate the manual pages. Also, mandb will not initialise # the database cache for any manpaths not mentioned below unless explicitly # requested to do so. # # In a per-user configuration file, this directive only controls the # location of catpaths and the creation of database caches; it has no effect # on privileges. # # Any manpaths that are subdirectories of other manpaths must be mentioned # *before* the containing manpath. E.g. /usr/man/preformat must be listed # before /usr/man. # # *MANPATH* -> *CATPATH* # MANDB_MAP /usr/man /var/cache/man/fsstnd MANDB_MAP /usr/share/man /var/cache/man MANDB_MAP /usr/local/man /var/cache/man/oldlocal MANDB_MAP /usr/local/share/man /var/cache/man/local MANDB_MAP /usr/X11R6/man /var/cache/man/X11R6 MANDB_MAP /opt/man /var/cache/man/opt # #--------------------------------------------------------- # Program definitions. These are commented out by default as the value # of the definition is already the default. To change: uncomment a # definition and modify it. # #DEFINE pager pager -s #DEFINE cat cat #DEFINE tr tr '\255\267\264\327' '\055\157\047\170' #DEFINE grep grep #DEFINE troff groff -mandoc #DEFINE nroff nroff -mandoc #DEFINE eqn eqn #DEFINE neqn neqn #DEFINE tbl tbl #DEFINE col col #DEFINE vgrind vgrind #DEFINE refer refer #DEFINE grap grap #DEFINE pic pic -S # #DEFINE compressor gzip -c7 #--------------------------------------------------------- # Misc definitions: same as program definitions above. # #DEFINE whatis_grep_flags -i #DEFINE apropos_grep_flags -iEw #DEFINE apropos_regex_grep_flags -iE #--------------------------------------------------------- # Section names. Manual sections will be searched in the order listed here; # the default is 1, n, l, 8, 3, 0, 2, 5, 4, 9, 6, 7. Multiple SECTION # directives may be given for clarity, and will be concatenated together in # the expected way. # If a particular extension is not in this list (say, 1mh), it will be # displayed with the rest of the section it belongs to. The effect of this # is that you only need to explicitly list extensions if you want to force a # particular order. Sections with extensions should usually be adjacent to # their main section (e.g. "1 1mh 8 ..."). # SECTION 1 n l 8 3 2 3posix 3pm 3perl 5 4 9 6 7 # #--------------------------------------------------------- # Range of terminal widths permitted when displaying cat pages. If the # terminal falls outside this range, cat pages will not be created (if # missing) or displayed. # #MINCATWIDTH 80 #MAXCATWIDTH 80 # # If CATWIDTH is set to a non-zero number, cat pages will always be # formatted for a terminal of the given width, regardless of the width of # the terminal actually being used. This should generally be within the # range set by MINCATWIDTH and MAXCATWIDTH. # #CATWIDTH 0 # #--------------------------------------------------------- # Flags. # NOCACHE keeps man from creating cat pages. #NOCACHE Thanks for any help (p.s. even 'man man' fails) Edit: When I run ls -l /usr/share/man/man1/gcc* I get the following output lrwxrwxrwx 1 root root 12 May 27 15:41 /usr/share/man/man1/gcc.1.gz -> gcc-4.6.1.gz -rw-r--r-- 1 root root 217776 Apr 15 17:34 /usr/share/man/man1/gcc-4.6.1.gz

Read the article
Is there any difference between storing textures and baked lighting for environment meshes?

- by Ben Hymers

I assume that when texturing environments, one or several textures will be used, and the UVs of the environment geometry will likely overlap on these textures, so that e.g. a tiling brick texture can be used by many parts of the environment, rather than UV unwrapping the entire thing, and having several areas of the texture be identical. If my assumption is wrong, please let me know! Now, when thinking about baking lighting, clearly this can't be done the same way - lighting in general will be unique to every face so the environment must be UV unwrapped without overlap, and lighting must be baked onto unique areas of one or several textures, to give each surface its own texture space to store its lighting. My questions are: Have I got this wrong? If so, how? Isn't baking lighting going to use a lot of texture space? Will the geometry need two UV sets, one used for the colour/normal texture and one for the lighting texture? Anything else you'd like to add? :)

Read the article
How do I change the default .htm file icon?

- by Michael Clayton

I really enjoy the look of UBUNTU. The only thing that I want to change is the default icon used for .html (.htm) files. I want to use the icon /usr/lib/firefox/browser/icons/mozicon128.png instead. I do not want to change any other visual element. Is there a practical way to accomplish this small change? edit: @Mitch, I've used assogiate in the past and although I was able to change the icon used for .mht files, I could not get it to change the .htm icon. @Anwar Shah, thanks for the information. I wish that it would work for me. Running 13.10 x86, after I do the copy of the icons, in the folders are a bunch of links to .svg files not actual graphics files. It does not appear that the second copy actually does anything on my system.

Read the article
C#/.NET Little Wonders: The Generic Func Delegates

- by James Michael Hare

Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here. Back in one of my three original “Little Wonders” Trilogy of posts, I had listed generic delegates as one of the Little Wonders of .NET. Later, someone posted a comment saying said that they would love more detail on the generic delegates and their uses, since my original entry just scratched the surface of them. Last week, I began our look at some of the handy generic delegates built into .NET with a description of delegates in general, and the Action family of delegates. For this week, I’ll launch into a look at the Func family of generic delegates and how they can be used to support generic, reusable algorithms and classes. Quick Delegate Recap Delegates are similar to function pointers in C++ in that they allow you to store a reference to a method. They can store references to either static or instance methods, and can actually be used to chain several methods together in one delegate. Delegates are very type-safe and can be satisfied with any standard method, anonymous method, or a lambda expression. They can also be null as well (refers to no method), so care should be taken to make sure that the delegate is not null before you invoke it. Delegates are defined using the keyword delegate, where the delegate’s type name is placed where you would typically place the method name: 1: // This delegate matches any method that takes string, returns nothing 2: public delegate void Log(string message); This delegate defines a delegate type named Log that can be used to store references to any method(s) that satisfies its signature (whether instance, static, lambda expression, etc.). Delegate instances then can be assigned zero (null) or more methods using the operator = which replaces the existing delegate chain, or by using the operator += which adds a method to the end of a delegate chain: 1: // creates a delegate instance named currentLogger defaulted to Console.WriteLine (static method) 2: Log currentLogger = Console.Out.WriteLine; 3: 4: // invokes the delegate, which writes to the console out 5: currentLogger("Hi Standard Out!"); 6: 7: // append a delegate to Console.Error.WriteLine to go to std error 8: currentLogger += Console.Error.WriteLine; 9: 10: // invokes the delegate chain and writes message to std out and std err 11: currentLogger("Hi Standard Out and Error!"); While delegates give us a lot of power, it can be cumbersome to re-create fairly standard delegate definitions repeatedly, for this purpose the generic delegates were introduced in various stages in .NET. These support various method types with particular signatures. Note: a caveat with generic delegates is that while they can support multiple parameters, they do not match methods that contains ref or out parameters. If you want to a delegate to represent methods that takes ref or out parameters, you will need to create a custom delegate. We’ve got the Func… delegates Just like it’s cousin, the Action delegate family, the Func delegate family gives us a lot of power to use generic delegates to make classes and algorithms more generic. Using them keeps us from having to define a new delegate type when need to make a class or algorithm generic. Remember that the point of the Action delegate family was to be able to perform an “action” on an item, with no return results. Thus Action delegates can be used to represent most methods that take 0 to 16 arguments but return void. You can assign a method The Func delegate family was introduced in .NET 3.5 with the advent of LINQ, and gives us the power to define a function that can be called on 0 to 16 arguments and returns a result. Thus, the main difference between Action and Func, from a delegate perspective, is that Actions return nothing, but Funcs return a result. The Func family of delegates have signatures as follows: Func<TResult> – matches a method that takes no arguments, and returns value of type TResult. Func<T, TResult> – matches a method that takes an argument of type T, and returns value of type TResult. Func<T1, T2, TResult> – matches a method that takes arguments of type T1 and T2, and returns value of type TResult. Func<T1, T2, …, TResult> – and so on up to 16 arguments, and returns value of type TResult. These are handy because they quickly allow you to be able to specify that a method or class you design will perform a function to produce a result as long as the method you specify meets the signature. For example, let’s say you were designing a generic aggregator, and you wanted to allow the user to define how the values will be aggregated into the result (i.e. Sum, Min, Max, etc…). To do this, we would ask the user of our class to pass in a method that would take the current total, the next value, and produce a new total. A class like this could look like: 1: public sealed class Aggregator<TValue, TResult> 2: { 3: // holds method that takes previous result, combines with next value, creates new result 4: private Func<TResult, TValue, TResult> _aggregationMethod; 5: 6: // gets or sets the current result of aggregation 7: public TResult Result { get; private set; } 8: 9: // construct the aggregator given the method to use to aggregate values 10: public Aggregator(Func<TResult, TValue, TResult> aggregationMethod = null) 11: { 12: if (aggregationMethod == null) throw new ArgumentNullException("aggregationMethod"); 13: 14: _aggregationMethod = aggregationMethod; 15: } 16: 17: // method to add next value 18: public void Aggregate(TValue nextValue) 19: { 20: // performs the aggregation method function on the current result and next and sets to current result 21: Result = _aggregationMethod(Result, nextValue); 22: } 23: } Of course, LINQ already has an Aggregate extension method, but that works on a sequence of IEnumerable<T>, whereas this is designed to work more with aggregating single results over time (such as keeping track of a max response time for a service). We could then use this generic aggregator to find the sum of a series of values over time, or the max of a series of values over time (among other things): 1: // creates an aggregator that adds the next to the total to sum the values 2: var sumAggregator = new Aggregator<int, int>((total, next) => total + next); 3: 4: // creates an aggregator (using static method) that returns the max of previous result and next 5: var maxAggregator = new Aggregator<int, int>(Math.Max); So, if we were timing the response time of a web method every time it was called, we could pass that response time to both of these aggregators to get an idea of the total time spent in that web method, and the max time spent in any one call to the web method: 1: // total will be 13 and max 13 2: int responseTime = 13; 3: sumAggregator.Aggregate(responseTime); 4: maxAggregator.Aggregate(responseTime); 5: 6: // total will be 20 and max still 13 7: responseTime = 7; 8: sumAggregator.Aggregate(responseTime); 9: maxAggregator.Aggregate(responseTime); 10: 11: // total will be 40 and max now 20 12: responseTime = 20; 13: sumAggregator.Aggregate(responseTime); 14: maxAggregator.Aggregate(responseTime); The Func delegate family is useful for making generic algorithms and classes, and in particular allows the caller of the method or user of the class to specify a function to be performed in order to generate a result. What is the result of a Func delegate chain? If you remember, we said earlier that you can assign multiple methods to a delegate by using the += operator to chain them. So how does this affect delegates such as Func that return a value, when applied to something like the code below? 1: Func<int, int, int> combo = null; 2: 3: // What if we wanted to aggregate the sum and max together? 4: combo += (total, next) => total + next; 5: combo += Math.Max; 6: 7: // what is the result? 8: var comboAggregator = new Aggregator<int, int>(combo); Well, in .NET if you chain multiple methods in a delegate, they will all get invoked, but the result of the delegate is the result of the last method invoked in the chain. Thus, this aggregator would always result in the Math.Max() result. The other chained method (the sum) gets executed first, but it’s result is thrown away: 1: // result is 13 2: int responseTime = 13; 3: comboAggregator.Aggregate(responseTime); 4: 5: // result is still 13 6: responseTime = 7; 7: comboAggregator.Aggregate(responseTime); 8: 9: // result is now 20 10: responseTime = 20; 11: comboAggregator.Aggregate(responseTime); So remember, you can chain multiple Func (or other delegates that return values) together, but if you do so you will only get the last executed result. Func delegates and co-variance/contra-variance in .NET 4.0 Just like the Action delegate, as of .NET 4.0, the Func delegate family is contra-variant on its arguments. In addition, it is co-variant on its return type. To support this, in .NET 4.0 the signatures of the Func delegates changed to: Func<out TResult> – matches a method that takes no arguments, and returns value of type TResult (or a more derived type). Func<in T, out TResult> – matches a method that takes an argument of type T (or a less derived type), and returns value of type TResult(or a more derived type). Func<in T1, in T2, out TResult> – matches a method that takes arguments of type T1 and T2 (or less derived types), and returns value of type TResult (or a more derived type). Func<in T1, in T2, …, out TResult> – and so on up to 16 arguments, and returns value of type TResult (or a more derived type). Notice the addition of the in and out keywords before each of the generic type placeholders. As we saw last week, the in keyword is used to specify that a generic type can be contra-variant -- it can match the given type or a type that is less derived. However, the out keyword, is used to specify that a generic type can be co-variant -- it can match the given type or a type that is more derived. On contra-variance, if you are saying you need an function that will accept a string, you can just as easily give it an function that accepts an object. In other words, if you say “give me an function that will process dogs”, I could pass you a method that will process any animal, because all dogs are animals. On the co-variance side, if you are saying you need a function that returns an object, you can just as easily pass it a function that returns a string because any string returned from the given method can be accepted by a delegate expecting an object result, since string is more derived. Once again, in other words, if you say “give me a method that creates an animal”, I can pass you a method that will create a dog, because all dogs are animals. It really all makes sense, you can pass a more specific thing to a less specific parameter, and you can return a more specific thing as a less specific result. In other words, pay attention to the direction the item travels (parameters go in, results come out). Keeping that in mind, you can always pass more specific things in and return more specific things out. For example, in the code below, we have a method that takes a Func<object> to generate an object, but we can pass it a Func<string> because the return type of object can obviously accept a return value of string as well: 1: // since Func<object> is co-variant, this will access Func<string>, etc... 2: public static string Sequence(int count, Func<object> generator) 3: { 4: var builder = new StringBuilder(); 5: 6: for (int i=0; i<count; i++) 7: { 8: object value = generator(); 9: builder.Append(value); 10: } 11: 12: return builder.ToString(); 13: } Even though the method above takes a Func<object>, we can pass a Func<string> because the TResult type placeholder is co-variant and accepts types that are more derived as well: 1: // delegate that's typed to return string. 2: Func<string> stringGenerator = () => DateTime.Now.ToString(); 3: 4: // This will work in .NET 4.0, but not in previous versions 5: Sequence(100, stringGenerator); Previous versions of .NET implemented some forms of co-variance and contra-variance before, but .NET 4.0 goes one step further and allows you to pass or assign an Func<A, BResult> to a Func<Y, ZResult> as long as A is less derived (or same) as Y, and BResult is more derived (or same) as ZResult. Sidebar: The Func and the Predicate A method that takes one argument and returns a bool is generally thought of as a predicate. Predicates are used to examine an item and determine whether that item satisfies a particular condition. Predicates are typically unary, but you may also have binary and other predicates as well. Predicates are often used to filter results, such as in the LINQ Where() extension method: 1: var numbers = new[] { 1, 2, 4, 13, 8, 10, 27 }; 2: 3: // call Where() using a predicate which determines if the number is even 4: var evens = numbers.Where(num => num % 2 == 0); As of .NET 3.5, predicates are typically represented as Func<T, bool> where T is the type of the item to examine. Previous to .NET 3.5, there was a Predicate<T> type that tended to be used (which we’ll discuss next week) and is still supported, but most developers recommend using Func<T, bool> now, as it prevents confusion with overloads that accept unary predicates and binary predicates, etc.: 1: // this seems more confusing as an overload set, because of Predicate vs Func 2: public static SomeMethod(Predicate<int> unaryPredicate) { } 3: public static SomeMethod(Func<int, int, bool> binaryPredicate) { } 4: 5: // this seems more consistent as an overload set, since just uses Func 6: public static SomeMethod(Func<int, bool> unaryPredicate) { } 7: public static SomeMethod(Func<int, int, bool> binaryPredicate) { } Also, even though Predicate<T> and Func<T, bool> match the same signatures, they are separate types! Thus you cannot assign a Predicate<T> instance to a Func<T, bool> instance and vice versa: 1: // the same method, lambda expression, etc can be assigned to both 2: Predicate<int> isEven = i => (i % 2) == 0; 3: Func<int, bool> alsoIsEven = i => (i % 2) == 0; 4: 5: // but the delegate instances cannot be directly assigned, strongly typed! 6: // ERROR: cannot convert type... 7: isEven = alsoIsEven; 8: 9: // however, you can assign by wrapping in a new instance: 10: isEven = new Predicate<int>(alsoIsEven); 11: alsoIsEven = new Func<int, bool>(isEven); So, the general advice that seems to come from most developers is that Predicate<T> is still supported, but we should use Func<T, bool> for consistency in .NET 3.5 and above. Sidebar: Func as a Generator for Unit Testing One area of difficulty in unit testing can be unit testing code that is based on time of day. We’d still want to unit test our code to make sure the logic is accurate, but we don’t want the results of our unit tests to be dependent on the time they are run. One way (of many) around this is to create an internal generator that will produce the “current” time of day. This would default to returning result from DateTime.Now (or some other method), but we could inject specific times for our unit testing. Generators are typically methods that return (generate) a value for use in a class/method. For example, say we are creating a CacheItem<T> class that represents an item in the cache, and we want to make sure the item shows as expired if the age is more than 30 seconds. Such a class could look like: 1: // responsible for maintaining an item of type T in the cache 2: public sealed class CacheItem<T> 3: { 4: // helper method that returns the current time 5: private static Func<DateTime> _timeGenerator = () => DateTime.Now; 6: 7: // allows internal access to the time generator 8: internal static Func<DateTime> TimeGenerator 9: { 10: get { return _timeGenerator; } 11: set { _timeGenerator = value; } 12: } 13: 14: // time the item was cached 15: public DateTime CachedTime { get; private set; } 16: 17: // the item cached 18: public T Value { get; private set; } 19: 20: // item is expired if older than 30 seconds 21: public bool IsExpired 22: { 23: get { return _timeGenerator() - CachedTime > TimeSpan.FromSeconds(30.0); } 24: } 25: 26: // creates the new cached item, setting cached time to "current" time 27: public CacheItem(T value) 28: { 29: Value = value; 30: CachedTime = _timeGenerator(); 31: } 32: } Then, we can use this construct to unit test our CacheItem<T> without any time dependencies: 1: var baseTime = DateTime.Now; 2: 3: // start with current time stored above (so doesn't drift) 4: CacheItem<int>.TimeGenerator = () => baseTime; 5: 6: var target = new CacheItem<int>(13); 7: 8: // now add 15 seconds, should still be non-expired 9: CacheItem<int>.TimeGenerator = () => baseTime.AddSeconds(15); 10: 11: Assert.IsFalse(target.IsExpired); 12: 13: // now add 31 seconds, should now be expired 14: CacheItem<int>.TimeGenerator = () => baseTime.AddSeconds(31); 15: 16: Assert.IsTrue(target.IsExpired); Now we can unit test for 1 second before, 1 second after, 1 millisecond before, 1 day after, etc. Func delegates can be a handy tool for this type of value generation to support more testable code. Summary Generic delegates give us a lot of power to make truly generic algorithms and classes. The Func family of delegates is a great way to be able to specify functions to calculate a result based on 0-16 arguments. Stay tuned in the weeks that follow for other generic delegates in the .NET Framework! Tweet Technorati Tags: .NET, C#, CSharp, Little Wonders, Generics, Func, Delegates

Read the article
Visual programming for serious software

- by Gerenuk

Are visual program control flow diagrams and languages which support that used for larger serious programs? Why not? They seem like a nice overview of the code. In the thread What software programming languages were used by the Soviet Union's space program? a visual language is mentioned (Drakon) and I wondered why such approaches aren't used more often? Is there nothing a visual control flow representation (I don't mean class diagrams etc.) which are 1-to-1 with code can help compared to typing in letters in an editor?

Read the article
Tips for adapting Date table to Power View forecasting #powerview #powerbi

- by Marco Russo (SQLBI)

During the keynote of the PASS Business Analytics Conference, Amir Netz presented the new forecasting capabilities in Power View for Office 365. I immediately tried the new feature (which was immediately available, a welcome surprise in a Microsoft announcement for a new release) and I had several issues trying to use existing data models. The forecasting has a few requirements that are not compatible with the “best practices” commonly used for a calendar table until this announcement. For example, if you have a Year-Month-Day hierarchy and you want to display a line chart aggregating data at the month level, you use a column containing month and year as a string (e.g. May 2014) sorted by a numeric column (such as 201405). Such a column cannot be used in the x-axis of a line chart for forecasting, because you need a date or numeric column. There are also other requirements and I wrote the article Prepare Data for Power View Forecasting in Power BI on SQLBI, describing how to create columns that can be used with the new forecasting capabilities in Power View for Office 365.

Read the article
Importing Debian Bugs to Launchpad

- by noddy

While reading about how the bugs from debian are imported to launchpad, I came across a blueprint https://blueprints.launchpad.net/launchpad/+spec/debian-bug-import which was used initially to import bugs from debian. I cannot find the script that was used to import them or the logic that was used. Did the people import all the bugs from debian or did they filter the bugs. And how are the bugs presently imported from debian to launchpad. I came across a script in launchpad which imports bugs from debian given certain bug numbers but I wanted to know whether there is some automation that exists for importing relevant debian bugs to launchpad. Thanks.

Read the article
DSL modem problem in 11.10

- by Misterious

Earlier I used Windows 7 and the modem used to work absolutely fine. Even when it got disconnected it used to reconnect automatically without any trouble. But then I installed Ubuntu 11.10 on dual-boot and set up the modem connection properly but the modem now disconnects much more frequently (earlier it disconnected once in 5 hours or so and after Ubuntu in 5 minutes!). Also once disconnected, it does not reconnect even when I have checked the "Connect Automatically" button. I have to restart the system to reconnect it. I the clean installed Windows and modem works perfectly fine again. What is the reason for this and how can I solve this? I really want to use Ubuntu but due to this problem I can't.

Read the article
Organization & Architecture UNISA Studies – Chap 5

- by MarkPearl

Learning Outcomes Describe the operation of a memory cell Explain the difference between DRAM and SRAM Discuss the different types of ROM Explain the concepts of a hard failure and a soft error respectively Describe SDRAM organization Semiconductor Main Memory The two traditional forms of RAM used in computers are DRAM and SRAM DRAM (Dynamic RAM) Divided into two technologies… Dynamic Static Dynamic RAM is made with cells that store data as charge on capacitors. The presence or absence of charge in a capacitor is interpreted as a binary 1 or 0. Because capacitors have natural tendency to discharge, dynamic RAM requires periodic charge refreshing to maintain data storage. The term dynamic refers to the tendency of the stored charge to leak away, even with power continuously applied. Although the DRAM cell is used to store a single bit (0 or 1), it is essentially an analogue device. The capacitor can store any charge value within a range, a threshold value determines whether the charge is interpreted as a 1 or 0. SRAM (Static RAM) SRAM is a digital device that uses the same logic elements used in the processor. In SRAM, binary values are stored using traditional flip flop logic configurations. SRAM will hold its data as along as power is supplied to it. Unlike DRAM, no refresh is required to retain data. SRAM vs. DRAM DRAM is simpler and smaller than SRAM. Thus it is more dense and less expensive than SRAM. The cost of the refreshing circuitry for DRAM needs to be considered, but if the machine requires a large amount of memory, DRAM turns out to be cheaper than SRAM. SRAMS are somewhat faster than DRAM, thus SRAM is generally used for cache memory and DRAM is used for main memory. Types of ROM Read Only Memory (ROM) contains a permanent pattern of data that cannot be changed. ROM is non volatile meaning no power source is required to maintain the bit values in memory. While it is possible to read a ROM, it is not possible to write new data into it. An important application of ROM is microprogramming, other applications include library subroutines for frequently wanted functions, System programs, Function tables. A ROM is created like any other integrated circuit chip, with the data actually wired into the chip as part of the fabrication process. To reduce costs of fabrication, we have PROMS. PROMS are… Written only once Non-volatile Written after fabrication Another variation of ROM is the read-mostly memory, which is useful for applications in which read operations are far more frequent than write operations, but for which non volatile storage is required. There are three common forms of read-mostly memory, namely… EPROM EEPROM Flash memory Error Correction Semiconductor memory is subject to errors, which can be classed into two categories… Hard failure – Permanent physical defect so that the memory cell or cells cannot reliably store data Soft failure – Random error that alters the contents of one or more memory cells without damaging the memory (common cause includes power supply issues, etc.) Most modern main memory systems include logic for both detecting and correcting errors. Error detection works as follows… When data is to be read into memory, a calculation is performed on the data to produce a code Both the code and the data are stored When the previously stored word is read out, the code is used to detect and possibly correct errors The error checking provides one of 3 possible results… No errors are detected – the fetched data bits are sent out An error is detected, and it is possible to correct the error. The data bits plus error correction bits are fed into a corrector, which produces a corrected set of bits to be sent out An error is detected, but it is not possible to correct it. This condition is reported Hamming Code See wiki for detailed explanation. We will probably need to know how to do a hemming code – refer to the textbook (pg. 188 – 189) Advanced DRAM organization One of the most critical system bottlenecks when using high-performance processors is the interface to main memory. This interface is the most important pathway in the entire computer system. The basic building block of main memory remains the DRAM chip. In recent years a number of enhancements to the basic DRAM architecture have been explored, and some of these are now on the market including… SDRAM (Synchronous DRAM) DDR-DRAM RDRAM SDRAM (Synchronous DRAM) SDRAM exchanges data with the processor synchronized to an external clock signal and running at the full speed of the processor/memory bus without imposing wait states. SDRAM employs a burst mode to eliminate the address setup time and row and column line precharge time after the first access In burst mode a series of data bits can be clocked out rapidly after the first bit has been accessed SDRAM has a multiple bank internal architecture that improves opportunities for on chip parallelism SDRAM performs best when it is transferring large blocks of data serially There is now an enhanced version of SDRAM known as double data rate SDRAM or DDR-SDRAM that overcomes the once-per-cycle limitation of SDRAM

Read the article
Do real developers use UML and other CASE tools?

- by Avi

I'm a CS student, currently a junior, and in one of my classes this semester they have us studying all sorts of UML diagramming methods. Among others, we've touched on Petri nets, DFD diagrams, sequence diagrams, use case diagrams, collaboration diagrams, Jackson System Development diagrams, entity-relation diagrams, and more. I've worked on more than a few professional projects over the years and never encountered anyone who used these systems to any great degree (other than a general class diagram or a diagram of the tables in a database). I was just wondering if I could query the hive mind to see if this is true in your work experience too. Have you used these models at all and found them to be as important as they tell us students they are? Or is all this stuff just academic ivory-tower crap that people in the real world hardly ever touch? Which of these systems have you found to be effective and useful? Are there specific kinds of scenarios that they are more intended to be used in than what the typical software developer encounters?

Read the article
Brief explanation for executables in a GNU/Clang Toolchain?

- by ZhangChn

I roughly understand that cc, ld and other parts are called in a certain sequence according to schemes like Makefiles etc. Some of those commands are used to generate those configs and Makefiles. And some other tools are used to deal with libraries. But what are other parts used for? How are they called in this process? Which tool would use various parser generators? Which part is optional? Why? Is there a brief summary get these explained on how the tools in a GNU or LLVM/Clang toolchain are organised and called in a C/C++ project building? Thanks in advance. EDIT: Here is a list of executables for Clang/LLVM on Mac OS X: ar clang dsymutil gperf libtool nmedit rpcgen unwinddump as clang++ dwarfdump gprof lorder otool segedit vgrind asa cmpdylib dyldinfo indent m4 pagestuff size what bison codesign_allocate flex install_name_tool mig ranlib strip yacc c++ ctags flex++ ld mkdep rebase unifdef cc ctf_insert gm4 lex nm redo_prebinding unifdefall

Read the article

Search Results

Search found 41338 results on 1654 pages for 'used'.

Page 86/1654 | < Previous Page | 82 83 84 85 86 87 88 89 90 91 92 93 | Next Page >

- by Rob Farley

- by Rob Farley

- by Charles Young

- by raysmithequip

- by user12820842

- by Kyle Burns

- by user12620111

- by Susantha Bathige

- by santiago

- by Misterious

- by jkohlhepp

- by Chuck J

- by A B

- by Chris Bridgett

- by Mike

- by Ben Hymers

- by Michael Clayton

- by James Michael Hare

- by Gerenuk

- by Marco Russo (SQLBI)

- by noddy

- by Misterious

- by MarkPearl

- by Avi

- by ZhangChn

< Previous Page | 82 83 84 85 86 87 88 89 90 91 92 93 | Next Page >