optimizing - Page 26 - Developer IT

Can my loop be optimized any more? (C++)

- by Sagekilla

Below is one of my inner loops that's run several thousand times, with input sizes of 20 - 1000 or more. Is there anything I can do to help squeeze any more performance out of this? I'm not looking to move this code to something like using tree codes (Barnes-Hut), but towards optimizing the actual calculations happening inside, since the same calculations occur in the Barnes-Hut algorithm. Any help is appreciated! typedef double real; struct Particle { Vector pos, vel, acc, jerk; Vector oldPos, oldVel, oldAcc, oldJerk; real mass; }; class Vector { private: real vec[3]; public: // Operators defined here }; real Gravity::interact(Particle *p, size_t numParticles) { PROFILE_FUNC(); real tau_q = 1e300; for (size_t i = 0; i < numParticles; i++) { p[i].jerk = 0; p[i].acc = 0; } for (size_t i = 0; i < numParticles; i++) { for (size_t j = i+1; j < numParticles; j++) { Vector r = p[j].pos - p[i].pos; Vector v = p[j].vel - p[i].vel; real r2 = lengthsq(r); real v2 = lengthsq(v); // Calculate inverse of |r|^3 real r3i = Constants::G * pow(r2, -1.5); // da = r / |r|^3 // dj = (v / |r|^3 - 3 * (r . v) * r / |r|^5 Vector da = r * r3i; Vector dj = (v - r * (3 * dot(r, v) / r2)) * r3i; // Calculate new acceleration and jerk p[i].acc += da * p[j].mass; p[i].jerk += dj * p[j].mass; p[j].acc -= da * p[i].mass; p[j].jerk -= dj * p[i].mass; // Collision estimation // Metric 1) tau = |r|^2 / |a(j) - a(i)| // Metric 2) tau = |r|^4 / |v|^4 real mij = p[i].mass + p[j].mass; real tau_est_q1 = r2 / (lengthsq(da) * mij * mij); real tau_est_q2 = (r2*r2) / (v2*v2); if (tau_est_q1 < tau_q) tau_q = tau_est_q1; if (tau_est_q2 < tau_q) tau_q = tau_est_q2; } } return sqrt(sqrt(tau_q)); }

Read the article

Precise explanation of JavaScript <-> DOM circular reference issue

- by Joey Adams

One of the touted advantages of jQuery.data versus raw expando properties (arbitrary attributes you can assign to DOM nodes) is that jQuery.data is "safe from circular references and therefore free from memory leaks". An article from Google titled "Optimizing JavaScript code" goes into more detail: The most common memory leaks for web applications involve circular references between the JavaScript script engine and the browsers' C++ objects' implementing the DOM (e.g. between the JavaScript script engine and Internet Explorer's COM infrastructure, or between the JavaScript engine and Firefox XPCOM infrastructure). It lists two examples of circular reference patterns: DOM element → event handler → closure scope → DOM DOM element → via expando → intermediary object → DOM element However, if a reference cycle between a DOM node and a JavaScript object produces a memory leak, doesn't this mean that any non-trivial event handler (e.g. onclick) will produce such a leak? I don't see how it's even possible for an event handler to avoid a reference cycle, because the way I see it: The DOM element references the event handler. The event handler references the DOM (either directly or indirectly). In any case, it's almost impossible to avoid referencing window in any interesting event handler, short of writing a setInterval loop that reads actions from a global queue. Can someone provide a precise explanation of the JavaScript ↔ DOM circular reference problem? Things I'd like clarified: What browsers are effected? A comment in the jQuery source specifically mentions IE6-7, but the Google article suggests Firefox is also affected. Are expando properties and event handlers somehow different concerning memory leaks? Or are both of these code snippets susceptible to the same kind of memory leak? // Create an expando that references to its own element. var elem = document.getElementById('foo'); elem.myself = elem; // Create an event handler that references its own element. var elem = document.getElementById('foo'); elem.onclick = function() { elem.style.display = 'none'; }; If a page leaks memory due to a circular reference, does the leak persist until the entire browser application is closed, or is the memory freed when the window/tab is closed?

Read the article

Scalable Database Tagging Schema

- by Longpoke

EDIT: To people building tagging systems. Don't read this. It is not what you are looking for. I asked this when I wasn't aware that RDBMS all have their own optimization methods, just use a simple many to many scheme. I have a posting system that has millions of posts. Each post can have an infinite number of tags associated with it. Users can create tags which have notes, date created, owner, etc. A tag is almost like a post itself, because people can post notes about the tag. Each tag association has an owner and date, so we can see who added the tag and when. My question is how can I implement this? It has to be fast searching posts by tag, or tags by post. Also, users can add tags to posts by typing the name into a field, kind of like the google search bar, it has to fill in the rest of the tag name for you. I have 3 solutions at the moment, but not sure which is the best, or if there is a better way. Note that I'm not showing the layout of notes since it will be trivial once I get a proper solution for tags. Method 1. Linked list tagId in post points to a linked list in tag_assoc, the application must traverse the list until flink=0 post: id, content, ownerId, date, tagId, notesId tag_assoc: id, tagId, ownerId, flink tag: id, name, notesId Method 2. Denormalization tags is simply a VARCHAR or TEXT field containing a tab delimited array of tagId:ownerId. It cannot be a fixed size. post: id, content, ownerId, date, tags, notesId tag: id, name, notesId Method 3. Toxi (from: http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html, also same thing here: http://stackoverflow.com/questions/20856/how-do-you-recommend-implementing-tags-or-tagging) post: id, content, ownerId, date, notesId tag_assoc: ownerId, tagId, postId tag: id, name, notesId Method 3 raises the question, how fast will it be to iterate through every single row in tag_assoc? Methods 1 and 2 should be fast for returning tags by post, but for posts by tag, another lookup table must be made. The last thing I have to worry about is optimizing searching tags by name, I have not worked that out yet. I made an ASCII diagram here: http://pastebin.com/f1c4e0e53

Read the article

SEO Help with Pages Indexed by Google

- by Joe Majewski

I'm working on optimizing my site for Google's search engine, and lately I've noticed that when doing a "site:www.joemajewski.com" query, I get results for pages that shouldn't be indexed at all. Let's take a look at this page, for example: http://www.joemajewski.com/wow/profile.php?id=3 I created my own CMS, and this is simply a breakdown of user id #3's statistics, which I noticed is indexed by Google, although it shouldn't be. I understand that it takes some time before Google's results reflect accurately on my site's content, but this has been improperly indexed for nearly six months now. Here are the precautions that I have taken: My robots.txt file has a line like this: Disallow: /wow/profile.php* When running the url through Google Webmaster Tools, it indicates that I did, indeed, correctly create the disallow command. It did state, however, that a page that doesn't get crawled may still get displayed in the search results if it's being linked to. Thus, I took one more precaution. In the source code I included the following meta data: <meta name="robots" content="noindex,follow" /> I am assuming that follow means to use the page when calculating PageRank, etc, and the noindex tells Google to not display the page in the search results. This page, profile.php, is used to take the $_GET['id'] and find the corresponding registered user. It displays a bit of information about that user, but is in no way relevant enough to warrant a display in the search results, so that is why I am trying to stop Google from indexing it. This is not the only page Google is indexing that I would like removed. I also have a WordPress blog, and there are many category pages, tag pages, and archive pages that I would like removed, and am doing the same procedures to attempt to remove them. Can someone explain how to get pages removed from Google's search results, and possibly some criteria that should help determine what types of pages that I don't want indexed. In terms of my WordPress blog, the only pages that I truly want indexed are my articles. Everything else I have tried to block, with little luck from Google. Can someone also explain why it's bad to have pages indexed that don't provide any new or relevant content, such as pages for WordPress tags or categories, which are clearly never going to receive traffic from Google. Thanks!

Read the article

Is this implementation truely tail-recursive?

- by CFP

Hello everyone! I've come up with the following code to compute in a tail-recursive way the result of an expression such as 3 4 * 1 + cos 8 * (aka 8*cos(1+(3*4))) The code is in OCaml. I'm using a list refto emulate a stack. type token = Num of float | Fun of (float->float) | Op of (float->float->float);; let pop l = let top = (List.hd !l) in l := List.tl (!l); top;; let push x l = l := (x::!l);; let empty l = (l = []);; let pile = ref [];; let eval data = let stack = ref data in let rec _eval cont = match (pop stack) with | Num(n) -> cont n; | Fun(f) -> _eval (fun x -> cont (f x)); | Op(op) -> _eval (fun x -> cont (op x (_eval (fun y->y)))); in _eval (fun x->x) ;; eval [Fun(fun x -> x**2.); Op(fun x y -> x+.y); Num(1.); Num(3.)];; I've used continuations to ensure tail-recursion, but since my stack implements some sort of a tree, and therefore provides quite a bad interface to what should be handled as a disjoint union type, the call to my function to evaluate the left branch with an identity continuation somehow irks a little. Yet it's working perfectly, but I have the feeling than in calling the _eval (fun y->y) bit, there must be something wrong happening, since it doesn't seem that this call can replace the previous one in the stack structure... Am I misunderstanding something here? I mean, I understand that with only the first call to _eval there wouldn't be any problem optimizing the calls, but here it seems to me that evaluation the _eval (fun y->y) will require to be stacked up, and therefore will fill the stack, possibly leading to an overflow... Thanks!

Read the article

Optimize slow ranking query

- by Juan Pablo Califano

I need to optimize a query for a ranking that is taking forever (the query itself works, but I know it's awful and I've just tried it with a good number of records and it gives a timeout). I'll briefly explain the model. I have 3 tables: player, team and player_team. I have players, that can belong to a team. Obvious as it sounds, players are stored in the player table and teams in team. In my app, each player can switch teams at any time, and a log has to be mantained. However, a player is considered to belong to only one team at a given time. The current team of a player is the last one he's joined. The structure of player and team is not relevant, I think. I have an id column PK in each. In player_team I have: id (PK) player_id (FK -> player.id) team_id (FK -> team.id) Now, each team is assigned a point for each player that has joined. So, now, I want to get a ranking of the first N teams with the biggest number of players. My first idea was to get first the current players from player_team (that is one record top for each player; this record must be the player's current team). I failed to find a simple way to do it (tried GROUP BY player_team.player_id HAVING player_team.id = MAX(player_team.id), but that didn't cut it. I tried a number of querys that didn't work, but managed to get this working. SELECT COUNT(*) AS total, pt.team_id, p.facebook_uid AS owner_uid, t.color FROM player_team pt JOIN player p ON (p.id = pt.player_id) JOIN team t ON (t.id = pt.team_id) WHERE pt.id IN ( SELECT max(J.id) FROM player_team J GROUP BY J.player_id ) GROUP BY pt.team_id ORDER BY total DESC LIMIT 50 As I said, it works but looks very bad and performs worse, so I'm sure there must be a better way to go. Anyone has any ideas for optimizing this? I'm using mysql, by the way. Thanks in advance Adding the explain. (Sorry, not sure how to format it properly) id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY t ALL PRIMARY NULL NULL NULL 5000 Using temporary; Using filesort 1 PRIMARY pt ref FKplayer_pt77082,FKplayer_pt265938,new_index FKplayer_pt77082 4 t.id 30 Using where 1 PRIMARY p eq_ref PRIMARY PRIMARY 4 pt.player_id 1 2 DEPENDENT SUBQUERY J index NULL new_index 8 NULL 150000 Using index

Read the article

MySQL search for user and their roles

- by Jenkz

I am re-writing the SQL which lets a user search for any other user on our site and also shows their roles. An an example, roles can be "Writer", "Editor", "Publisher". Each role links a User to a Publication. Users can take multiple roles within multiple publications. Example table setup: "users" : user_id, firstname, lastname "publications" : publication_id, name "link_writers" : user_id, publication_id "link_editors" : user_id, publication_id Current psuedo SQL: SELECT * FROM ( (SELECT user_id FROM users WHERE firstname LIKE '%Jenkz%') UNION (SELECT user_id FROM users WHERE lastname LIKE '%Jenkz%') ) AS dt JOIN (ROLES STATEMENT) AS roles ON roles.user_id = dt.user_id At the moment my roles statement is: SELECT dt2.user_id, dt2.publication_id, dt.role FROM ( (SELECT 'writer' AS role, link_writers.user_id, link_writers.publication_id FROM link_writers) UNION (SELECT 'editor' AS role, link_editors.user_id, link_editors.publication_id FROM link_editors) ) AS dt2 The reason for wrapping the roles statement in UNION clauses is that some roles are more complex and require a table join to find the publication_id and user_id. As an example "publishers" might be linked accross two tables "link_publishers": user_id, publisher_group_id "link_publisher_groups": publisher_group_id, publication_id So in that instance, the query forming part of my UNION would be: SELECT 'publisher' AS role, link_publishers.user_id, link_publisher_groups.publication_id FROM link_publishers JOIN link_publisher_groups ON lpg.group_id = lp.group_id I'm pretty confident that my table setup is good (I was warned off the one-table-for-all system when researching the layout). My problem is that there are now 100,000 rows in the users table and upto 70,000 rows in each of the link tables. Initial lookup in the users table is fast, but the joining really slows things down. How can I only join on the relevant roles? -------------------------- EDIT ---------------------------------- Explain above (open in a new window to see full resolution). The bottom bit in red, is the "WHERE firstname LIKE '%Jenkz%'" the third row searches WHERE CONCAT(firstname, ' ', lastname) LIKE '%Jenkz%'. Hence the large row count, but I think this is unavoidable, unless there is a way to put an index accross concatenated fields? The green bit at the top just shows the total rows scanned from the ROLES STATEMENT. You can then see each individual UNION clause (#6 - #12) which all show a large number of rows. Some of the indexes are normal, some are unique. It seems that MySQL isn't optimizing to use the dt.user_id as a comparison for the internal of the UNION statements. Is there any way to force this behaviour? Please note that my real setup is not publications and writers but "webmasters", "players", "teams" etc.

Read the article

Making an efficient algorithm

- by James P.

Here's my recent submission for the FB programming contest (qualifying round only requires to upload program output so source code doesn't matter). The objective is to find two squares that add up to a given value. I've left it as it is as an example. It does the job but is too slow for my liking. Here's the points that are obviously eating up time: List of squares is being recalculated for each call of getNumOfDoubleSquares(). This could be precalculated or extended when needed. Both squares are being checked for when it is only necessary to check for one (complements). There might be a more efficient way than a double-nested loop to find pairs. Other suggestions? Besides this particular problem, what do you look for when optimizing an algorithm? public static int getNumOfDoubleSquares( Integer target ){ int num = 0; ArrayList<Integer> squares = new ArrayList<Integer>(); ArrayList<Integer> found = new ArrayList<Integer>(); int squareValue = 0; for( int j=0; squareValue<=target; j++ ){ squares.add(j, squareValue); squareValue = (int)Math.pow(j+1,2); } int squareSum = 0; System.out.println( "Target=" + target ); for( int i = 0; i < squares.size(); i++ ){ int square1 = squares.get(i); for( int j = 0; j < squares.size(); j++ ){ int square2 = squares.get(j); squareSum = square1 + square2; if( squareSum == target && !found.contains( square1 ) && !found.contains( square2 ) ){ found.add(square1); found.add(square2); System.out.println( "Found !" + square1 +"+"+ square2 +"="+ squareSum); num++; } } } return num; }

Read the article

JQuery performance issue (Or just bad CODING!)

- by ferronrsmith

Read the article

Have I pushed the limits of my current VPS or is there room for optimization?

- by JRameau

I am currently on a mediatemple DV server (basic) 512mb dedicated ram, this is a CentOS based VPS with Plesk and Virtuozzo. My experience with it from day 1 has been bad and I only could sooth my server issues with several caching "Band-aids," but my sites are not as small as they were a year ago either so the issues have worsen. I have 3 Drupal installs running on separate (plesk) domains, 1 of those drupal installs is a multisite, that consists of 5-6 sites 2 of those sites are bringing in actual traffic. Those caching "Band-aids" I mentioned are APC, which seemed to help alot initially, and Drupal's Boost, which is considered a poorman's Varnish, it makes all my pages static for anonymous users. Last 30day combined estimate on Google Ananlytics: 90k visitors 260k pageviews. Issue: alot of downtime, I am continually checking if my sites are up, and lately I have been finding it down more than 3 times daily. Restarting Apache will bring it back up, for some time. I have google search every error message and looked up ways to optimize my DV server, and I am beyond stump what is my next move. Is this server bad, have I hit a impossibly low restriction such as the 12mb kernel memory barrier (kmemsize), is it on my end, do I need to optimize some more? *I have provided as much information as I can below, any help or suggestions given will be appreciated Common Error messages I see in the log: [error] (12)Cannot allocate memory: fork: Unable to fork new process [error] make_obcallback: could not import mod_python.apache.\n Traceback (most recent call last): File "/usr/lib/python2.4/site-packages/mod_python/apache.py", line 21, in ? import traceback File "/usr/lib/python2.4/traceback.py", line 3, in ? import linecache ImportError: No module named linecache [error] python_handler: no interpreter callback found. [warn-phpd] mmap cache can't open /var/www/vhosts/***/httpdocs/*** - Too many open files in system (pid ***) [alert] Child 8125 returned a Fatal error... Apache is exiting! [emerg] (43)Identifier removed: couldn't grab the accept mutex [emerg] (22)Invalid argument: couldn't release the accept mutex cat /proc/user_beancounters: Version: 2.5 uid resource held maxheld barrier limit failcnt 41548: kmemsize 4582652 5306699 12288832 13517715 21105036 lockedpages 0 0 600 600 0 privvmpages 38151 42676 229036 249036 0 shmpages 16274 16274 17237 17237 2 dummy 0 0 0 0 0 numproc 43 46 300 300 0 physpages 27260 29528 0 2147483647 0 vmguarpages 0 0 131072 2147483647 0 oomguarpages 27270 29538 131072 2147483647 0 numtcpsock 21 29 300 300 0 numflock 8 8 480 528 0 numpty 1 1 30 30 0 numsiginfo 0 1 1024 1024 0 tcpsndbuf 648440 675272 2867477 4096277 1711499 tcprcvbuf 301620 359716 2867477 4096277 0 othersockbuf 4472 4472 1433738 2662538 0 dgramrcvbuf 0 0 1433738 1433738 0 numothersock 12 12 300 300 0 dcachesize 0 0 2684271 2764800 0 numfile 3447 3496 6300 6300 3872 dummy 0 0 0 0 0 dummy 0 0 0 0 0 dummy 0 0 0 0 0 numiptent 14 14 200 200 0 TOP: (In January the load avg was really high 3-10, I was able to bring it down where it is currently is by giving APC more memory play around with) top - 16:46:07 up 2:13, 1 user, load average: 0.34, 0.20, 0.20 Tasks: 40 total, 2 running, 37 sleeping, 0 stopped, 1 zombie Cpu(s): 0.3% us, 0.1% sy, 0.0% ni, 99.7% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 916144k total, 156668k used, 759476k free, 0k buffers Swap: 0k total, 0k used, 0k free, 0k cached MySQLTuner: (after optimizing every table and repairing any table with overage I got the fragmented count down to 86) [--] Data in MyISAM tables: 285M (Tables: 1105) [!!] Total fragmented tables: 86 [--] Up for: 2h 44m 38s (409K q [41.421 qps], 6K conn, TX: 1B, RX: 174M) [--] Reads / Writes: 79% / 21% [--] Total buffers: 58.0M global + 2.7M per thread (100 max threads) [!!] Query cache prunes per day: 675307 [!!] Temporary tables created on disk: 35% (7K on disk / 20K total)

Read the article

The hidden cost of interrupting knowledge workers

- by Piet

The November issue of pragpub has an interesting article on interruptions. The article is written by Brian Tarbox, who also mentions the article on his blog. I like the subtitle: ‘Simple Strategies for Avoiding Dumping Your Mental Stack’. Brian talks about the effective cost of interrupting a ‘knowledge worker’, often with trivial questions or distractions. In the eyes of the interruptor, the interruption only costs the time the interrupted had to listen to the question and give an answer. However, depending on what the interrupted was doing at the time, getting fully immersed in their task again might take up to 15-20 minutes. Enough interruptions might even cause a knowledge worker to mentally call it a day. According to this article interruptions can consume about 28% of a knowledge worker’s time, translating in a $588 billion loss for US companies each year. Looking for a new developer to join your team? Ever thought about optimizing your team’s environment and the way they work instead? Making non knowledge workers aware You can’t. Well, I haven’t succeeded yet. And believe me: I’ve tried. When you’ve got a simple way to really increase your productivity (’give me 2 hours of uninterrupted time a day’) it wouldn’t be right not to tell your boss or team-leader about it. The problem is: only productive knowledge workers seem to understand this. People who don’t fall into this category just seem to think you’re joking, being arrogant or anti-social when you tell them the interruptions can really have an impact on your productivity. Also, knowledge workers often work in a very concentrated mental state which is described here as: It is the same mindfulness as ecstatic lovemaking, the merging of two into a fluidly harmonious one. The hallmark of flow is a feeling of spontaneous joy, even rapture, while performing a task. Yes, coding can be addictive and if you’re interrupting a programmer at the wrong moment, you’re effectively bringing down a junkie from his high in just a few seconds. This can result in seemingly arrogant, almost aggressive reactions. How to make people aware of the production-cost they’re inflicting: I’ve been often pondering that question myself. The article suggests that solutions based on that question never seem to work. To be honest: I’ve never even been able to find a half decent solution for this question. People who are not in this situations just don’t understand the issue, no matter how you try to explain it. Fun (?) thing I’ve noticed: Programmers or IT people in general who don’t get this are often the kind of people who just don’t get anything done. Interrupt handling (interruption management?) IRL Have non-urgent questions handled in a non-interruptive way It helps a bit to educate people into using non-interruptive ways to ask questions: “duh, I have no idea, but I’m a bit busy here now could you put it in an email so I don’t forget?”. Eventually, a considerable amount of people will skip interrupting you and just send an email right away. Some stubborn-headed people however will continue to just interrupt you, saying “you’re 10 meters from my desk, why can’t we just talk?”. Just remember to disable your email notifications, it can be hard to resist opening your email client when you know a new email just arrived. Use Do Not Disturb signals When working in a group of programmers, often the unofficial sign you can only be interrupted for something important is to put on headphones. And when the environment is quiet enough, often people aren’t even listening to music. Otherwise music can help to block the indirect distractions (someone else talking on the phone or tapping their feet). You might get a “they’re all just surfing and listening to music”-reaction from outsiders though. Peopleware talks about a team where the no-interruption sign was placing a shawl on the desk. If I remember correctly, I am unable to locate my copy of this really excellent must-read book. If you have all standardized on the same IM tool, maybe that tool has a ‘do not disturb’ setting. Also some phone-systems have a ‘DND’ (do not disturb) setting. Hide Brian offers a number of good suggestions, some obvious like: hide away somewhere they can’t find you. Not sure how long it’ll be till someone thinks you’re just taking a nap somewhere though. Also, this often isn’t possible or your boss might not understand this. And if you really get caught taking a nap, make sure to explain that your were powernapping. Counter-act interruptions Another suggestion he offers is when you’re being interrupted to just hold up your hand, blocking the interruption, and at least giving you time to finish your sentence or your block/line of code. The last suggestion works more as a way to make it obvious to the interruptor that they really are interrupting your work and to offload some of the cost on the interruptor. In practice, this can also helps you cool down a bit so you don’t start saying nasty things to the interruptor. Unfortunately I’ve sometimes been confronted with people who just ignore this signal and keep talking, as if they’re sure that whatever they’ve got to say is really worth listening to and without a doubt more important than anything you might be doing. This behaviour usually leaves me speechless (not good when someone just asked a question). I’ve noticed that these people are usually also the first to complain when being interrupted themselves. They’re generally not very liked as colleagues, so try not to imitate their behaviour. TDD as a way to minimize recovery time I don’t like Test Driven Development. Mainly for only one reason: It interrupts flow. At least, that’s what it does for me, but maybe I’m just not grown used to TDD yet. BUT a positive effect TDD has on me when I have to work in an interruptive environment and can’t really get into the ‘flow’ (also supposedly called ‘the zone’ by software developers, although I’ve never heard it 1st hand), TDD helps me to concentrate on the tasks at hand and helps me to get back at work after an interruption. I feel when using TDD, I can get by without the need for being totally ‘in’ the project and I can be reasonably productive without obtaining ‘flow’. Do you have a suggestion on how to make people aware of the concept of ‘flow’ and the cost of interruptions? (without looking like an arrogant ass or a weirdo)

Read the article

SQL SERVER – Weekly Series – Memory Lane – #051

- by Pinal Dave

Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Explanation and Understanding NOT NULL Constraint NOT NULL is integrity CONSTRAINT. It does not allow creating of the row where column contains NULL value. Most discussed questions about NULL is what is NULL? I will not go in depth analysis it. Simply put NULL is unknown or missing data. When NULL is present in database columns, it can affect the integrity of the database. I really do not prefer NULL in the database unless they are absolutely necessary. Three T-SQL Script to Create Primary Keys on Table I have always enjoyed writing about three topics Constraint and Keys, Backup and Restore and Datetime Functions. Primary Keys constraints prevent duplicate values for columns and provides a unique identifier to each column, as well it creates clustered index on the columns. 2008 Get Numeric Value From Alpha Numeric String – UDF for Get Numeric Numbers Only SQL is great with String operations. Many times, I use T-SQL to do my string operation. Let us see User Defined Function, which I wrote a few days ago, which will return only Numeric values from Alpha Numeric values. Introduction and Example of UNION and UNION ALL It is very much interesting when I get requests from blog reader to re-write my previous articles. I have received few requests to rewrite my article SQL SERVER – Union vs. Union All – Which is better for performance? with examples. I request you to read my previous article first to understand what is the concept and read this article to understand the same concept with an example. Downgrade Database for Previous Version The main questions is how they can downgrade the from SQL Server 2005 to SQL Server 2000? The answer is : Not Possible. Get Common Records From Two Tables Without Using Join Following is my scenario, Suppose Table 1 and Table 2 has same column e.g. Column1 Following is the query, 1. Select column1,column2 From Table1 2. Select column1 From Table2 I want to find common records from these tables, but I don’t want to use the Join clause because for that I need to specify the column name for Join condition. Will you help me to get common records without using Join condition? I am using SQL Server 2005. Retrieve – Select Only Date Part From DateTime – Best Practice – Part 2 A year ago I wrote a post about SQL SERVER – Retrieve – Select Only Date Part From DateTime – Best Practice where I have discussed two different methods of getting the date part from datetime. Introduction to CLR – Simple Example of CLR Stored Procedure CLR is an abbreviation of Common Language Runtime. In SQL Server 2005 and later version of it database objects can be created which are created in CLR. Stored Procedures, Functions, Triggers can be coded in CLR. CLR is faster than T-SQL in many cases. CLR is mainly used to accomplish tasks which are not possible by T-SQL or can use lots of resources. The CLR can be usually implemented where there is an intense string operation, thread management or iteration methods which can be complicated for T-SQL. Implementing CLR provides more security to the Extended Stored Procedure. 2009 Comic Slow Query – SQL Joke Before Presentation After Presentation Enable Automatic Statistic Update on Database In one of the recent projects, I found out that despite putting good indexes and optimizing the query, I could not achieve an optimized performance and I still received an unoptimized response from the SQL Server. On examination, I figured out that the culprit was statistics. The database that I was trying to optimize had auto update of the statistics was disabled. Recently Executed T-SQL Query Please refer to blog post query to recently executed T-SQL query on database. Change Collation of Database Column – T-SQL Script – Consolidating Collations – Extention Script At some time in your DBA career, you may find yourself in a position when you sit back and realize that your database collations have somehow run amuck, or are faced with the ever annoying CANNOT RESOLVE COLLATION message when trying to join data of varying collation settings. 2010 Visiting Alma Mater – Delivering Session on Database Performance and Career – Nirma Institute of Technology Everyone always dreams of visiting their school and college, where they have studied once. It is a great feeling to see the college once again – where you have spent the wonderful golden years of your time. College time is filled with studies, education, emotions and several plans to build a future. I consider myself fortunate as I got the opportunity to study at some of the best places in the world. Change Column DataTypes There are times when I feel like writing that I am a day older in SQL Server. In fact, there are many who are looking for a solution that is simple enough. Have you ever searched online for something very simple. I often do and enjoy doing things which are straight forward and easy to change. 2011 Three DMVs – sys.dm_server_memory_dumps – sys.dm_server_services – sys.dm_server_registry In this blog post we will see three new DMVs which are introduced in Denali. The DMVs are very simple and there is not much to describe them. So here is the simple game. I will be asking a question back to you after seeing the result of the each of the DMV and you help me to complete this blog post. A Simple Quiz – T-SQL Brain Trick If you have some time, I strongly suggest you try this quiz out as it is for sure twists your brain. 2012 List All The Column With Specific Data Types in Database 5 years ago I wrote script SQL SERVER – 2005 – List All The Column With Specific Data Types, when I read it again, it is very much relevant and I liked it. This is one of the script which every developer would like to keep it handy. I have upgraded the script bit more. I have included few additional information which I believe I should have added from the beginning. It is difficult to visualize the final script when we are writing it first time. Find First Non-Numeric Character from String The function PATINDEX exists for quite a long time in SQL Server but I hardly see it being used. Well, at least I use it and I am comfortable using it. Here is a simple script which I use when I have to identify first non-numeric character. Finding Different ColumnName From Almost Identitical Tables Well here is the interesting example of how we can use sys.column catalogue views and get the details of the newly added column. I have previously written about EXCEPT over here which is very similar to MINUS of Oracle. Storing Data and Files in Cloud – Dropbox – Personal Technology Tip I thought long and hard about doing a Personal Technology Tips series for this blog. I have so many tips I’d like to share. I am on my computer almost all day, every day, so I have a treasure trove of interesting tidbits I like to share if given the chance. The only thing holding me back – which tip to share first? The first tip obviously has the weight of seeming like the most important. But this would mean choosing amongst my favorite tricks and shortcuts. This is a hard task. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article

Simple way of converting server side objects into client side using JSON serialization for asp.net websites

- by anil.kasalanati

Introduction:- With the growth of Web2.0 and the need for faster user experience the spotlight has shifted onto javascript based applications built using REST pattern or asp.net AJAX Pagerequest manager. And when we are working with javascript wouldn’t it be much better if we could create objects in an OOAD way and easily push it to the client side. Following are the reasons why you would push the server side objects onto client side - Easy availability of the complex object. - Use C# compiler and rick intellisense to create and maintain the objects but use them in the javascript. You could run code analysis etc. - Reduce the number of calls we make to the server side by loading data on the pageload. I would like to explain about the 3rd point because that proved to be highly beneficial to me when I was fixing the performance issues of a major website. There could be a scenario where in you be making multiple AJAX based webrequestmanager calls in order to get the same response in a single page. This happens in the case of widget based framework when all the widgets are independent but they need some common information available in the framework to load the data. So instead of making n multiple calls we could load the data needed during pageload. The above picture shows the scenario where in all the widgets need the common information and then call GetData webservice on the server side. Ofcourse the result can be cached on the client side but a better solution would be to avoid the call completely. In order to do that we need to JSONSerialize the content and send it in the DOM. Example:- I have developed a simple application to demonstrate the idea and I would explaining that in detail here. The class called SimpleClass would be sent as serialized JSON to the client side . And this inherits from the base class which has the implementation for the GetJSONString method. You can create a single base class and all the object which need to be pushed to the client side can inherit from that class. The important thing to note is that the class should be annotated with DataContract attribute and the methods should have the Data Member attribute. This is needed by the .Net DataContractSerializer and this follows the opt-in mode so if you want to send an attribute to the client side then you need to annotate the DataMember attribute. So if I didn’t want to send the Result I would simple remove the DataMember attribute. This is default WCF/.Net 3.5 stuff but it provides the flexibility of have a fullfledged object on the server side but sending a smaller object to the client side. Sometimes you may hide some values due to security constraints. And thing you will notice is that I have marked the class as Serializable so that it can be stored in the Session and used in webfarm deployment scenarios. Following is the implementation of the base class – This implements the default DataContractJsonSerializer and for more information or customization refer to following blogs – http://softcero.blogspot.com/2010/03/optimizing-net-json-serializing-and-ii.html http://weblogs.asp.net/gunnarpeipman/archive/2010/12/28/asp-net-serializing-and-deserializing-json-objects.aspx The next part is pretty simple, I just need to inject this object into the aspx page. And in the aspx markup I have the following line – <script type="text/javascript"> var data =(<%=SimpleClassJSON %>); alert(data.ResultText); </script> This will output the content as JSON into the variable data and this can be any element in the DOM. And you can verify the element by checking data in the Firebug console. Design Consideration – If you have a lot of javascripts then you need to think about using Script # and you can write javascript in C#. Refer to Nikhil’s blog – http://projects.nikhilk.net/ScriptSharp Ensure that you are taking security into consideration while exposing server side objects on to client side. I have seen application exposing passwords, secret key so it is not a good practice. The application can be tested using the following url – http://techconsulting.vpscustomer.com/Samples/JsonTest.aspx The source code is available at http://techconsulting.vpscustomer.com/Source/HistoryTest.zip

Read the article

MySQL Cluster 7.2: Over 8x Higher Performance than Cluster 7.1

- by Mat Keep

0 0 1 893 5092 Homework 42 11 5974 14.0 Normal 0 false false false EN-US JA X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:Cambria; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin; mso-ansi-language:EN-US;} Summary The scalability enhancements delivered by extensions to multi-threaded data nodes enables MySQL Cluster 7.2 to deliver over 8x higher performance than the previous MySQL Cluster 7.1 release on a recent benchmark What’s New in MySQL Cluster 7.2 MySQL Cluster 7.2 was released as GA (Generally Available) in February 2012, delivering many enhancements to performance on complex queries, new NoSQL Key / Value API, cross-data center replication and ease-of-use. These enhancements are summarized in the Figure below, and detailed in the MySQL Cluster New Features whitepaper Figure 1: Next Generation Web Services, Cross Data Center Replication and Ease-of-Use Once of the key enhancements delivered in MySQL Cluster 7.2 is extensions made to the multi-threading processes of the data nodes. Multi-Threaded Data Node Extensions The MySQL Cluster 7.2 data node is now functionally divided into seven thread types: 1) Local Data Manager threads (ldm). Note – these are sometimes also called LQH threads. 2) Transaction Coordinator threads (tc) 3) Asynchronous Replication threads (rep) 4) Schema Management threads (main) 5) Network receiver threads (recv) 6) Network send threads (send) 7) IO threads Each of these thread types are discussed in more detail below. MySQL Cluster 7.2 increases the maximum number of LDM threads from 4 to 16. The LDM contains the actual data, which means that when using 16 threads the data is more heavily partitioned (this is automatic in MySQL Cluster). Each LDM thread maintains its own set of data partitions, index partitions and REDO log. The number of LDM partitions per data node is not dynamically configurable, but it is possible, however, to map more than one partition onto each LDM thread, providing flexibility in modifying the number of LDM threads. The TC domain stores the state of in-flight transactions. This means that every new transaction can easily be assigned to a new TC thread. Testing has shown that in most cases 1 TC thread per 2 LDM threads is sufficient, and in many cases even 1 TC thread per 4 LDM threads is also acceptable. Testing also demonstrated that in some instances where the workload needed to sustain very high update loads it is necessary to configure 3 to 4 TC threads per 4 LDM threads. In the previous MySQL Cluster 7.1 release, only one TC thread was available. This limit has been increased to 16 TC threads in MySQL Cluster 7.2. The TC domain also manages the Adaptive Query Localization functionality introduced in MySQL Cluster 7.2 that significantly enhanced complex query performance by pushing JOIN operations down to the data nodes. Asynchronous Replication was separated into its own thread with the release of MySQL Cluster 7.1, and has not been modified in the latest 7.2 release. To scale the number of TC threads, it was necessary to separate the Schema Management domain from the TC domain. The schema management thread has little load, so is implemented with a single thread. The Network receiver domain was bound to 1 thread in MySQL Cluster 7.1. With the increase of threads in MySQL Cluster 7.2 it is also necessary to increase the number of recv threads to 8. This enables each receive thread to service one or more sockets used to communicate with other nodes the Cluster. The Network send thread is a new thread type introduced in MySQL Cluster 7.2. Previously other threads handled the sending operations themselves, which can provide for lower latency. To achieve highest throughput however, it has been necessary to create dedicated send threads, of which 8 can be configured. It is still possible to configure MySQL Cluster 7.2 to a legacy mode that does not use any of the send threads – useful for those workloads that are most sensitive to latency. The IO Thread is the final thread type and there have been no changes to this domain in MySQL Cluster 7.2. Multiple IO threads were already available, which could be configured to either one thread per open file, or to a fixed number of IO threads that handle the IO traffic. Except when using compression on disk, the IO threads typically have a very light load. Benchmarking the Scalability Enhancements The scalability enhancements discussed above have made it possible to scale CPU usage of each data node to more than 5x of that possible in MySQL Cluster 7.1. In addition, a number of bottlenecks have been removed, making it possible to scale data node performance by even more than 5x. Figure 2: MySQL Cluster 7.2 Delivers 8.4x Higher Performance than 7.1 The flexAsynch benchmark was used to compare MySQL Cluster 7.2 performance to 7.1 across an 8-node Intel Xeon x5670-based cluster of dual socket commodity servers (6 cores each). As the results demonstrate, MySQL Cluster 7.2 delivers over 8x higher performance per data nodes than MySQL Cluster 7.1. More details of this and other benchmarks will be published in a new whitepaper – coming soon, so stay tuned! In a following blog post, I’ll provide recommendations on optimum thread configurations for different types of server processor. You can also learn more from the Best Practices Guide to Optimizing Performance of MySQL Cluster Conclusion MySQL Cluster has achieved a range of impressive benchmark results, and set in context with the previous 7.1 release, is able to deliver over 8x higher performance per node. As a result, the multi-threaded data node extensions not only serve to increase performance of MySQL Cluster, they also enable users to achieve significantly improved levels of utilization from current and future generations of massively multi-core, multi-thread processor designs.

Read the article

C#: LINQ vs foreach - Round 1.

- by James Michael Hare

So I was reading Peter Kellner's blog entry on Resharper 5.0 and its LINQ refactoring and thought that was very cool. But that raised a point I had always been curious about in my head -- which is a better choice: manual foreach loops or LINQ? The answer is not really clear-cut. There are two sides to any code cost arguments: performance and maintainability. The first of these is obvious and quantifiable. Given any two pieces of code that perform the same function, you can run them side-by-side and see which piece of code performs better. Unfortunately, this is not always a good measure. Well written assembly language outperforms well written C++ code, but you lose a lot in maintainability which creates a big techncial debt load that is hard to offset as the application ages. In contrast, higher level constructs make the code more brief and easier to understand, hence reducing technical cost. Now, obviously in this case we're not talking two separate languages, we're comparing doing something manually in the language versus using a higher-order set of IEnumerable extensions that are in the System.Linq library. Well, before we discuss any further, let's look at some sample code and the numbers. First, let's take a look at the for loop and the LINQ expression. This is just a simple find comparison: // find implemented via LINQ public static bool FindViaLinq(IEnumerable<int> list, int target) { return list.Any(item => item == target); } // find implemented via standard iteration public static bool FindViaIteration(IEnumerable<int> list, int target) { foreach (var i in list) { if (i == target) { return true; } } return false; } Okay, looking at this from a maintainability point of view, the Linq expression is definitely more concise (8 lines down to 1) and is very readable in intention. You don't have to actually analyze the behavior of the loop to determine what it's doing. So let's take a look at performance metrics from 100,000 iterations of these methods on a List<int> of varying sizes filled with random data. For this test, we fill a target array with 100,000 random integers and then run the exact same pseudo-random targets through both searches. List<T> On 100,000 Iterations Method Size Total (ms) Per Iteration (ms) % Slower Any 10 26 0.00046 30.00% Iteration 10 20 0.00023 - Any 100 116 0.00201 18.37% Iteration 100 98 0.00118 - Any 1000 1058 0.01853 16.78% Iteration 1000 906 0.01155 - Any 10,000 10,383 0.18189 17.41% Iteration 10,000 8843 0.11362 - Any 100,000 104,004 1.8297 18.27% Iteration 100,000 87,941 1.13163 - The LINQ expression is running about 17% slower for average size collections and worse for smaller collections. Presumably, this is due to the overhead of the state machine used to track the iterators for the yield returns in the LINQ expressions, which seems about right in a tight loop such as this. So what about other LINQ expressions? After all, Any() is one of the more trivial ones. I decided to try the TakeWhile() algorithm using a Count() to get the position stopped like the sample Pete was using in his blog that Resharper refactored for him into LINQ: // Linq form public static int GetTargetPosition1(IEnumerable<int> list, int target) { return list.TakeWhile(item => item != target).Count(); } // traditionally iterative form public static int GetTargetPosition2(IEnumerable<int> list, int target) { int count = 0; foreach (var i in list) { if(i == target) { break; } ++count; } return count; } Once again, the LINQ expression is much shorter, easier to read, and should be easier to maintain over time, reducing the cost of technical debt. So I ran these through the same test data: List<T> On 100,000 Iterations Method Size Total (ms) Per Iteration (ms) % Slower TakeWhile 10 41 0.00041 128% Iteration 10 18 0.00018 - TakeWhile 100 171 0.00171 88% Iteration 100 91 0.00091 - TakeWhile 1000 1604 0.01604 94% Iteration 1000 825 0.00825 - TakeWhile 10,000 15765 0.15765 92% Iteration 10,000 8204 0.08204 - TakeWhile 100,000 156950 1.5695 92% Iteration 100,000 81635 0.81635 - Wow! I expected some overhead due to the state machines iterators produce, but 90% slower? That seems a little heavy to me. So then I thought, well, what if TakeWhile() is not the right tool for the job? The problem is TakeWhile returns each item for processing using yield return, whereas our for-loop really doesn't care about the item beyond using it as a stop condition to evaluate. So what if that back and forth with the iterator state machine is the problem? Well, we can quickly create an (albeit ugly) lambda that uses the Any() along with a count in a closure (if a LINQ guru knows a better way PLEASE let me know!), after all , this is more consistent with what we're trying to do, we're trying to find the first occurence of an item and halt once we find it, we just happen to be counting on the way. This mostly matches Any(). // a new method that uses linq but evaluates the count in a closure. public static int TakeWhileViaLinq2(IEnumerable<int> list, int target) { int count = 0; list.Any(item => { if(item == target) { return true; } ++count; return false; }); return count; } Now how does this one compare? List<T> On 100,000 Iterations Method Size Total (ms) Per Iteration (ms) % Slower TakeWhile 10 41 0.00041 128% Any w/Closure 10 23 0.00023 28% Iteration 10 18 0.00018 - TakeWhile 100 171 0.00171 88% Any w/Closure 100 116 0.00116 27% Iteration 100 91 0.00091 - TakeWhile 1000 1604 0.01604 94% Any w/Closure 1000 1101 0.01101 33% Iteration 1000 825 0.00825 - TakeWhile 10,000 15765 0.15765 92% Any w/Closure 10,000 10802 0.10802 32% Iteration 10,000 8204 0.08204 - TakeWhile 100,000 156950 1.5695 92% Any w/Closure 100,000 108378 1.08378 33% Iteration 100,000 81635 0.81635 - Much better! It seems that the overhead of TakeAny() returning each item and updating the state in the state machine is drastically reduced by using Any() since Any() iterates forward until it finds the value we're looking for -- for the task we're attempting to do. So the lesson there is, make sure when you use a LINQ expression you're choosing the best expression for the job, because if you're doing more work than you really need, you'll have a slower algorithm. But this is true of any choice of algorithm or collection in general. Even with the Any() with the count in the closure it is still about 30% slower, but let's consider that angle carefully. For a list of 100,000 items, it was the difference between 1.01 ms and 0.82 ms roughly in a List<T>. That's really not that bad at all in the grand scheme of things. Even running at 90% slower with TakeWhile(), for the vast majority of my projects, an extra millisecond to save potential errors in the long term and improve maintainability is a small price to pay. And if your typical list is 1000 items or less we're talking only microseconds worth of difference. It's like they say: 90% of your performance bottlenecks are in 2% of your code, so over-optimizing almost never pays off. So personally, I'll take the LINQ expression wherever I can because they will be easier to read and maintain (thus reducing technical debt) and I can rely on Microsoft's development to have coded and unit tested those algorithm fully for me instead of relying on a developer to code the loop logic correctly. If something's 90% slower, yes, it's worth keeping in mind, but it's really not until you start get magnitudes-of-order slower (10x, 100x, 1000x) that alarm bells should really go off. And if I ever do need that last millisecond of performance? Well then I'll optimize JUST THAT problem spot. To me it's worth it for the readability, speed-to-market, and maintainability.

Read the article

The Shift: how Orchard painlessly shifted to document storage, and how it’ll affect you

- by Bertrand Le Roy

We’ve known it all along. The storage for Orchard content items would be much more efficient using a document database than a relational one. Orchard content items are composed of parts that serialize naturally into infoset kinds of documents. Storing them as relational data like we’ve done so far was unnatural and requires the data for a single item to span multiple tables, related through 1-1 relationships. This means lots of joins in queries, and a great potential for Select N+1 problems. Document databases, unfortunately, are still a tough sell in many places that prefer the more familiar relational model. Being able to x-copy Orchard to hosters has also been a basic constraint in the design of Orchard. Combine those with the necessity at the time to run in medium trust, and with license compatibility issues, and you’ll find yourself with very few reasonable choices. So we went, a little reluctantly, for relational SQL stores, with the dream of one day transitioning to document storage. We have played for a while with the idea of building our own document storage on top of SQL databases, and Sébastien implemented something more than decent along those lines, but we had a better way all along that we didn’t notice until recently… In Orchard, there are fields, which are named properties that you can add dynamically to a content part. Because they are so dynamic, we have been storing them as XML into a column on the main content item table. This infoset storage and its associated API are fairly generic, but were only used for fields. The breakthrough was when Sébastien realized how this existing storage could give us the advantages of document storage with minimal changes, while continuing to use relational databases as the substrate. public bool CommercialPrices { get { return this.Retrieve(p => p.CommercialPrices); } set { this.Store(p => p.CommercialPrices, value); } } This code is very compact and efficient because the API can infer from the expression what the type and name of the property are. It is then able to do the proper conversions for you. For this code to work in a content part, there is no need for a record at all. This is particularly nice for site settings: one query on one table and you get everything you need. This shows how the existing infoset solves the data storage problem, but you still need to query. Well, for those properties that need to be filtered and sorted on, you can still use the current record-based relational system. This of course continues to work. We do however provide APIs that make it trivial to store into both record properties and the infoset storage in one operation: public double Price { get { return Retrieve(r => r.Price); } set { Store(r => r.Price, value); } } This code looks strikingly similar to the non-record case above. The difference is that it will manage both the infoset and the record-based storages. The call to the Store method will send the data in both places, keeping them in sync. The call to the Retrieve method does something even cooler: if the property you’re looking for exists in the infoset, it will return it, but if it doesn’t, it will automatically look into the record for it. And if that wasn’t cool enough, it will take that value from the record and store it into the infoset for the next time it’s required. This means that your data will start automagically migrating to infoset storage just by virtue of using the code above instead of the usual: public double Price { get { return Record.Price; } set { Record.Price = value; } } As your users browse the site, it will get faster and faster as Select N+1 issues will optimize themselves away. If you preferred, you could still have explicit migration code, but it really shouldn’t be necessary most of the time. If you do already have code using QueryHints to mitigate Select N+1 issues, you might want to reconsider those, as with the new system, you’ll want to avoid joins that you don’t need for filtering or sorting, further optimizing your queries. There are some rare cases where the storage of the property must be handled differently. Check out this string[] property on SearchSettingsPart for example: public string[] SearchedFields { get { return (Retrieve<string>("SearchedFields") ?? "") .Split(new[] {',', ' '}, StringSplitOptions.RemoveEmptyEntries); } set { Store("SearchedFields", String.Join(", ", value)); } } The array of strings is transformed by the property accessors into and from a comma-separated list stored in a string. The Retrieve and Store overloads used in this case are lower-level versions that explicitly specify the type and name of the attribute to retrieve or store. You may be wondering what this means for code or operations that look directly at the database tables instead of going through the new infoset APIs. Even if there is a record, the infoset version of the property will win if it exists, so it is necessary to keep the infoset up-to-date. It’s not very complicated, but definitely something to keep in mind. Here is what a product record looks like in Nwazet.Commerce for example: And here is the same data in the infoset: The infoset is stored in Orchard_Framework_ContentItemRecord or Orchard_Framework_ContentItemVersionRecord, depending on whether the content type is versionable or not. A good way to find what you’re looking for is to inspect the record table first, as it’s usually easier to read, and then get the item record of the same id. Here is the detailed XML document for this product: <Data> <ProductPart Inventory="40" Price="18" Sku="pi-camera-box" OutOfStockMessage="" AllowBackOrder="false" Weight="0.2" Size="" ShippingCost="null" IsDigital="false" /> <ProductAttributesPart Attributes="" /> <AutoroutePart DisplayAlias="camera-box" /> <TitlePart Title="Nwazet Pi Camera Box" /> <BodyPart Text="[...]" /> <CommonPart CreatedUtc="2013-09-10T00:39:00Z" PublishedUtc="2013-09-14T01:07:47Z" /> </Data> The data is neatly organized under each part. It is easy to see how that document is all you need to know about that content item, all in one table. If you want to modify that data directly in the database, you should be careful to do it in both the record table and the infoset in the content item record. In this configuration, the record is now nothing more than an index, and will only be used for sorting and filtering. Of course, it’s perfectly fine to mix record-backed properties and record-less properties on the same part. It really depends what you think must be sorted and filtered on. In turn, this potentially simplifies migrations considerably. So here it is, the great shift of Orchard to document storage, something that Orchard has been designed for all along, and that we were able to implement with a satisfying and surprising economy of resources. Expect this code to make its way into the 1.8 version of Orchard when that’s available.

Read the article

Design for complex ATG applications

- by Glen Borkowski

Overview Needless to say, some ATG applications are more complex than others. Some ATG applications support a single site, single language, single catalog, single currency, have a single development staff, single business team, and a relatively simple business model. The real complex applications have to support multiple sites, multiple languages, multiple catalogs, multiple currencies, a couple different development teams, multiple business teams, and a highly complex business model (and processes to go along with it). While it's still important to implement a proper design for simple applications, it's absolutely critical to do this for the complex applications. Why? It's all about time and money. If you are unable to manage your complex applications in an efficient manner, the cost of managing it will increase dramatically as will the time to get things done (time to market). On the positive side, your competition is most likely in the same situation, so you just need to be more efficient than they are. This article is intended to discuss a number of key areas to think about when designing complex applications on ATG. Some of this can get fairly technical, so it may help to get some background first. You can get enough of the required background information from this post. After reading that, come back here and follow along. Application Design Of all the various types of ATG applications out there, the most complex tend to be the ones in the telecommunications industry - especially the ones which operate in multiple countries. To get started, let's assume that we are talking about an application like that. One that has these properties: Operates in multiple countries - must support multiple sites, catalogs, languages, and currencies The organization is fairly loosely-coupled - single brand, but different businesses across different countries There is some common functionality across all sites in all countries There is some common functionality across different sites within the same country Sites within a single country may have some unique functionality - relative to other sites in the same country Complex product catalog (mostly in terms of bundles, eligibility, and compatibility) At this point, I'll assume you have read through the required reading and have a decent understanding of how ATG modules work... Code / configuration - assemble into modules When it comes to defining your modules for a complex application, there are a number of goals: Divide functionality between the modules in a way that maps to your business Group common functionality 'further down in the stack of modules' Provide a good balance between shared resources and autonomy for countries / sites Now I'll describe a high level approach to how you could accomplish those goals... Let's start from the bottom and work our way up. At the very bottom, you have the modules that ship with ATG - the 'out of the box' stuff. You want to make sure that you are leveraging all the modules that make sense in order to get the most value from ATG as possible - and less stuff you'll have to write yourself. On top of the ATG modules, you should create what we'll refer to as the Corporate Foundation Module described as follows: Sits directly on top of ATG modules Used by all applications across all countries and sites - this is the foundation for everyone Contains everything that is common across all countries / all sites Once established and settled, will change less frequently than other 'higher' modules Encapsulates as many enterprise-wide integrations as possible Will provide means of code sharing therefore less development / testing - faster time to market Contains a 'reference' web application (described below) The next layer up could be multiple modules for each country (you could replace this with region if that makes more sense). We'll define those modules as follows: Sits on top of the corporate foundation module Contains what is unique to all sites in a given country Responsible for managing any resource bundles for this country (to handle multiple languages) Overrides / replaces corporate integration points with any country-specific ones Finally, we will define what should be a fairly 'thin' (in terms of functionality) set of modules for each site as follows: Sits on top of the country it resides in module Contains what is unique for a given site within a given country Will mostly contain configuration, but could also define some unique functionality as well Contains one or more web applications The graphic below should help to indicate how these modules fit together: Web applications As described in the previous section, there are many opportunities for sharing (minimizing costs) as it relates to the code and configuration aspects of ATG modules. Web applications are also contained within ATG modules, however, sharing web applications can be a bit more difficult because this is what the end customer actually sees, and since each site may have some degree of unique look & feel, sharing becomes more challenging. One approach that can help is to define a 'reference' web application at the corporate foundation layer to act as a solid starting point for each site. Here's a description of the 'reference' web application: Contains minimal / sample reference styling as this will mostly be addressed at the site level web app Focus on functionality - ensure that core functionality is revealed via this web application Each individual site can use this as a starting point There may be multiple types of web apps (i.e. B2C, B2B, etc) There are some techniques to share web application assets - i.e. multiple web applications, defined in the web.xml, and it's worth investigating, but is out of scope here. Reference infrastructure In this complex environment, it is assumed that there is not a single infrastructure for all countries and all sites. It's more likely that different countries (or regions) could have their own solution for infrastructure. In this case, it will be advantageous to define a reference infrastructure which contains all the hardware and software that make up the core environment. Specifications and diagrams should be created to outline what this reference infrastructure looks like, as well as it's baseline cost and the incremental cost to scale up with volume. Having some consistency in terms of infrastructure will save time and money as new countries / sites come online. Here are some properties of the reference infrastructure: Standardized approach to setup of hardware Type and number of servers Defines application server, operating system, database, etc... - including vendor and specific versions Consistent naming conventions Provides a consistent base of terminology and understanding across environments Defines which ATG services run on which servers Production Staging BCC / Preview Each site can change as required to meet scale requirements Governance / organization It should be no surprise that the complex application we're talking about is backed by an equally complex organization. One of the more challenging aspects of efficiently managing a series of complex applications is to ensure the proper level of governance and organization. Here are some ideas and goals to work towards: Establish a committee to make enterprise-wide decisions that affect all sites Representation should be evenly distributed Should have a clear communication procedure Focus on high level business goals Evaluation of feature / function gaps and how that relates to ATG release schedule / roadmap Determine when to upgrade & ensure value will be realized Determine how to manage various levels of modules Who is responsible for maintaining corporate / country / site layers Determine a procedure for controlling what goes in the corporate foundation module Standardize on source code control, database, hardware, OS versions, J2EE app servers, development procedures, etc only use tested / proven versions - this is something that should be centralized so that every country / site does not have to worry about compatibility between versions Create a innovation team Quickly develop new features, perform proof of concepts All teams can benefit from their findings Summary At this point, it should be clear why the topics above (design, governance, organization, etc) are critical to being able to efficiently manage a complex application. To summarize, it's all about competitive advantage... You will need to reduce costs and improve time to market with the goal of providing a better experience for your end customers. You can reduce cost by reducing development time, time allocated to testing (don't have to test the corporate foundation module over and over again - do it once), and optimizing operations. With an efficient design, you can improve your time to market and your business will be more flexible and agile. Over time, you'll find that you're becoming more focused on offering functionality that is new to the market (creativity) and this will be rewarded - you're now a leader. In addition to the above, you'll realize soft benefits as well. Your staff will be operating in a culture based on sharing. You'll want to reward efforts to improve and enhance the foundation as this will benefit everyone. This culture will inspire innovation, which can only lend itself to your competitive advantage.

Read the article

Master Data

- by david.butler(at)oracle.com

Let's take a deeper look at what we mean when we talk about 'Master' data. In its most general sense, master data is data that exists in more than one operational application. These are the applications that automate business processes. These applications require significant amounts of data to function correctly. This includes data about the objects that are involved in transactions, as well as the transaction data itself. For example, when a customer buys a product, the transaction is managed by a sales application. The objects of the transaction are the Customer and the Product. The transactional data is the time, place, price, discount, payment methods, etc. used at the point of sale. Many thousands of transactional data attributes are needed within the application. These important data elements are local to the applications and have no bearing on other applications. Harmonization and synchronization across applications is not necessary. The Customer and Product objects of the transaction also have a large number of attributes. Customer for example, includes hierarchies, hierarchical and matrixed relationships, contacts, classifications, preferences, accounts, identifiers, profiles, and addresses galore for 'ship to', 'mail to'; 'service at'; etc. Dozens of attributes exist for individuals, hundreds for organizations, and thousands for products. This data has meaning beyond any particular application. It exists in many applications and drives the vital cross application enterprise business processes. These are the processes that define and differentiate the organization. At every decision point, information about the objects of the process determines the direction of the process flow. This is the nature of the data that exists in more than one application, and this is why we call it 'master data'. Let me elaborate. Parties Oracle has developed a party schema to model all participants in your daily business operations. It models people, organizations, groups, customers, contacts, employees, and suppliers. It models their accounts, locations, classifications, and preferences. And most importantly, it models the vast array of hierarchical and matrixed relationships that exist between all the participants in your real world operations. The model logically separates people and organizations from their relationships and accounts. This separation creates flexibility unmatched in the industry and accounts for the fact that the Oracle schema for Customers, Suppliers, and Accounts is a true superset of the wide variety of commercial and homegrown customer models in existence. Sites Sites are places where business is conducted. They can be addresses, clusters such as retail malls, locations within a cluster, floors within a building, places where meters are located, rooms on floors, etc. Fully understanding all attributes of a site is key to many business processes. Attributes such as 'noise abatement policy' at a point of delivery, or the size of an oven in a business kitchen drive day-to-day activities such as delivery schedules or food promotions. Typically this kind of data is siloed in departments and scattered across applications and spreadsheets. This leads to conflicting information and poor operational efficiencies. Oracle's Global Single Schema can hold all site attributes in one place and enables a single version of authoritative site information across the enterprise. Products and Services The Oracle Global Single Schema also includes a number of entities that define the products and services a company creates and offers for sale. Key entities include Items organized into Catalogs and Price Lists. The Catalog structures provide for the ability to capture different views of a product such as engineering, manufacturing, and service which are based on a unified product model. As a result, designers, manufacturing engineers, purchasers and partners can work simultaneously on a common product definition. The Catalog schema allows for unlimited attributes, combines them into meaningful groups, and maps them to catalog categories to track these different types of information. The model also maps an unlimited number of functional structures for each item. For example, multiple Bills of Material (BOMs) can be constructed representing requirements BOM, features BOM, and packaging BOM for an item. The Catalog model also supports hierarchical information about each item and all standard Global Data Synchronization attributes. Business Processes Utilizing Linked Data Entities Each business entity codified into a centralized master data environment significantly improves the efficiency of the automated business processes that use the consolidated data. When all the key business entities used by an organization's process are so consolidated, the advantages are multiplied. The primary reason for business process breakdowns (i.e. data errors across application boundaries) is eliminated. All processes are positively impacted and business process automation is itself automated. I like to use the "Call to Resolution" business process as an example to help illustrate this important point. It involves call center applications, service applications, RMA applications, transportation applications, inventory applications, etc. Customer, Site, Product and Supplier master data must all be correct and consistent across these applications. What's more, the data relationships between customer and product, and product and suppliers must be right. This is the minimum quality needed to insure the business process flows without error. But that is not the end of the story. Critical master data attributes such as customer loyalty, profitability, credit worthiness, and propensity to buy can optimize the call center point of contact component of the process. Critical product information such as alternative parts or equivalent products can optimize the resolution selected by the process. A comprehensive understanding of the 'service at' location can help insure multiple trips are avoided in the process. Full supplier information on reliability, delivery delays, and potential alternates can prevent supplier exceptions and play a significant role in optimizing the process. In other words, these master data attributes enable the optimization of the "Call to Resolution" enterprise business process. Master data supports and guides business process flows. Thus the phrase 'Master Data' is indeed appropriate. MDM is the software that houses, manages, and governs the master data that resides in all applications and controls the enterprise business processes. A complete master data solution takes a data model that holds fully attributed master data entities and their inter-relationships. Oracle has this model. Oracle, with its deep understanding of application data is the logical choice for managing all your master data within the enterprise whether or not your organization actually runs any Oracle Applications.

Read the article

The Great Divorce

- by BlackRabbitCoder

I have a confession to make: I've been in an abusive relationship for more than 17 years now. Yes, I am not ashamed to admit it, but I'm finally doing something about it. I met her in college, she was new and sexy and amazingly fast -- and I'd never met anything like her before. Her style and her power captivated me and I couldn't wait to learn more about her. I took a chance on her, and though I learned a lot from her -- and will always be grateful for my time with her -- I think it's time to move on. Her name was C++, and she so outshone my previous love, C, that any thoughts of going back evaporated in the heat of this new romance. She promised me she'd be gentle and not hurt me the way C did. She promised me she'd clean-up after herself better than C did. She promised me she'd be less enigmatic and easier to keep happy than C was. But I was deceived. Oh sure, as far as truth goes, it wasn't a complete lie. To some extent she was more fun, more powerful, safer, and easier to maintain. But it just wasn't good enough -- or at least it's not good enough now. I loved C++, some part of me still does, it's my first-love of programming languages and I recognize its raw power, its blazing speed, and its improvements over its predecessor. But with today's hardware, at speeds we could only dream to conceive of twenty years ago, that need for speed -- at the cost of all else -- has died, and that has left my feelings for C++ moribund. If I ever need to write an operating system or a device driver, then I might need that speed. But 99% of the time I don't. I'm a business-type programmer and chances are 90% of you are too, and even the ones who need speed at all costs may be surprised by how much you sacrifice for that. That's not to say that I don't want my software to perform, and it's not to say that in the business world we don't care about speed or that our job is somehow less difficult or technical. There's many times we write programs to handle millions of real-time updates or handle thousands of financial transactions or tracking trading algorithms where every second counts. But if I choose to write my code in C++ purely for speed chances are I'll never notice the speed increase -- and equally true chances are it will be far more prone to crash and far less easy to maintain. Nearly without fail, it's the macro-optimizations you need, not the micro-optimizations. If I choose to write a O(n2) algorithm when I could have used a O(n) algorithm -- that can kill me. If I choose to go to the database to load a piece of unchanging data every time instead of caching it on first load -- that too can kill me. And if I cross the network multiple times for pieces of data instead of getting it all at once -- yes that can also kill me. But choosing an overly powerful and dangerous mid-level language to squeeze out every last drop of performance will realistically not make stock orders process any faster, and more likely than not open up the system to more risk of crashes and resource leaks. And that's when my love for C++ began to die. When I noticed that I didn't need that speed anymore. That that speed was really kind of a lie. Sure, I can be super efficient and pack bits in a byte instead of using separate boolean values. Sure, I can use an unsigned char instead of an int. But in the grand scheme of things it doesn't matter as much as you think it does. The key is maintainability, and that's where C++ failed me. I like to tell the other developers I work with that there's two levels of correctness in coding: Is it immediately correct? Will it stay correct? That is, you can hack together any piece of code and make it correct to satisfy a task at hand, but if a new developer can't come in tomorrow and make a fairly significant change to it without jeopardizing that correctness, it won't stay correct. Some people laugh at me when I say I now prefer maintainability over speed. But that is exactly the point. If you focus solely on speed you tend to produce code that is much harder to maintain over the long hall, and that's a load of technical debt most shops can't afford to carry and end up completely scrapping code before it's time. When good code is written well for maintainability, though, it can be correct both now and in the future. And you know the best part is? My new love is nearly as fast as C++, and in some cases even faster -- and better than that, I know C# will treat me right. Her creators have poured hundreds of thousands of hours of time into making her the sexy beast she is today. They made her easy to understand and not an enigmatic mess. They made her consistent and not moody and amorphous. And they made her perform as fast as I care to go by optimizing her both at compile time and a run-time. Her code is so elegant and easy on the eyes that I'm not worried where she will run to or what she'll pull behind my back. She is powerful enough to handle all my tasks, fast enough to execute them with blazing speed, maintainable enough so that I can rely on even fairly new peers to modify my work, and rich enough to allow me to satisfy any need. C# doesn't ask me to clean up her messes! She cleans up after herself and she tries to make my life easier for me by taking on most of those optimization tasks C++ asked me to take upon myself. Now, there are many of you who would say that I am the cause of my own grief, that it was my fault C++ didn't behave because I didn't pay enough attention to her. That I alone caused the pain she inflicted on me. And to some extent, you have a point. But she was so high maintenance, requiring me to know every twist and turn of her vast and unrestrained power that any wrong term or bout of forgetfulness was met with painful reminders that she wasn't going to watch my back when I made a mistake. But C#, she loves me when I'm good, and she loves me when I'm bad, and together we make beautiful code that is both fast and safe. So that's why I'm leaving C++ behind. She says she's changing for me, but I have no interest in what C++0x may bring. Oh, I'll still keep in touch, and maybe I'll see her now and again when she brings her problems to my door and asks for some attention -- for I always have a soft spot for her, you see. But she's out of my house now. I have three kids and a dog and a cat, and all require me to clean up after them, why should I have to clean up after my programming language as well?

Read the article

Rapid Evolution of Society & Technology

- by Michael Snow

We caught up with Brian Solis on the phone the other day and Christie Flanagan had a chance to chat with him and learn a bit more about him and some of the concepts he'll be addressing in our Social Business Thought Leaders Webcast on Thursday 12/13/12. «--- Interview with Brian Solis Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-family:"Calibri","sans-serif"; mso-ascii- mso-ascii-theme-font:minor-latin; mso-fareast- mso-fareast-theme-font:minor-latin; mso-hansi- mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Be sure and register for this week's webcast ---» ------------------- Guest post by Brian Solis. Reposted (Borrowed) from his posting of May 24, 2012 Dear [insert business name], what’s your promise? - Brian Solis You say you want to get closer to customers, but your actions are different than your words. You say you want to “surprise and delight” customers, but your product development teams are too busy building against a roadmap without consideration of the 5th P of marketing…people. Your employees are your number one asset, however the infrastructure of the organization has turned once optimistic and ambitious intrapreneurs into complacent cogs or worse, your greatest detractors. You question the adoption of disruptive technology by your internal champions yet you’ve not tried to find the value for yourself. You’re a change agent and you truly wish to bring about change, but you’ve not invested time or resources to answer “why” in your endeavors to become a connected or social business. If we are to truly change, we must find purpose. We must uncover the essence of our business and the value it delivers to traditional and connected consumers. We must rethink the spirit of today’s embrace and clearly articulate how transformation is going to improve customer and employee experiences and relationships now and over time. Without doing so, any attempts at evolution will be thwarted by reality. In an era of Digital Darwinism, no business is too big to fail or too small to succeed. These are undisciplined times which require alternative approaches to recognize and pursue new opportunities. But everything begins with acknowledging the 360 view of the world that you see today is actually a filtered view of managed and efficient convenience. Today, many organizations that were once inspired by innovation and engagement have fallen into a process of marketing, operationalizing, managing, and optimizing. That might have worked for the better part of the last century, but for the next 10 years and beyond, new vision, leadership and supporting business models will be written to move businesses from rigid frameworks to adaptive and agile entities. I believe that today’s executives will undergo a great test; a test of character, vision, intention, and universal leadership. It starts with a simple, but essential question…what is your promise? Notice, I didn’t ask about your brand promise. Nor did I ask for you to cite your mission and vision statements. This is much more than value propositions or manufactured marketing language designed to hook audiences and stakeholders. I asked for your promise to me as your consumer, stakeholder, and partner. This isn’t about B2B or B2C, but instead, people to people, person to person. It is this promise that will breathe new life into an organization that on the outside, could be misdiagnosed as catatonic by those who are disrupting your markets. A promise, for example, is meant to inspire. It creates alignment. It serves as the foundation for your vision, mission, and all business strategies and it must come from the top to mean anything. For without it, we cannot genuinely voice what it is we stand for or stand behind. Think for a moment about the definition of community. It’s easy to confuse a workplace or a market where everyone simply shares common characteristics. However, a community in this day and age is much more than belonging to something, it’s about doing something together that makes belonging matter The next few years will force a divide where companies are separated by intention as measured by actions and words. But, becoming a social business is not enough. Becoming more authentic and transparent doesn’t serve as a mantra for a renaissance. A promise is the ink that inscribes the spirit of the relationship between you and me. A promise serves as the words that influence change from within and change beyond the halls of our business. It is the foundation for a renewed embrace, one that must then find its way to every aspect of the organization. It’s the difference between a social business and an adaptive business. While an adaptive business can also be social, it is the culture of the organization that strives to not just use technology to extend current philosophies or processes into new domains, but instead give rise to a new culture where striving for relevance is among its goals. The tools and networks simply become enablers of a greater mission You are reading this because you believe in something more than what you’re doing today. While you fight for change within your organization, remember to aim for a higher purpose. Organizations that strive for innovation, imagination, and relevance will outperform those that do not. Part of your job is to lead a missionary push that unites the groundswell with a top down cascade. Change will only happen because you and other internal champions see what others can’t and will do what other won’t. It takes resolve. It takes the ability to translate new opportunities into business value. And, it takes courage. “This is a very noisy world, so we have to be very clear what we want them to know about us”-Steve Jobs ----------------------------------------------------------------- So -- where do you begin to evaluate the kind of experience you are delivering for your customers, partners, and employees? Take a look at this White Paper: Creating a Successful and Meaningful Customer Experience on the Web and then have a cup of coffee while you listen to the sage advice of Guy Kawasaki in a short video below. An interview with Guy Kawasaki on Maximizing Social Media Channels

Read the article

ANTS CLR and Memory Profiler In Depth Review (Part 2 of 2 – Memory Profiler)

- by ToStringTheory

One of the things that people might not know about me, is my obsession to make my code as efficient as possible. Many people might not realize how much of a task or undertaking that this might be, but it is surely a task as monumental as climbing Mount Everest, except this time it is a challenge for the mind… In trying to make code efficient, there are many different factors that play a part – size of project or solution, tiers, language used, experience and training of the programmer, technologies used, maintainability of the code – the list can go on for quite some time. I spend quite a bit of time when developing trying to determine what is the best way to implement a feature to accomplish the efficiency that I look to achieve. One program that I have recently come to learn about – Red Gate ANTS Performance (CLR) and Memory profiler gives me tools to accomplish that job more efficiently as well. In this review, I am going to cover some of the features of the ANTS memory profiler set by compiling some hideous example code to test against. Notice As a member of the Geeks With Blogs Influencers program, one of the perks is the ability to review products, in exchange for a free license to the program. I have not let this affect my opinions of the product in any way, and Red Gate nor Geeks With Blogs has tried to influence my opinion regarding this product in any way. Introduction – Part 2 In my last post, I reviewed the feature packed Red Gate ANTS Performance Profiler. Separate from the Red Gate Performance Profiler is the Red Gate ANTS Memory Profiler – a simple, easy to use utility for checking how your application is handling memory management… A tool that I wish I had had many times in the past. This post will be focusing on the ANTS Memory Profiler and its tool set. The memory profiler has a large assortment of features just like the Performance Profiler, with the new session looking nearly exactly alike: ANTS Memory Profiler Memory profiling is not something that I have to do very often… In the past, the few cases I’ve had to find a memory leak in an application I have usually just had to trace the code of the operations being performed to look for oddities… Sadly, I have come across more undisposed/non-using’ed IDisposable objects, usually from ADO.Net than I would like to ever see. Support is not fun, however using ANTS Memory Profiler makes this task easier. For this round of testing, I am going to use the same code from my previous example, using the WPF application. This time, I will choose the ‘Profile Memory’ option from the ANTS menu in Visual Studio, which launches the solution in its currently configured state/start-up project, and then launches the ANTS Memory Profiler to help. It prepopulates all of the fields with the current project information, and all I have to do is select the ‘Start Profiling’ option. When the window comes up, it is actually quite barren, just giving ideas on how to work the profiler. You start by getting to the point in your application that you want to profile, and then taking a ‘Memory Snapshot’. This performs a full garbage collection, and snapshots the managed heap. Using the same WPF app as before, I will go ahead and take a snapshot now. As you can see, ANTS is already giving me lots of information regarding the snapshot, however this is just a snapshot. The whole point of the profiler is to perform an action, usually one where a memory problem is being noticed, and then take another snapshot and perform a diff between them to see what has changed. I am going to go ahead and generate 5000 primes, and then take another snapshot: As you can see, ANTS is already giving me a lot of new information about this snapshot compared to the last. Information such as difference in memory usage, fragmentation, class usage, etc… If you take more snapshots, you can use the dropdown at the top to set your actual comparison snapshots. If you beneath the timeline, you will see a breadcrumb trail showing how best to approach profiling memory using ANTS. When you first do the comparison, you start on the Summary screen. You can either use the charts at the bottom, or switch to the class list screen to get to the next step. Here is the class list screen: As you can see, it lists information about all of the instances between the snapshots, as well as at the bottom giving you a way to filter by telling ANTS what your problem is. I am going to go ahead and select the Int16[] to look at the Instance Categorizer Using the instance categorizer, you can travel backwards to see where all of the instances are coming from. It may be hard to see in this image, but hopefully the lightbox (click on it) will help: I can see that all of these instances are rooted to the application through the UI TextBlock control. This image will probably be even harder to see, however using the ‘Instance Retention Graph’, you can trace an objects memory inheritance up the chain to see its roots as well. This is a simple example, as this is simply a known element. Usually you would be profiling an actual problem, and comparing those differences. I know in the past, I have spotted a problem where a new context was created per page load, and it was rooted into the application through an event. As the application began to grow, performance and reliability problems started to emerge. A tool like this would have been a great way to identify the problem quickly. Overview Overall, I think that the Red Gate ANTS Memory Profiler is a great utility for debugging those pesky leaks. 3 Biggest Pros: Easy to use interface with lots of options for configuring profiling session Intuitive and helpful interface for drilling down from summary, to instance, to root graphs ANTS provides an API for controlling the profiler. Not many options, but still helpful. 2 Biggest Cons: Inability to automatically snapshot the memory by interval Lack of complete integration with Visual Studio via an extension panel Ratings Ease of Use (9/10) – I really do believe that they have brought simplicity to the once difficult task of memory profiling. I especially liked how it stepped you further into the drilldown by directing you towards the best options. Effectiveness (10/10) – I believe that the profiler does EXACTLY what it purports to do. Features (7/10) – A really great set of features all around in the application, however, I would like to see some ability for automatically triggering snapshots based on intervals or framework level items such as events. Customer Service (10/10) – My entire experience with Red Gate personnel has been nothing but good. their people are friendly, helpful, and happy! UI / UX (9/10) – The interface is very easy to get around, and all of the options are easy to find. With a little bit of poking around, you’ll be optimizing Hello World in no time flat! Overall (9/10) – Overall, I am happy with the Memory Profiler and its features, as well as with the service I received when working with the Red Gate personnel. Thank you for reading up to here, or skipping ahead – I told you it would be shorter! Please, if you do try the product, drop me a message and let me know what you think! I would love to hear any opinions you may have on the product. Code Feel free to download the code I used above – download via DropBox

Read the article

What You Need to Know About Windows 8.1

- by Chris Hoffman

Windows 8.1 is available to everyone starting today, October 19. The latest version of Windows improves on Windows 8 in every way. It’s a big upgrade, whether you use the desktop or new touch-optimized interface. The latest version of Windows has been dubbed “an apology” by some — it’s definitely more at home on a desktop PC than Windows 8 was. However, it also offers a more fleshed out and mature tablet experience. How to Get Windows 8.1 For Windows 8 users, Windows 8.1 is completely free. It will be available as a download from the Windows Store — that’s the “Store” app in the Modern, tiled interface. Assuming upgrading to the final version will be just like upgrading to the preview version, you’ll likely see a “Get Windows 8.1″ pop-up that will take you to the Windows Store and guide you through the download process. You’ll also be able to download ISO images of Windows 8.1, so can perform a clean install to upgrade. On any new computer, you can just install Windows 8.1 without going through Windows 8. New computers will start to ship with Windows 8.1 and boxed copies of Windows 8 will be replaced by boxed copies of Windows 8.1. If you’re using Windows 7 or a previous version of Windows, the update won’t be free. Getting Windows 8.1 will cost you the same amount as a full copy of Windows 8 — $120 for the standard version. If you’re an average Windows 7 user, you’re likely better off waiting until you buy a new PC with Windows 8.1 included rather than spend this amount of money to upgrade. Improvements for Desktop Users Some have dubbed Windows 8.1 “an apology” from Microsoft, although you certainly won’t see Microsoft referring to it this way. Either way, Steven Sinofsky, who presided over Windows 8′s development, left the company shortly after Windows 8 was released. Coincidentally, Windows 8.1 contains many features that Steven Sinofsky and Microsoft refused to implement. Windows 8.1 offers the following big improvements for desktop users: Boot to Desktop: You can now log in directly to the desktop, skipping the tiled interface entirely. Disable Top-Left and Top-Right Hot Corners: The app switcher and charms bar won’t appear when you move your mouse to the top-left or top-right corners of the screen if you enable this option. No more intrusions into the desktop. The Start Button Returns: Windows 8.1 brings back an always-present Start button on the desktop taskbar, dramatically improving discoverability for new Windows 8 users and providing a bigger mouse target for remote desktops and virtual machines. Crucially, the Start menu isn’t back — clicking this button will open the full-screen Modern interface. Start menu replacements will continue to function on Windows 8.1, offering more traditional Start menus. Show All Apps By Default: Luckily, you can hide the Start screen and its tiles almost entirely. Windows 8.1 can be configured to show a full-screen list of all your installed apps when you click the Start button, with desktop apps prioritized. The only real difference is that the Start menu is now a full-screen interface. Shut Down or Restart From Start Button: You can now right-click the Start button to access Shut down, Restart, and other power options in just as many clicks as you could on Windows 7. Shared Start Screen and Desktop Backgrounds; Windows 8 limited you to just a few Steven Sinofsky-approved background images for your Start screen, but Windows 8.1 allows you to use your desktop background on the Start screen. This can make the transition between the Start screen and desktop much less jarring. The tiles or shortcuts appear to be floating above the desktop rather than off in their own separate universe. Unified Search: Unified search is back, so you can start typing and search your programs, settings, and files all at once — no more awkwardly clicking between different categories when trying to open a Control Panel screen or search for a file. These all add up to a big improvement when using Windows 8.1 on the desktop. Microsoft is being much more flexible — the Start menu is full screen, but Microsoft has relented on so many other things and you’d never have to see a tile if you didn’t want to. For more information, read our guide to optimizing Windows 8.1 for a desktop PC. These are just the improvements specifically for desktop users. Windows 8.1 includes other useful features for everyone, such as deep SkyDrive integration that allows you to store your files in the cloud without installing any additional sync programs. Improvements for Touch Users If you have a Windows 8 or Windows RT tablet or another touch-based device you use the interface formerly known as Metro on, you’ll see many other noticeable improvements. Windows 8′s new interface was half-baked when it launched, but it’s now much more capable and mature. App Updates: Windows 8′s included apps were extremely limited in many cases. For example, Internet Explorer 10 could only display ten tabs at a time and the Mail app was a barren experience devoid of features. In Windows 8.1, some apps — like Xbox Music — have been redesigned from scratch, Internet Explorer allows you to display a tab bar on-screen all the time, while apps like Mail have accumulated quite a few useful features. The Windows Store app has been entirely redesigned and is less awkward to browse. Snap Improvements: Windows 8′s Snap feature was a toy, allowing you to snap one app to a small sidebar at one side of your screen while another app consumed most of your screen. Windows 8.1 allows you to snap two apps side-by-side, seeing each app’s full interface at once. On larger displays, you can even snap three or four apps at once. Windows 8′s ability to use multiple apps at once on a tablet is compelling and unmatched by iPads and Android tablets. You can also snap two of the same apps side-by-side — to view two web pages at once, for example. More Comprehensive PC Settings: Windows 8.1 offers a more comprehensive PC settings app, allowing you to change most system settings in a touch-optimized interface. You shouldn’t have to use the desktop Control Panel on a tablet anymore — or at least not as often. Touch-Optimized File Browsing: Microsoft’s SkyDrive app allows you to browse files on your local PC, finally offering a built-in, touch-optimized way to manage files without using the desktop. Help & Tips: Windows 8.1 includes a Help+Tips app that will help guide new users through its new interface, something Microsoft stubbornly refused to add during development. There’s still no “Modern” version of Microsoft Office apps (aside from OneNote), so you’ll still have to head to use desktop Office apps on tablets. It’s not perfect, but the Modern interface doesn’t feel anywhere near as immature anymore. Read our in-depth look at the ways Microsoft’s Modern interface, formerly known as Metro, is improved in Windows 8.1 for more information. In summary, Windows 8.1 is what Windows 8 should have been. All of these improvements are on top of the many great desktop features, security improvements, and all-around battery life and performance optimizations that appeared in Windows 8. If you’re still using Windows 7 and are happy with it, there’s probably no reason to race out and buy a copy of Windows 8.1 at the rather high price of $120. But, if you’re using Windows 8, it’s a big upgrade no matter what you’re doing. If you buy a new PC and it comes with Windows 8.1, you’re getting a much more flexible and comfortable experience. If you’re holding off on buying a new computer because you don’t want Windows 8, give Windows 8.1 a try — yes, it’s different, but Microsoft has compromised on the desktop while making a lot of improvements to the new interface. You just might find that Windows 8.1 is now a worthwhile upgrade, even if you only want to use the desktop.

Read the article

Heaps of Trouble?

- by Paul White NZ

If you’re not already a regular reader of Brad Schulz’s blog, you’re missing out on some great material. In his latest entry, he is tasked with optimizing a query run against tables that have no indexes at all. The problem is, predictably, that performance is not very good. The catch is that we are not allowed to create any indexes (or even new statistics) as part of our optimization efforts. In this post, I’m going to look at the problem from a slightly different angle, and present an alternative solution to the one Brad found. Inevitably, there’s going to be some overlap between our entries, and while you don’t necessarily need to read Brad’s post before this one, I do strongly recommend that you read it at some stage; he covers some important points that I won’t cover again here. The Example We’ll use data from the AdventureWorks database, copied to temporary unindexed tables. A script to create these structures is shown below: CREATE TABLE #Custs ( CustomerID INTEGER NOT NULL, TerritoryID INTEGER NULL, CustomerType NCHAR(1) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, ); GO CREATE TABLE #Prods ( ProductMainID INTEGER NOT NULL, ProductSubID INTEGER NOT NULL, ProductSubSubID INTEGER NOT NULL, Name NVARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, ); GO CREATE TABLE #OrdHeader ( SalesOrderID INTEGER NOT NULL, OrderDate DATETIME NOT NULL, SalesOrderNumber NVARCHAR(25) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, CustomerID INTEGER NOT NULL, ); GO CREATE TABLE #OrdDetail ( SalesOrderID INTEGER NOT NULL, OrderQty SMALLINT NOT NULL, LineTotal NUMERIC(38,6) NOT NULL, ProductMainID INTEGER NOT NULL, ProductSubID INTEGER NOT NULL, ProductSubSubID INTEGER NOT NULL, ); GO INSERT #Custs ( CustomerID, TerritoryID, CustomerType ) SELECT C.CustomerID, C.TerritoryID, C.CustomerType FROM AdventureWorks.Sales.Customer C WITH (TABLOCK); GO INSERT #Prods ( ProductMainID, ProductSubID, ProductSubSubID, Name ) SELECT P.ProductID, P.ProductID, P.ProductID, P.Name FROM AdventureWorks.Production.Product P WITH (TABLOCK); GO INSERT #OrdHeader ( SalesOrderID, OrderDate, SalesOrderNumber, CustomerID ) SELECT H.SalesOrderID, H.OrderDate, H.SalesOrderNumber, H.CustomerID FROM AdventureWorks.Sales.SalesOrderHeader H WITH (TABLOCK); GO INSERT #OrdDetail ( SalesOrderID, OrderQty, LineTotal, ProductMainID, ProductSubID, ProductSubSubID ) SELECT D.SalesOrderID, D.OrderQty, D.LineTotal, D.ProductID, D.ProductID, D.ProductID FROM AdventureWorks.Sales.SalesOrderDetail D WITH (TABLOCK); The query itself is a simple join of the four tables: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #Prods P JOIN #OrdDetail D ON P.ProductMainID = D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID JOIN #OrdHeader H ON D.SalesOrderID = H.SalesOrderID JOIN #Custs C ON H.CustomerID = C.CustomerID ORDER BY P.ProductMainID ASC OPTION (RECOMPILE, MAXDOP 1); Remember that these tables have no indexes at all, and only the single-column sampled statistics SQL Server automatically creates (assuming default settings). The estimated query plan produced for the test query looks like this (click to enlarge): The Problem The problem here is one of cardinality estimation – the number of rows SQL Server expects to find at each step of the plan. The lack of indexes and useful statistical information means that SQL Server does not have the information it needs to make a good estimate. Every join in the plan shown above estimates that it will produce just a single row as output. Brad covers the factors that lead to the low estimates in his post. In reality, the join between the #Prods and #OrdDetail tables will produce 121,317 rows. It should not surprise you that this has rather dire consequences for the remainder of the query plan. In particular, it makes a nonsense of the optimizer’s decision to use Nested Loops to join to the two remaining tables. Instead of scanning the #OrdHeader and #Custs tables once (as it expected), it has to perform 121,317 full scans of each. The query takes somewhere in the region of twenty minutes to run to completion on my development machine. A Solution At this point, you may be thinking the same thing I was: if we really are stuck with no indexes, the best we can do is to use hash joins everywhere. We can force the exclusive use of hash joins in several ways, the two most common being join and query hints. A join hint means writing the query using the INNER HASH JOIN syntax; using a query hint involves adding OPTION (HASH JOIN) at the bottom of the query. The difference is that using join hints also forces the order of the join, whereas the query hint gives the optimizer freedom to reorder the joins at its discretion. Adding the OPTION (HASH JOIN) hint results in this estimated plan: That produces the correct output in around seven seconds, which is quite an improvement! As a purely practical matter, and given the rigid rules of the environment we find ourselves in, we might leave things there. (We can improve the hashing solution a bit – I’ll come back to that later on). Faster Nested Loops It might surprise you to hear that we can beat the performance of the hash join solution shown above using nested loops joins exclusively, and without breaking the rules we have been set. The key to this part is to realize that a condition like (A = B) can be expressed as (A <= B) AND (A >= B). Armed with this tremendous new insight, we can rewrite the join predicates like so: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #OrdDetail D JOIN #OrdHeader H ON D.SalesOrderID >= H.SalesOrderID AND D.SalesOrderID <= H.SalesOrderID JOIN #Custs C ON H.CustomerID >= C.CustomerID AND H.CustomerID <= C.CustomerID JOIN #Prods P ON P.ProductMainID >= D.ProductMainID AND P.ProductMainID <= D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID ORDER BY D.ProductMainID OPTION (RECOMPILE, LOOP JOIN, MAXDOP 1, FORCE ORDER); I’ve also added LOOP JOIN and FORCE ORDER query hints to ensure that only nested loops joins are used, and that the tables are joined in the order they appear. The new estimated execution plan is: This new query runs in under 2 seconds. Why Is It Faster? The main reason for the improvement is the appearance of the eager Index Spools, which are also known as index-on-the-fly spools. If you read my Inside The Optimiser series you might be interested to know that the rule responsible is called JoinToIndexOnTheFly. An eager index spool consumes all rows from the table it sits above, and builds a index suitable for the join to seek on. Taking the index spool above the #Custs table as an example, it reads all the CustomerID and TerritoryID values with a single scan of the table, and builds an index keyed on CustomerID. The term ‘eager’ means that the spool consumes all of its input rows when it starts up. The index is built in a work table in tempdb, has no associated statistics, and only exists until the query finishes executing. The result is that each unindexed table is only scanned once, and just for the columns necessary to build the temporary index. From that point on, every execution of the inner side of the join is answered by a seek on the temporary index – not the base table. A second optimization is that the sort on ProductMainID (required by the ORDER BY clause) is performed early, on just the rows coming from the #OrdDetail table. The optimizer has a good estimate for the number of rows it needs to sort at that stage – it is just the cardinality of the table itself. The accuracy of the estimate there is important because it helps determine the memory grant given to the sort operation. Nested loops join preserves the order of rows on its outer input, so sorting early is safe. (Hash joins do not preserve order in this way, of course). The extra lazy spool on the #Prods branch is a further optimization that avoids executing the seek on the temporary index if the value being joined (the ‘outer reference’) hasn’t changed from the last row received on the outer input. It takes advantage of the fact that rows are still sorted on ProductMainID, so if duplicates exist, they will arrive at the join operator one after the other. The optimizer is quite conservative about introducing index spools into a plan, because creating and dropping a temporary index is a relatively expensive operation. It’s presence in a plan is often an indication that a useful index is missing. I want to stress that I rewrote the query in this way primarily as an educational exercise – I can’t imagine having to do something so horrible to a production system. Improving the Hash Join I promised I would return to the solution that uses hash joins. You might be puzzled that SQL Server can create three new indexes (and perform all those nested loops iterations) faster than it can perform three hash joins. The answer, again, is down to the poor information available to the optimizer. Let’s look at the hash join plan again: Two of the hash joins have single-row estimates on their build inputs. SQL Server fixes the amount of memory available for the hash table based on this cardinality estimate, so at run time the hash join very quickly runs out of memory. This results in the join spilling hash buckets to disk, and any rows from the probe input that hash to the spilled buckets also get written to disk. The join process then continues, and may again run out of memory. This is a recursive process, which may eventually result in SQL Server resorting to a bailout join algorithm, which is guaranteed to complete eventually, but may be very slow. The data sizes in the example tables are not large enough to force a hash bailout, but it does result in multiple levels of hash recursion. You can see this for yourself by tracing the Hash Warning event using the Profiler tool. The final sort in the plan also suffers from a similar problem: it receives very little memory and has to perform multiple sort passes, saving intermediate runs to disk (the Sort Warnings Profiler event can be used to confirm this). Notice also that because hash joins don’t preserve sort order, the sort cannot be pushed down the plan toward the #OrdDetail table, as in the nested loops plan. Ok, so now we understand the problems, what can we do to fix it? We can address the hash spilling by forcing a different order for the joins: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #Prods P JOIN #Custs C JOIN #OrdHeader H ON H.CustomerID = C.CustomerID JOIN #OrdDetail D ON D.SalesOrderID = H.SalesOrderID ON P.ProductMainID = D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID ORDER BY D.ProductMainID OPTION (MAXDOP 1, HASH JOIN, FORCE ORDER); With this plan, each of the inputs to the hash joins has a good estimate, and no hash recursion occurs. The final sort still suffers from the one-row estimate problem, and we get a single-pass sort warning as it writes rows to disk. Even so, the query runs to completion in three or four seconds. That’s around half the time of the previous hashing solution, but still not as fast as the nested loops trickery. Final Thoughts SQL Server’s optimizer makes cost-based decisions, so it is vital to provide it with accurate information. We can’t really blame the performance problems highlighted here on anything other than the decision to use completely unindexed tables, and not to allow the creation of additional statistics. I should probably stress that the nested loops solution shown above is not one I would normally contemplate in the real world. It’s there primarily for its educational and entertainment value. I might perhaps use it to demonstrate to the sceptical that SQL Server itself is crying out for an index. Be sure to read Brad’s original post for more details. My grateful thanks to him for granting permission to reuse some of his material. Paul White Email: [email protected] Twitter: @PaulWhiteNZ

Read the article

C#: String Concatenation vs Format vs StringBuilder

- by James Michael Hare

I was looking through my groups’ C# coding standards the other day and there were a couple of legacy items in there that caught my eye. They had been passed down from committee to committee so many times that no one even thought to second guess and try them for a long time. It’s yet another example of how micro-optimizations can often get the best of us and cause us to write code that is not as maintainable as it could be for the sake of squeezing an extra ounce of performance out of our software. So the two standards in question were these, in paraphrase: Prefer StringBuilder or string.Format() to string concatenation. Prefer string.Equals() with case-insensitive option to string.ToUpper().Equals(). Now some of you may already know what my results are going to show, as these items have been compared before on many blogs, but I think it’s always worth repeating and trying these yourself. So let’s dig in. The first test was a pretty standard one. When concattenating strings, what is the best choice: StringBuilder, string concattenation, or string.Format()? So before we being I read in a number of iterations from the console and a length of each string to generate. Then I generate that many random strings of the given length and an array to hold the results. Why am I so keen to keep the results? Because I want to be able to snapshot the memory and don’t want garbage collection to collect the strings, hence the array to keep hold of them. I also didn’t want the random strings to be part of the allocation, so I pre-allocate them and the array up front before the snapshot. So in the code snippets below: num – Number of iterations. strings – Array of randomly generated strings. results – Array to hold the results of the concatenation tests. timer – A System.Diagnostics.Stopwatch() instance to time code execution. start – Beginning memory size. stop – Ending memory size. after – Memory size after final GC. So first, let’s look at the concatenation loop: 1: // build num strings using concattenation. 2: for (int i = 0; i < num; i++) 3: { 4: results[i] = "This is test #" + i + " with a result of " + strings[i]; 5: } Pretty standard, right? Next for string.Format(): 1: // build strings using string.Format() 2: for (int i = 0; i < num; i++) 3: { 4: results[i] = string.Format("This is test #{0} with a result of {1}", i, strings[i]); 5: } Finally, StringBuilder: 1: // build strings using StringBuilder 2: for (int i = 0; i < num; i++) 3: { 4: var builder = new StringBuilder(); 5: builder.Append("This is test #"); 6: builder.Append(i); 7: builder.Append(" with a result of "); 8: builder.Append(strings[i]); 9: results[i] = builder.ToString(); 10: } So I take each of these loops, and time them by using a block like this: 1: // get the total amount of memory used, true tells it to run GC first. 2: start = System.GC.GetTotalMemory(true); 3: 4: // restart the timer 5: timer.Reset(); 6: timer.Start(); 7: 8: // *** code to time and measure goes here. *** 9: 10: // get the current amount of memory, stop the timer, then get memory after GC. 11: stop = System.GC.GetTotalMemory(false); 12: timer.Stop(); 13: other = System.GC.GetTotalMemory(true); So let’s look at what happens when I run each of these blocks through the timer and memory check at 500,000 iterations: 1: Operator + - Time: 547, Memory: 56104540/55595960 - 500000 2: string.Format() - Time: 749, Memory: 57295812/55595960 - 500000 3: StringBuilder - Time: 608, Memory: 55312888/55595960 – 500000 Egad! string.Format brings up the rear and + triumphs, well, at least in terms of speed. The concat burns more memory than StringBuilder but less than string.Format(). This shows two main things: StringBuilder is not always the panacea many think it is. The difference between any of the three is miniscule! The second point is extremely important! You will often here people who will grasp at results and say, “look, operator + is 10% faster than StringBuilder so always use StringBuilder.” Statements like this are a disservice and often misleading. For example, if I had a good guess at what the size of the string would be, I could have preallocated my StringBuffer like so: 1: for (int i = 0; i < num; i++) 2: { 3: // pre-declare StringBuilder to have 100 char buffer. 4: var builder = new StringBuilder(100); 5: builder.Append("This is test #"); 6: builder.Append(i); 7: builder.Append(" with a result of "); 8: builder.Append(strings[i]); 9: results[i] = builder.ToString(); 10: } Now let’s look at the times: 1: Operator + - Time: 551, Memory: 56104412/55595960 - 500000 2: string.Format() - Time: 753, Memory: 57296484/55595960 - 500000 3: StringBuilder - Time: 525, Memory: 59779156/55595960 - 500000 Whoa! All of the sudden StringBuilder is back on top again! But notice, it takes more memory now. This makes perfect sense if you examine the IL behind the scenes. Whenever you do a string concat (+) in your code, it examines the lengths of the arguments and creates a StringBuilder behind the scenes of the appropriate size for you. But even IF we know the approximate size of our StringBuilder, look how much less readable it is! That’s why I feel you should always take into account both readability and performance. After all, consider all these timings are over 500,000 iterations. That’s at best 0.0004 ms difference per call which is neglidgable at best. The key is to pick the best tool for the job. What do I mean? Consider these awesome words of wisdom: Concatenate (+) is best at concatenating. StringBuilder is best when you need to building. Format is best at formatting. Totally Earth-shattering, right! But if you consider it carefully, it actually has a lot of beauty in it’s simplicity. Remember, there is no magic bullet. If one of these always beat the others we’d only have one and not three choices. The fact is, the concattenation operator (+) has been optimized for speed and looks the cleanest for joining together a known set of strings in the simplest manner possible. StringBuilder, on the other hand, excels when you need to build a string of inderterminant length. Use it in those times when you are looping till you hit a stop condition and building a result and it won’t steer you wrong. String.Format seems to be the looser from the stats, but consider which of these is more readable. Yes, ignore the fact that you could do this with ToString() on a DateTime. 1: // build a date via concatenation 2: var date1 = (month < 10 ? string.Empty : "0") + month + '/' 3: + (day < 10 ? string.Empty : "0") + '/' + year; 4: 5: // build a date via string builder 6: var builder = new StringBuilder(10); 7: if (month < 10) builder.Append('0'); 8: builder.Append(month); 9: builder.Append('/'); 10: if (day < 10) builder.Append('0'); 11: builder.Append(day); 12: builder.Append('/'); 13: builder.Append(year); 14: var date2 = builder.ToString(); 15: 16: // build a date via string.Format 17: var date3 = string.Format("{0:00}/{1:00}/{2:0000}", month, day, year); 18: So the strength in string.Format is that it makes constructing a formatted string easy to read. Yes, it’s slower, but look at how much more elegant it is to do zero-padding and anything else string.Format does. So my lesson is, don’t look for the silver bullet! Choose the best tool. Micro-optimization almost always bites you in the end because you’re sacrificing readability for performance, which is almost exactly the wrong choice 90% of the time. I love the rules of optimization. They’ve been stated before in many forms, but here’s how I always remember them: For Beginners: Do not optimize. For Experts: Do not optimize yet. It’s so true. Most of the time on today’s modern hardware, a micro-second optimization at the sake of readability will net you nothing because it won’t be your bottleneck. Code for readability, choose the best tool for the job which will usually be the most readable and maintainable as well. Then, and only then, if you need that extra performance boost after profiling your code and exhausting all other options… then you can start to think about optimizing.

Read the article

Dotfuscator Deep Dive with WP7

- by Bil Simser

I thought I would share some experience with code obfuscation (specifically the Dotfuscator product) and Windows Phone 7 apps. These days twitter is a buzz with black hat and white operations coming out about how the marketplace is insecure and Microsoft failed, blah, blah, blah. So it’s that much more important to protect your intellectual property. You should protect it no matter what when releasing apps into the wild but more so when someone is paying for them. You want to protect the time and effort that went into your code and have some comfort that the casual hacker isn’t going to usurp your next best thing. Enter code obfuscation. Code obfuscation is one tool that can help protect your IP. Basically it goes into your compiled assemblies, rewrites things at an IL level (like renaming methods and classes and hiding logic flow) and rewrites it back so that the assembly or executable is still fully functional but prying eyes using a tool like ILDASM or Reflector can’t see what’s going on. You can read more about code obfuscation here on Wikipedia. A word to the wise. Code obfuscation isn’t 100% secure. More so on the WP7 platform where the OS expects certain things to be as they were meant to be. So don’t expect 100% obfuscation of every class and every method and every property. It’s just not going to happen. What this does do is give you some level of protection but don’t put all your eggs in one basket and call it done. Like I said, this is just one step in the process. There are a few tools out there that provide code obfuscation and support the Windows Phone 7 platform (see links to other tools at the end of this post). One such tool is Dotfuscator from PreEmptive solutions. The thing about Dotfuscator is that they’ve struck a deal with Microsoft to provide a *free* copy of their commercial product for Windows Phone 7. The only drawback is that it only runs until March 31, 2010. However it’s a good place to start and the focus of this article. Getting Started When you fire up Dotfuscator you’re presented with a dialog to start a new project or load a previous one. We’ll start with a new project. You’re then looking at a somewhat blank screen that shows an Input tab (among others) and you’re probably wondering what to do? Click on the folder icon (first one) and browse to where your xap file is. At this point you can save the project and click on the arrow to start the process. Bam! You’re done. Right? Think again. The program did indeed run and create a new version of your xap (doing it’s thing and rewriting back your *obfuscated* assemblies) but let’s take a look at the assembly in Reflector to see the end result. Remember a xap file is really just a glorified zip file (or cab file if you prefer). When you ran Dotfuscator for the first time with the default settings you’ll see it created a new version of your xap in a folder under “My Documents” called “Dotfuscated” (you can configure the output directory in settings). Here’s the new xap file. Since a xap is just a zip, rename it to .cab or .zip or something and open it with your favorite unarchive program (I use WinRar but it doesn’t matter as long as it can unzip files). If you already have the xap file associated with your unarchive tool the rename isn’t needed. Once renamed extract the contents of the xap to your hard drive: Now you’ll have a folder with the contents of the xap file extracted: Double click or load up your assembly (WindowsPhoneDataBoundApplication1.dll in the example) in Reflector and let’s see the results: Hmm. That doesn’t look right. I can see all the methods and the code is all there for my LoadData method I wanted to protect. Product failure. Let’s return it for a refund. Hold your horses. We need to check out the settings in the program first. Remember when we loaded up our xap file. It started us on the Input tab but there was a settings tab before that. Wonder what it does? Here’s the default settings: Renaming Taking a closer look, all of the settings in Feature are disabled. WTF? Yeah, it leaves me scratching my head why an obfuscator by default doesn’t obfuscate. However it’s a simple fix to change these settings. Let’s enable Renaming as it sounds like a good start. Renaming obscures code by renaming methods and fields to names that are not understandable. Great. Run the tool again and go through the process of unzipping the updated xap and let’s take a look in Reflector again at our project. This looks a lot better. Lots of methods named a, b, c, d, etc. That’ll help slow hackers down a bit. What about our logic that we spent days weeks on? Let’s take a look at the LoadData method: What gives? We have renaming enabled but all of our code is still there. If you look through all your methods you’ll find it’s still sitting there out in the open. Control Flow Back to the settings page again. Let’s enable Control Flow now. Control Flow obfuscation synthesizes branching, conditional, and iterative constructs (such as if, for, and while) that produce valid executable logic, but yield non-deterministic semantic results when decompilation is attempted. In other words, the code runs as before, but decompilers cannot reproduce the original code. Do the dance again and let’s see the results in Reflector. Ahh, that’s better. Methods renamed *and* nobody can look at our LoadData method now. Life is good. More than Minimum This is the bare minimum to obfuscate your xap to at least a somewhat comfortable level. However I did find that while this worked in my Hello World demo, it didn’t work on one of my real world apps. I had to do some extra tweaking with that. Below are the screens that I used on one app that worked. I’m not sure what it was about the app that the approach above didn’t work with (maybe the extra assembly?) but it works and I’m happy with it. YMMV. Remember to test your obfuscated app on your device first before submitting to ensure you haven’t obfuscated the obfuscator. settings tab: rename tab: string encryption tab: premark tab: A few final notes Play with the settings and keep bumping up the bar to try to get as much obfuscation as you can. The more the better but remember you can overdo it. Always (always, always, always) deploy your obfuscated xap to your device and test it before submitting to the marketplace. I didn’t and got rejected because I had gone overboard with the obfuscation so the app wouldn’t launch at all. Not everything is going to be obfuscated. Specifically I don’t see a way to obfuscate auto properties and a few other language features. Again, if you crank the settings up you might hide these but I haven’t spent a lot of time optimizing the process. Some people might say to obfuscate your xaml using string encryption but again, test, test, test. Xaml is picky so too much obfuscation (or any) might disable your app or produce odd rendering effets. Remember, obfuscation is not 100% secure! Don’t rely on it as a sole way of protecting your assets. Other Tools Dotfuscator is one just product and isn’t the end-all be-all to obfuscation so check out others below. For example, Crypto can make it so Reflector doesn’t even recognize the app as a .NET one and won’t open it. Others can encrypt resources and Xaml markup files. Here are some other obfuscators that support the Windows Phone 7 platform. Feel free to give them a try and let people know your experience with them! Dotfuscator Windows Phone Edition Crypto Obfuscator for .NET DeepSea Obfuscation

Search Results

Search found 671 results on 27 pages for 'optimizing'.

Page 26/27 | < Previous Page | 22 23 24 25 26 27 | Next Page >

- by Sagekilla

- by Joey Adams

- by Longpoke

- by Joe Majewski

- by CFP

- by Juan Pablo Califano

- by Jenkz

- by James P.

- by ferronrsmith

- by JRameau

- by Piet

- by Pinal Dave

- by anil.kasalanati

- by Mat Keep

- by James Michael Hare

- by Bertrand Le Roy

- by Glen Borkowski

- by david.butler(at)oracle.com

- by BlackRabbitCoder

- by Michael Snow

- by ToStringTheory

- by Chris Hoffman

- by Paul White NZ

- by James Michael Hare

- by Bil Simser

< Previous Page | 22 23 24 25 26 27 | Next Page >