rate limiting - Page 40

Writing a spell checker similar to "did you mean"

- by user888734

I'm hoping to write a spellchecker for search queries in a web application - not unlike Google's "Did you mean?" The algorithm will be loosely based on this: http://catalog.ldc.upenn.edu/LDC2006T13 In short, it generates correction candidates and scores them on how often they appear (along with adjacent words in the search query) in an enormous dataset of known n-grams - Google Web 1T - which contains well over 1 billion 5-grams. I'm not using the Web 1T dataset, but building my n-gram sets from my own documents - about 200k docs, and I'm estimating tens or hundreds of millions of n-grams will be generated. This kind of process is pushing the limits of my understanding of basic computing performance - can I simply load my n-grams into memory in a hashtable or dictionary when the app starts? Is the only limiting factor the amount of memory on the machine? Or am I barking up the wrong tree? Perhaps putting all my n-grams in a graph database with some sort of tree query optimisation? Could that ever be fast enough?

Read the article

Results stored in a session - good idea?

- by Nick

To give a bit of background, lets say it's a generic results page, which is paginated so there are X results per page. Generally to do this, I have two queries on the page: to get the total number of results to get the results, limiting by the correct page's resultset However, recently I've been trying to cut down on the queries the site is making, and I thought one way to do this would be to only do the query if any parameters to the page have changed (except of course the page number)? This would then cache all the result id's in a session, which can be sliced when I need to return the correct resultset for that page. I was trying to look around the net to see if there are downsides of this method, but I've found very little information about it. Has anyone done this before? Is it a good idea?

Read the article

Are Web Safe Colors Still Relevant?

- by VxJasonxV

I still remember one of my high school teachers lecturing us about the "web safe colors". A set of 216-256 colors that you should confine your designs to use, and nothing else besides them. Last I knew, Photoshop still has the "web safe" yield icon[1] on it's color picker. Are web safe colors still a concern? Outside of the obvious application (accessibility, legacy software versions, etc.), how much consideration should I give to limiting my color choice for my general audience? [1] Or was it the cube? I never remember.

Read the article

Why does the use of interface-based programming appear to be limited to behaviour?

- by Carnotaurus

I have been doing a little thinking about inheritance vs. realization vs. composition. I am not about to post the whole detail here. So I was wondering, when are not talking about supporting unit testing: Why does interface-based programming seem to focus upon the grouping of common behaviour, e.g., IPettable (for an animal), IEditable (for a user control), ISubmitable (for a form), etc. Why does the use of interface-based programming appear to be limited to behaviour when we could pragmatically group not so much on behaviour but on commonsense physical similarities which could have nothing to do with behaviour? It is not that there is some limiting feature within OOP; so how come?

Read the article

How will I know when my company is ready to receive an investment? [migrated]

- by gunshor

How will I know when my company is ready to receive an investment? I am starting a company and have bootstrapped it so far. I have produced four versions of the demo. The first fully-working version is underway. Getting this to a beta phase product will require capital, which requires an investment, which requires an investor, which requires I stop working on the product and go out and talk to people about it. The last time I raised money from investors, it took a while but I was successful. I don't want it to take a while. I want it to be brain dead simple for an investor to understand the value so that I can optimize the time I spend with the product. Is my logic flawed? What is the best way to approach raising money, while limiting both my time and risk? Thanks.

Read the article

Does an inventory limit in an MMORPG make sense?

- by Philipp

I am currently developing a simple 2d MMORPG. My current focus is the inventory system. I am currently wondering if I should implement a limit on what a player character can carry. Either in form of a maximum weight, a limited number of inventory slots, or a combination of both. Almost every MMORPG I ever played limits inventory space. But plausibility aside, is this really necessary from a gameplay point of view? Maybe it would in fact improve the game experience when I just let the players carry as much stuff as they want. tl;dr: What is the game development rationale behind limiting carrying capacity of player characters?

Read the article

What is an achievable way of setting content budgets (e.g. polygon count) for level content in a 3D title?

- by MrCranky

In answering this question for swquinn, the answer raised a more pertinent question that I'd like to hear answers to. I'll post our own strategy (promise I won't accept it as the answer), but I'd like to hear others. Specifically: how do you go about setting a sensible budget for your content team. Usually one of the very first questions asked in a development is: what's our polygon budget? Of course, these days it's rare that vertex/poly count alone is the limiting factor, instead shader complexity, fill-rate, lighting complexity, all come into play. What the content team want are some hard numbers / limits to work to such that they have a reasonable expectation that their content, once it actually gets into the engine, will not be too heavy. Given that 'it depends' isn't a particularly useful answer, I'd like to hear a strategy that allows me to give them workable limits without being a) misleading, or b) wrong.

Read the article

what does calling ´this´ outside of a jquery plugin refer to

- by Richard

Hi, I am using the liveTwitter plugin The problem is that I need to stop the plugin from hitting the Twitter api. According to the documentation I need to do this $("#tab1 .container_twitter_status").each(function(){ this.twitter.stop(); }); Already, the each does not make sense on an id and what does this refer to? Anyway, I get an undefined error. I will paste the plugin code and hope it makes sense to somebody MY only problem thusfar with this plugin is that I need to be able to stop it. thanks in advance, Richard /* * jQuery LiveTwitter 1.5.0 * - Live updating Twitter plugin for jQuery * * Copyright (c) 2009-2010 Inge Jørgensen (elektronaut.no) * Licensed under the MIT license (MIT-LICENSE.txt) * * $Date: 2010/05/30$ */ /* * Usage example: * $("#twitterSearch").liveTwitter('bacon', {limit: 10, rate: 15000}); */ (function($){ if(!$.fn.reverse){ $.fn.reverse = function() { return this.pushStack(this.get().reverse(), arguments); }; } $.fn.liveTwitter = function(query, options, callback){ var domNode = this; $(this).each(function(){ var settings = {}; // Handle changing of options if(this.twitter) { settings = jQuery.extend(this.twitter.settings, options); this.twitter.settings = settings; if(query) { this.twitter.query = query; } this.twitter.limit = settings.limit; this.twitter.mode = settings.mode; if(this.twitter.interval){ this.twitter.refresh(); } if(callback){ this.twitter.callback = callback; } // ..or create a new twitter object } else { // Extend settings with the defaults settings = jQuery.extend({ mode: 'search', // Mode, valid options are: 'search', 'user_timeline' rate: 15000, // Refresh rate in ms limit: 10, // Limit number of results refresh: true }, options); // Default setting for showAuthor if not provided if(typeof settings.showAuthor == "undefined"){ settings.showAuthor = (settings.mode == 'user_timeline') ? false : true; } // Set up a dummy function for the Twitter API callback if(!window.twitter_callback){ window.twitter_callback = function(){return true;}; } this.twitter = { settings: settings, query: query, limit: settings.limit, mode: settings.mode, interval: false, container: this, lastTimeStamp: 0, callback: callback, // Convert the time stamp to a more human readable format relativeTime: function(timeString){ var parsedDate = Date.parse(timeString); var delta = (Date.parse(Date()) - parsedDate) / 1000; var r = ''; if (delta < 60) { r = delta + ' seconds ago'; } else if(delta < 120) { r = 'a minute ago'; } else if(delta < (45*60)) { r = (parseInt(delta / 60, 10)).toString() + ' minutes ago'; } else if(delta < (90*60)) { r = 'an hour ago'; } else if(delta < (24*60*60)) { r = '' + (parseInt(delta / 3600, 10)).toString() + ' hours ago'; } else if(delta < (48*60*60)) { r = 'a day ago'; } else { r = (parseInt(delta / 86400, 10)).toString() + ' days ago'; } return r; }, // Update the timestamps in realtime refreshTime: function() { var twitter = this; $(twitter.container).find('span.time').each(function(){ $(this).html(twitter.relativeTime(this.timeStamp)); }); }, // Handle reloading refresh: function(initialize){ var twitter = this; if(this.settings.refresh || initialize) { var url = ''; var params = {}; if(twitter.mode == 'search'){ params.q = this.query; if(this.settings.geocode){ params.geocode = this.settings.geocode; } if(this.settings.lang){ params.lang = this.settings.lang; } if(this.settings.rpp){ params.rpp = this.settings.rpp; } else { params.rpp = this.settings.limit; } // Convert params to string var paramsString = []; for(var param in params){ if(params.hasOwnProperty(param)){ paramsString[paramsString.length] = param + '=' + encodeURIComponent(params[param]); } } paramsString = paramsString.join("&"); url = "http://search.twitter.com/search.json?"+paramsString+"&callback=?"; } else if(twitter.mode == 'user_timeline') { url = "http://api.twitter.com/1/statuses/user_timeline/"+encodeURIComponent(this.query)+".json?count="+twitter.limit+"&callback=?"; } else if(twitter.mode == 'list') { var username = encodeURIComponent(this.query.user); var listname = encodeURIComponent(this.query.list); url = "http://api.twitter.com/1/"+username+"/lists/"+listname+"/statuses.json?per_page="+twitter.limit+"&callback=?"; } $.getJSON(url, function(json) { var results = null; if(twitter.mode == 'search'){ results = json.results; } else { results = json; } var newTweets = 0; $(results).reverse().each(function(){ var screen_name = ''; var profile_image_url = ''; if(twitter.mode == 'search') { screen_name = this.from_user; profile_image_url = this.profile_image_url; created_at_date = this.created_at; } else { screen_name = this.user.screen_name; profile_image_url = this.user.profile_image_url; // Fix for IE created_at_date = this.created_at.replace(/^(\w+)\s(\w+)\s(\d+)(.*)(\s\d+)$/, "$1, $3 $2$5$4"); } var userInfo = this.user; var linkified_text = this.text.replace(/[A-Za-z]+:\/\/[A-Za-z0-9-_]+\.[A-Za-z0-9-_:%&\?\/.=]+/, function(m) { return m.link(m); }); linkified_text = linkified_text.replace(/@[A-Za-z0-9_]+/g, function(u){return u.link('http://twitter.com/'+u.replace(/^@/,''));}); linkified_text = linkified_text.replace(/#[A-Za-z0-9_\-]+/g, function(u){return u.link('http://search.twitter.com/search?q='+u.replace(/^#/,'%23'));}); if(!twitter.settings.filter || twitter.settings.filter(this)) { if(Date.parse(created_at_date) > twitter.lastTimeStamp) { newTweets += 1; var tweetHTML = '<div class="tweet tweet-'+this.id+'">'; if(twitter.settings.showAuthor) { tweetHTML += '<img width="24" height="24" src="'+profile_image_url+'" />' + '<p class="text"><span class="username"><a href="http://twitter.com/'+screen_name+'">'+screen_name+'</a>:</span> '; } else { tweetHTML += '<p class="text"> '; } tweetHTML += linkified_text + ' <span class="time">'+twitter.relativeTime(created_at_date)+'</span>' + '</p>' + '</div>'; $(twitter.container).prepend(tweetHTML); var timeStamp = created_at_date; $(twitter.container).find('span.time:first').each(function(){ this.timeStamp = timeStamp; }); if(!initialize) { $(twitter.container).find('.tweet-'+this.id).hide().fadeIn(); } twitter.lastTimeStamp = Date.parse(created_at_date); } } }); if(newTweets > 0) { // Limit number of entries $(twitter.container).find('div.tweet:gt('+(twitter.limit-1)+')').remove(); // Run callback if(twitter.callback){ twitter.callback(domNode, newTweets); } // Trigger event $(domNode).trigger('tweets'); } }); } }, start: function(){ var twitter = this; if(!this.interval){ this.interval = setInterval(function(){twitter.refresh();}, twitter.settings.rate); this.refresh(true); } }, stop: function(){ if(this.interval){ clearInterval(this.interval); this.interval = false; } } }; var twitter = this.twitter; this.timeInterval = setInterval(function(){twitter.refreshTime();}, 5000); this.twitter.start(); } }); return this; }; })(jQuery);

Read the article

How can I use Perl regular expressions to parse XML data?

- by Luke

I have a pretty long piece of XML that I want to parse. I want to remove everything except for the subclass-code and city. So that I am left with something like the example below. EXAMPLE TEST SUBCLASS|MIAMI CODE <?xml version="1.0" standalone="no"?> <web-export> <run-date>06/01/2010 <pub-code>TEST <ad-type>TEST <cat-code>Real Estate</cat-code> <class-code>TEST</class-code> <subclass-code>TEST SUBCLASS</subclass-code> <placement-description></placement-description> <position-description>Town House</position-description> <subclass3-code></subclass3-code> <subclass4-code></subclass4-code> <ad-number>0000284708-01</ad-number> <start-date>05/28/2010</start-date> <end-date>06/09/2010</end-date> <line-count>6</line-count> <run-count>13</run-count> <customer-type>Private Party</customer-type> <account-number>100099237</account-number> <account-name>DOE, JOHN</account-name> <addr-1>207 CLARENCE STREET</addr-1> <addr-2> </addr-2> <city>MIAMI</city> <state>FL</state> <postal-code>02910</postal-code> <country>USA</country> <phone-number>4014612880</phone-number> <fax-number></fax-number> <url-addr> </url-addr> <email-addr>[email protected]</email-addr> <pay-flag>N</pay-flag> <ad-description>DEANESTATES2BEDS2BATHSAPPLIANCED</ad-description> <order-source>Import</order-source> <order-status>Live</order-status> <payor-acct>100099237</payor-acct> <agency-flag>N</agency-flag> <rate-note></rate-note> <ad-content> MIAMI/Dean Estates: 2 beds, 2 baths. Applianced. Central air. Carpets. Laundry. 2 decks. Pool. Parking. Close to everything.No smoking. No utilities. $1275 mo. 401-578-1501. </ad-content> </ad-type> </pub-code> </run-date> </web-export> PERL So what I want to do is open an existing file read the contents then use regular expressions to eliminate the unnecessary XML tags. open(READFILE, "FILENAME"); while(<READFILE>) { $_ =~ s/<\?xml version="(.*)" standalone="(.*)"\?>\n.*//g; $_ =~ s/<subclass-code>//g; $_ =~ s/<\/subclass-code>\n.*/|/g; $_ =~ s/(.*)PJ RER Houses /PJ RER Houses/g; $_ =~ s/\G //g; $_ =~ s/<city>//g; $_ =~ s/<\/city>\n.*//g; $_ =~ s/<(\/?)web-export>(.*)\n.*//g; $_ =~ s/<(\/?)run-date>(.*)\n.*//g; $_ =~ s/<(\/?)pub-code>(.*)\n.*//g; $_ =~ s/<(\/?)ad-type>(.*)\n.*//g; $_ =~ s/<(\/?)cat-code>(.*)<(\/?)cat-code>\n.*//g; $_ =~ s/<(\/?)class-code>(.*)<(\/?)class-code>\n.*//g; $_ =~ s/<(\/?)placement-description>(.*)<(\/?)placement-description>\n.*//g; $_ =~ s/<(\/?)position-description>(.*)<(\/?)position-description>\n.*//g; $_ =~ s/<(\/?)subclass3-code>(.*)<(\/?)subclass3-code>\n.*//g; $_ =~ s/<(\/?)subclass4-code>(.*)<(\/?)subclass4-code>\n.*//g; $_ =~ s/<(\/?)ad-number>(.*)<(\/?)ad-number>\n.*//g; $_ =~ s/<(\/?)start-date>(.*)<(\/?)start-date>\n.*//g; $_ =~ s/<(\/?)end-date>(.*)<(\/?)end-date>\n.*//g; $_ =~ s/<(\/?)line-count>(.*)<(\/?)line-count>\n.*//g; $_ =~ s/<(\/?)run-count>(.*)<(\/?)run-count>\n.*//g; $_ =~ s/<(\/?)customer-type>(.*)<(\/?)customer-type>\n.*//g; $_ =~ s/<(\/?)account-number>(.*)<(\/?)account-number>\n.*//g; $_ =~ s/<(\/?)account-name>(.*)<(\/?)account-name>\n.*//g; $_ =~ s/<(\/?)addr-1>(.*)<(\/?)addr-1>\n.*//g; $_ =~ s/<(\/?)addr-2>(.*)<(\/?)addr-2>\n.*//g; $_ =~ s/<(\/?)state>(.*)<(\/?)state>\n.*//g; $_ =~ s/<(\/?)postal-code>(.*)<(\/?)postal-code>\n.*//g; $_ =~ s/<(\/?)country>(.*)<(\/?)country>\n.*//g; $_ =~ s/<(\/?)phone-number>(.*)<(\/?)phone-number>\n.*//g; $_ =~ s/<(\/?)fax-number>(.*)<(\/?)fax-number>\n.*//g; $_ =~ s/<(\/?)url-addr>(.*)<(\/?)url-addr>\n.*//g; $_ =~ s/<(\/?)email-addr>(.*)<(\/?)email-addr>\n.*//g; $_ =~ s/<(\/?)pay-flag>(.*)<(\/?)pay-flag>\n.*//g; $_ =~ s/<(\/?)ad-description>(.*)<(\/?)ad-description>\n.*//g; $_ =~ s/<(\/?)order-source>(.*)<(\/?)order-source>\n.*//g; $_ =~ s/<(\/?)order-status>(.*)<(\/?)order-status>\n.*//g; $_ =~ s/<(\/?)payor-acct>(.*)<(\/?)payor-acct>\n.*//g; $_ =~ s/<(\/?)agency-flag>(.*)<(\/?)agency-flag>\n.*//g; $_ =~ s/<(\/?)rate-note>(.*)<(\/?)rate-note>\n.*//g; $_ =~ s/<ad-content>(.*)\n.*//g; $_ =~ s/\t(.*)\n.*//g; $_ =~ s/<\/ad-content>(.*)\n.*//g; } close( READFILE1 ); Is there an easier way of doing this? I don't want to use any modules. I know that it might make this easier but the file I am reading has a lot of data in it.

Read the article

Problems with real-valued input deep belief networks (of RBMs)

- by Junier

I am trying to recreate the results reported in Reducing the dimensionality of data with neural networks of autoencoding the olivetti face dataset with an adapted version of the MNIST digits matlab code, but am having some difficulty. It seems that no matter how much tweaking I do on the number of epochs, rates, or momentum the stacked RBMs are entering the fine-tuning stage with a large amount of error and consequently fail to improve much at the fine-tuning stage. I am also experiencing a similar problem on another real-valued dataset. For the first layer I am using a RBM with a smaller learning rate (as described in the paper) and with negdata = poshidstates*vishid' + repmat(visbiases,numcases,1); I'm fairly confident I am following the instructions found in the supporting material but I cannot achieve the correct errors. Is there something I am missing? See the code I'm using for real-valued visible unit RBMs below, and for the whole deep training. The rest of the code can be found here. rbmvislinear.m: epsilonw = 0.001; % Learning rate for weights epsilonvb = 0.001; % Learning rate for biases of visible units epsilonhb = 0.001; % Learning rate for biases of hidden units weightcost = 0.0002; initialmomentum = 0.5; finalmomentum = 0.9; [numcases numdims numbatches]=size(batchdata); if restart ==1, restart=0; epoch=1; % Initializing symmetric weights and biases. vishid = 0.1*randn(numdims, numhid); hidbiases = zeros(1,numhid); visbiases = zeros(1,numdims); poshidprobs = zeros(numcases,numhid); neghidprobs = zeros(numcases,numhid); posprods = zeros(numdims,numhid); negprods = zeros(numdims,numhid); vishidinc = zeros(numdims,numhid); hidbiasinc = zeros(1,numhid); visbiasinc = zeros(1,numdims); sigmainc = zeros(1,numhid); batchposhidprobs=zeros(numcases,numhid,numbatches); end for epoch = epoch:maxepoch, fprintf(1,'epoch %d\r',epoch); errsum=0; for batch = 1:numbatches, if (mod(batch,100)==0) fprintf(1,' %d ',batch); end %%%%%%%%% START POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% data = batchdata(:,:,batch); poshidprobs = 1./(1 + exp(-data*vishid - repmat(hidbiases,numcases,1))); batchposhidprobs(:,:,batch)=poshidprobs; posprods = data' * poshidprobs; poshidact = sum(poshidprobs); posvisact = sum(data); %%%%%%%%% END OF POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% poshidstates = poshidprobs > rand(numcases,numhid); %%%%%%%%% START NEGATIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% negdata = poshidstates*vishid' + repmat(visbiases,numcases,1);% + randn(numcases,numdims) if not using mean neghidprobs = 1./(1 + exp(-negdata*vishid - repmat(hidbiases,numcases,1))); negprods = negdata'*neghidprobs; neghidact = sum(neghidprobs); negvisact = sum(negdata); %%%%%%%%% END OF NEGATIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% err= sum(sum( (data-negdata).^2 )); errsum = err + errsum; if epoch>5, momentum=finalmomentum; else momentum=initialmomentum; end; %%%%%%%%% UPDATE WEIGHTS AND BIASES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% vishidinc = momentum*vishidinc + ... epsilonw*( (posprods-negprods)/numcases - weightcost*vishid); visbiasinc = momentum*visbiasinc + (epsilonvb/numcases)*(posvisact-negvisact); hidbiasinc = momentum*hidbiasinc + (epsilonhb/numcases)*(poshidact-neghidact); vishid = vishid + vishidinc; visbiases = visbiases + visbiasinc; hidbiases = hidbiases + hidbiasinc; %%%%%%%%%%%%%%%% END OF UPDATES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% end fprintf(1, '\nepoch %4i error %f \n', epoch, errsum); end dofacedeepauto.m: clear all close all maxepoch=200; %In the Science paper we use maxepoch=50, but it works just fine. numhid=2000; numpen=1000; numpen2=500; numopen=30; fprintf(1,'Pretraining a deep autoencoder. \n'); fprintf(1,'The Science paper used 50 epochs. This uses %3i \n', maxepoch); load fdata %makeFaceData; [numcases numdims numbatches]=size(batchdata); fprintf(1,'Pretraining Layer 1 with RBM: %d-%d \n',numdims,numhid); restart=1; rbmvislinear; hidrecbiases=hidbiases; save mnistvh vishid hidrecbiases visbiases; maxepoch=50; fprintf(1,'\nPretraining Layer 2 with RBM: %d-%d \n',numhid,numpen); batchdata=batchposhidprobs; numhid=numpen; restart=1; rbm; hidpen=vishid; penrecbiases=hidbiases; hidgenbiases=visbiases; save mnisthp hidpen penrecbiases hidgenbiases; fprintf(1,'\nPretraining Layer 3 with RBM: %d-%d \n',numpen,numpen2); batchdata=batchposhidprobs; numhid=numpen2; restart=1; rbm; hidpen2=vishid; penrecbiases2=hidbiases; hidgenbiases2=visbiases; save mnisthp2 hidpen2 penrecbiases2 hidgenbiases2; fprintf(1,'\nPretraining Layer 4 with RBM: %d-%d \n',numpen2,numopen); batchdata=batchposhidprobs; numhid=numopen; restart=1; rbmhidlinear; hidtop=vishid; toprecbiases=hidbiases; topgenbiases=visbiases; save mnistpo hidtop toprecbiases topgenbiases; backpropface; Thanks for your time

Read the article

Problems with real-valued deep belief networks (of RBMs)

- by Junier

I am trying to recreate the results reported in Reducing the dimensionality of data with neural networks of autoencoding the olivetti face dataset with an adapted version of the MNIST digits matlab code, but am having some difficulty. It seems that no matter how much tweaking I do on the number of epochs, rates, or momentum the stacked RBMs are entering the fine-tuning stage with a large amount of error and consequently fail to improve much at the fine-tuning stage. I am also experiencing a similar problem on another real-valued dataset. For the first layer I am using a RBM with a smaller learning rate (as described in the paper) and with negdata = poshidstates*vishid' + repmat(visbiases,numcases,1); I'm fairly confident I am following the instructions found in the supporting material but I cannot achieve the correct errors. Is there something I am missing? See the code I'm using for real-valued visible unit RBMs below, and for the whole deep training. The rest of the code can be found here. rbmvislinear.m: epsilonw = 0.001; % Learning rate for weights epsilonvb = 0.001; % Learning rate for biases of visible units epsilonhb = 0.001; % Learning rate for biases of hidden units weightcost = 0.0002; initialmomentum = 0.5; finalmomentum = 0.9; [numcases numdims numbatches]=size(batchdata); if restart ==1, restart=0; epoch=1; % Initializing symmetric weights and biases. vishid = 0.1*randn(numdims, numhid); hidbiases = zeros(1,numhid); visbiases = zeros(1,numdims); poshidprobs = zeros(numcases,numhid); neghidprobs = zeros(numcases,numhid); posprods = zeros(numdims,numhid); negprods = zeros(numdims,numhid); vishidinc = zeros(numdims,numhid); hidbiasinc = zeros(1,numhid); visbiasinc = zeros(1,numdims); sigmainc = zeros(1,numhid); batchposhidprobs=zeros(numcases,numhid,numbatches); end for epoch = epoch:maxepoch, fprintf(1,'epoch %d\r',epoch); errsum=0; for batch = 1:numbatches, if (mod(batch,100)==0) fprintf(1,' %d ',batch); end %%%%%%%%% START POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% data = batchdata(:,:,batch); poshidprobs = 1./(1 + exp(-data*vishid - repmat(hidbiases,numcases,1))); batchposhidprobs(:,:,batch)=poshidprobs; posprods = data' * poshidprobs; poshidact = sum(poshidprobs); posvisact = sum(data); %%%%%%%%% END OF POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% poshidstates = poshidprobs > rand(numcases,numhid); %%%%%%%%% START NEGATIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% negdata = poshidstates*vishid' + repmat(visbiases,numcases,1);% + randn(numcases,numdims) if not using mean neghidprobs = 1./(1 + exp(-negdata*vishid - repmat(hidbiases,numcases,1))); negprods = negdata'*neghidprobs; neghidact = sum(neghidprobs); negvisact = sum(negdata); %%%%%%%%% END OF NEGATIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% err= sum(sum( (data-negdata).^2 )); errsum = err + errsum; if epoch>5, momentum=finalmomentum; else momentum=initialmomentum; end; %%%%%%%%% UPDATE WEIGHTS AND BIASES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% vishidinc = momentum*vishidinc + ... epsilonw*( (posprods-negprods)/numcases - weightcost*vishid); visbiasinc = momentum*visbiasinc + (epsilonvb/numcases)*(posvisact-negvisact); hidbiasinc = momentum*hidbiasinc + (epsilonhb/numcases)*(poshidact-neghidact); vishid = vishid + vishidinc; visbiases = visbiases + visbiasinc; hidbiases = hidbiases + hidbiasinc; %%%%%%%%%%%%%%%% END OF UPDATES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% end fprintf(1, '\nepoch %4i error %f \n', epoch, errsum); end dofacedeepauto.m: clear all close all maxepoch=200; %In the Science paper we use maxepoch=50, but it works just fine. numhid=2000; numpen=1000; numpen2=500; numopen=30; fprintf(1,'Pretraining a deep autoencoder. \n'); fprintf(1,'The Science paper used 50 epochs. This uses %3i \n', maxepoch); load fdata %makeFaceData; [numcases numdims numbatches]=size(batchdata); fprintf(1,'Pretraining Layer 1 with RBM: %d-%d \n',numdims,numhid); restart=1; rbmvislinear; hidrecbiases=hidbiases; save mnistvh vishid hidrecbiases visbiases; maxepoch=50; fprintf(1,'\nPretraining Layer 2 with RBM: %d-%d \n',numhid,numpen); batchdata=batchposhidprobs; numhid=numpen; restart=1; rbm; hidpen=vishid; penrecbiases=hidbiases; hidgenbiases=visbiases; save mnisthp hidpen penrecbiases hidgenbiases; fprintf(1,'\nPretraining Layer 3 with RBM: %d-%d \n',numpen,numpen2); batchdata=batchposhidprobs; numhid=numpen2; restart=1; rbm; hidpen2=vishid; penrecbiases2=hidbiases; hidgenbiases2=visbiases; save mnisthp2 hidpen2 penrecbiases2 hidgenbiases2; fprintf(1,'\nPretraining Layer 4 with RBM: %d-%d \n',numpen2,numopen); batchdata=batchposhidprobs; numhid=numopen; restart=1; rbmhidlinear; hidtop=vishid; toprecbiases=hidbiases; topgenbiases=visbiases; save mnistpo hidtop toprecbiases topgenbiases; backpropface; Thanks for your time

Read the article

256 Windows Azure Worker Roles, Windows Kinect and a 90's Text-Based Ray-Tracer

- by Alan Smith

For a couple of years I have been demoing a simple render farm hosted in Windows Azure using worker roles and the Azure Storage service. At the start of the presentation I deploy an Azure application that uses 16 worker roles to render a 1,500 frame 3D ray-traced animation. At the end of the presentation, when the animation was complete, I would play the animation delete the Azure deployment. The standing joke with the audience was that it was that it was a “$2 demo”, as the compute charges for running the 16 instances for an hour was $1.92, factor in the bandwidth charges and it’s a couple of dollars. The point of the demo is that it highlights one of the great benefits of cloud computing, you pay for what you use, and if you need massive compute power for a short period of time using Windows Azure can work out very cost effective. The “$2 demo” was great for presenting at user groups and conferences in that it could be deployed to Azure, used to render an animation, and then removed in a one hour session. I have always had the idea of doing something a bit more impressive with the demo, and scaling it from a “$2 demo” to a “$30 demo”. The challenge was to create a visually appealing animation in high definition format and keep the demo time down to one hour. This article will take a run through how I achieved this. Ray Tracing Ray tracing, a technique for generating high quality photorealistic images, gained popularity in the 90’s with companies like Pixar creating feature length computer animations, and also the emergence of shareware text-based ray tracers that could run on a home PC. In order to render a ray traced image, the ray of light that would pass from the view point must be tracked until it intersects with an object. At the intersection, the color, reflectiveness, transparency, and refractive index of the object are used to calculate if the ray will be reflected or refracted. Each pixel may require thousands of calculations to determine what color it will be in the rendered image. Pin-Board Toys Having very little artistic talent and a basic understanding of maths I decided to focus on an animation that could be modeled fairly easily and would look visually impressive. I’ve always liked the pin-board desktop toys that become popular in the 80’s and when I was working as a 3D animator back in the 90’s I always had the idea of creating a 3D ray-traced animation of a pin-board, but never found the energy to do it. Even if I had a go at it, the render time to produce an animation that would look respectable on a 486 would have been measured in months. PolyRay Back in 1995 I landed my first real job, after spending three years being a beach-ski-climbing-paragliding-bum, and was employed to create 3D ray-traced animations for a CD-ROM that school kids would use to learn physics. I had got into the strange and wonderful world of text-based ray tracing, and was using a shareware ray-tracer called PolyRay. PolyRay takes a text file describing a scene as input and, after a few hours processing on a 486, produced a high quality ray-traced image. The following is an example of a basic PolyRay scene file. background Midnight_Blue static define matte surface { ambient 0.1 diffuse 0.7 } define matte_white texture { matte { color white } } define matte_black texture { matte { color dark_slate_gray } } define position_cylindrical 3 define lookup_sawtooth 1 define light_wood <0.6, 0.24, 0.1> define median_wood <0.3, 0.12, 0.03> define dark_wood <0.05, 0.01, 0.005> define wooden texture { noise surface { ambient 0.2 diffuse 0.7 specular white, 0.5 microfacet Reitz 10 position_fn position_cylindrical position_scale 1 lookup_fn lookup_sawtooth octaves 1 turbulence 1 color_map( [0.0, 0.2, light_wood, light_wood] [0.2, 0.3, light_wood, median_wood] [0.3, 0.4, median_wood, light_wood] [0.4, 0.7, light_wood, light_wood] [0.7, 0.8, light_wood, median_wood] [0.8, 0.9, median_wood, light_wood] [0.9, 1.0, light_wood, dark_wood]) } } define glass texture { surface { ambient 0 diffuse 0 specular 0.2 reflection white, 0.1 transmission white, 1, 1.5 }} define shiny surface { ambient 0.1 diffuse 0.6 specular white, 0.6 microfacet Phong 7 } define steely_blue texture { shiny { color black } } define chrome texture { surface { color white ambient 0.0 diffuse 0.2 specular 0.4 microfacet Phong 10 reflection 0.8 } } viewpoint { from <4.000, -1.000, 1.000> at <0.000, 0.000, 0.000> up <0, 1, 0> angle 60 resolution 640, 480 aspect 1.6 image_format 0 } light <-10, 30, 20> light <-10, 30, -20> object { disc <0, -2, 0>, <0, 1, 0>, 30 wooden } object { sphere <0.000, 0.000, 0.000>, 1.00 chrome } object { cylinder <0.000, 0.000, 0.000>, <0.000, 0.000, -4.000>, 0.50 chrome } After setting up the background and defining colors and textures, the viewpoint is specified. The “camera” is located at a point in 3D space, and it looks towards another point. The angle, image resolution, and aspect ratio are specified. Two lights are present in the image at defined coordinates. The three objects in the image are a wooden disc to represent a table top, and a sphere and cylinder that intersect to form a pin that will be used for the pin board toy in the final animation. When the image is rendered, the following image is produced. The pins are modeled with a chrome surface, so they reflect the environment around them. Note that the scale of the pin shaft is not correct, this will be fixed later. Modeling the Pin Board The frame of the pin-board is made up of three boxes, and six cylinders, the front box is modeled using a clear, slightly reflective solid, with the same refractive index of glass. The other shapes are modeled as metal. object { box <-5.5, -1.5, 1>, <5.5, 5.5, 1.2> glass } object { box <-5.5, -1.5, -0.04>, <5.5, 5.5, -0.09> steely_blue } object { box <-5.5, -1.5, -0.52>, <5.5, 5.5, -0.59> steely_blue } object { cylinder <-5.2, -1.2, 1.4>, <-5.2, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <5.2, -1.2, 1.4>, <5.2, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <-5.2, 5.2, 1.4>, <-5.2, 5.2, -0.74>, 0.2 steely_blue } object { cylinder <5.2, 5.2, 1.4>, <5.2, 5.2, -0.74>, 0.2 steely_blue } object { cylinder <0, -1.2, 1.4>, <0, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <0, 5.2, 1.4>, <0, 5.2, -0.74>, 0.2 steely_blue } In order to create the matrix of pins that make up the pin board I used a basic console application with a few nested loops to create two intersecting matrixes of pins, which models the layout used in the pin boards. The resulting image is shown below. The pin board contains 11,481 pins, with the scene file containing 23,709 lines of code. For the complete animation 2,000 scene files will be created, which is over 47 million lines of code. Each pin in the pin-board will slide out a specific distance when an object is pressed into the back of the board. This is easily modeled by setting the Z coordinate of the pin to a specific value. In order to set all of the pins in the pin-board to the correct position, a bitmap image can be used. The position of the pin can be set based on the color of the pixel at the appropriate position in the image. When the Windows Azure logo is used to set the Z coordinate of the pins, the following image is generated. The challenge now was to make a cool animation. The Azure Logo is fine, but it is static. Using a normal video to animate the pins would not work; the colors in the video would not be the same as the depth of the objects from the camera. In order to simulate the pin board accurately a series of frames from a depth camera could be used. Windows Kinect The Kenect controllers for the X-Box 360 and Windows feature a depth camera. The Kinect SDK for Windows provides a programming interface for Kenect, providing easy access for .NET developers to the Kinect sensors. The Kinect Explorer provided with the Kinect SDK is a great starting point for exploring Kinect from a developers perspective. Both the X-Box 360 Kinect and the Windows Kinect will work with the Kinect SDK, the Windows Kinect is required for commercial applications, but the X-Box Kinect can be used for hobby projects. The Windows Kinect has the advantage of providing a mode to allow depth capture with objects closer to the camera, which makes for a more accurate depth image for setting the pin positions. Creating a Depth Field Animation The depth field animation used to set the positions of the pin in the pin board was created using a modified version of the Kinect Explorer sample application. In order to simulate the pin board accurately, a small section of the depth range from the depth sensor will be used. Any part of the object in front of the depth range will result in a white pixel; anything behind the depth range will be black. Within the depth range the pixels in the image will be set to RGB values from 0,0,0 to 255,255,255. A screen shot of the modified Kinect Explorer application is shown below. The Kinect Explorer sample application was modified to include slider controls that are used to set the depth range that forms the image from the depth stream. This allows the fine tuning of the depth image that is required for simulating the position of the pins in the pin board. The Kinect Explorer was also modified to record a series of images from the depth camera and save them as a sequence JPEG files that will be used to animate the pins in the animation the Start and Stop buttons are used to start and stop the image recording. En example of one of the depth images is shown below. Once a series of 2,000 depth images has been captured, the task of creating the animation can begin. Rendering a Test Frame In order to test the creation of frames and get an approximation of the time required to render each frame a test frame was rendered on-premise using PolyRay. The output of the rendering process is shown below. The test frame contained 23,629 primitive shapes, most of which are the spheres and cylinders that are used for the 11,800 or so pins in the pin board. The 1280x720 image contains 921,600 pixels, but as anti-aliasing was used the number of rays that were calculated was 4,235,777, with 3,478,754,073 object boundaries checked. The test frame of the pin board with the depth field image applied is shown below. The tracing time for the test frame was 4 minutes 27 seconds, which means rendering the2,000 frames in the animation would take over 148 hours, or a little over 6 days. Although this is much faster that an old 486, waiting almost a week to see the results of an animation would make it challenging for animators to create, view, and refine their animations. It would be much better if the animation could be rendered in less than one hour. Windows Azure Worker Roles The cost of creating an on-premise render farm to render animations increases in proportion to the number of servers. The table below shows the cost of servers for creating a render farm, assuming a cost of $500 per server. Number of Servers Cost 1 $500 16 $8,000 256 $128,000 As well as the cost of the servers, there would be additional costs for networking, racks etc. Hosting an environment of 256 servers on-premise would require a server room with cooling, and some pretty hefty power cabling. The Windows Azure compute services provide worker roles, which are ideal for performing processor intensive compute tasks. With the scalability available in Windows Azure a job that takes 256 hours to complete could be perfumed using different numbers of worker roles. The time and cost of using 1, 16 or 256 worker roles is shown below. Number of Worker Roles Render Time Cost 1 256 hours $30.72 16 16 hours $30.72 256 1 hour $30.72 Using worker roles in Windows Azure provides the same cost for the 256 hour job, irrespective of the number of worker roles used. Provided the compute task can be broken down into many small units, and the worker role compute power can be used effectively, it makes sense to scale the application so that the task is completed quickly, making the results available in a timely fashion. The task of rendering 2,000 frames in an animation is one that can easily be broken down into 2,000 individual pieces, which can be performed by a number of worker roles. Creating a Render Farm in Windows Azure The architecture of the render farm is shown in the following diagram. The render farm is a hybrid application with the following components: · On-Premise o Windows Kinect – Used combined with the Kinect Explorer to create a stream of depth images. o Animation Creator – This application uses the depth images from the Kinect sensor to create scene description files for PolyRay. These files are then uploaded to the jobs blob container, and job messages added to the jobs queue. o Process Monitor – This application queries the role instance lifecycle table and displays statistics about the render farm environment and render process. o Image Downloader – This application polls the image queue and downloads the rendered animation files once they are complete. · Windows Azure o Azure Storage – Queues and blobs are used for the scene description files and completed frames. A table is used to store the statistics about the rendering environment. The architecture of each worker role is shown below. The worker role is configured to use local storage, which provides file storage on the worker role instance that can be use by the applications to render the image and transform the format of the image. The service definition for the worker role with the local storage configuration highlighted is shown below. <?xml version="1.0" encoding="utf-8"?> <ServiceDefinition name="CloudRay" > <WorkerRole name="CloudRayWorkerRole" vmsize="Small"> <Imports> </Imports> <ConfigurationSettings> <Setting name="DataConnectionString" /> </ConfigurationSettings> <LocalResources> <LocalStorage name="RayFolder" cleanOnRoleRecycle="true" /> </LocalResources> </WorkerRole> </ServiceDefinition> The two executable programs, PolyRay.exe and DTA.exe are included in the Azure project, with Copy Always set as the property. PolyRay will take the scene description file and render it to a Truevision TGA file. As the TGA format has not seen much use since the mid 90’s it is converted to a JPG image using Dave's Targa Animator, another shareware application from the 90’s. Each worker roll will use the following process to render the animation frames. 1. The worker process polls the job queue, if a job is available the scene description file is downloaded from blob storage to local storage. 2. PolyRay.exe is started in a process with the appropriate command line arguments to render the image as a TGA file. 3. DTA.exe is started in a process with the appropriate command line arguments convert the TGA file to a JPG file. 4. The JPG file is uploaded from local storage to the images blob container. 5. A message is placed on the images queue to indicate a new image is available for download. 6. The job message is deleted from the job queue. 7. The role instance lifecycle table is updated with statistics on the number of frames rendered by the worker role instance, and the CPU time used. The code for this is shown below. public override void Run() { // Set environment variables string polyRayPath = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), PolyRayLocation); string dtaPath = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), DTALocation); LocalResource rayStorage = RoleEnvironment.GetLocalResource("RayFolder"); string localStorageRootPath = rayStorage.RootPath; JobQueue jobQueue = new JobQueue("renderjobs"); JobQueue downloadQueue = new JobQueue("renderimagedownloadjobs"); CloudRayBlob sceneBlob = new CloudRayBlob("scenes"); CloudRayBlob imageBlob = new CloudRayBlob("images"); RoleLifecycleDataSource roleLifecycleDataSource = new RoleLifecycleDataSource(); Frames = 0; while (true) { // Get the render job from the queue CloudQueueMessage jobMsg = jobQueue.Get(); if (jobMsg != null) { // Get the file details string sceneFile = jobMsg.AsString; string tgaFile = sceneFile.Replace(".pi", ".tga"); string jpgFile = sceneFile.Replace(".pi", ".jpg"); string sceneFilePath = Path.Combine(localStorageRootPath, sceneFile); string tgaFilePath = Path.Combine(localStorageRootPath, tgaFile); string jpgFilePath = Path.Combine(localStorageRootPath, jpgFile); // Copy the scene file to local storage sceneBlob.DownloadFile(sceneFilePath); // Run the ray tracer. string polyrayArguments = string.Format("\"{0}\" -o \"{1}\" -a 2", sceneFilePath, tgaFilePath); Process polyRayProcess = new Process(); polyRayProcess.StartInfo.FileName = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), polyRayPath); polyRayProcess.StartInfo.Arguments = polyrayArguments; polyRayProcess.Start(); polyRayProcess.WaitForExit(); // Convert the image string dtaArguments = string.Format(" {0} /FJ /P{1}", tgaFilePath, Path.GetDirectoryName (jpgFilePath)); Process dtaProcess = new Process(); dtaProcess.StartInfo.FileName = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), dtaPath); dtaProcess.StartInfo.Arguments = dtaArguments; dtaProcess.Start(); dtaProcess.WaitForExit(); // Upload the image to blob storage imageBlob.UploadFile(jpgFilePath); // Add a download job. downloadQueue.Add(jpgFile); // Delete the render job message jobQueue.Delete(jobMsg); Frames++; } else { Thread.Sleep(1000); } // Log the worker role activity. roleLifecycleDataSource.Alive ("CloudRayWorker", RoleLifecycleDataSource.RoleLifecycleId, Frames); } } Monitoring Worker Role Instance Lifecycle In order to get more accurate statistics about the lifecycle of the worker role instances used to render the animation data was tracked in an Azure storage table. The following class was used to track the worker role lifecycles in Azure storage. public class RoleLifecycle : TableServiceEntity { public string ServerName { get; set; } public string Status { get; set; } public DateTime StartTime { get; set; } public DateTime EndTime { get; set; } public long SecondsRunning { get; set; } public DateTime LastActiveTime { get; set; } public int Frames { get; set; } public string Comment { get; set; } public RoleLifecycle() { } public RoleLifecycle(string roleName) { PartitionKey = roleName; RowKey = Utils.GetAscendingRowKey(); Status = "Started"; StartTime = DateTime.UtcNow; LastActiveTime = StartTime; EndTime = StartTime; SecondsRunning = 0; Frames = 0; } } A new instance of this class is created and added to the storage table when the role starts. It is then updated each time the worker renders a frame to record the total number of frames rendered and the total processing time. These statistics are used be the monitoring application to determine the effectiveness of use of resources in the render farm. Rendering the Animation The Azure solution was deployed to Windows Azure with the service configuration set to 16 worker role instances. This allows for the application to be tested in the cloud environment, and the performance of the application determined. When I demo the application at conferences and user groups I often start with 16 instances, and then scale up the application to the full 256 instances. The configuration to run 16 instances is shown below. <?xml version="1.0" encoding="utf-8"?> <ServiceConfiguration serviceName="CloudRay" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="1" osVersion="*"> <Role name="CloudRayWorkerRole"> <Instances count="16" /> <ConfigurationSettings> <Setting name="DataConnectionString" value="DefaultEndpointsProtocol=https;AccountName=cloudraydata;AccountKey=..." /> </ConfigurationSettings> </Role> </ServiceConfiguration> About six minutes after deploying the application the first worker roles become active and start to render the first frames of the animation. The CloudRay Monitor application displays an icon for each worker role instance, with a number indicating the number of frames that the worker role has rendered. The statistics on the left show the number of active worker roles and statistics about the render process. The render time is the time since the first worker role became active; the CPU time is the total amount of processing time used by all worker role instances to render the frames. Five minutes after the first worker role became active the last of the 16 worker roles activated. By this time the first seven worker roles had each rendered one frame of the animation. With 16 worker roles u and running it can be seen that one hour and 45 minutes CPU time has been used to render 32 frames with a render time of just under 10 minutes. At this rate it would take over 10 hours to render the 2,000 frames of the full animation. In order to complete the animation in under an hour more processing power will be required. Scaling the render farm from 16 instances to 256 instances is easy using the new management portal. The slider is set to 256 instances, and the configuration saved. We do not need to re-deploy the application, and the 16 instances that are up and running will not be affected. Alternatively, the configuration file for the Azure service could be modified to specify 256 instances. <?xml version="1.0" encoding="utf-8"?> <ServiceConfiguration serviceName="CloudRay" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="1" osVersion="*"> <Role name="CloudRayWorkerRole"> <Instances count="256" /> <ConfigurationSettings> <Setting name="DataConnectionString" value="DefaultEndpointsProtocol=https;AccountName=cloudraydata;AccountKey=..." /> </ConfigurationSettings> </Role> </ServiceConfiguration> Six minutes after the new configuration has been applied 75 new worker roles have activated and are processing their first frames. Five minutes later the full configuration of 256 worker roles is up and running. We can see that the average rate of frame rendering has increased from 3 to 12 frames per minute, and that over 17 hours of CPU time has been utilized in 23 minutes. In this test the time to provision 140 worker roles was about 11 minutes, which works out at about one every five seconds. We are now half way through the rendering, with 1,000 frames complete. This has utilized just under three days of CPU time in a little over 35 minutes. The animation is now complete, with 2,000 frames rendered in a little over 52 minutes. The CPU time used by the 256 worker roles is 6 days, 7 hours and 22 minutes with an average frame rate of 38 frames per minute. The rendering of the last 1,000 frames took 16 minutes 27 seconds, which works out at a rendering rate of 60 frames per minute. The frame counts in the server instances indicate that the use of a queue to distribute the workload has been very effective in distributing the load across the 256 worker role instances. The first 16 instances that were deployed first have rendered between 11 and 13 frames each, whilst the 240 instances that were added when the application was scaled have rendered between 6 and 9 frames each. Completed Animation I’ve uploaded the completed animation to YouTube, a low resolution preview is shown below. Pin Board Animation Created using Windows Kinect and 256 Windows Azure Worker Roles The animation can be viewed in 1280x720 resolution at the following link: http://www.youtube.com/watch?v=n5jy6bvSxWc Effective Use of Resources According to the CloudRay monitor statistics the animation took 6 days, 7 hours and 22 minutes CPU to render, this works out at 152 hours of compute time, rounded up to the nearest hour. As the usage for the worker role instances are billed for the full hour, it may have been possible to render the animation using fewer than 256 worker roles. When deciding the optimal usage of resources, the time required to provision and start the worker roles must also be considered. In the demo I started with 16 worker roles, and then scaled the application to 256 worker roles. It would have been more optimal to start the application with maybe 200 worker roles, and utilized the full hour that I was being billed for. This would, however, have prevented showing the ease of scalability of the application. The new management portal displays the CPU usage across the worker roles in the deployment. The average CPU usage across all instances is 93.27%, with over 99% used when all the instances are up and running. This shows that the worker role resources are being used very effectively. Grid Computing Scenarios Although I am using this scenario for a hobby project, there are many scenarios where a large amount of compute power is required for a short period of time. Windows Azure provides a great platform for developing these types of grid computing applications, and can work out very cost effective. · Windows Azure can provide massive compute power, on demand, in a matter of minutes. · The use of queues to manage the load balancing of jobs between role instances is a simple and effective solution. · Using a cloud-computing platform like Windows Azure allows proof-of-concept scenarios to be tested and evaluated on a very low budget. · No charges for inbound data transfer makes the uploading of large data sets to Windows Azure Storage services cost effective. (Transaction charges still apply.) Tips for using Windows Azure for Grid Computing Scenarios I found the implementation of a render farm using Windows Azure a fairly simple scenario to implement. I was impressed by ease of scalability that Azure provides, and by the short time that the application took to scale from 16 to 256 worker role instances. In this case it was around 13 minutes, in other tests it took between 10 and 20 minutes. The following tips may be useful when implementing a grid computing project in Windows Azure. · Using an Azure Storage queue to load-balance the units of work across multiple worker roles is simple and very effective. The design I have used in this scenario could easily scale to many thousands of worker role instances. · Windows Azure accounts are typically limited to 20 cores. If you need to use more than this, a call to support and a credit card check will be required. · Be aware of how the billing model works. You will be charged for worker role instances for the full clock our in which the instance is deployed. Schedule the workload to start just after the clock hour has started. · Monitor the utilization of the resources you are provisioning, ensure that you are not paying for worker roles that are idle. · If you are deploying third party applications to worker roles, you may well run into licensing issues. Purchasing software licenses on a per-processor basis when using hundreds of processors for a short time period would not be cost effective. · Third party software may also require installation onto the worker roles, which can be accomplished using start-up tasks. Bear in mind that adding a startup task and possible re-boot will add to the time required for the worker role instance to start and activate. An alternative may be to use a prepared VM and use VM roles. · Consider using the Windows Azure Autoscaling Application Block (WASABi) to autoscale the worker roles in your application. When using a large number of worker roles, the utilization must be carefully monitored, if the scaling algorithms are not optimal it could get very expensive!

Read the article

Using R to Analyze G1GC Log Files

- by user12620111

Using R to Analyze G1GC Log Files body, td { font-family: sans-serif; background-color: white; font-size: 12px; margin: 8px; } tt, code, pre { font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace; } h1 { font-size:2.2em; } h2 { font-size:1.8em; } h3 { font-size:1.4em; } h4 { font-size:1.0em; } h5 { font-size:0.9em; } h6 { font-size:0.8em; } a:visited { color: rgb(50%, 0%, 50%); } pre { margin-top: 0; max-width: 95%; border: 1px solid #ccc; white-space: pre-wrap; } pre code { display: block; padding: 0.5em; } code.r, code.cpp { background-color: #F8F8F8; } table, td, th { border: none; } blockquote { color:#666666; margin:0; padding-left: 1em; border-left: 0.5em #EEE solid; } hr { height: 0px; border-bottom: none; border-top-width: thin; border-top-style: dotted; border-top-color: #999999; } @media print { * { background: transparent !important; color: black !important; filter:none !important; -ms-filter: none !important; } body { font-size:12pt; max-width:100%; } a, a:visited { text-decoration: underline; } hr { visibility: hidden; page-break-before: always; } pre, blockquote { padding-right: 1em; page-break-inside: avoid; } tr, img { page-break-inside: avoid; } img { max-width: 100% !important; } @page :left { margin: 15mm 20mm 15mm 10mm; } @page :right { margin: 15mm 10mm 15mm 20mm; } p, h2, h3 { orphans: 3; widows: 3; } h2, h3 { page-break-after: avoid; } } pre .operator, pre .paren { color: rgb(104, 118, 135) } pre .literal { color: rgb(88, 72, 246) } pre .number { color: rgb(0, 0, 205); } pre .comment { color: rgb(76, 136, 107); } pre .keyword { color: rgb(0, 0, 255); } pre .identifier { color: rgb(0, 0, 0); } pre .string { color: rgb(3, 106, 7); } var hljs=new function(){function m(p){return p.replace(/&/gm,"&").replace(/"}while(y.length||w.length){var v=u().splice(0,1)[0];z+=m(x.substr(q,v.offset-q));q=v.offset;if(v.event=="start"){z+=t(v.node);s.push(v.node)}else{if(v.event=="stop"){var p,r=s.length;do{r--;p=s[r];z+=("")}while(p!=v.node);s.splice(r,1);while(r'+M[0]+""}else{r+=M[0]}O=P.lR.lastIndex;M=P.lR.exec(L)}return r+L.substr(O,L.length-O)}function J(L,M){if(M.sL&&e[M.sL]){var r=d(M.sL,L);x+=r.keyword_count;return r.value}else{return F(L,M)}}function I(M,r){var L=M.cN?'':"";if(M.rB){y+=L;M.buffer=""}else{if(M.eB){y+=m(r)+L;M.buffer=""}else{y+=L;M.buffer=r}}D.push(M);A+=M.r}function G(N,M,Q){var R=D[D.length-1];if(Q){y+=J(R.buffer+N,R);return false}var P=q(M,R);if(P){y+=J(R.buffer+N,R);I(P,M);return P.rB}var L=v(D.length-1,M);if(L){var O=R.cN?"":"";if(R.rE){y+=J(R.buffer+N,R)+O}else{if(R.eE){y+=J(R.buffer+N,R)+O+m(M)}else{y+=J(R.buffer+N+M,R)+O}}while(L1){O=D[D.length-2].cN?"":"";y+=O;L--;D.length--}var r=D[D.length-1];D.length--;D[D.length-1].buffer="";if(r.starts){I(r.starts,"")}return R.rE}if(w(M,R)){throw"Illegal"}}var E=e[B];var D=[E.dM];var A=0;var x=0;var y="";try{var s,u=0;E.dM.buffer="";do{s=p(C,u);var t=G(s[0],s[1],s[2]);u+=s[0].length;if(!t){u+=s[1].length}}while(!s[2]);if(D.length1){throw"Illegal"}return{r:A,keyword_count:x,value:y}}catch(H){if(H=="Illegal"){return{r:0,keyword_count:0,value:m(C)}}else{throw H}}}function g(t){var p={keyword_count:0,r:0,value:m(t)};var r=p;for(var q in e){if(!e.hasOwnProperty(q)){continue}var s=d(q,t);s.language=q;if(s.keyword_count+s.rr.keyword_count+r.r){r=s}if(s.keyword_count+s.rp.keyword_count+p.r){r=p;p=s}}if(r.language){p.second_best=r}return p}function i(r,q,p){if(q){r=r.replace(/^((]+|\t)+)/gm,function(t,w,v,u){return w.replace(/\t/g,q)})}if(p){r=r.replace(/\n/g,"")}return r}function n(t,w,r){var x=h(t,r);var v=a(t);var y,s;if(v){y=d(v,x)}else{return}var q=c(t);if(q.length){s=document.createElement("pre");s.innerHTML=y.value;y.value=k(q,c(s),x)}y.value=i(y.value,w,r);var u=t.className;if(!u.match("(\\s|^)(language-)?"+v+"(\\s|$)")){u=u?(u+" "+v):v}if(/MSIE [678]/.test(navigator.userAgent)&&t.tagName=="CODE"&&t.parentNode.tagName=="PRE"){s=t.parentNode;var p=document.createElement("div");p.innerHTML=""+y.value+"";t=p.firstChild.firstChild;p.firstChild.cN=s.cN;s.parentNode.replaceChild(p.firstChild,s)}else{t.innerHTML=y.value}t.className=u;t.result={language:v,kw:y.keyword_count,re:y.r};if(y.second_best){t.second_best={language:y.second_best.language,kw:y.second_best.keyword_count,re:y.second_best.r}}}function o(){if(o.called){return}o.called=true;var r=document.getElementsByTagName("pre");for(var p=0;p|=||=||=|\\?|\\[|\\{|\\(|\\^|\\^=|\\||\\|=|\\|\\||~";this.ER="(?![\\s\\S])";this.BE={b:"\\\\.",r:0};this.ASM={cN:"string",b:"'",e:"'",i:"\\n",c:[this.BE],r:0};this.QSM={cN:"string",b:'"',e:'"',i:"\\n",c:[this.BE],r:0};this.CLCM={cN:"comment",b:"//",e:"$"};this.CBLCLM={cN:"comment",b:"/\\*",e:"\\*/"};this.HCM={cN:"comment",b:"#",e:"$"};this.NM={cN:"number",b:this.NR,r:0};this.CNM={cN:"number",b:this.CNR,r:0};this.BNM={cN:"number",b:this.BNR,r:0};this.inherit=function(r,s){var p={};for(var q in r){p[q]=r[q]}if(s){for(var q in s){p[q]=s[q]}}return p}}();hljs.LANGUAGES.cpp=function(){var a={keyword:{"false":1,"int":1,"float":1,"while":1,"private":1,"char":1,"catch":1,"export":1,virtual:1,operator:2,sizeof:2,dynamic_cast:2,typedef:2,const_cast:2,"const":1,struct:1,"for":1,static_cast:2,union:1,namespace:1,unsigned:1,"long":1,"throw":1,"volatile":2,"static":1,"protected":1,bool:1,template:1,mutable:1,"if":1,"public":1,friend:2,"do":1,"return":1,"goto":1,auto:1,"void":2,"enum":1,"else":1,"break":1,"new":1,extern:1,using:1,"true":1,"class":1,asm:1,"case":1,typeid:1,"short":1,reinterpret_cast:2,"default":1,"double":1,register:1,explicit:1,signed:1,typename:1,"try":1,"this":1,"switch":1,"continue":1,wchar_t:1,inline:1,"delete":1,alignof:1,char16_t:1,char32_t:1,constexpr:1,decltype:1,noexcept:1,nullptr:1,static_assert:1,thread_local:1,restrict:1,_Bool:1,complex:1},built_in:{std:1,string:1,cin:1,cout:1,cerr:1,clog:1,stringstream:1,istringstream:1,ostringstream:1,auto_ptr:1,deque:1,list:1,queue:1,stack:1,vector:1,map:1,set:1,bitset:1,multiset:1,multimap:1,unordered_set:1,unordered_map:1,unordered_multiset:1,unordered_multimap:1,array:1,shared_ptr:1}};return{dM:{k:a,i:"",k:a,r:10,c:["self"]}]}}}();hljs.LANGUAGES.r={dM:{c:[hljs.HCM,{cN:"number",b:"\\b0[xX][0-9a-fA-F]+[Li]?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+(?:[eE][+\\-]?\\d*)?L\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+\\.(?!\\d)(?:i\\b)?",e:hljs.IMMEDIATE_RE,r:1},{cN:"number",b:"\\b\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"keyword",b:"(?:tryCatch|library|setGeneric|setGroupGeneric)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\.",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\d+(?![\\w.])",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\b(?:function)",e:hljs.IMMEDIATE_RE,r:2},{cN:"keyword",b:"(?:if|in|break|next|repeat|else|for|return|switch|while|try|stop|warning|require|attach|detach|source|setMethod|setClass)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"literal",b:"(?:NA|NA_integer_|NA_real_|NA_character_|NA_complex_)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"literal",b:"(?:NULL|TRUE|FALSE|T|F|Inf|NaN)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"identifier",b:"[a-zA-Z.][a-zA-Z0-9._]*\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"operator",b:"|=|| Using R to Analyze G1GC Log Files Using R to Analyze G1GC Log Files Introduction Working in Oracle Platform Integration gives an engineer opportunities to work on a wide array of technologies. My team’s goal is to make Oracle applications run best on the Solaris/SPARC platform. When looking for bottlenecks in a modern applications, one needs to be aware of not only how the CPUs and operating system are executing, but also network, storage, and in some cases, the Java Virtual Machine. I was recently presented with about 1.5 GB of Java Garbage First Garbage Collector log file data. If you’re not familiar with the subject, you might want to review Garbage First Garbage Collector Tuning by Monica Beckwith. The customer had been running Java HotSpot 1.6.0_31 to host a web application server. I was told that the Solaris/SPARC server was running a Java process launched using a commmand line that included the following flags: -d64 -Xms9g -Xmx9g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintFlagsFinal -XX:+DisableExplicitGC -XX:+UnlockExperimentalVMOptions -XX:ParallelGCThreads=8 Several sources on the internet indicate that if I were to print out the 1.5 GB of log files, it would require enough paper to fill the bed of a pick up truck. Of course, it would be fruitless to try to scan the log files by hand. Tools will be required to summarize the contents of the log files. Others have encountered large Java garbage collection log files. There are existing tools to analyze the log files: IBM’s GC toolkit The chewiebug GCViewer gchisto HPjmeter Instead of using one of the other tools listed, I decide to parse the log files with standard Unix tools, and analyze the data with R. Data Cleansing The log files arrived in two different formats. I guess that the difference is that one set of log files was generated using a more verbose option, maybe -XX:+PrintHeapAtGC, and the other set of log files was generated without that option. Format 1 In some of the log files, the log files with the less verbose format, a single trace, i.e. the report of a singe garbage collection event, looks like this: {Heap before GC invocations=12280 (full 61): garbage-first heap total 9437184K, used 7499918K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 1 young (4096K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. 2014-05-14T07:24:00.988-0700: 60586.353: [GC pause (young) 7324M->7320M(9216M), 0.1567265 secs] Heap after GC invocations=12281 (full 61): garbage-first heap total 9437184K, used 7496533K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 0 young (0K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. } A simple grep can be used to extract a summary: $ grep "\[ GC pause (young" g1gc.log 2014-05-13T13:24:35.091-0700: 3.109: [GC pause (young) 20M->5029K(9216M), 0.0146328 secs] 2014-05-13T13:24:35.440-0700: 3.459: [GC pause (young) 9125K->6077K(9216M), 0.0086723 secs] 2014-05-13T13:24:37.581-0700: 5.599: [GC pause (young) 25M->8470K(9216M), 0.0203820 secs] 2014-05-13T13:24:42.686-0700: 10.704: [GC pause (young) 44M->15M(9216M), 0.0288848 secs] 2014-05-13T13:24:48.941-0700: 16.958: [GC pause (young) 51M->20M(9216M), 0.0491244 secs] 2014-05-13T13:24:56.049-0700: 24.066: [GC pause (young) 92M->26M(9216M), 0.0525368 secs] 2014-05-13T13:25:34.368-0700: 62.383: [GC pause (young) 602M->68M(9216M), 0.1721173 secs] But that format wasn't easily read into R, so I needed to be a bit more tricky. I used the following Unix command to create a summary file that was easy for R to read. $ echo "SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime" $ grep "\[GC pause (young" g1gc.log | grep -v mark | sed -e 's/[A-SU-z,]/ /g' -e 's/->/ /' -e 's/: / /g' | more SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime 2014-05-13T13:24:35.091-0700 3.109 20 5029 9216 0.0146328 2014-05-13T13:24:35.440-0700 3.459 9125 6077 9216 0.0086723 2014-05-13T13:24:37.581-0700 5.599 25 8470 9216 0.0203820 2014-05-13T13:24:42.686-0700 10.704 44 15 9216 0.0288848 2014-05-13T13:24:48.941-0700 16.958 51 20 9216 0.0491244 2014-05-13T13:24:56.049-0700 24.066 92 26 9216 0.0525368 2014-05-13T13:25:34.368-0700 62.383 602 68 9216 0.1721173 Format 2 In some of the log files, the log files with the more verbose format, a single trace, i.e. the report of a singe garbage collection event, was more complicated than Format 1. Here is a text file with an example of a single G1GC trace in the second format. As you can see, it is quite complicated. It is nice that there is so much information available, but the level of detail can be overwhelming. I wrote this awk script (download) to summarize each trace on a single line. #!/usr/bin/env awk -f BEGIN { printf("SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize\n") } ###################### # Save count data from lines that are at the start of each G1GC trace. # Each trace starts out like this: # {Heap before GC invocations=14 (full 0): # garbage-first heap total 9437184K, used 325496K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) ###################### /{Heap.*full/{ gsub ( "\\)" , "" ); nf=split($0,a,"="); split(a[2],b," "); getline; if ( match($0, "first") ) { G1GC=1; IncrementalCount=b[1]; FullCount=substr( b[3], 1, length(b[3])-1 ); } else { G1GC=0; } } ###################### # Pull out time stamps that are in lines with this format: # 2014-05-12T14:02:06.025-0700: 94.312: [GC pause (young), 0.08870154 secs] ###################### /GC pause/ { DateTime=$1; SecondsSinceLaunch=substr($2, 1, length($2)-1); } ###################### # Heap sizes are in lines that look like this: # [ 4842M->4838M(9216M)] ###################### /\[ .*]$/ { gsub ( "\\[" , "" ); gsub ( "\ \]" , "" ); gsub ( "->" , " " ); gsub ( "\$ " , " " ); gsub ( "\ $" , " " ); split($0,a," "); if ( split(a[1],b,"M") > 1 ) {BeforeSize=b[1]*1024;} if ( split(a[1],b,"K") > 1 ) {BeforeSize=b[1];} if ( split(a[2],b,"M") > 1 ) {AfterSize=b[1]*1024;} if ( split(a[2],b,"K") > 1 ) {AfterSize=b[1];} if ( split(a[3],b,"M") > 1 ) {TotalSize=b[1]*1024;} if ( split(a[3],b,"K") > 1 ) {TotalSize=b[1];} } ###################### # Emit an output line when you find input that looks like this: # [Times: user=1.41 sys=0.08, real=0.24 secs] ###################### /\[Times/ { if (G1GC==1) { gsub ( "," , "" ); split($2,a,"="); UserTime=a[2]; split($3,a,"="); SysTime=a[2]; split($4,a,"="); RealTime=a[2]; print DateTime,SecondsSinceLaunch,IncrementalCount,FullCount,UserTime,SysTime,RealTime,BeforeSize,AfterSize,TotalSize; G1GC=0; } } The resulting summary is about 25X smaller that the original file, but still difficult for a human to digest. SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ... 2014-05-12T18:36:34.669-0700: 3985.744 561 0 0.57 0.06 0.16 1724416 1720320 9437184 2014-05-12T18:36:34.839-0700: 3985.914 562 0 0.51 0.06 0.19 1724416 1720320 9437184 2014-05-12T18:36:35.069-0700: 3986.144 563 0 0.60 0.04 0.27 1724416 1721344 9437184 2014-05-12T18:36:35.354-0700: 3986.429 564 0 0.33 0.04 0.09 1725440 1722368 9437184 2014-05-12T18:36:35.545-0700: 3986.620 565 0 0.58 0.04 0.17 1726464 1722368 9437184 2014-05-12T18:36:35.726-0700: 3986.801 566 0 0.43 0.05 0.12 1726464 1722368 9437184 2014-05-12T18:36:35.856-0700: 3986.930 567 0 0.30 0.04 0.07 1726464 1723392 9437184 2014-05-12T18:36:35.947-0700: 3987.023 568 0 0.61 0.04 0.26 1727488 1723392 9437184 2014-05-12T18:36:36.228-0700: 3987.302 569 0 0.46 0.04 0.16 1731584 1724416 9437184 Reading the Data into R Once the GC log data had been cleansed, either by processing the first format with the shell script, or by processing the second format with the awk script, it was easy to read the data into R. g1gc.df = read.csv("summary.txt", row.names = NULL, stringsAsFactors=FALSE,sep="") str(g1gc.df) ## 'data.frame': 8307 obs. of 10 variables: ## $ row.names : chr "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ... ## $ SecondsSinceLaunch: num 1.16 1.47 1.97 3.83 6.1 ... ## $ IncrementalCount : int 0 1 2 3 4 5 6 7 8 9 ... ## $ FullCount : int 0 0 0 0 0 0 0 0 0 0 ... ## $ UserTime : num 0.11 0.05 0.04 0.21 0.08 0.26 0.31 0.33 0.34 0.56 ... ## $ SysTime : num 0.04 0.01 0.01 0.05 0.01 0.06 0.07 0.06 0.07 0.09 ... ## $ RealTime : num 0.02 0.02 0.01 0.04 0.02 0.04 0.05 0.04 0.04 0.06 ... ## $ BeforeSize : int 8192 5496 5768 22528 24576 43008 34816 53248 55296 93184 ... ## $ AfterSize : int 1400 1672 2557 4907 7072 14336 16384 18432 19456 21504 ... ## $ TotalSize : int 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 ... head(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount ## 1 2014-05-12T14:00:32.868-0700: 1.161 0 ## 2 2014-05-12T14:00:33.179-0700: 1.472 1 ## 3 2014-05-12T14:00:33.677-0700: 1.969 2 ## 4 2014-05-12T14:00:35.538-0700: 3.830 3 ## 5 2014-05-12T14:00:37.811-0700: 6.103 4 ## 6 2014-05-12T14:00:41.428-0700: 9.720 5 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 1 0 0.11 0.04 0.02 8192 1400 9437184 ## 2 0 0.05 0.01 0.02 5496 1672 9437184 ## 3 0 0.04 0.01 0.01 5768 2557 9437184 ## 4 0 0.21 0.05 0.04 22528 4907 9437184 ## 5 0 0.08 0.01 0.02 24576 7072 9437184 ## 6 0 0.26 0.06 0.04 43008 14336 9437184 Basic Statistics Once the data has been read into R, simple statistics are very easy to generate. All of the numbers from high school statistics are available via simple commands. For example, generate a summary of every column: summary(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount FullCount ## Length:8307 Min. : 1 Min. : 0 Min. : 0.0 ## Class :character 1st Qu.: 9977 1st Qu.:2048 1st Qu.: 0.0 ## Mode :character Median :12855 Median :4136 Median : 12.0 ## Mean :12527 Mean :4156 Mean : 31.6 ## 3rd Qu.:15758 3rd Qu.:6262 3rd Qu.: 61.0 ## Max. :55484 Max. :8391 Max. :113.0 ## UserTime SysTime RealTime BeforeSize ## Min. :0.040 Min. :0.0000 Min. : 0.0 Min. : 5476 ## 1st Qu.:0.470 1st Qu.:0.0300 1st Qu.: 0.1 1st Qu.:5137920 ## Median :0.620 Median :0.0300 Median : 0.1 Median :6574080 ## Mean :0.751 Mean :0.0355 Mean : 0.3 Mean :5841855 ## 3rd Qu.:0.920 3rd Qu.:0.0400 3rd Qu.: 0.2 3rd Qu.:7084032 ## Max. :3.370 Max. :1.5600 Max. :488.1 Max. :8696832 ## AfterSize TotalSize ## Min. : 1380 Min. :9437184 ## 1st Qu.:5002752 1st Qu.:9437184 ## Median :6559744 Median :9437184 ## Mean :5785454 Mean :9437184 ## 3rd Qu.:7054336 3rd Qu.:9437184 ## Max. :8482816 Max. :9437184 Q: What is the total amount of User CPU time spent in garbage collection? sum(g1gc.df$UserTime) ## [1] 6236 As you can see, less than two hours of CPU time was spent in garbage collection. Is that too much? To find the percentage of time spent in garbage collection, divide the number above by total_elapsed_time*CPU_count. In this case, there are a lot of CPU’s and it turns out the the overall amount of CPU time spent in garbage collection isn’t a problem when viewed in isolation. When calculating rates, i.e. events per unit time, you need to ask yourself if the rate is homogenous across the time period in the log file. Does the log file include spikes of high activity that should be separately analyzed? Averaging in data from nights and weekends with data from business hours may alias problems. If you have a reason to suspect that the garbage collection rates include peaks and valleys that need independent analysis, see the “Time Series” section, below. Q: How much garbage is collected on each pass? The amount of heap space that is recovered per GC pass is surprisingly low: At least one collection didn’t recover any data. (“Min.=0”) 25% of the passes recovered 3MB or less. (“1st Qu.=3072”) Half of the GC passes recovered 4MB or less. (“Median=4096”) The average amount recovered was 56MB. (“Mean=56390”) 75% of the passes recovered 36MB or less. (“3rd Qu.=36860”) At least one pass recovered 2GB. (“Max.=2121000”) g1gc.df$Delta = g1gc.df$BeforeSize - g1gc.df$AfterSize summary(g1gc.df$Delta) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0 3070 4100 56400 36900 2120000 Q: What is the maximum User CPU time for a single collection? The worst garbage collection (“Max.”) is many standard deviations away from the mean. The data appears to be right skewed. summary(g1gc.df$UserTime) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.040 0.470 0.620 0.751 0.920 3.370 sd(g1gc.df$UserTime) ## [1] 0.3966 Basic Graphics Once the data is in R, it is trivial to plot the data with formats including dot plots, line charts, bar charts (simple, stacked, grouped), pie charts, boxplots, scatter plots histograms, and kernel density plots. Histogram of User CPU Time per Collection I don't think that this graph requires any explanation. hist(g1gc.df$UserTime, main="User CPU Time per Collection", xlab="Seconds", ylab="Frequency") Box plot to identify outliers When the initial data is viewed with a box plot, you can see the one crazy outlier in the real time per GC. Save this data point for future analysis and drop the outlier so that it’s not throwing off our statistics. Now the box plot shows many outliers, which will be examined later, using times series analysis. Notice that the scale of the x-axis changes drastically once the crazy outlier is removed. par(mfrow=c(2,1)) boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(dominated by a crazy outlier)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") crazy.outlier.df=g1gc.df[g1gc.df$RealTime > 400,] g1gc.df=g1gc.df[g1gc.df$RealTime < 400,] boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(crazy outlier excluded)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") box(which = "outer", lty = "solid") Here is the crazy outlier for future analysis: crazy.outlier.df ## row.names SecondsSinceLaunch IncrementalCount ## 8233 2014-05-12T23:15:43.903-0700: 20741 8316 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 8233 112 0.55 0.42 488.1 8381440 8235008 9437184 ## Delta ## 8233 146432 R Time Series Data To analyze the garbage collection as a time series, I’ll use Z’s Ordered Observations (zoo). “zoo is the creator for an S3 class of indexed totally ordered observations which includes irregular time series.” require(zoo) ## Loading required package: zoo ## ## Attaching package: 'zoo' ## ## The following objects are masked from 'package:base': ## ## as.Date, as.Date.numeric head(g1gc.df[,1]) ## [1] "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" ## [3] "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ## [5] "2014-05-12T14:00:37.811-0700:" "2014-05-12T14:00:41.428-0700:" options("digits.secs"=3) times=as.POSIXct( g1gc.df[,1], format="%Y-%m-%dT%H:%M:%OS%z:") g1gc.z = zoo(g1gc.df[,-c(1)], order.by=times) head(g1gc.z) ## SecondsSinceLaunch IncrementalCount FullCount ## 2014-05-12 17:00:32.868 1.161 0 0 ## 2014-05-12 17:00:33.178 1.472 1 0 ## 2014-05-12 17:00:33.677 1.969 2 0 ## 2014-05-12 17:00:35.538 3.830 3 0 ## 2014-05-12 17:00:37.811 6.103 4 0 ## 2014-05-12 17:00:41.427 9.720 5 0 ## UserTime SysTime RealTime BeforeSize AfterSize ## 2014-05-12 17:00:32.868 0.11 0.04 0.02 8192 1400 ## 2014-05-12 17:00:33.178 0.05 0.01 0.02 5496 1672 ## 2014-05-12 17:00:33.677 0.04 0.01 0.01 5768 2557 ## 2014-05-12 17:00:35.538 0.21 0.05 0.04 22528 4907 ## 2014-05-12 17:00:37.811 0.08 0.01 0.02 24576 7072 ## 2014-05-12 17:00:41.427 0.26 0.06 0.04 43008 14336 ## TotalSize Delta ## 2014-05-12 17:00:32.868 9437184 6792 ## 2014-05-12 17:00:33.178 9437184 3824 ## 2014-05-12 17:00:33.677 9437184 3211 ## 2014-05-12 17:00:35.538 9437184 17621 ## 2014-05-12 17:00:37.811 9437184 17504 ## 2014-05-12 17:00:41.427 9437184 28672 Example of Two Benchmark Runs in One Log File The data in the following graph is from a different log file, not the one of primary interest to this article. I’m including this image because it is an example of idle periods followed by busy periods. It would be uninteresting to average the rate of garbage collection over the entire log file period. More interesting would be the rate of garbage collect in the two busy periods. Are they the same or different? Your production data may be similar, for example, bursts when employees return from lunch and idle times on weekend evenings, etc. Once the data is in an R Time Series, you can analyze isolated time windows. Clipping the Time Series data Flashing back to our test case… Viewing the data as a time series is interesting. You can see that the work intensive time period is between 9:00 PM and 3:00 AM. Lets clip the data to the interesting period: par(mfrow=c(2,1)) plot(g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Complete Log File", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") clipped.g1gc.z=window(g1gc.z, start=as.POSIXct("2014-05-12 21:00:00"), end=as.POSIXct("2014-05-13 03:00:00")) plot(clipped.g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Limited to Benchmark Execution", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") box(which = "outer", lty = "solid") Cumulative Incremental and Full GC count Here is the cumulative incremental and full GC count. When the line is very steep, it indicates that the GCs are repeating very quickly. Notice that the scale on the Y axis is different for full vs. incremental. plot(clipped.g1gc.z[,c(2:3)], main="Cumulative Incremental and Full GC count", xlab="Time of Day", col="#1b9e77") GC Analysis of Benchmark Execution using Time Series data In the following series of 3 graphs: The “After Size” show the amount of heap space in use after each garbage collection. Many Java objects are still referenced, i.e. alive, during each garbage collection. This may indicate that the application has a memory leak, or may indicate that the application has a very large memory footprint. Typically, an application's memory footprint plateau's in the early stage of execution. One would expect this graph to have a flat top. The steep decline in the heap space may indicate that the application crashed after 2:00. The second graph shows that the outliers in real execution time, discussed above, occur near 2:00. when the Java heap seems to be quite full. The third graph shows that Full GCs are infrequent during the first few hours of execution. The rate of Full GC's, (the slope of the cummulative Full GC line), changes near midnight. plot(clipped.g1gc.z[,c("AfterSize","RealTime","FullCount")], xlab="Time of Day", col=c("#1b9e77","red","#1b9e77")) GC Analysis of heap recovered Each GC trace includes the amount of heap space in use before and after the individual GC event. During garbage coolection, unreferenced objects are identified, the space holding the unreferenced objects is freed, and thus, the difference in before and after usage indicates how much space has been freed. The following box plot and bar chart both demonstrate the same point - the amount of heap space freed per garbage colloection is surprisingly low. par(mfrow=c(2,1)) boxplot(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", horizontal = TRUE, col="red") hist(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", breaks=100, col="red") box(which = "outer", lty = "solid") This graph is the most interesting. The dark blue area shows how much heap is occupied by referenced Java objects. This represents memory that holds live data. The red fringe at the top shows how much data was recovered after each garbage collection. barplot(clipped.g1gc.z[,c("AfterSize","Delta")], col=c("#7570b3","#e7298a"), xlab="Time of Day", border=NA) legend("topleft", c("Live Objects","Heap Recovered on GC"), fill=c("#7570b3","#e7298a")) box(which = "outer", lty = "solid") When I discuss the data in the log files with the customer, I will ask for an explaination for the large amount of referenced data resident in the Java heap. There are two are posibilities: There is a memory leak and the amount of space required to hold referenced objects will continue to grow, limited only by the maximum heap size. After the maximum heap size is reached, the JVM will throw an “Out of Memory” exception every time that the application tries to allocate a new object. If this is the case, the aplication needs to be debugged to identify why old objects are referenced when they are no longer needed. The application has a legitimate requirement to keep a large amount of data in memory. The customer may want to further increase the maximum heap size. Another possible solution would be to partition the application across multiple cluster nodes, where each node has responsibility for managing a unique subset of the data. Conclusion In conclusion, R is a very powerful tool for the analysis of Java garbage collection log files. The primary difficulty is data cleansing so that information can be read into an R data frame. Once the data has been read into R, a rich set of tools may be used for thorough evaluation.

Read the article

Optimizing MySQL for small VPS

- by Chris M

I'm trying to optimize my MySQL config for a verrry small VPS. The VPS is also running NGINX/PHP-FPM and Magento; all with a limit of 250MB of RAM. This is an output of MySQL Tuner... -------- General Statistics -------------------------------------------------- [--] Skipped version check for MySQLTuner script [OK] Currently running supported MySQL version 5.1.41-3ubuntu12.8 [OK] Operating on 64-bit architecture -------- Storage Engine Statistics ------------------------------------------- [--] Status: -Archive -BDB -Federated +InnoDB -ISAM -NDBCluster [--] Data in MyISAM tables: 1M (Tables: 14) [--] Data in InnoDB tables: 29M (Tables: 301) [--] Data in MEMORY tables: 1M (Tables: 17) [!!] Total fragmented tables: 301 -------- Security Recommendations ------------------------------------------- [OK] All database users have passwords assigned -------- Performance Metrics ------------------------------------------------- [--] Up for: 2d 11h 14m 58s (1M q [8.038 qps], 33K conn, TX: 2B, RX: 618M) [--] Reads / Writes: 83% / 17% [--] Total buffers: 122.0M global + 8.6M per thread (100 max threads) [!!] Maximum possible memory usage: 978.2M (404% of installed RAM) [OK] Slow queries: 0% (37/1M) [OK] Highest usage of available connections: 6% (6/100) [OK] Key buffer size / total MyISAM indexes: 32.0M/282.0K [OK] Key buffer hit rate: 99.7% (358K cached / 1K reads) [OK] Query cache efficiency: 83.4% (1M cached / 1M selects) [!!] Query cache prunes per day: 48301 [OK] Sorts requiring temporary tables: 0% (0 temp sorts / 144K sorts) [OK] Temporary tables created on disk: 13% (27K on disk / 203K total) [OK] Thread cache hit rate: 99% (6 created / 33K connections) [!!] Table cache hit rate: 0% (32 open / 51K opened) [OK] Open file limit used: 1% (20/1K) [OK] Table locks acquired immediately: 99% (1M immediate / 1M locks) [!!] InnoDB data size / buffer pool: 29.2M/8.0M -------- Recommendations ----------------------------------------------------- General recommendations: Run OPTIMIZE TABLE to defragment tables for better performance Reduce your overall MySQL memory footprint for system stability Enable the slow query log to troubleshoot bad queries Increase table_cache gradually to avoid file descriptor limits Variables to adjust: *** MySQL's maximum memory usage is dangerously high *** *** Add RAM before increasing MySQL buffer variables *** query_cache_size (> 64M) table_cache (> 32) innodb_buffer_pool_size (>= 29M) and this is the config. # # The MySQL database server configuration file. # # You can copy this to one of: # - "/etc/mysql/my.cnf" to set global options, # - "~/.my.cnf" to set user-specific options. # # One can use all long options that the program supports. # Run program with --help to get a list of available options and with # --print-defaults to see which it would actually understand and use. # # For explanations see # http://dev.mysql.com/doc/mysql/en/server-system-variables.html # This will be passed to all mysql clients # It has been reported that passwords should be enclosed with ticks/quotes # escpecially if they contain "#" chars... # Remember to edit /etc/mysql/debian.cnf when changing the socket location. [client] port = 3306 socket = /var/run/mysqld/mysqld.sock # Here is entries for some specific programs # The following values assume you have at least 32M ram # This was formally known as [safe_mysqld]. Both versions are currently parsed. [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] # # * Basic Settings # # # * IMPORTANT # If you make changes to these settings and your system uses apparmor, you may # also need to also adjust /etc/apparmor.d/usr.sbin.mysqld. # user = mysql socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp skip-external-locking # # Instead of skip-networking the default is now to listen only on # localhost which is more compatible and is not less secure. bind-address = 127.0.0.1 # # * Fine Tuning # key_buffer = 32M max_allowed_packet = 16M thread_stack = 192K thread_cache_size = 8 sort_buffer_size = 4M read_buffer_size = 4M myisam_sort_buffer_size = 16M # This replaces the startup script and checks MyISAM tables if needed # the first time they are touched myisam-recover = BACKUP max_connections = 100 table_cache = 32 tmp_table_size = 128M #thread_concurrency = 10 # # * Query Cache Configuration # #query_cache_limit = 1M query_cache_type = 1 query_cache_size = 64M # # * Logging and Replication # # Both location gets rotated by the cronjob. # Be aware that this log type is a performance killer. # As of 5.1 you can enable the log at runtime! #general_log_file = /var/log/mysql/mysql.log #general_log = 1 log_error = /var/log/mysql/error.log # Here you can see queries with especially long duration #log_slow_queries = /var/log/mysql/mysql-slow.log #long_query_time = 2 #log-queries-not-using-indexes # # The following can be used as easy to replay backup logs or for replication. # note: if you are setting up a replication slave, see README.Debian about # other settings you may need to change. #server-id = 1 #log_bin = /var/log/mysql/mysql-bin.log expire_logs_days = 10 max_binlog_size = 100M #binlog_do_db = include_database_name #binlog_ignore_db = include_database_name # # * InnoDB # # InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/. # Read the manual for more InnoDB related options. There are many! # # * Security Features # # Read the manual, too, if you want chroot! # chroot = /var/lib/mysql/ # # For generating SSL certificates I recommend the OpenSSL GUI "tinyca". # # ssl-ca=/etc/mysql/cacert.pem # ssl-cert=/etc/mysql/server-cert.pem # ssl-key=/etc/mysql/server-key.pem [mysqldump] quick quote-names max_allowed_packet = 16M [mysql] #no-auto-rehash # faster start of mysql but no tab completition [isamchk] key_buffer = 16M # # * IMPORTANT: Additional settings that can override those from this file! # The files must end with '.cnf', otherwise they'll be ignored. # !includedir /etc/mysql/conf.d/ The site contains 1 wordpress site,so lots of MYISAM but mostly static content as its not changing all that often (A wordpress cache plugin deals with this). And the Magento Site which consists of a lot of InnoDB tables, some MyISAM and some INMEMORY. The "read" side seems to be running pretty well with a mass of optimizations I've used on Magento, the NGINX setup and PHP-FPM + XCACHE. I'd love to have a kick in the right direction with the MySQL config so I'm not blindly altering it based on the MySQLTuner without understanding what I'm changing. Thanks

Read the article

Server Memory with Magento

- by Mohamed Elgharabawy

I have a cloud server with the following specifications: 2vCPUs 4G RAM 160GB Disk Space Network 400Mb/s System Image: Ubuntu 12.04 LTS I am only running Magento CE 1.7.0.2 on this server. Nothing else. Usually, the server has a loading time of 4-5 seconds. Recently, this has dropped to over 30 seconds and sometimes the server just goes away and I get HTTP error reports to my email stating that HTTP requests took more than 20000ms. Running top command and sorting them returns the following: top - 15:29:07 up 3:40, 1 user, load average: 28.59, 25.95, 22.91 Tasks: 112 total, 30 running, 82 sleeping, 0 stopped, 0 zombie Cpu(s): 90.2%us, 9.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.2%st PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 31901 www-data 20 0 360m 71m 5840 R 7 1.8 1:39.51 apache2 32084 www-data 20 0 362m 72m 5548 R 7 1.8 1:31.56 apache2 32089 www-data 20 0 348m 59m 5660 R 7 1.5 1:41.74 apache2 32295 www-data 20 0 343m 54m 5532 R 7 1.4 2:00.78 apache2 32303 www-data 20 0 354m 65m 5260 R 7 1.6 1:38.76 apache2 32304 www-data 20 0 346m 56m 5544 R 7 1.4 1:41.26 apache2 32305 www-data 20 0 348m 59m 5640 R 7 1.5 1:50.11 apache2 32291 www-data 20 0 358m 69m 5256 R 6 1.7 1:44.26 apache2 32517 www-data 20 0 345m 56m 5532 R 6 1.4 1:45.56 apache2 30473 www-data 20 0 355m 66m 5680 R 6 1.7 2:00.05 apache2 32093 www-data 20 0 352m 63m 5848 R 6 1.6 1:53.23 apache2 32302 www-data 20 0 345m 56m 5512 R 6 1.4 1:55.87 apache2 32433 www-data 20 0 346m 57m 5500 S 6 1.4 1:31.58 apache2 32638 www-data 20 0 354m 65m 5508 R 6 1.6 1:36.59 apache2 32230 www-data 20 0 347m 57m 5524 R 6 1.4 1:33.96 apache2 32231 www-data 20 0 355m 66m 5512 R 6 1.7 1:37.47 apache2 32233 www-data 20 0 354m 64m 6032 R 6 1.6 1:59.74 apache2 32300 www-data 20 0 355m 66m 5672 R 6 1.7 1:43.76 apache2 32510 www-data 20 0 347m 58m 5512 R 6 1.5 1:42.54 apache2 32521 www-data 20 0 348m 59m 5508 R 6 1.5 1:47.99 apache2 32639 www-data 20 0 344m 55m 5512 R 6 1.4 1:34.25 apache2 32083 www-data 20 0 345m 56m 5696 R 5 1.4 1:59.42 apache2 32085 www-data 20 0 347m 58m 5692 R 5 1.5 1:42.29 apache2 32293 www-data 20 0 353m 64m 5676 R 5 1.6 1:52.73 apache2 32301 www-data 20 0 348m 59m 5564 R 5 1.5 1:49.63 apache2 32528 www-data 20 0 351m 62m 5520 R 5 1.6 1:36.11 apache2 31523 mysql 20 0 3460m 576m 8288 S 5 14.4 2:06.91 mysqld 32002 www-data 20 0 345m 55m 5512 R 5 1.4 2:01.88 apache2 32080 www-data 20 0 357m 68m 5512 S 5 1.7 1:31.30 apache2 32163 www-data 20 0 347m 58m 5512 S 5 1.5 1:58.68 apache2 32509 www-data 20 0 345m 56m 5504 R 5 1.4 1:49.54 apache2 32306 www-data 20 0 358m 68m 5504 S 4 1.7 1:53.29 apache2 32165 www-data 20 0 344m 55m 5524 S 4 1.4 1:40.71 apache2 32640 www-data 20 0 345m 56m 5528 R 4 1.4 1:36.49 apache2 31888 www-data 20 0 359m 70m 5664 R 4 1.8 1:57.07 apache2 32511 www-data 20 0 357m 67m 5512 S 3 1.7 1:47.00 apache2 32054 www-data 20 0 357m 68m 5660 S 2 1.7 1:53.10 apache2 1 root 20 0 24452 2276 1232 S 0 0.1 0:01.58 init Moreover, running free -m returns the following: total used free shared buffers cached Mem: 4003 3919 83 0 118 901 -/+ buffers/cache: 2899 1103 Swap: 0 0 0 To investigate this further, I have installed apache buddy, it recommeneded that I need to reduce the maxclient connections. Which I did. I also installed MysqlTuner and it suggests that I need to set my innodb_buffer_pool_size to = 3.0G. However, I cannot do that, since the whole memory is 4G. Here is the output from apache buddy: ### GENERAL REPORT ### Settings considered for this report: Your server's physical RAM: 4003MB Apache's MaxClients directive: 40 Apache MPM Model: prefork Largest Apache process (by memory): 73.77MB [ OK ] Your MaxClients setting is within an acceptable range. Max potential memory usage: 2950.8 MB Percentage of RAM allocated to Apache 73.72 % And this is the output of MySQLTuner: -------- Performance Metrics ------------------------------------------------- [--] Up for: 47m 22s (675K q [237.552 qps], 12K conn, TX: 1B, RX: 300M) [--] Reads / Writes: 45% / 55% [--] Total buffers: 2.1G global + 2.7M per thread (151 max threads) [OK] Maximum possible memory usage: 2.5G (64% of installed RAM) [OK] Slow queries: 0% (0/675K) [OK] Highest usage of available connections: 26% (40/151) [OK] Key buffer size / total MyISAM indexes: 36.0M/18.7M [OK] Key buffer hit rate: 100.0% (245K cached / 105 reads) [OK] Query cache efficiency: 92.5% (500K cached / 541K selects) [!!] Query cache prunes per day: 302886 [OK] Sorts requiring temporary tables: 0% (1 temp sorts / 15K sorts) [!!] Joins performed without indexes: 12135 [OK] Temporary tables created on disk: 25% (8K on disk / 32K total) [OK] Thread cache hit rate: 90% (1K created / 12K connections) [!!] Table cache hit rate: 17% (400 open / 2K opened) [OK] Open file limit used: 12% (123/1K) [OK] Table locks acquired immediately: 100% (196K immediate / 196K locks) [!!] InnoDB buffer pool / data size: 2.0G/3.5G [OK] InnoDB log waits: 0 -------- Recommendations ----------------------------------------------------- General recommendations: Run OPTIMIZE TABLE to defragment tables for better performance MySQL started within last 24 hours - recommendations may be inaccurate Enable the slow query log to troubleshoot bad queries Adjust your join queries to always utilize indexes Increase table_cache gradually to avoid file descriptor limits Read this before increasing table_cache over 64: http://bit.ly/1mi7c4C Variables to adjust: query_cache_size ( 64M) join_buffer_size ( 128.0K, or always use indexes with joins) table_cache ( 400) innodb_buffer_pool_size (= 3G) Last but not least, the server still has more than 60% of free disk space. Now, based on the above, I have few questions: Are these numbers normal? Do they make sense? Do I need to upgrade the server? If I don't need to upgrade and my configuration is not correct, how do I optimize it?

Read the article

CodePlex Daily Summary for Sunday, April 15, 2012

CodePlex Daily Summary for Sunday, April 15, 2012Popular ReleasesAssaultCube Reloaded: 2.4.1 Valor: POSSIBLE KNIFE CRASH FIX Codename Valor as suggested by LMFAO! on the forums Weapon tweaks Linux has Ubuntu 11.10 32-bit precompiled binaries and Ubuntu 10.10 64-bit precompiled binaries, but you can compile your own as it also contains the source. If you are using Mac or other operating systems, download the Linux package. The server pack is ready for both Windows and Linux, but you might need to compile your own for linux (source included)callisto: callisto 2.0.25: removed 2 ip ranges from hotspot shield black list.KBCsv: KBCsv V1.4.0.0: #11872 (skipping records with delimited text can break parser) #11873 (globalization support in CsvWriter) #11185 (more versatile constructor and method overloads)National Geographic Photo of the Day Wallpaper Changer: Photo of the Day Wallpaper Changer v2.0: National Geographic - Photo of the Day Wallpaper Changer v2.0 is an improved version. It has some new features like improved GUI, automatic update and date to date photo archiver etc. please check out the user guide for more information. Please copy the exe in a directory and run. Its that simple to use :).Coding4Fun Tools: Coding4Fun.Phone.Toolkit v1.5.6: Bug Fixes nuget was broken for Timespan and complete build due to how i did the target. Corrected and made all 3 match. Color Slider (and by default Color Picker) didn't respect view state for being disabled depending on how it was set. Test application now has test cases.Json.NET: Json.NET 4.5 Release 3: Change - DefaultContractResolver.IgnoreSerializableAttribute is now true by default Fix - Fixed MaxDepth on JsonReader recursively throwing an error Fix - Fixed SerializationBinder.BindToName not being called with full assembly namesVisual Studio Team Foundation Server Branching and Merging Guide: v2 - For Visual Studio 11: Welcome to the BETA of the Branching and Merging Guide preview As this is a BETA release and the quality bar for the final Release has not been achieved, we value your candid feedback and recommend that you do not use or deploy these BETA artifacts in a production environment. Quality-Bar Details Documentation has been reviewed by Visual Studio ALM Rangers Documentation has not been through an independent technical review Documentation has been reviewed by the quality and recording te...Media Companion: MC 3.435b Release: This release should be the last beta for 3.4xx. A handful of problems have been sorted out since last weeks release. If there are no major problems this time, it will upgraded to 3.500 Stable at the end of the week! General The .NET Framework has been modified to use the Client profile, as provided by normal Windows updates; no longer is there a requirement to download and install the Full profile! mc_com.exe has been worked on to mimic proper Media Companion output (a big thanks to vbat99...Wholemy.LinkedLists: wholemy.linkedlists.2012.04.12.38: libs and srcTHE NVL Maker: The NVL Maker Ver 3.12: SIM??????，TRA??????，ZIP????。 ????????????????，??????~（??????????????????） ??????? simpatch1440x900 trapatch1440x900 ?????1400x900??1440x900，?????????????Data.xp3。 ???? ?????3.12?EXE????????????????， ??????????????，??Tool/krkrconf.exe，??Editor.exe， ???????????????「??????」。 ?????Editor.exe??????。 ???? ???? http://etale.us/gameupload/THE_NVL_Maker_ver3.12_sim.zip ???? http://www.mediafire.com/?je51683g22bz8vo ??Infinite Creation?? http://bbs.etale.us/forum.php ?????? ???? 3.12 ??? ???、????...Quick Performance Monitor: Version 1.8.2: Version 1.8.2. Add the ability for qpmset files to also store the Window location/size so predefined 'sets' can be forced to always open on the same place of the screen.SnmpMessenger: 0.1.1.1: Project Description SnmpMessenger, a messenger. Using the SNMP protocol to exchange messages. It's developed in C#. SnmpMessenger For .Net 4.0, Mono 2.8. Support SNMP V1, V2, V3. Features Send get, set and other requests and get the response. Send and receive traps. Handle requests and return the response. Note This library is compliant with the Common Language Specification(CLS). The latest version is 0.1.1.1. It is only a messenger, does not involve VACM. Any problems, Please mailto: wa...Python Tools for Visual Studio: 1.1.1: We’re pleased to announce the release of Python Tools for Visual Studio 1.1.1. Python Tools for Visual Studio (PTVS) is an open-source plug-in for Visual Studio which supports programming with the Python language. PTVS supports a broad range of features including: • Supports CPython and IronPython • Python editor with advanced member and signature intellisense • Code navigation: “Find all refs”, goto definition, and object browser • Local and remote debugging • Profiling with multiple view...Supporting Guidance and Whitepapers: v1 - Team Foundation Service Whitepapers: Welcome to the BETA release of the Team Foundation Service Whitepapers preview As this is a BETA release and the quality bar for the final Release has not been achieved, we value your candid feedback and recommend that you do not use or deploy these BETA artifacts in a production environment. Quality-Bar Details Documentation has been reviewed by Visual Studio ALM Rangers Documentation has been through an independent technical review All critical bugs have been resolved Known Issue...LINQ to Twitter: LINQ to Twitter Beta v2.0.24: Supports .NET 3.5, .NET 4.0, Silverlight 4.0, Windows Phone 7.1, and Client Profile. 100% Twitter API coverage. Also available via NuGet.Kendo UI ASP.NET Sample Applications: Sample Applications (2012-04-11): Sample application(s) demonstrating the use of Kendo UI in ASP.NET applications.SCCM Client Actions Tool: SCCM Client Actions Tool v1.12: SCCM Client Actions Tool v1.12 is the latest version. It comes with following changes since last version: Improved WMI date conversion to be aware of timezone differences and DST. Fixed new version check. The tool is downloadable as a ZIP file that contains four files: ClientActionsTool.hta – The tool itself. Cmdkey.exe – command line tool for managing cached credentials. This is needed for alternate credentials feature when running the HTA on Windows XP. Cmdkey.exe is natively availab...Dual Browsing: Dual Browser: Please note the following: I setup the address bar temporarily to only accepts http:// .com addresses. Just type in the name of the website excluding: http://, www., and .com; (Ex: for www.youtube.com just type: youtube then click OK). The page splitter can be grabbed by holding down your left mouse button and move left or right. By right clicking on the page background, you can choose to refresh, go back a page and so on. Demo video: http://youtu.be/L7NTFVM3JUYLiberty: v3.2.0.1 Release 9th April 2012: Change Log-Fixed -Reach Fixed a bug where the object editor did not work on non-English operating systemsPath Copy Copy: 10.1: This release addresses the following work items: 11357 11358 11359 This release is a recommended upgrade, especially for users who didn't install the 10.0.1 version.New ProjectsADENA: This project consists in the development of a Graphical User Interface (GUI) for a proof assistant for Propositional Classical Logic based on Natural Deduction method.Alternity Warships Editor: This is a tool designed to easy the process of creating a new spaceship for Alternity using the Warship rules.BigBallz: Projeto de site de Bolões para campeonatos diversos. A princípio pensado para copa do mundo de futebol de 2010Crescent: bunch of mac scriptsDemoVasquez: Demo Proyecto 1Didrotuos: DidrotuosEat Out Advocate: Eat Out Advocate --------------------Kuttiflow: A simple implementation of Workflow for .NetLuan Van Cuoi Khoa: Lu?n van cu?i khoa ÐH Tây Nguyênmapaconwindowsphone: probar un mapa de bing mapsMovie Renamer: Is your movie collection a mess? No idea which movie is which? This tool easily renames movies based on the title and the year of the movie. Helps you sort out that movie collection! It will download from IMDB the closest matches based on the name of the file. Very simple written in WPF, so it will need the .NET 4 framework. At the moment you cannot set how you want to movie to be renamed, it will always be: "MovieTitle (Year).ext".Online Math Calculator: This is an Online Math Calculator. What this does is take your equation in the usual form and convert every variable to x, y, and every constant to a, b, c, d, e, etc... use wolfram|alpha and then convert back to the input format. This allows to input most equations.Orchard on Windows Azure with Dynamic Deploy: Dynamic Deploy is a cloud deployment platform. This project includes the Orchard source code that was used to create a Windows Azure build. The original source has not been modified. We have just added more themes and modules and modified web.config with a machine key. visit httppcvvpes: pcvvpesPesquisa de Satisfacao: PesquisaPit of Despair: An XNA 4.0 game in C# focused on learning to write overhead dungeon crawl games. Inpired from games such as Zelda, Wizardy, perhaps some original Final Fantasy.Prova Branquinho: Prova BranquinhoRibHat: RibHat is a framework for building websites, forums, blogs, and web-based information systems. It is a set of libraries that help the programmer to get rid from the immobility of CMS, obtaining a maximum level of customization.SetupWizard: a SetupWizard, install windows service, create IIS site, create database during installation.SharkOS: This operating system is the system Arkadia OS but with a GUI, it is created in c #, it will be fast and no lag.Shopping List: Shopping List is a simple WP7 application that enables tracking of items to buy when going for groceries.Sim Cricket: Cricket simulation game.. For cricket and C# fans.. Simple InterNET Daemon: Simple InterNET DaemonSteggy: Steganography project. SychevIgor Win8 Apps Source Code: SychevIgor Win8 Apps Source Codetest53768492156478: nothingTool to change monitor display frequency on HTPC: This is a small application that can change the monitors refresh-rate by simply running one of the applications for the desired refresh-rate. It was developed to make it easy to change the refresh rate, when launching external media players from XBMC and so on... And it was published here, so that others can see how easily it can be done.TravelSaver: Hajj Umrah USA, Travel SaverUpdateBot: UpdateBot is a GUI application that simplifies and automates the downloading and parsing of FileHippo.com's Update Checker result pages.

Read the article

Working with Timelines with LINQ to Twitter

- by Joe Mayo

When first working with the Twitter API, I thought that using SinceID would be an effective way to page through timelines. In practice it doesn’t work well for various reasons. To explain why, Twitter published an excellent document that is a must-read for anyone working with timelines: Twitter Documentation: Working with Timelines This post shows how to implement the recommended strategies in that document by using LINQ to Twitter. You should read the document in it’s entirety before moving on because my explanation will start at the bottom and work back up to the top in relation to the Twitter document. What follows is an explanation of SinceID, MaxID, and how they come together to help you efficiently work with Twitter timelines. The Role of SinceID Specifying SinceID says to Twitter, “Don’t return tweets earlier than this”. What you want to do is store this value after every timeline query set so that it can be reused on the next set of queries. The next section will explain what I mean by query set, but a quick explanation is that it’s a loop that gets all new tweets. The SinceID is a backstop to avoid retrieving tweets that you already have. Here’s some initialization code that includes a variable named sinceID that will be used to populate the SinceID property in subsequent queries: // last tweet processed on previous query set ulong sinceID = 210024053698867204; ulong maxID; const int Count = 10; var statusList = new List<status>(); Here, I’ve hard-coded the sinceID variable, but this is where you would initialize sinceID from whatever storage you choose (i.e. a database). The first time you ever run this code, you won’t have a value from a previous query set. Initially setting it to 0 might sound like a good idea, but what if you’re querying a timeline with lots of tweets? Because of the number of tweets and rate limits, your query set might take a very long time to run. A caveat might be that Twitter won’t return an entire timeline back to Tweet #0, but rather only go back a certain period of time, the limits of which are documented for individual Twitter timeline API resources. So, to initialize SinceID at too low of a number can result in a lot of initial tweets, yet there is a limit to how far you can go back. What you’re trying to accomplish in your application should guide you in how to initially set SinceID. I have more to say about SinceID later in this post. The other variables initialized above include the declaration for MaxID, Count, and statusList. The statusList variable is a holder for all the timeline tweets collected during this query set. You can set Count to any value you want as the largest number of tweets to retrieve, as defined by individual Twitter timeline API resources. To effectively page results, you’ll use the maxID variable to set the MaxID property in queries, which I’ll discuss next. Initializing MaxID On your first query of a query set, MaxID will be whatever the most recent tweet is that you get back. Further, you don’t know what MaxID is until after the initial query. The technique used in this post is to do an initial query and then use the results to figure out what the next MaxID will be. Here’s the code for the initial query: var userStatusResponse = (from tweet in twitterCtx.Status where tweet.Type == StatusType.User && tweet.ScreenName == "JoeMayo" && tweet.SinceID == sinceID && tweet.Count == Count select tweet) .ToList(); statusList.AddRange(userStatusResponse); // first tweet processed on current query maxID = userStatusResponse.Min( status => ulong.Parse(status.StatusID)) - 1; The query above sets both SinceID and Count properties. As explained earlier, Count is the largest number of tweets to return, but the number can be less. A couple reasons why the number of tweets that are returned could be less than Count include the fact that the user, specified by ScreenName, might not have tweeted Count times yet or might not have tweeted at least Count times within the maximum number of tweets that can be returned by the Twitter timeline API resource. Another reason could be because there aren’t Count tweets between now and the tweet ID specified by sinceID. Setting SinceID constrains the results to only those tweets that occurred after the specified Tweet ID, assigned via the sinceID variable in the query above. The statusList is an accumulator of all tweets receive during this query set. To simplify the code, I left out some logic to check whether there were no tweets returned. If the query above doesn’t return any tweets, you’ll receive an exception when trying to perform operations on an empty list. Yeah, I cheated again. Besides querying initial tweets, what’s important about this code is the final line that sets maxID. It retrieves the lowest numbered status ID in the results. Since the lowest numbered status ID is for a tweet we already have, the code decrements the result by one to keep from asking for that tweet again. Remember, SinceID is not inclusive, but MaxID is. The maxID variable is now set to the highest possible tweet ID that can be returned in the next query. The next section explains how to use MaxID to help get the remaining tweets in the query set. Retrieving Remaining Tweets Earlier in this post, I defined a term that I called a query set. Essentially, this is a group of requests to Twitter that you perform to get all new tweets. A single query might not be enough to get all new tweets, so you’ll have to start at the top of the list that Twitter returns and keep making requests until you have all new tweets. The previous section showed the first query of the query set. The code below is a loop that completes the query set: do { // now add sinceID and maxID userStatusResponse = (from tweet in twitterCtx.Status where tweet.Type == StatusType.User && tweet.ScreenName == "JoeMayo" && tweet.Count == Count && tweet.SinceID == sinceID && tweet.MaxID == maxID select tweet) .ToList(); if (userStatusResponse.Count > 0) { // first tweet processed on current query maxID = userStatusResponse.Min( status => ulong.Parse(status.StatusID)) - 1; statusList.AddRange(userStatusResponse); } } while (userStatusResponse.Count != 0 && statusList.Count < 30); Here we have another query, but this time it includes the MaxID property. The SinceID property prevents reading tweets that we’ve already read and Count specifies the largest number of tweets to return. Earlier, I mentioned how it was important to check how many tweets were returned because failing to do so will result in an exception when subsequent code runs on an empty list. The code above protects against this problem by only working with the results if Twitter actually returns tweets. Reasons why there wouldn’t be results include: if the first query got all the new tweets there wouldn’t be more to get and there might not have been any new tweets between the SinceID and MaxID settings of the most recent query. The code for loading the returned tweets into statusList and getting the maxID are the same as previously explained. The important point here is that MaxID is being reset, not SinceID. As explained in the Twitter documentation, paging occurs from the newest tweets to oldest, so setting MaxID lets us move from the most recent tweets down to the oldest as specified by SinceID. The two loop conditions cause the loop to continue as long as tweets are being read or a max number of tweets have been read. Logically, you want to stop reading when you’ve read all the tweets and that’s indicated by the fact that the most recent query did not return results. I put the check to stop after 30 tweets are reached to keep the demo from running too long – in the console the response scrolls past available buffer and I wanted you to be able to see the complete output. Yet, there’s another point to be made about constraining the number of items you return at one time. The Twitter API has rate limits and making too many queries per minute will result in an error from twitter that LINQ to Twitter raises as an exception. To use the API properly, you’ll have to ensure you don’t exceed this threshold. Looking at the statusList.Count as done above is rather primitive, but you can implement your own logic to properly manage your rate limit. Yeah, I cheated again. Summary Now you know how to use LINQ to Twitter to work with Twitter timelines. After reading this post, you have a better idea of the role of SinceID - the oldest tweet already received. You also know that MaxID is the largest tweet ID to retrieve in a query. Together, these settings allow you to page through results via one or more queries. You also understand what factors affect the number of tweets returned and considerations for potential error handling logic. The full example of the code for this post is included in the downloadable source code for LINQ to Twitter. @JoeMayo

Read the article

can anyone help how to get data from a plist, precisely inside the array

- by jix

Can anyone help me with getting data from this plist? I'm having trouble accessing the values of the three objects in the plist. i can see all the list of countries in my tableView, but i can't see the prices when i tap on a cell . any help please thanks MY PLIST <plist version="1.0"> <dict> <key>Afghanistan 3</key> <array> <string>RC $1.65</string> <string>CC $2.36</string> <string>EC 0</string> </array> <key>Albania 1</key> <array> <string>RC FREE</string> <string>CC $1.01</string> </array> <key>Algeria 2</key> <array> <string>RC $0.27</string> <string>CC $0.85</string> </array> <key>Andorra 2</key> <array> <string>RC FREE</string> <string>CC $0.93</string> also my code that i have implemented in xcode 4.5 . cc is the calling rate that is in item 0 in the plist rc is the receiving rate that is in item 1 in the plist ec is the extra rate that is in item 2 in the plist how can i see the cc ,rc, & ec each in a label when i click the cell in the next view controller ? MY CODE NSString *ratesFile = [[NSBundle mainBundle] pathForResource:@"rates" ofType:@"plist"]; rates = [[NSDictionary alloc]initWithContentsOfFile:ratesFile]; NSArray * dictionaryKeys = [rates allKeys]; name = [dictionaryKeys sortedArrayUsingSelector:@selector(compare:)]; cc = [rates objectForKey:@"Item 0"]; rc = [rates objectForKey:@"Item 1"]; ec = [rates objectForKey:@"Item 2"]; - (NSInteger)tableView:(UITableView *)tableView numberOfRowsInSection:(NSInteger)section { return [rates count]; } - (UITableViewCell *)tableView:(UITableView *)tableView cellForRowAtIndexPath:(NSIndexPath *)indexPath { static NSString *CellIdentifier = @"Cell"; UITableViewCell *cell = [tableView dequeueReusableCellWithIdentifier:CellIdentifier]; if (cell == nil) { cell = [[UITableViewCell alloc] initWithStyle:UITableViewCellStyleDefault reuseIdentifier:CellIdentifier]; cell.accessoryType = UITableViewCellAccessoryDisclosureIndicator; } NSString *countryName = [name objectAtIndex:indexPath.row]; cell.textLabel.text = countryName; } - (void)tableView:(UITableView *)tableView didSelectRowAtIndexPath:(NSIndexPath *)indexPath { NSString *ccRate = [cc objectAtIndex:indexPath.row]; if (!self.detailViewController) { self.detailViewController = [[DetailViewController alloc] initWithNibName:@"DetailViewController" bundle:nil]; } self.detailViewController.detailItem = ccRate; [self.navigationController pushViewController:self.detailViewController animated:YES]; } thank in advance

Read the article

Replace conditional with polymorphism refactoring or similar?

- by Anders Svensson

Hi, I have tried to ask a variant of this question before. I got some helpful answers, but still nothing that felt quite right to me. It seems to me this shouldn't really be that hard a nut to crack, but I'm not able to find an elegant simple solution. (Here's my previous post, but please try to look at the problem stated here as procedural code first so as not to be influenced by the earlier explanation which seemed to lead to very complicated solutions: http://stackoverflow.com/questions/2772858/design-pattern-for-cost-calculator-app ) Basically, the problem is to create a calculator for hours needed for projects that can contain a number of services. In this case "writing" and "analysis". The hours are calculated differently for the different services: writing is calculated by multiplying a "per product" hour rate with the number of products, and the more products are included in the project, the lower the hour rate is, but the total number of hours is accumulated progressively (i.e. for a medium-sized project you take both the small range pricing and then add the medium range pricing up to the number of actual products). Whereas for analysis it's much simpler, it is just a bulk rate for each size range. How would you be able to refactor this into an elegant and preferably simple object-oriented version (please note that I would never write it like this in a purely procedural manner, this is just to show the problem in another way succinctly). I have been thinking in terms of factory, strategy and decorator patterns, but can't get any to work well. (I read Head First Design Patterns a while back, and both the decorator and factory patterns described have some similarities to this problem, but I have trouble seeing them as good solutions as stated there. The decorator example seems very complicated for just adding condiments, but maybe it could work better here, I don't know. And the factory pattern example with the pizza factory...well it just seems to create such a ridiculous explosion of classes, at least in their example. I have found good use for factory patterns before, but I can't see how I could use it here without getting a really complicated set of classes) The main goal would be to only have to change in one place (loose coupling etc) if I were to add a new parameter (say another size, like XSMALL, and/or another service, like "Administration"). Here's the procedural code example: public class Conditional { private int _numberOfManuals; private string _serviceType; private const int SMALL = 2; private const int MEDIUM = 8; public int GetHours() { if (_numberOfManuals <= SMALL) { if (_serviceType == "writing") return 30 * _numberOfManuals; if (_serviceType == "analysis") return 10; } else if (_numberOfManuals <= MEDIUM) { if (_serviceType == "writing") return (SMALL * 30) + (20 * _numberOfManuals - SMALL); if (_serviceType == "analysis") return 20; } else //i.e. LARGE { if (_serviceType == "writing") return (SMALL * 30) + (20 * (MEDIUM - SMALL)) + (10 * _numberOfManuals - MEDIUM); if (_serviceType == "analysis") return 30; } return 0; //Just a default fallback for this contrived example } } All replies are appreciated! I hope someone has a really elegant solution to this problem that I actually thought from the beginning would be really simple... Regards, Anders

Read the article

Why is there a Null Pointer Exception in this Java Code?

- by algorithmicCoder

This code takes in users and movies from two separate files and computes a user score for a movie. When i run the code I get the following error: Exception in thread "main" java.lang.NullPointerException at RecommenderSystem.makeRecommendation(RecommenderSystem.java:75) at RecommenderSystem.main(RecommenderSystem.java:24) I believe the null pointer exception is due to an error in this particular class but I can't spot it....any thoughts? import java.io.*; import java.lang.Math; public class RecommenderSystem { private Movie[] m_movies; private User[] m_users; /** Parse the movies and users files, and then run queries against them. */ public static void main(String[] argv) throws FileNotFoundException, ParseError, RecommendationError { FileReader movies_fr = new FileReader("C:\\workspace\\Recommender\\src\\IMDBTop10.txt"); FileReader users_fr = new FileReader("C:\\workspace\\Recommender\\src\\IMDBTop10-users.txt"); MovieParser mp = new MovieParser(movies_fr); UserParser up = new UserParser(users_fr); Movie[] movies = mp.getMovies(); User[] users = up.getUsers(); RecommenderSystem rs = new RecommenderSystem(movies, users); System.out.println("Alice would rate \"The Shawshank Redemption\" with at least a " + rs.makeRecommendation("The Shawshank Redemption", "asmith")); System.out.println("Carol would rate \"The Dark Knight\" with at least a " + rs.makeRecommendation("The Dark Knight", "cd0")); } /** Instantiate a recommender system. * * @param movies An array of Movie that will be copied into m_movies. * @param users An array of User that will be copied into m_users. */ public RecommenderSystem(Movie[] movies, User[] users) throws RecommendationError { m_movies = movies; m_users = users; } /** Suggest what the user with "username" would rate "movieTitle". * * @param movieTitle The movie for which a recommendation is made. * @param username The user for whom the recommendation is made. */ public double makeRecommendation(String movieTitle, String username) throws RecommendationError { int userNumber; int movieNumber; int j=0; double weightAvNum =0; double weightAvDen=0; for (userNumber = 0; userNumber < m_users.length; ++userNumber) { if (m_users[userNumber].getUsername().equals(username)) { break; } } for (movieNumber = 0; movieNumber < m_movies.length; ++movieNumber) { if (m_movies[movieNumber].getTitle().equals(movieTitle)) { break; } } // Use the weighted average algorithm here (don't forget to check for // errors). while(j<m_users.length){ if(j!=userNumber){ weightAvNum = weightAvNum + (m_users[j].getRating(movieNumber)- m_users[j].getAverageRating())*(m_users[userNumber].similarityTo(m_users[j])); weightAvDen = weightAvDen + (m_users[userNumber].similarityTo(m_users[j])); } j++; } return (m_users[userNumber].getAverageRating()+ (weightAvNum/weightAvDen)); } } class RecommendationError extends Exception { /** An error for when something goes wrong in the recommendation process. * * @param s A string describing the error. */ public RecommendationError(String s) { super(s); } }

Read the article

Python Class which checks input before passing to console (C) program

- by Joseph Melettukunnel

Hello, We are asked to write a web-frontend (in python) for a very complex (and old) console application, written in C. Since we have no access to the C Source Code, and we assume that there might be some unsafe methods, we'd like to check the input which will the passed to the console application. WebClient - Python Module - Console Application Do you have any suggestions or tips what we should check for? Right now we are only limiting the string length and filtering some (program specific) unallowed keywords. Thanks, Joseph EDIT: Will remove strings like %s because of format string attacks

Read the article

Does an app name have to be unique between the iPhone and iPad app store?

- by Owen

If there was an iPhone app called 'gears' could I release a app for iPad called 'gears'? Also, has anyone heard if Apple will move towards limiting unique names to a category so it would be possible e.g. 'gears' the game and 'gears' the productivity app? App naming seems like a dark art. Any pointers to resources on Apple's stance would also be handy.

Read the article

Background colour for H3 extendind more than the content

- by Aruna

hi, i am having a tag where i have added a Css for it as #jsn-maincontent_inner h3 { background-color:#BBB1A5; color:white; padding:3px 8px; text-transform:uppercase; } but the background color extends for the whole row of the line and it is not limiting upto the content . HOw to resolve this??

Read the article

Jquery add elipse (...) and truncate text based on height not width.

- by Jigs

I have found many ways to measure width and to truncate text using jquery, but I can't find one based on height. I am limited to two lines of text or it will knock my design out of canter. So does anyone know a method of limiting a paragraph to two lines high, tuncating it and adding an elipse ?

Read the article

Do you still limit line length in code?

- by Noldorin

This is a matter on which I would like to gauge the opinion of the community: Do you still limit the length of lines of code to a fixed maximum? This was certainly a convention of the past for many languages; one would typically cap the number of characters per line to a value such as 80 (and more recnetly 100 or 120 I believe). As far as I understand, the primary reasons for limiting line length are: Readability - You don't have to scroll over horizontally when you want to see the end of some lines. Printing - Admittedly (at least in my experience), most code that you are working on does not get printed out on paper, but by limiting the number of characters you can insure that formatting doesn't get messed up when printed. Past editors (?) - Not sure about this one, but I suspect that at some point in the distant past of programming, (at least some) text editors may have been based on a fixed-width buffer. I'm sure there are points that I am still missing out, so feel free to add to these... Now, when I tend to observe C or C# code nowadays, I often see a number of different styles, the main ones being: Line length capped to 80, 100, or even 120 characters. As far as I understand, 80 is the traditional length, but the longer ones of 100 and 120 have appeared because of the widespread use of high resolutions and widescreen monitors nowadays. No line length capping at all. This tends to be pretty horrible to read, and I don't see it too often, though it's certainly not too rare either. Inconsistent capping of line length. The length of some lines are limited to a fixed maximum (or even a maximum that changes depending on the file/location in code), while others (possibly comments) are not at all. My personal preference here (at least recently) has been to cap the line length to 100 in the Visual Studio editor. This means that in a decently sized window (on a non-widescreen monitor), the ends of lines are still fully visible. I can however see a few disadvantages in this, especially when you end up writing code that's indented 3 or 4 levels and then having to include a long string literal - though I often take this as a sign to refactor my code! In particular, I am curious what the C and C# coders (or anyone who uses Visual Studio for that matter) think about this point, though I would be interested in hearing anyone's thoughts on the subject. Edit Thanks for the all answers - I appreciate the variety of opinions here, all presenting sound reasons. Consensus does seem to be tipping in the direction of always (or almost always) limit the line length. Interestingly, it seems to be in various coding standards to limit the line length. Judging by some of the answers, both the Python and Google CPP guidelines set the limit at 80 chars. I haven't seen anything similar regarding C# or VB.NET, but I would be curious to see if there are ones anywhere.

Search Results

Search found 2899 results on 116 pages for 'rate limiting'.

Page 40/116 | < Previous Page | 36 37 38 39 40 41 42 43 44 45 46 47 | Next Page >

- by user888734

- by Nick

- by VxJasonxV

- by Carnotaurus

- by gunshor

- by Philipp

- by MrCranky

- by Richard

- by Luke

- by Junier

- by Junier

- by Alan Smith

- by user12620111

- by Chris M

- by Mohamed Elgharabawy

- by Joe Mayo

- by jix

- by Anders Svensson

- by algorithmicCoder

- by Joseph Melettukunnel

- by Owen

- by Aruna

- by Jigs

- by Noldorin

< Previous Page | 36 37 38 39 40 41 42 43 44 45 46 47 | Next Page >