Search Results

Search found 21331 results on 854 pages for 'require once'.

Page 82/854 | < Previous Page | 78 79 80 81 82 83 84 85 86 87 88 89 | Next Page >

RhinoMocks: how to test if method was called when using PartialMock

- by epitka

I have a clas that is something like this public class MyClass { public virtual string MethodA(Command cmd) { //some code here} public void MethodB(SomeType obj) { // do some work MethodA(command); } } I mocked MyClass as PartialMock (mocks.PartialMock<MyClass>) and I setup expectation for MethodA var cmd = new Command(); //set the cmd to expected state Expect.Call(MyClass.MethodA(cmd)).Repeat.Once(); Problem is that MethodB calls actual implementation of MethodA instead of mocking it up. I must be doing something wrong (not very experienced with RhinoMocks). How do I force it to mock MetdhodA? Here is the actual code: var cmd = new SetBaseProductCoInsuranceCommand(); cmd.BaseProduct = planBaseProduct; var insuredType = mocks.DynamicMock<InsuredType>(); Expect.Call(insuredType.Code).Return(InsuredTypeCode.AllInsureds); cmd.Values.Add(new SetBaseProductCoInsuranceCommand.CoInsuranceValues() { CoInsurancePercent = 0, InsuredType = insuredType, PolicySupplierType = ppProvider }); Expect.Call(() => service.SetCoInsurancePercentages(cmd)).Repeat.Once(); mocks.ReplayAll(); //act service.DefaultCoInsurancesFor(planBaseProduct); //assert service.AssertWasCalled(x => x.SetCoInsurancePercentages(cmd),x=>x.Repeat.Once());

Read the article
Redis Cookbook Chat Recipe

- by Tommy Kennedy

I am a new starter to Node.Js and Redis. I got the Redis cookbook and was trying out the Chat client & Server recipe. I was wondering if anybody got the code to work or if there is some bug in the code. I dont see where the sent messages from the client get invoked on the server. Any help would be great. Regards, Tom Client Code: <?php ?> <html> <head> <title></title> <script src="http://192.168.0.118:8000/socket.io/socket.io.js"></script> <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js"></script> <script> var socket = io.connect('192.168.0.118',{port:8000}); socket.on('message', function(data){ alert(data); //var li = new Element('li').insert(data); //$('messages').insert({top: li}); }); </script> </head> <body> <ul id="messages">  </ul> <form id="chatform" action=""> <input id="chattext" type="text" value="" /> <input type="submit" value="Send" /> </form> <script> $('#chatform').submit(function() { socket.emit('message', 'test'); //$('chattext').val()); $('chattext').val(""); // cleanup the field return false; }); </script> </body> </html> Server Code: var http = require('http'); io = require('socket.io'); redis = require('redis'); rc = redis.createClient(); //rc1 = redis.createClient(); rc.on("connect",function(){ rc.subscribe("chat"); console.log("In Chat Stream"); }); rc.on("message",function (channel,message){ console.log("Sending hope: " + message); //rc1.publish("chat","hope"); socketio.sockets.emit('message',message); }); server = http.createServer(function(req,res){ res.writeHead(200,{'content-type':'text/html'}); res.end('<h1>hello world</h1>'); }); server.listen(8000); var socketio = io.listen(server);

Read the article
Having trouble parsing XML with jQuery

- by Jack

Hi Guys, I'm trying to parse some XML data using jQuery, and as it stands I have extracted the 'ID' attribute of the required nodes and stored them in an array, and now I want to run a loop for each array member and eventually grab more attributes from the notes specific to each ID. The problem currently is that once I get to the 'for' loop, it isn't looping, and I think I may have written the xml path data incorrectly. It runs once and I recieve the 'alert(arrayIds.length);' only once, and it only loops the correct amount of times if I remove the subsequent xml path code. Here is my function: var arrayIds = new Array(); $(document).ready(function(){ $.ajax({ type: "GET", url: "question.xml", dataType: "xml", success: function(xml) { $(xml).find("C").each(function(){ $("#attr2").append($(this).attr('ID') + "<br />"); arrayIds.push($(this).attr('ID')); }); for (i=0; i<arrayIds.length; i++) { alert(arrayIds.length); $(xml).find("C[ID='arrayIds[i]']").(function(){ // pass values alert('test'); }); } } }); }); Any ideas?

Read the article
Facebook iFrame APP not working in IE, works on every other browser

- by Sean Ashmore

So im getting a blank page when loading this page within an iFrame on Internet explorer, every other browser works fine.. I have also tried using p3p headers as other people have suggested, but to no avail. <?php require ("connect.php"); require ("config.php"); require ("fb_config.php"); ?> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <title>Login handler</title> <meta http-equiv="content-type" content="text/html; charset=iso-8859-1"> <link rel="stylesheet" href="css/login.css" type="text/css"> </head> <body> <?//=$user?> <?php if($user == 0) { echo "You are not logged into facebook. Nice try."; }else{ $query = "SELECT id,fb_id,login_ip,login_count,activated,sitestate FROM login WHERE fb_id='".mysql_real_escape_string($user)."'"; $result = mysql_query($query) or die(mysql_error()); $row = mysql_fetch_array($result); if (mysql_num_rows($result) == 0) { $sql = "INSERT INTO login SET id = '', fb_id ='" .mysql_real_escape_string($user). "', name = '" .rand(10000000000000000,99999999999999999999). "', signup =NOW() , password = '" .mysql_real_escape_string($pass). "', state = '0', mail = '" .mysql_real_escape_string($_POST['mail']). "',location='".mysql_real_escape_string($randomlocation)."',location_start='".mysql_real_escape_string($randomlocation)."', signup_ip='".mysql_real_escape_string($_SERVER['REMOTE_ADDR'])."',ref='".mysql_real_escape_string($_POST['ref'])."', activation_id = '" .mysql_real_escape_string($activation_link). "',activated='2', killprotection = '$twodayprot',gender='" .mysql_real_escape_string($_POST["gender"]). "'"; $res = mysql_query($sql); } //if($row['fb_id'] != $user){ //echo "Your facebook ID: $user is NOT in the MW DB."; //exit(); //}else{ if(empty($row['login_ip'])){ $row['login_ip'] = $_SERVER['REMOTE_ADDR']; }else{ $ip_information = explode("-", $row['login_ip']); if (in_array($_SERVER['REMOTE_ADDR'], $ip_information)) { $row['login_ip'] = $row['login_ip']; }else{ $row['login_ip'] = $row['login_ip']."-".$_SERVER['REMOTE_ADDR']; } } $update_login = mysql_query("UPDATE login SET login_count=login_count+'1' WHERE name='".mysql_real_escape_string($_POST['username'])."'") or die(mysql_error()); $_SESSION['user_id'] = $row['id']; $result = mysql_query("UPDATE login SET userip='".mysql_real_escape_string($_SERVER['REMOTE_ADDR'])."',login_ip='".mysql_real_escape_string($row['login_ip'])."',login_count='0' WHERE id='".mysql_real_escape_string($_SESSION['user_id'])."'") or die(mysql_error()); if ($row['sitestate'] == 0){ header("location: home.php"); } elseif ($row['sitestate'] == 2) { header("location: killed.php?id={$row['id']}&encrypted={$row['password']}"); } else { header("location: banned.php?id={$row['id']}&encrypted={$row['password']}"); } }// id check. ?> </body> </html>

Read the article
Generated images fail to load in browser

- by notJim

I've got a page on a webapp that has about 13 images that are generated by my application, which is written in the Kohana PHP framework. The images are actually graphs. They are cached so they are only generated once, but the first time the user visits the page, and the images all have to be generated, about half of the images don't load in the browser. Once the page has been requested once and images are cached, they all load successfully. Doing some ad-hoc testing, if I load an individual image in the browser, it takes from 450-700 ms to load with an empty cache (I checked this using Google Chrome's resource tracking feature). For reference, it takes around 90-150 ms to load a cached image. Even if the image cache is empty, I have the data and some of the application's startup tasks cached, so that after the first request, none of that data needs to be fetched. My questions are: Why are the images failing to load? It seems like the browser just decides not to download the image after a certain point, rather than waiting for them all to finish loading. What can I do to get them to load the first time, with an empty cache? Obviously one option is to decrease the load times, and I could figure out how to do that by profiling the app, but are there other options? As I mentioned, the app is in the Kohana PHP framework, and it's running on Apache. As an aside, I've solved this problem for now by fetching the page as soon as the data is available (it comes from a batch process), so that the images are always cached by the time the user sees them. That feels like a kludgey solution to me, though, and I'm curious about what's actually going on.

Read the article
Logback: Logging with two loggers

- by gammay

I would like to use slf4j+logback for two purposes in my application - log and audit. For logging, I log the normal way: static final Logger logger = LoggerFactory.getLogger(Main.class); logger.debug("-> main()"); For Audit, I create a special named logger and log to it: static final Logger logger = LoggerFactory.getLogger("AUDIT_LOGGER"); Object[] params = { new Integer(1) /* TenantID */, new Integer(10) /* UserID */, msg}; logger.info("{}|{}|{}", params); logback configuration: <logger name="AUDIT_LOGGER" level="info"> <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender"> <encoder> <pattern>%d{HH:mm:ss.SSS}|%msg%n </pattern> </encoder> </appender> </logger> <root level="all"> <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender"> <encoder> <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n </pattern> </encoder> </appender> </root> Problem: Messages logged through audit logger appear twice - once under the AUDIT_LOGGER and once under the root logger. 14:41:57.975 [main] DEBUG com.gammay.example.Main - - main() 14:41:57.978|1|10|welcome to main 14:41:57.978 [main] INFO AUDIT_LOGGER - 1|10|welcome to main How can I make sure audit messages appear only once under the audit logger?

Read the article
Setting WCF service for multiple client calls

- by user348255

Hi all, I have made a WCF service which is defined like this: [ServiceBehavior(InstanceContextMode = InstanceContextMode.Single, ConcurrencyMode = ConcurrencyMode.Multiple)] binding is done using netTcpBinding. We support 50+ clients that call the server from time to time. Each client opens a channel using channelfactory once it is loaded and uses that channel for all calls (creates the channel and proxy only once). we have built a small load tester that imitates the client by calling the server by 50 different threads at once (using 50 different channels). when we run this tester, after the 10th client tries to connect, all other client fail connecting. We have set throttling to 100. My questions are: 1. is it correct for each client to create a channel and use it through the client life time? or, do i need to use a using statement for each call to the server (create and distroy a new channel for each call). 2. does the service have a limit of channel connections to it? other then throttling? thanks alot, Guy.

Read the article
Browser timing out attempting to load images

- by notJim

I've got a page on a webapp that has about 13 images that are generated by my application, which is written in the Kohana PHP framework. The images are actually graphs. They are cached so they are only generated once, but the first time the user visits the page, and the images all have to be generated, about half of the images don't load in the browser. Once the page has been requested once and images are cached, they all load successfully. Doing some ad-hoc testing, if I load an individual image in the browser, it takes from 450-700 ms to load with an empty cache (I checked this using Google Chrome's resource tracking feature). For reference, it takes around 90-150 ms to load a cached image. Even if the image cache is empty, I have the data and some of the application's startup tasks cached, so that after the first request, none of that data needs to be fetched. My questions are: Why are the images failing to load? It seems like the browser just decides not to download the image after a certain point, rather than waiting for them all to finish loading. What can I do to get them to load the first time, with an empty cache? Obviously one option is to decrease the load times, and I could figure out how to do that by profiling the app, but are there other options? As I mentioned, the app is in the Kohana PHP framework, and it's running on Apache. As an aside, I've solved this problem for now by fetching the page as soon as the data is available (it comes from a batch process), so that the images are always cached by the time the user sees them. That feels like a kludgey solution to me, though, and I'm curious about what's actually going on.

Read the article
How do I efficiently parse a CSV file in Perl?

- by Mike

I'm working on a project that involves parsing a large csv formatted file in Perl and am looking to make things more efficient. My approach has been to split() the file by lines first, and then split() each line again by commas to get the fields. But this suboptimal since at least two passes on the data are required. (once to split by lines, then once again for each line). This is a very large file, so cutting processing in half would be a significant improvement to the entire application. My question is, what is the most time efficient means of parsing a large CSV file using only built in tools? note: Each line has a varying number of tokens, so we can't just ignore lines and split by commas only. Also we can assume fields will contain only alphanumeric ascii data (no special characters or other tricks). Also, i don't want to get into parallel processing, although it might work effectively. edit It can only involve built-in tools that ship with Perl 5.8. For bureaucratic reasons, I cannot use any third party modules (even if hosted on cpan) another edit Let's assume that our solution is only allowed to deal with the file data once it is entirely loaded into memory. yet another edit I just grasped how stupid this question is. Sorry for wasting your time. Voting to close.

Read the article
Error in Android's clearCheck() for RadioGroup?

- by Manuel R. Ciosici

I'm having an issue with RadioGroup's clearChecked(). I'm displaying a multiple choice question to the user and after the user selects an answer I check the answer, give him some feedback and then move to the next question. In the process of moving to the next question I clearCheck on the RadioGroup. Can anyone explain to me why the onCheckedChanged method is called 3 times? Once when the change actually occurs (with the user changes), once when I clearCheck(with -1 as the selected id) and once in between (with the user changes again)? As far as I could tell the second trigger is provoked by clearCheck. Code below: private void checkAnswer(RadioGroup group, int checkedId){ // this makes sure it doesn't blow up when the check is cleared // also we don't check the answer when there is no answer if (checkedId == -1) return; if (group.getCheckedRadioButtonId() == -1) return; // check if correct answer if (checkedId == validAnswerId){ score++; this.giveFeedBack(feedBackType.GOOD); } else { this.giveFeedBack(feedBackType.BAD); } // allow for user to see feedback and move to next question h.postDelayed(this, 800); } private void changeToQuestion(int questionNumber){ if (questionNumber >= this.questionSet.size()){ // means we are past the question set // we're going to the score activity this.goToScoreActivity(); return; } //clearing the check gr.clearCheck(); // give change the feedback back to question imgFeedback.setImageResource(R.drawable.question_mark); //OTHER CODE HERE } and the run method looks like this public void run() { questionNumber++; changeToQuestion(questionNumber); }

Read the article
Multiple instances of the same Async task (Windows Phone)

- by Bart Teunissen

After googeling for ages, and reading some stuff about async task in books. I made a my first program with an async task in it. Only to find out, that i can only start one task. I want to run the task more then once. This is where i found out that that doesn't seem to work. to be a little bit clearer, here are some parts of my code: InitFunction(var); This is the Task itself public async Task InitFunction(string var) { _VarHandle = await _AdsClient.GetSymhandleByNameAsync(var); _Data = await _AdsClient.ReadAsync<T>(_VarHandle); _AdsClient.AddNotificationAsync<T>(_VarHandle, AdsTransmissionMode.OnChange, 1000, this); } This works like a charm when i execute the task only once.. But is there a possibility to run it multiple times. Something like this? InitFunction(var1); InitFunction(var2); InitFunction(var3); Because if i do this now (multiple tasks at once), the task it wants to start is still running, and it throws an exeption. if someone could help me with this, that would be awesome! ~ Bart

Read the article
Keeping video viewing statistics breakdown by video time in a database

- by Septagram

I need to keep a number of statistics about the videos being watched, and one of them is what parts of the video are being watched most. The design I came up with is to split the video into 256 intervals and keep the floating-point number of views for each of them. I receive the data as a number of intervals the user watched continuously. The problem is how to store them. There are two solutions I see. Row per every video segment Let's have a database table like this: CREATE TABLE `video_heatmap` ( `id` int(11) NOT NULL AUTO_INCREMENT, `video_id` int(11) NOT NULL, `position` tinyint(3) unsigned NOT NULL, `views` float NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `idx_lookup` (`video_id`,`position`) ) ENGINE=MyISAM Then, whenever we have to process a number of views, make sure there are the respective database rows and add appropriate values to the views column. I found out it's a lot faster if the existence of rows is taken care of first (SELECT COUNT(*) of rows for a given video and INSERT IGNORE if they are lacking), and then a number of update queries is used like this: UPDATE video_heatmap SET views = views + ? WHERE video_id = ? AND position >= ? AND position < ? This seems, however, a little bloated. The other solution I came up with is Row per video, update in transactions A table will look (sort of) like this: CREATE TABLE video ( id INT NOT NULL AUTO_INCREMENT, heatmap BINARY (4 * 256) NOT NULL, ... ) ENGINE=InnoDB Then, upon every time a view needs to be stored, it will be done in a transaction with consistent snapshot, in a sequence like this: If the video doesn't exist in the database, it is created. A row is retrieved, heatmap, an array of floats stored in the binary form, is converted into a form more friendly for processing (in PHP). Values in the array are increased appropriately and the array is converted back. Row is changed via UPDATE query. So far the advantages can be summed up like this: First approach Stores data as floats, not as some magical binary array. Doesn't require transaction support, so doesn't require InnoDB, and we're using MyISAM for everything at the moment, so there won't be any need to mix storage engines. (only applies in my specific situation) Doesn't require a transaction WITH CONSISTENT SNAPSHOT. I don't know what are the performance penalties of those. I already implemented it and it works. (only applies in my specific situation) Second approach Is using a lot less storage space (the first approach is storing video ID 256 times and stores position for every segment of the video, not to mention primary key). Should scale better, because of InnoDB's per-row locking as opposed to MyISAM's table locking. Might generally work faster because there are a lot less requests being made. Easier to implement in code (although the other one is already implemented). So, what should I do? If it wasn't for the rest of our system using MyISAM consistently, I'd go with the second approach, but currently I'm leaning to the first one. But maybe there are some reasons to favour one approach or another?

Read the article
HTML5: Rendering absolute images using canvas

- by Mark

I am experimenting with canvas as part of my HTML5 introduction. This constitutes as assignment work, but I am not asking for any help on the actual coursework at all. I am trying to write a rendering engine, but having no luck because once the image is drawn on canvas it looks very distorted, and not at the right dimensions of the image itself. I have made a animation engine that loads images into an array, and then iterates through them at a certain speed. This is not the problem, and I assume is not causing the issue as this was happening when I drawn an image to the canvas. I think this is natural behaviour for images to be scaled/skewed when the window is resized, so I conquered that by simply redrawing the whole thing once the window is resized. The images I am using are isometric, and drawn at a pixel level. Would this cause the distortion? It seems setting the dimensions on the drawImage() function are not working are all. I am using JavaScript for the manipulation and rendering of the canvas. I would normally try and work it out myself, but I do not have any time to ponder why because I have no idea why it is even scaling/skewing the image once it is drawn on the canvas. I cannot share the code for obvious reasons. I should also mention, the canvas's dimension is the total width of the viewport, as I am developing a game. My question is: Has anyone encountered this and how would I correct it? Thanks for your help.

Read the article
Detecting TCP dropout over an unreliable network

- by yx

I am doing some experimentation over an unreliable radio network (home brewed) using very rudimentary java socket programming to transfer messages back and forth between the end nodes. The setup is as follows: Node A --- Relay Node --- Node B One problem I am constantly running into is that somehow the connection drops out and neither Node A or B knows that the link is dead, and yet continues to transmit data. The TCP connection does not time out either. I have added in a heartbeat message that causes a timeout after a while, but I still would like to know what is the underlying cause of why TCP does not time out. Here are the options I am enabling when setting up a socket: channel.socket().setKeepAlive(false); channel.socket().setTrafficClass(0x08); // for max throughput This behavior is strange since it is totally different than when I have a wired network. On a wired network, I can simulate a disconnected connection by pulling out the ethernet cord, however, once I plug the cord back in, the connection becomes restablished and messages begin to be passed through once more. On the radio network, the connection is never reestablished and once it silently dies, the messages never resume. Is there some other unknown java implentation or setting for a socket that I can use, also, why am I seeing this behavior in the first place? And yes, before anyone says anything, I know TCP is not the preffered choice over an unreliable network, but in this case I wanted to ensure no packet loss.

Read the article
Coding style in .NET: whether to refactor into new method or not?

- by Dione

Hi As you aware, in .NET code-behind style, we already use a lot of function to accommodate those _Click function, _SelectedIndexChanged function etc etc. In our team there are some developer that make a function in the middle of .NET function, for example: public void Button_Click(object sender, EventArgs e) { some logic here.. some logic there.. DoSomething(); DoSomethingThere(); another logic here.. DoOtherSomething(); } private void DoSomething() { } private void DoSomethingThere() { } private void DoOtherSomething() { } public void DropDown_SelectedIndexChanged() { } public void OtherButton_Click() { } and the function listed above is only used once in that function and not used anywhere else in the page, or called from other part of the solution. They said it make the code more tidier by grouping them and extract them into additional sub-function. I can understand if the sub-function is use over and over again in the code, but if it is only use once, then I think it is not really a good idea to extract them into sub-function, as the code getting bigger and bigger, when you look into the page and trying to understand the logic or to debug by skimming through line by line, it will make you confused by jumping from main function to the sub-function then to main function and to sub-function again. I know this kind of grouping by method is better when you writing old ASP or Cold fusion style, but I am not sure if this kind of style is better for .NET or not. Question is: which is better when you developing .NET, is grouping similar logic into a sub-method better (although they only use once), or just put them together inside main function and add //explanation here on the start of the logic is better? Hope my question is clear enough. Thanks.

Read the article
Doing a global count of an object type (like Users), best practice?

- by user246114

Hi, I know keeping global counters is frowned upon in app engine. I am interested in getting some stats though, like once every 24 hours. For example, I'd like to count the number of User objects in the system once every 24 hours. So how do we do this? Do we simply keep a set of admin tool functions which do something like: SELECT FROM com.me.project.server.User; and just see what the size of the returned List is? This is kind of a bummer because the datastore would have to deserialize every single User instance to create the returned list, right? I could optimize this possibly by asking for only the keys to be returned, so the whole User object doesn't have to be deserialized. Then again, a global counter for # of users probably would create too much contention, because there probably won't be hundreds of signups a minute for the service I'm creating. How should we go about doing this? Getting my total number of users once a day is probably a pretty typical operation? Thank you

Read the article
Quick guide to Oracle IRM 11g: Classification design

- by Simon Thorpe

Quick guide to Oracle IRM 11g indexThis is the final article in the quick guide to Oracle IRM. If you've followed everything prior you will now have a fully functional and tested Information Rights Management service. It doesn't matter if you've been following the 10g or 11g guide as this next article is common to both. ContentsWhy this is the most important part... Understanding the classification and standard rights model Identifying business use cases Creating an effective IRM classification modelOne single classification across the entire businessA context for each and every possible granular use caseWhat makes a good context? Deciding on the use of roles in the context Reviewing the features and security for context roles Summary Why this is the most important part...Now the real work begins, installing and getting an IRM system running is as simple as following instructions. However to actually have an IRM technology easily protecting your most sensitive information without interfering with your users existing daily work flows and be able to scale IRM across the entire business, requires thought into how confidential documents are created, used and distributed. This article is going to give you the information you need to ask the business the right questions so that you can deploy your IRM service successfully. The IRM team here at Oracle have over 10 years of experience in helping customers and it is important you understand the following to be successful in securing access to your most confidential information. Whatever you are trying to secure, be it mergers and acquisitions information, engineering intellectual property, health care documentation or financial reports. No matter what type of user is going to access the information, be they employees, contractors or customers, there are common goals you are always trying to achieve.Securing the content at the earliest point possible and do it automatically. Removing the dependency on the user to decide to secure the content reduces the risk of mistakes significantly and therefore results a more secure deployment. K.I.S.S. (Keep It Simple Stupid) Reduce complexity in the rights/classification model. Oracle IRM lets you make changes to access to documents even after they are secured which allows you to start with a simple model and then introduce complexity once you've understood how the technology is going to be used in the business. After an initial learning period you can review your implementation and start to make informed decisions based on user feedback and administration experience. Clearly communicate to the user, when appropriate, any changes to their existing work practice. You must make every effort to make the transition to sealed content as simple as possible. For external users you must help them understand why you are securing the documents and inform them the value of the technology to both your business and them. Before getting into the detail, I must pay homage to Martin White, Vice President of client services in SealedMedia, the company Oracle acquired and who created Oracle IRM. In the SealedMedia years Martin was involved with every single customer and was key to the design of certain aspects of the IRM technology, specifically the context model we will be discussing here. Listening carefully to customers and understanding the flexibility of the IRM technology, Martin taught me all the skills of helping customers build scalable, effective and simple to use IRM deployments. No matter how well the engineering department designed the software, badly designed and poorly executed projects can result in difficult to use and manage, and ultimately insecure solutions. The advice and information that follows was born with Martin and he's still delivering IRM consulting with customers and can be found at www.thinkers.co.uk. It is from Martin and others that Oracle not only has the most advanced, scalable and usable document security solution on the market, but Oracle and their partners have the most experience in delivering successful document security solutions. Understanding the classification and standard rights model The goal of any successful IRM deployment is to balance the increase in security the technology brings without over complicating the way people use secured content and avoid a significant increase in administration and maintenance. With Oracle it is possible to automate the protection of content, deploy the desktop software transparently and use authentication methods such that users can open newly secured content initially unaware the document is any different to an insecure one. That is until of course they attempt to do something for which they don't have any rights, such as copy and paste to an insecure application or try and print. Central to achieving this objective is creating a classification model that is simple to understand and use but also provides the right level of complexity to meet the business needs. In Oracle IRM the term used for each classification is a "context". A context defines the relationship between.A group of related documents The people that use the documents The roles that these people perform The rights that these people need to perform their role The context is the key to the success of Oracle IRM. It provides the separation of the role and rights of a user from the content itself. Documents are sealed to contexts but none of the rights, user or group information is stored within the content itself. Sealing only places information about the location of the IRM server that sealed it, the context applied to the document and a few other pieces of metadata that pertain only to the document. This important separation of rights from content means that millions of documents can be secured against a single classification and a user needs only one right assigned to be able to access all documents. If you have followed all the previous articles in this guide, you will be ready to start defining contexts to which your sensitive information will be protected. But before you even start with IRM, you need to understand how your own business uses and creates sensitive documents and emails. Identifying business use cases Oracle is able to support multiple classification systems, but usually there is one single initial need for the technology which drives a deployment. This need might be to protect sensitive mergers and acquisitions information, engineering intellectual property, financial documents. For this and every subsequent use case you must understand how users create and work with documents, to who they are distributed and how the recipients should interact with them. A successful IRM deployment should start with one well identified use case (we go through some examples towards the end of this article) and then after letting this use case play out in the business, you learn how your users work with content, how well your communication to the business worked and if the classification system you deployed delivered the right balance. It is at this point you can start rolling the technology out further. Creating an effective IRM classification model Once you have selected the initial use case you will address with IRM, you need to design a classification model that defines the access to secured documents within the use case. In Oracle IRM there is an inbuilt classification system called the "context" model. In Oracle IRM 11g it is possible to extend the server to support any rights classification model, but the majority of users who are not using an application integration (such as Oracle IRM within Oracle Beehive) are likely to be starting out with the built in context model. Before looking at creating a classification system with IRM, it is worth reviewing some recognized standards and methods for creating and implementing security policy. A very useful set of documents are the ISO 17799 guidelines and the SANS security policy templates. First task is to create a context against which documents are to be secured. A context consists of a group of related documents (all top secret engineering research), a list of roles (contributors and readers) which define how users can access documents and a list of users (research engineers) who have been given a role allowing them to interact with sealed content. Before even creating the first context it is wise to decide on a philosophy which will dictate the level of granularity, the question is, where do you start? At a department level? By project? By technology? First consider the two ends of the spectrum... One single classification across the entire business Imagine that instead of having separate contexts, one for engineering intellectual property, one for your financial data, one for human resources personally identifiable information, you create one context for all documents across the entire business. Whilst you may have immediate objections, there are some significant benefits in thinking about considering this. Document security classification decisions are simple. You only have one context to chose from! User provisioning is simple, just make sure everyone has a role in the only context in the business. Administration is very low, if you assign rights to groups from the business user repository you probably never have to touch IRM administration again. There are however some obvious downsides to this model.All users in have access to all IRM secured content. So potentially a sales person could access sensitive mergers and acquisition documents, if they can get their hands on a copy that is. You cannot delegate control of different documents to different parts of the business, this may not satisfy your regulatory requirements for the separation and delegation of duties. Changing a users role affects every single document ever secured. Even though it is very unlikely a business would ever use one single context to secure all their sensitive information, thinking about this scenario raises one very important point. Just having one single context and securing all confidential documents to it, whilst incurring some of the problems detailed above, has one huge value. Once secured, IRM protected content can ONLY be accessed by authorized users. Just think of all the sensitive documents in your business today, imagine if you could ensure that only everyone you trust could open them. Even if an employee lost a laptop or someone accidentally sent an email to the wrong recipient, only the right people could open that file. A context for each and every possible granular use case Now let's think about the total opposite of a single context design. What if you created a context for each and every single defined business need and created multiple contexts within this for each level of granularity? Let's take a use case where we need to protect engineering intellectual property. Imagine we have 6 different engineering groups, and in each we have a research department, a design department and manufacturing. The company information security policy defines 3 levels of information sensitivity... restricted, confidential and top secret. Then let's say that each group and department needs to define access to information from both internal and external users. Finally add into the mix that they want to review the rights model for each context every financial quarter. This would result in a huge amount of contexts. For example, lets just look at the resulting contexts for one engineering group. Q1FY2010 Restricted Internal - Engineering Group 1 - Research Q1FY2010 Restricted Internal - Engineering Group 1 - Design Q1FY2010 Restricted Internal - Engineering Group 1 - Manufacturing Q1FY2010 Restricted External- Engineering Group 1 - Research Q1FY2010 Restricted External - Engineering Group 1 - Design Q1FY2010 Restricted External - Engineering Group 1 - Manufacturing Q1FY2010 Confidential Internal - Engineering Group 1 - Research Q1FY2010 Confidential Internal - Engineering Group 1 - Design Q1FY2010 Confidential Internal - Engineering Group 1 - Manufacturing Q1FY2010 Confidential External - Engineering Group 1 - Research Q1FY2010 Confidential External - Engineering Group 1 - Design Q1FY2010 Confidential External - Engineering Group 1 - Manufacturing Q1FY2010 Top Secret Internal - Engineering Group 1 - Research Q1FY2010 Top Secret Internal - Engineering Group 1 - Design Q1FY2010 Top Secret Internal - Engineering Group 1 - Manufacturing Q1FY2010 Top Secret External - Engineering Group 1 - Research Q1FY2010 Top Secret External - Engineering Group 1 - Design Q1FY2010 Top Secret External - Engineering Group 1 - Manufacturing Now multiply the above by 6 for each engineering group, 18 contexts. You are then creating/reviewing another 18 every 3 months. After a year you've got 72 contexts. What would be the advantages of such a complex classification model? You can satisfy very granular rights requirements, for example only an authorized engineering group 1 researcher can create a top secret report for access internally, and his role will be reviewed on a very frequent basis. Your business may have very complex rights requirements and mapping this directly to IRM may be an obvious exercise. The disadvantages of such a classification model are significant...Huge administrative overhead. Someone in the business must manage, review and administrate each of these contexts. If the engineering group had a single administrator, they would have 72 classifications to reside over each year. From an end users perspective life will be very confusing. Imagine if a user has rights in just 6 of these contexts. They may be able to print content from one but not another, be able to edit content in 2 contexts but not the other 4. Such confusion at the end user level causes frustration and resistance to the use of the technology. Increased synchronization complexity. Imagine a user who after 3 years in the company ends up with over 300 rights in many different contexts across the business. This would result in long synchronization times as the client software updates all your offline rights. Hard to understand who can do what with what. Imagine being the VP of engineering and as part of an internal security audit you are asked the question, "What rights to researchers have to our top secret information?". In this complex model the answer is not simple, it would depend on many roles in many contexts. Of course this example is extreme, but it highlights that trying to build many barriers in your business can result in a nightmare of administration and confusion amongst users. In the real world what we need is a balance of the two. We need to seek an optimum number of contexts. Too many contexts are unmanageable and too few contexts does not give fine enough granularity. What makes a good context? Good context design derives mainly from how well you understand your business requirements to secure access to confidential information. Some customers I have worked with can tell me exactly the documents they wish to secure and know exactly who should be opening them. However there are some customers who know only of the government regulation that requires them to control access to certain types of information, they don't actually know where the documents are, how they are created or understand exactly who should have access. Therefore you need to know how to ask the business the right questions that lead to information which help you define a context. First ask these questions about a set of documentsWhat is the topic? Who are legitimate contributors on this topic? Who are the authorized readership? If the answer to any one of these is significantly different, then it probably merits a separate context. Remember that sealed documents are inherently secure and as such they cannot leak to your competitors, therefore it is better sealed to a broad context than not sealed at all. Simplicity is key here. Always revert to the first extreme example of a single classification, then work towards essential complexity. If there is any doubt, always prefer fewer contexts. Remember, Oracle IRM allows you to change your mind later on. You can implement a design now and continue to change and refine as you learn how the technology is used. It is easy to go from a simple model to a more complex one, it is much harder to take a complex model that is already embedded in the work practice of users and try to simplify it. It is also wise to take a single use case and address this first with the business. Don't try and tackle many different problems from the outset. Do one, learn from the process, refine it and then take what you have learned into the next use case, refine and continue. Once you have a good grasp of the technology and understand how your business will use it, you can then start rolling out the technology wider across the business. Deciding on the use of roles in the context Once you have decided on that first initial use case and a context to create let's look at the details you need to decide upon. For each context, identify; Administrative rolesBusiness owner, the person who makes decisions about who may or may not see content in this context. This is often the person who wanted to use IRM and drove the business purchase. They are the usually the person with the most at risk when sensitive information is lost. Point of contact, the person who will handle requests for access to content. Sometimes the same as the business owner, sometimes a trusted secretary or administrator. Context administrator, the person who will enact the decisions of the Business Owner. Sometimes the point of contact, sometimes a trusted IT person. Document related rolesContributors, the people who create and edit documents in this context. Reviewers, the people who are involved in reviewing documents but are not trusted to secure information to this classification. This role is not always necessary. (See later discussion on Published-work and Work-in-Progress) Readers, the people who read documents from this context. Some people may have several of the roles above, which is fine. What you are trying to do is understand and define how the business interacts with your sensitive information. These roles obviously map directly to roles available in Oracle IRM. Reviewing the features and security for context roles At this point we have decided on a classification of information, understand what roles people in the business will play when administrating this classification and how they will interact with content. The final piece of the puzzle in getting the information for our first context is to look at the permissions people will have to sealed documents. First think why are you protecting the documents in the first place? It is to prevent the loss of leaking of information to the wrong people. To control the information, making sure that people only access the latest versions of documents. You are not using Oracle IRM to prevent unauthorized people from doing legitimate work. This is an important point, with IRM you can erect many barriers to prevent access to content yet too many restrictions and authorized users will often find ways to circumvent using the technology and end up distributing unprotected originals. Because IRM is a security technology, it is easy to get carried away restricting different groups. However I would highly recommend starting with a simple solution with few restrictions. Ensure that everyone who reasonably needs to read documents can do so from the outset. Remember that with Oracle IRM you can change rights to content whenever you wish and tighten security. Always return to the fact that the greatest value IRM brings is that ONLY authorized users can access secured content, remember that simple "one context for the entire business" model. At the start of the deployment you really need to aim for user acceptance and therefore a simple model is more likely to succeed. As time passes and users understand how IRM works you can start to introduce more restrictions and complexity. Another key aspect to focus on is handling exceptions. If you decide on a context model where engineering can only access engineering information, and sales can only access sales data. Act quickly when a sales manager needs legitimate access to a set of engineering documents. Having a quick and effective process for permitting other people with legitimate needs to obtain appropriate access will be rewarded with acceptance from the user community. These use cases can often be satisfied by integrating IRM with a good Identity & Access Management technology which simplifies the process of assigning users the correct business roles. The big print issue... Printing is often an issue of contention, users love to print but the business wants to ensure sensitive information remains in the controlled digital world. There are many cases of physical document loss causing a business pain, it is often overlooked that IRM can help with this issue by limiting the ability to generate physical copies of digital content. However it can be hard to maintain a balance between security and usability when it comes to printing. Consider the following points when deciding about whether to give print rights. Oracle IRM sealed documents can contain watermarks that expose information about the user, time and location of access and the classification of the document. This information would reside in the printed copy making it easier to trace who printed it. Printed documents are slower to distribute in comparison to their digital counterparts, so time sensitive information in printed format may present a lower risk. Print activity is audited, therefore you can monitor and react to users abusing print rights. Summary In summary it is important to think carefully about the way you create your context model. As you ask the business these questions you may get a variety of different requirements. There may be special projects that require a context just for sensitive information created during the lifetime of the project. There may be a department that requires all information in the group is secured and you might have a few senior executives who wish to use IRM to exchange a small number of highly sensitive documents with a very small number of people. Oracle IRM, with its very flexible context classification system, can support all of these use cases. The trick is to introducing the complexity to deliver them at the right level. In another article i'm working on I will go through some examples of how Oracle IRM might map to existing business use cases. But for now, this article covers all the important questions you need to get your IRM service deployed and successfully protecting your most sensitive information.

Read the article
GuestPost: Unit Testing Entity Framework (v1) Dependent Code using TypeMock Isolator

- by Eric Nelson

Time for another guest post (check out others in the series), this time bringing together the world of mocking with the world of Entity Framework. A big thanks to Moses for agreeing to do this. Unit Testing Entity Framework Dependent Code using TypeMock Isolator by Muhammad Mosa Introduction Unit testing data access code in my opinion is a challenging thing. Let us consider unit tests and integration tests. In integration tests you are allowed to have environmental dependencies such as a physical database connection to insert, update, delete or retrieve your data. However when performing unit tests it is often much more efficient and productive to remove environmental dependencies. Instead you will need to fake these dependencies. Faking a database (also known as mocking) can be relatively straight forward but the version of Entity Framework released with .Net 3.5 SP1 has a number of implementation specifics which actually makes faking the existence of a database quite difficult. Faking Entity Framework As mentioned earlier, to effectively unit test you will need to fake/simulate Entity Framework calls to the database. There are many free open source mocking frameworks that can help you achieve this but it will require additional effort to overcome & workaround a number of limitations in those frameworks. Examples of these limitations include: Not able to fake calls to non virtual methods Not able to fake sealed classes Not able to fake LINQ to Entities queries (replace database calls with in-memory collection calls) There is a mocking framework which is flexible enough to handle limitations such as those above. The commercially available TypeMock Isolator can do the job for you with less code and ultimately more readable unit tests. I’m going to demonstrate tackling one of those limitations using MoQ as my mocking framework. Then I will tackle the same issue using TypeMock Isolator. Mocking Entity Framework with MoQ One basic need when faking Entity Framework is to fake the ObjectContext. This cannot be done by passing any connection string. You have to pass a correct Entity Framework connection string that specifies CSDL, SSDL and MSL locations along with a provider connection string. Assuming we are going to do that, we’ll explore another limitation. The limitation we are going to face now is related to not being able to fake calls to non-virtual/overridable members with MoQ. I have the following repository method that adds an EntityObject (instance of a Blog entity) to Blogs entity set in an ObjectContext. public override void Add(Blog blog) { if(BlogContext.Blogs.Any(b=>b.Name == blog.Name)) { throw new InvalidOperationException("Blog with same name already exists!"); } BlogContext.AddToBlogs(blog); } The method does a very simple check that the name of the new Blog entity instance doesn’t exist. This is done through the simple LINQ query above. If the blog doesn’t already exist it simply adds it to the current context to be saved when SaveChanges of the ObjectContext instance (e.g. BlogContext) is called. However, if a blog with the same name exits, and exception (InvalideOperationException) will be thrown. Let us now create a unit test for the Add method using MoQ. [TestMethod] [ExpectedException(typeof(InvalidOperationException))] public void Add_Should_Throw_InvalidOperationException_When_Blog_With_Same_Name_Already_Exits() { //(1) We shouldn't depend on configuration when doing unit tests! But, //its a workaround to fake the ObjectContext string connectionString = ConfigurationManager .ConnectionStrings["MyBlogConnString"] .ConnectionString; //(2) Arrange: Fake ObjectContext var fakeContext = new Mock<MyBlogContext>(connectionString); //(3) Next Line will pass, as ObjectContext now can be faked with proper connection string var repo = new BlogRepository(fakeContext.Object); //(4) Create fake ObjectQuery<Blog>. Will be used to substitute MyBlogContext.Blogs property var fakeObjectQuery = new Mock<ObjectQuery<Blog>>("[Blogs]", fakeContext.Object); //(5) Arrange: Set Expectations //Next line will throw an exception by MoQ: //System.ArgumentException: Invalid setup on a non-overridable member fakeContext.SetupGet(c=>c.Blogs).Returns(fakeObjectQuery.Object); fakeObjectQuery.Setup(q => q.Any(b => b.Name == "NewBlog")).Returns(true); //Act repo.Add(new Blog { Name = "NewBlog" }); } This test method is checking to see if the correct exception ([ExpectedException(typeof(InvalidOperationException))]) is thrown when a developer attempts to Add a blog with a name that’s already exists. On (1) a connection string is initialized from configuration file. To retrieve the full connection string. On (2) a fake ObjectContext is being created. The ObjectContext here is MyBlogContext and its being created using this var fakeContext = new Mock<MyBlogContext>(connectionString); This way a fake context is being created using MoQ. On (3) a BlogRepository instance is created. BlogRepository has dependency on generate Entity Framework ObjectContext, MyObjectContext. And so the fake context is passed to the constructor. var repo = new BlogRepository(fakeContext.Object); On (4) a fake instance of ObjectQuery<Blog> is being created to use as a substitute to MyObjectContext.Blogs property as we will see in (5). On (5) setup an expectation for calling Blogs property of MyBlogContext and substitute the return result with the fake ObjectQuery<Blog> instance created on (4). When you run this test it will fail with MoQ throwing an exception because of this line: fakeContext.SetupGet(c=>c.Blogs).Returns(fakeObjectQuery.Object); This happens because the generate property MyBlogContext.Blogs is not virtual/overridable. And assuming it is virtual or you managed to make it virtual it will fail at the following line throwing the same exception: fakeObjectQuery.Setup(q => q.Any(b => b.Name == "NewBlog")).Returns(true); This time the test will fail because the Any extension method is not virtual/overridable. You won’t be able to replace ObjectQuery<Blog> with fake in memory collection to test your LINQ to Entities queries. Now lets see how replacing MoQ with TypeMock Isolator can help. Mocking Entity Framework with TypeMock Isolator The following is the same test method we had above for MoQ but this time implemented using TypeMock Isolator: [TestMethod] [ExpectedException(typeof(InvalidOperationException))] public void Add_New_Blog_That_Already_Exists_Should_Throw_InvalidOperationException() { //(1) Create fake in memory collection of blogs var fakeInMemoryBlogs = new List<Blog> {new Blog {Name = "FakeBlog"}}; //(2) create fake context var fakeContext = Isolate.Fake.Instance<MyBlogContext>(); //(3) Setup expected call to MyBlogContext.Blogs property through the fake context Isolate.WhenCalled(() => fakeContext.Blogs) .WillReturnCollectionValuesOf(fakeInMemoryBlogs.AsQueryable()); //(4) Create new blog with a name that already exits in the fake in memory collection in (1) var blog = new Blog {Name = "FakeBlog"}; //(5) Instantiate instance of BlogRepository (Class under test) var repo = new BlogRepository(fakeContext); //(6) Acting by adding the newly created blog () repo.Add(blog); } When running the above test method it will pass as the Add method of BlogRepository is going to throw an InvalidOperationException which is the expected behaviour. Nothing prevents us from faking out the database interaction! Even faking ObjectContext at (2) didn’t require a connection string. On (3) Isolator sets up a faking result for MyBlogContext.Blogs when its being called through the fake instance fakeContext created on (2). The faking result is just an in-memory collection declared an initialized on (1). Finally at (6) action we call the Add method of BlogRepository passing a new Blog instance that has a name that’s already exists in the fake in-memory collection which we set up at (1). As expected the test will pass because it will throw the expected exception defined on top of the test method - InvalidOperationException. TypeMock Isolator succeeded in faking Entity Framework with ease. Conclusion We explored how to write a simple unit test using TypeMock Isolator for code which is using Entity Framework. We also explored a few of the limitations of other mocking frameworks which TypeMock is successfully able to handle. There are workarounds that you can use to overcome limitations when using MoQ or Rhino Mock, however the workarounds will require you to write more code and your tests will likely be more complex. For a comparison between different mocking frameworks take a look at this document produced by TypeMock. You might also want to check out this open source project to compare mocking frameworks. I hope you enjoyed this post Muhammad Mosa http://mosesofegypt.net/ http://twitter.com/mosessaur Screencast of unit testing Entity Framework Related Links GuestPost: Introduction to Mocking GuesPost: Typemock Isolator – Much more than an Isolation framework

Read the article
TOTD #166: Using NoSQL database in your Java EE 6 Applications on GlassFish - MongoDB for now!

- by arungupta

The Java EE 6 platform includes Java Persistence API to work with RDBMS. The JPA specification defines a comprehensive API that includes, but not restricted to, how a database table can be mapped to a POJO and vice versa, provides mechanisms how a PersistenceContext can be injected in a @Stateless bean and then be used for performing different operations on the database table and write typesafe queries. There are several well known advantages of RDBMS but the NoSQL movement has gained traction over past couple of years. The NoSQL databases are not intended to be a replacement for the mainstream RDBMS. As Philosophy of NoSQL explains, NoSQL database was designed for casual use where all the features typically provided by an RDBMS are not required. The name "NoSQL" is more of a category of databases that is more known for what it is not rather than what it is. The basic principles of NoSQL database are: No need to have a pre-defined schema and that makes them a schema-less database. Addition of new properties to existing objects is easy and does not require ALTER TABLE. The unstructured data gives flexibility to change the format of data any time without downtime or reduced service levels. Also there are no joins happening on the server because there is no structure and thus no relation between them. Scalability and performance is more important than the entire set of functionality typically provided by an RDBMS. This set of databases provide eventual consistency and/or transactions restricted to single items but more focus on CRUD. Not be restricted to SQL to access the information stored in the backing database. Designed to scale-out (horizontal) instead of scale-up (vertical). This is important knowing that databases, and everything else as well, is moving into the cloud. RBDMS can scale-out using sharding but requires complex management and not for the faint of heart. Unlike RBDMS which require a separate caching tier, most of the NoSQL databases comes with integrated caching. Designed for less management and simpler data models lead to lower administration as well. There are primarily three types of NoSQL databases: Key-Value stores (e.g. Cassandra and Riak) Document databases (MongoDB or CouchDB) Graph databases (Neo4J) You may think NoSQL is panacea but as I mentioned above they are not meant to replace the mainstream databases and here is why: RDBMS have been around for many years, very stable, and functionally rich. This is something CIOs and CTOs can bet their money on without much worry. There is a reason 98% of Fortune 100 companies run Oracle :-) NoSQL is cutting edge, brings excitement to developers, but enterprises are cautious about them. Commercial databases like Oracle are well supported by the backing enterprises in terms of providing support resources on a global scale. There is a full ecosystem built around these commercial databases providing training, performance tuning, architecture guidance, and everything else. NoSQL is fairly new and typically backed by a single company not able to meet the scale of these big enterprises. NoSQL databases are good for CRUDing operations but business intelligence is extremely important for enterprises to stay competitive. RDBMS provide extensive tooling to generate this data but that was not the original intention of NoSQL databases and is lacking in that area. Generating any meaningful information other than CRUDing require extensive programming. Not suited for complex transactions such as banking systems or other highly transactional applications requiring 2-phase commit. SQL cannot be used with NoSQL databases and writing simple queries can be involving. Enough talking, lets take a look at some code. This blog has published multiple blogs on how to access a RDBMS using JPA in a Java EE 6 application. This Tip Of The Day (TOTD) will show you can use MongoDB (a document-oriented database) with a typical 3-tier Java EE 6 application. Lets get started! The complete source code of this project can be downloaded here. Download MongoDB for your platform from here (1.8.2 as of this writing) and start the server as: arun@ArunUbuntu:~/tools/mongodb-linux-x86_64-1.8.2/bin$./mongod./mongod --help for help and startup optionsSun Jun 26 20:41:11 [initandlisten] MongoDB starting : pid=11210port=27017 dbpath=/data/db/ 64-bit Sun Jun 26 20:41:11 [initandlisten] db version v1.8.2, pdfile version4.5Sun Jun 26 20:41:11 [initandlisten] git version:433bbaa14aaba6860da15bd4de8edf600f56501bSun Jun 26 20:41:11 [initandlisten] build sys info: Linuxbs-linux64.10gen.cc 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 2017:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41Sun Jun 26 20:41:11 [initandlisten] waiting for connections on port 27017Sun Jun 26 20:41:11 [websvr] web admin interface listening on port 28017 The default directory for the database is /data/db and needs to be created as: sudo mkdir -p /data/db/sudo chown `id -u` /data/db You can specify a different directory using "--dbpath" option. Refer to Quickstart for your specific platform. Using NetBeans, create a Java EE 6 project and make sure to enable CDI and add JavaServer Faces framework. Download MongoDB Java Driver (2.6.3 of this writing) and add it to the project library by selecting "Properties", "LIbraries", "Add Library...", creating a new library by specifying the location of the JAR file, and adding the library to the created project. Edit the generated "index.xhtml" such that it looks like: <h1>Add a new movie</h1><h:form> Name: <h:inputText value="#{movie.name}" size="20"/><br/> Year: <h:inputText value="#{movie.year}" size="6"/><br/> Language: <h:inputText value="#{movie.language}" size="20"/><br/> <h:commandButton actionListener="#{movieSessionBean.createMovie}" action="show" title="Add" value="submit"/></h:form> This page has a simple HTML form with three text boxes and a submit button. The text boxes take name, year, and language of a movie and the submit button invokes the "createMovie" method of "movieSessionBean" and then render "show.xhtml". Create "show.xhtml" ("New" -> "Other..." -> "Other" -> "XHTML File") such that it looks like: <head> <title><h1>List of movies</h1></title> </head> <body> <h:form> <h:dataTable value="#{movieSessionBean.movies}" var="m" > <h:column><f:facet name="header">Name</f:facet>#{m.name}</h:column> <h:column><f:facet name="header">Year</f:facet>#{m.year}</h:column> <h:column><f:facet name="header">Language</f:facet>#{m.language}</h:column> </h:dataTable> </h:form> This page shows the name, year, and language of all movies stored in the database so far. The list of movies is returned by "movieSessionBean.movies" property. Now create the "Movie" class such that it looks like: import com.mongodb.BasicDBObject;import com.mongodb.BasicDBObject;import com.mongodb.DBObject;import javax.enterprise.inject.Model;import javax.validation.constraints.Size;/** * @author arun */@Modelpublic class Movie { @Size(min=1, max=20) private String name; @Size(min=1, max=20) private String language; private int year; // getters and setters for "name", "year", "language" public BasicDBObject toDBObject() { BasicDBObject doc = new BasicDBObject(); doc.put("name", name); doc.put("year", year); doc.put("language", language); return doc; } public static Movie fromDBObject(DBObject doc) { Movie m = new Movie(); m.name = (String)doc.get("name"); m.year = (int)doc.get("year"); m.language = (String)doc.get("language"); return m; } @Override public String toString() { return name + ", " + year + ", " + language; }} Other than the usual boilerplate code, the key methods here are "toDBObject" and "fromDBObject". These methods provide a conversion from "Movie" -> "DBObject" and vice versa. The "DBObject" is a MongoDB class that comes as part of the mongo-2.6.3.jar file and which we added to our project earlier. The complete javadoc for 2.6.3 can be seen here. Notice, this class also uses Bean Validation constraints and will be honored by the JSF layer. Finally, create "MovieSessionBean" stateless EJB with all the business logic such that it looks like: package org.glassfish.samples;import com.mongodb.BasicDBObject;import com.mongodb.DB;import com.mongodb.DBCollection;import com.mongodb.DBCursor;import com.mongodb.DBObject;import com.mongodb.Mongo;import java.net.UnknownHostException;import java.util.ArrayList;import java.util.List;import javax.annotation.PostConstruct;import javax.ejb.Stateless;import javax.inject.Inject;import javax.inject.Named;/** * @author arun */@Stateless@Namedpublic class MovieSessionBean { @Inject Movie movie; DBCollection movieColl; @PostConstruct private void initDB() throws UnknownHostException { Mongo m = new Mongo(); DB db = m.getDB("movieDB"); movieColl = db.getCollection("movies"); if (movieColl == null) { movieColl = db.createCollection("movies", null); } } public void createMovie() { BasicDBObject doc = movie.toDBObject(); movieColl.insert(doc); } public List<Movie> getMovies() { List<Movie> movies = new ArrayList(); DBCursor cur = movieColl.find(); System.out.println("getMovies: Found " + cur.size() + " movie(s)"); for (DBObject dbo : cur.toArray()) { movies.add(Movie.fromDBObject(dbo)); } return movies; }} The database is initialized in @PostConstruct. Instead of a working with a database table, NoSQL databases work with a schema-less document. The "Movie" class is the document in our case and stored in the collection "movies". The collection allows us to perform query functions on all movies. The "getMovies" method invokes "find" method on the collection which is equivalent to the SQL query "select * from movies" and then returns a List<Movie>. Also notice that there is no "persistence.xml" in the project. Right-click and run the project to see the output as: Enter some values in the text box and click on enter to see the result as: If you reached here then you've successfully used MongoDB in your Java EE 6 application, congratulations! Some food for thought and further play ... SQL to MongoDB mapping shows mapping between traditional SQL -> Mongo query language. Tutorial shows fun things you can do with MongoDB. Try the interactive online shell The cookbook provides common ways of using MongoDB In terms of this project, here are some tasks that can be tried: Encapsulate database management in a JPA persistence provider. Is it even worth it because the capabilities are going to be very different ? MongoDB uses "BSonObject" class for JSON representation, add @XmlRootElement on a POJO and how a compatible JSON representation can be generated. This will make the fromXXX and toXXX methods redundant.

Read the article
TOTD #166: Using NoSQL database in your Java EE 6 Applications on GlassFish - MongoDB for now!

- by arungupta

The Java EE 6 platform includes Java Persistence API to work with RDBMS. The JPA specification defines a comprehensive API that includes, but not restricted to, how a database table can be mapped to a POJO and vice versa, provides mechanisms how a PersistenceContext can be injected in a @Stateless bean and then be used for performing different operations on the database table and write typesafe queries. There are several well known advantages of RDBMS but the NoSQL movement has gained traction over past couple of years. The NoSQL databases are not intended to be a replacement for the mainstream RDBMS. As Philosophy of NoSQL explains, NoSQL database was designed for casual use where all the features typically provided by an RDBMS are not required. The name "NoSQL" is more of a category of databases that is more known for what it is not rather than what it is. The basic principles of NoSQL database are: No need to have a pre-defined schema and that makes them a schema-less database. Addition of new properties to existing objects is easy and does not require ALTER TABLE. The unstructured data gives flexibility to change the format of data any time without downtime or reduced service levels. Also there are no joins happening on the server because there is no structure and thus no relation between them. Scalability and performance is more important than the entire set of functionality typically provided by an RDBMS. This set of databases provide eventual consistency and/or transactions restricted to single items but more focus on CRUD. Not be restricted to SQL to access the information stored in the backing database. Designed to scale-out (horizontal) instead of scale-up (vertical). This is important knowing that databases, and everything else as well, is moving into the cloud. RBDMS can scale-out using sharding but requires complex management and not for the faint of heart. Unlike RBDMS which require a separate caching tier, most of the NoSQL databases comes with integrated caching. Designed for less management and simpler data models lead to lower administration as well. There are primarily three types of NoSQL databases: Key-Value stores (e.g. Cassandra and Riak) Document databases (MongoDB or CouchDB) Graph databases (Neo4J) You may think NoSQL is panacea but as I mentioned above they are not meant to replace the mainstream databases and here is why: RDBMS have been around for many years, very stable, and functionally rich. This is something CIOs and CTOs can bet their money on without much worry. There is a reason 98% of Fortune 100 companies run Oracle :-) NoSQL is cutting edge, brings excitement to developers, but enterprises are cautious about them. Commercial databases like Oracle are well supported by the backing enterprises in terms of providing support resources on a global scale. There is a full ecosystem built around these commercial databases providing training, performance tuning, architecture guidance, and everything else. NoSQL is fairly new and typically backed by a single company not able to meet the scale of these big enterprises. NoSQL databases are good for CRUDing operations but business intelligence is extremely important for enterprises to stay competitive. RDBMS provide extensive tooling to generate this data but that was not the original intention of NoSQL databases and is lacking in that area. Generating any meaningful information other than CRUDing require extensive programming. Not suited for complex transactions such as banking systems or other highly transactional applications requiring 2-phase commit. SQL cannot be used with NoSQL databases and writing simple queries can be involving. Enough talking, lets take a look at some code. This blog has published multiple blogs on how to access a RDBMS using JPA in a Java EE 6 application. This Tip Of The Day (TOTD) will show you can use MongoDB (a document-oriented database) with a typical 3-tier Java EE 6 application. Lets get started! The complete source code of this project can be downloaded here. Download MongoDB for your platform from here (1.8.2 as of this writing) and start the server as: arun@ArunUbuntu:~/tools/mongodb-linux-x86_64-1.8.2/bin$./mongod./mongod --help for help and startup optionsSun Jun 26 20:41:11 [initandlisten] MongoDB starting : pid=11210port=27017 dbpath=/data/db/ 64-bit Sun Jun 26 20:41:11 [initandlisten] db version v1.8.2, pdfile version4.5Sun Jun 26 20:41:11 [initandlisten] git version:433bbaa14aaba6860da15bd4de8edf600f56501bSun Jun 26 20:41:11 [initandlisten] build sys info: Linuxbs-linux64.10gen.cc 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 2017:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41Sun Jun 26 20:41:11 [initandlisten] waiting for connections on port 27017Sun Jun 26 20:41:11 [websvr] web admin interface listening on port 28017 The default directory for the database is /data/db and needs to be created as: sudo mkdir -p /data/db/sudo chown `id -u` /data/db You can specify a different directory using "--dbpath" option. Refer to Quickstart for your specific platform. Using NetBeans, create a Java EE 6 project and make sure to enable CDI and add JavaServer Faces framework. Download MongoDB Java Driver (2.6.3 of this writing) and add it to the project library by selecting "Properties", "LIbraries", "Add Library...", creating a new library by specifying the location of the JAR file, and adding the library to the created project. Edit the generated "index.xhtml" such that it looks like: <h1>Add a new movie</h1><h:form> Name: <h:inputText value="#{movie.name}" size="20"/><br/> Year: <h:inputText value="#{movie.year}" size="6"/><br/> Language: <h:inputText value="#{movie.language}" size="20"/><br/> <h:commandButton actionListener="#{movieSessionBean.createMovie}" action="show" title="Add" value="submit"/></h:form> This page has a simple HTML form with three text boxes and a submit button. The text boxes take name, year, and language of a movie and the submit button invokes the "createMovie" method of "movieSessionBean" and then render "show.xhtml". Create "show.xhtml" ("New" -> "Other..." -> "Other" -> "XHTML File") such that it looks like: <head> <title><h1>List of movies</h1></title> </head> <body> <h:form> <h:dataTable value="#{movieSessionBean.movies}" var="m" > <h:column><f:facet name="header">Name</f:facet>#{m.name}</h:column> <h:column><f:facet name="header">Year</f:facet>#{m.year}</h:column> <h:column><f:facet name="header">Language</f:facet>#{m.language}</h:column> </h:dataTable> </h:form> This page shows the name, year, and language of all movies stored in the database so far. The list of movies is returned by "movieSessionBean.movies" property. Now create the "Movie" class such that it looks like: import com.mongodb.BasicDBObject;import com.mongodb.BasicDBObject;import com.mongodb.DBObject;import javax.enterprise.inject.Model;import javax.validation.constraints.Size;/** * @author arun */@Modelpublic class Movie { @Size(min=1, max=20) private String name; @Size(min=1, max=20) private String language; private int year; // getters and setters for "name", "year", "language" public BasicDBObject toDBObject() { BasicDBObject doc = new BasicDBObject(); doc.put("name", name); doc.put("year", year); doc.put("language", language); return doc; } public static Movie fromDBObject(DBObject doc) { Movie m = new Movie(); m.name = (String)doc.get("name"); m.year = (int)doc.get("year"); m.language = (String)doc.get("language"); return m; } @Override public String toString() { return name + ", " + year + ", " + language; }} Other than the usual boilerplate code, the key methods here are "toDBObject" and "fromDBObject". These methods provide a conversion from "Movie" -> "DBObject" and vice versa. The "DBObject" is a MongoDB class that comes as part of the mongo-2.6.3.jar file and which we added to our project earlier. The complete javadoc for 2.6.3 can be seen here. Notice, this class also uses Bean Validation constraints and will be honored by the JSF layer. Finally, create "MovieSessionBean" stateless EJB with all the business logic such that it looks like: package org.glassfish.samples;import com.mongodb.BasicDBObject;import com.mongodb.DB;import com.mongodb.DBCollection;import com.mongodb.DBCursor;import com.mongodb.DBObject;import com.mongodb.Mongo;import java.net.UnknownHostException;import java.util.ArrayList;import java.util.List;import javax.annotation.PostConstruct;import javax.ejb.Stateless;import javax.inject.Inject;import javax.inject.Named;/** * @author arun */@Stateless@Namedpublic class MovieSessionBean { @Inject Movie movie; DBCollection movieColl; @PostConstruct private void initDB() throws UnknownHostException { Mongo m = new Mongo(); DB db = m.getDB("movieDB"); movieColl = db.getCollection("movies"); if (movieColl == null) { movieColl = db.createCollection("movies", null); } } public void createMovie() { BasicDBObject doc = movie.toDBObject(); movieColl.insert(doc); } public List<Movie> getMovies() { List<Movie> movies = new ArrayList(); DBCursor cur = movieColl.find(); System.out.println("getMovies: Found " + cur.size() + " movie(s)"); for (DBObject dbo : cur.toArray()) { movies.add(Movie.fromDBObject(dbo)); } return movies; }} The database is initialized in @PostConstruct. Instead of a working with a database table, NoSQL databases work with a schema-less document. The "Movie" class is the document in our case and stored in the collection "movies". The collection allows us to perform query functions on all movies. The "getMovies" method invokes "find" method on the collection which is equivalent to the SQL query "select * from movies" and then returns a List<Movie>. Also notice that there is no "persistence.xml" in the project. Right-click and run the project to see the output as: Enter some values in the text box and click on enter to see the result as: If you reached here then you've successfully used MongoDB in your Java EE 6 application, congratulations! Some food for thought and further play ... SQL to MongoDB mapping shows mapping between traditional SQL -> Mongo query language. Tutorial shows fun things you can do with MongoDB. Try the interactive online shell The cookbook provides common ways of using MongoDB In terms of this project, here are some tasks that can be tried: Encapsulate database management in a JPA persistence provider. Is it even worth it because the capabilities are going to be very different ? MongoDB uses "BSonObject" class for JSON representation, add @XmlRootElement on a POJO and how a compatible JSON representation can be generated. This will make the fromXXX and toXXX methods redundant.

Read the article
CodePlex Daily Summary for Friday, June 29, 2012

CodePlex Daily Summary for Friday, June 29, 2012Popular ReleasesSupporting Guidance and Whitepapers: v1 - Supporting Media: Welcome to the Release Candidate (RC) release of the ALM Rangers Readiness supporting edia As this is a RC release and the quality bar for the final Release has not been achieved, we value your candid feedback and recommend that you do not use or deploy these RC artifacts in a production environment. Quality-Bar Details All critical bugs have been resolved Known Issues / Bugs Practical Ruck training workshop not yet includedDesigning Windows 8 Applications with C# and XAML: Chapters 1 - 7 Release Preview: Source code for all examples from Chapters 1 - 7 for the Release PreviewDataBooster - Extension to ADO.NET Data Provider: DataBooster Library for Oracle + SQL Server Beta2: This is a derivative library of dbParallel project http://dbparallel.codeplex.com. All above binaries releases require .NET Framework 4.0 or later. SQL Server support is always build-in (can't be unplugged). The first download (DLL) also requires ODP.NET to connect Oracle; The second download (DLL) also requires DataDirect(3.5) to connect Oracle; The third download (DLL) doesn't support Oracle. Please download the source code if the provider need to be replaced by others. For example ODP.NE...Microsoft Ajax Minifier: Microsoft Ajax Minifier 4.57: Fix for issue #18284: evaluating literal expressions in the pattern c1 * (x / c2) where c1/c2 is an integer value (as opposed to c2/c1 being the integer) caused the expression to be destroyed.Visual Studio ALM Quick Reference Guidance: v2 - Visual Studio 2010 (Japanese): Rex Tang (?? ??) http://blogs.msdn.com/b/willy-peter_schaub/archive/2011/12/08/introducing-the-visual-studio-alm-rangers-rex-tang.aspx, Takaho Yamaguchi (?? ??), Masashi Fujiwara (?? ??), localized and reviewed the Quick Reference Guidance for the Japanese communities, based on http://vsarquickguide.codeplex.com/releases/view/52402. The Japanese guidance is available in AllGuides and Everything packages. The AllGuides package contains guidances in PDF file format, while the Everything packag...Visual Studio Team Foundation Server Branching and Merging Guide: v1 - Visual Studio 2010 (Japanese): Rex Tang (?? ??) http://blogs.msdn.com/b/willy-peter_schaub/archive/2011/12/08/introducing-the-visual-studio-alm-rangers-rex-tang.aspx, Takaho Yamaguchi (?? ??), Hirokazu Higashino (?? ??), localized and reviewed the Branching Guidance for the Japanese communities, based on http://vsarbranchingguide.codeplex.com/releases/view/38849. The Japanese guidance is available in AllGuides and Everything packages. The AllGuides package contains guidances in PDF file format, while the Everything packag...SQL Server FineBuild: Version 3.1.0: Top SQL Server FineBuild Version 3.1.0This is the stable version of FineBuild for SQL Server 2012, 2008 R2, 2008 and 2005 Documentation FineBuild Wiki containing details of the FineBuild process Known Issues Limitations with this release FineBuild V3.1.0 Release Contents List of changes included in this release Please DonateFineBuild is free, but please donate what you think FineBuild is worth as everything goes to charity. Tearfund is one of the UK's leading relief and de...EasySL: RapidSL V2: Rewrite RapidSL UI Framework, Using Silverlight 5.0 EF4.1 Code First Ria Service SP2 + Lastest Silverlight Toolkit.NETDeob0: NETDeob 0.1.1: http://i.imgur.com/54C78.pngSOLID by example: All examples: All solid examplesSiteMap Editor for Microsoft Dynamics CRM 2011: SiteMap Editor (1.1.1726.406): Use of new version of connection controls for a full support of OSDP authentication mechanism for CRM Online.Umbraco CMS: Umbraco CMS 5.2: Development on Umbraco v5 discontinued After much discussion and consultation with leaders from the Umbraco community it was decided that work on the v5 branch would be discontinued with efforts being refocused on the stable and feature rich v4 branch. For full details as to why this decision was made please watch the CodeGarden 12 Keynote. What about all that hard work?!?? We are not binning everything and it does not mean that all work done on 5 is lost! we are taking all of the best and m...CodeGenerate: CodeGenerate Alpha: The Project can auto generate C# code. Include BLL Layer、Domain Layer、IDAL Layer、DAL Layer. Support SqlServer And Oracle This is a alpha program,but which can run and generate code. Generate database table info into MS WordXDA ROM HUB: XDA ROM HUB v0.9: Kernel listing added -- Thanks to iONEx Added scripts installer button. Added "Nandroid On The Go" -- Perform a Nandroid backup without a PC! Added official Android app!ExtAspNet: ExtAspNet v3.1.8.2: +2012-06-24 v3.1.8 +????Grid???????（???????ExpandUnusedSpace????????）（??）。 -????MinColumnWidth（??????）。 -????AutoExpandColumn，???????????????（ColumnID）（?????ForceFitFirstTime??ForceFitAllTime，??????）。 -????AutoExpandColumnMax?AutoExpandColumnMin。 -????ForceFitFirstTime，????????????，??????????（????????????）。 -????ForceFitAllTime，????????????，??????????（??????????????????）。 -????VerticalScrollWidth，????????（??????????，0?????????????）。 -????grid/grid_forcefit.aspx。 -???????????En...AJAX Control Toolkit: June 2012 Release: AJAX Control Toolkit Release Notes - June 2012 Release Version 60623June 2012 release of the AJAX Control Toolkit. AJAX Control Toolkit .NET 4 – AJAX Control Toolkit for .NET 4 and sample site (Recommended). AJAX Control Toolkit .NET 3.5 – AJAX Control Toolkit for .NET 3.5 and sample site (Recommended). Notes: - The current version of the AJAX Control Toolkit is not compatible with ASP.NET 2.0. The latest version that is compatible with ASP.NET 2.0 can be found here: 11121. - Pages using ...WPF Application Framework (WAF): WPF Application Framework (WAF) 2.5.0.5: Version: 2.5.0.5 (Milestone 5): This release contains the source code of the WPF Application Framework (WAF) and the sample applications. Requirements .NET Framework 4.0 (The package contains a solution file for Visual Studio 2010) The unit test projects require Visual Studio 2010 Professional Changelog Legend: [B] Breaking change; [O] Marked member as obsolete WAF: Add IsInDesignMode property to the WafConfiguration class. WAF: Introduce the IModuleController interface. WAF: Add ...Windows 8 Metro RSS Reader: Metro RSS Reader.v7: Updated for Windows 8 Release Preview Changed background and foreground colors Used VariableSizeGrid layout to wrap blog posts with images Sort items with Images first, text-only last Enabled Caching to improve navigation between framesConfuser: Confuser 1.9: Change log: * Stable output (i.e. given the same seed & input assemblies, you'll get the same output assemblies) + Generate debug symbols, now it is possible to debug the output under a debugger! (Of course without enabling anti debug) + Generating obfuscation database, most of the obfuscation data in stored in it. + Two tools utilizing the obfuscation database (Database viewer & Stack trace decoder) * Change the protection scheme -----Please read Bug Report before you report a bug-----...XDesigner.Development: First release: First releaseNew ProjectsArcGIS Server Rest Catalog: jquery widget that display all MapServer service of ESRI ArcGIS Server in accordion control. Require jQuery and JsRender. BugsBox.Lib: BugsBoxlibCodeStudy: The project includes all my code written for .NET studyCommunity Tfs Team Tools: Community TFS Team Tools is a community project based on the example code from ALM Rangers - Quick Response Sample Command line utility to manage TFS TeamsDiveBaseManager: DiveBaseManagerEclipse App: Aplicacion para android para el envio de coordenaads a un servidorJQMdotNET: JQMdotNet is an early attempt to make a series of MVC HTML helpers to quickly render JQuery Mobile pages.MathBuilderFramework: Math Builder FX is a framework for math problems. The idea of this framework is to create real exercises of mathematics throught base classes.MDS MODELING WORKBOOK: MDS Modeling Workbook is a modeling tool and a solution accelerator for Microsoft Master Data Services. Minesweeper: a clean old minesweeperMultithreaded Port Scanner Utility: Mulithreaded Port Scanner Utility is a very simple port scanner that take advantage of the new System.Threading.Task namespace in .Net 4.0Nonocast.Data: Nonocast.Data is a free, open source developer focused object persistence for small and medium software.On{X} Scripts: This project is to share scripts I create or modify from available free/opensource scripts (off-course with due mentions & links to original developer).PCS MAP: ladPowerShell Module for Mayhem: A module for Mayhem with a reaction which executes a PowerShell script.PunkBuster™ Screenshot Viewer: PunkBuster™ Screenshot Viewer shows screenshots of all games protected with PunkBuster™.Random reminder: Objective: To create a simple Windows application that can schedule and automatically reschedule a random timer that falls within a defined interval.Remembrall: Some useful stuff (at least for me) ! Javascript plugins jQuery plugins C# utilities ASP.NET MVC helper extensionsrepotfs: Repositório de exemplosSharepoint Designer: Sharepoint ExplororSPFluid: Simple modularity and data access framework.SPManager: Helper classSQL Server : xp_fixeddrives2: C++ Extended Stored Procedure Get volumes InfoSWORD Extensions for Sharepoint: Push documents in and out of Sharepoint using the SWORD protocol. ***incomplete***Test projects: My testing projecttestdd06282012git01: xzc testdd06282012git1: sadtestdd06282012tfs01: xzctestddhg06282012: xctestddtfs062820121: ,lm;.testhg06282012dd1: zxTlrk: NATotal_Mayhem: Add-on for Mayhem ( http://mayhem.codeplex.com/ ) that includes networking events, power reactions, and more.WallSwitch: An application to cycle your desktop wallpaper.??????: ????????

Read the article
Syncing Data with a Server using Silverlight and HTTP Polling Duplex

- by dwahlin

Many applications have the need to stay in-sync with data provided by a service. Although web applications typically rely on standard polling techniques to check if data has changed, Silverlight provides several interesting options for keeping an application in-sync that rely on server “push” technologies. A few years back I wrote several blog posts covering different “push” technologies available in Silverlight that rely on sockets or HTTP Polling Duplex. We recently had a project that looked like it could benefit from pushing data from a server to one or more clients so I thought I’d revisit the subject and provide some updates to the original code posted. If you’ve worked with AJAX before in Web applications then you know that until browsers fully support web sockets or other duplex (bi-directional communication) technologies that it’s difficult to keep applications in-sync with a server without relying on polling. The problem with polling is that you have to check for changes on the server on a timed-basis which can often be wasteful and take up unnecessary resources. With server “push” technologies, data can be pushed from the server to the client as it changes. Once the data is received, the client can update the user interface as appropriate. Using “push” technologies allows the client to listen for changes from the data but stay 100% focused on client activities as opposed to worrying about polling and asking the server if anything has changed. Silverlight provides several options for pushing data from a server to a client including sockets, TCP bindings and HTTP Polling Duplex. Each has its own strengths and weaknesses as far as performance and setup work with HTTP Polling Duplex arguably being the easiest to setup and get going. In this article I’ll demonstrate how HTTP Polling Duplex can be used in Silverlight 4 applications to push data and show how you can create a WCF server that provides an HTTP Polling Duplex binding that a Silverlight client can consume. What is HTTP Polling Duplex? Technologies that allow data to be pushed from a server to a client rely on duplex functionality. Duplex (or bi-directional) communication allows data to be passed in both directions. A client can call a service and the server can call the client. HTTP Polling Duplex (as its name implies) allows a server to communicate with a client without forcing the client to constantly poll the server. It has the benefit of being able to run on port 80 making setup a breeze compared to the other options which require specific ports to be used and cross-domain policy files to be exposed on port 943 (as with sockets and TCP bindings). Having said that, if you’re looking for the best speed possible then sockets and TCP bindings are the way to go. But, they’re not the only game in town when it comes to duplex communication. The first time I heard about HTTP Polling Duplex (initially available in Silverlight 2) I wasn’t exactly sure how it was any better than standard polling used in AJAX applications. I read the Silverlight SDK, looked at various resources and generally found the following definition unhelpful as far as understanding the actual benefits that HTTP Polling Duplex provided: "The Silverlight client periodically polls the service on the network layer, and checks for any new messages that the service wants to send on the callback channel. The service queues all messages sent on the client callback channel and delivers them to the client when the client polls the service." Although the previous definition explained the overall process, it sounded as if standard polling was used. Fortunately, Microsoft’s Scott Guthrie provided me with a more clear definition several years back that explains the benefits provided by HTTP Polling Duplex quite well (used with his permission): "The [HTTP Polling Duplex] duplex support does use polling in the background to implement notifications – although the way it does it is different than manual polling. It initiates a network request, and then the request is effectively “put to sleep” waiting for the server to respond (it doesn’t come back immediately). The server then keeps the connection open but not active until it has something to send back (or the connection times out after 90 seconds – at which point the duplex client will connect again and wait). This way you are avoiding hitting the server repeatedly – but still get an immediate response when there is data to send." After hearing Scott’s definition the light bulb went on and it all made sense. A client makes a request to a server to check for changes, but instead of the request returning immediately, it parks itself on the server and waits for data. It’s kind of like waiting to pick up a pizza at the store. Instead of calling the store over and over to check the status, you sit in the store and wait until the pizza (the request data) is ready. Once it’s ready you take it back home (to the client). This technique provides a lot of efficiency gains over standard polling techniques even though it does use some polling of its own as a request is initially made from a client to a server. So how do you implement HTTP Polling Duplex in your Silverlight applications? Let’s take a look at the process by starting with the server. Creating an HTTP Polling Duplex WCF Service Creating a WCF service that exposes an HTTP Polling Duplex binding is straightforward as far as coding goes. Add some one way operations into an interface, create a client callback interface and you’re ready to go. The most challenging part comes into play when configuring the service to properly support the necessary binding and that’s more of a cut and paste operation once you know the configuration code to use. To create an HTTP Polling Duplex service you’ll need to expose server-side and client-side interfaces and reference the System.ServiceModel.PollingDuplex assembly (located at C:\Program Files (x86)\Microsoft SDKs\Silverlight\v4.0\Libraries\Server on my machine) in the server project. For the demo application I upgraded a basketball simulation service to support the latest polling duplex assemblies. The service simulates a simple basketball game using a Game class and pushes information about the game such as score, fouls, shots and more to the client as the game changes over time. Before jumping too far into the game push service, it’s important to discuss two interfaces used by the service to communicate in a bi-directional manner. The first is called IGameStreamService and defines the methods/operations that the client can call on the server (see Listing 1). The second is IGameStreamClient which defines the callback methods that a server can use to communicate with a client (see Listing 2). [ServiceContract(Namespace = "Silverlight", CallbackContract = typeof(IGameStreamClient))] public interface IGameStreamService { [OperationContract(IsOneWay = true)] void GetTeamData(); } Listing 1. The IGameStreamService interface defines server operations that can be called on the server. [ServiceContract] public interface IGameStreamClient { [OperationContract(IsOneWay = true)] void ReceiveTeamData(List<Team> teamData); [OperationContract(IsOneWay = true, AsyncPattern=true)] IAsyncResult BeginReceiveGameData(GameData gameData, AsyncCallback callback, object state); void EndReceiveGameData(IAsyncResult result); } Listing 2. The IGameStreamClient interfaces defines client operations that a server can call. The IGameStreamService interface is decorated with the standard ServiceContract attribute but also contains a value for the CallbackContract property. This property is used to define the interface that the client will expose (IGameStreamClient in this example) and use to receive data pushed from the service. Notice that each OperationContract attribute in both interfaces sets the IsOneWay property to true. This means that the operation can be called and passed data as appropriate, however, no data will be passed back. Instead, data will be pushed back to the client as it’s available. Looking through the IGameStreamService interface you can see that the client can request team data whereas the IGameStreamClient interface allows team and game data to be received by the client. One interesting point about the IGameStreamClient interface is the inclusion of the AsyncPattern property on the BeginReceiveGameData operation. I initially created this operation as a standard one way operation and it worked most of the time. However, as I disconnected clients and reconnected new ones game data wasn’t being passed properly. After researching the problem more I realized that because the service could take up to 7 seconds to return game data, things were getting hung up. By setting the AsyncPattern property to true on the BeginReceivedGameData operation and providing a corresponding EndReceiveGameData operation I was able to get around this problem and get everything running properly. I’ll provide more details on the implementation of these two methods later in this post. Once the interfaces were created I moved on to the game service class. The first order of business was to create a class that implemented the IGameStreamService interface. Since the service can be used by multiple clients wanting game data I added the ServiceBehavior attribute to the class definition so that I could set its InstanceContextMode to InstanceContextMode.Single (in effect creating a Singleton service object). Listing 3 shows the game service class as well as its fields and constructor. [ServiceBehavior(ConcurrencyMode = ConcurrencyMode.Multiple, InstanceContextMode = InstanceContextMode.Single)] public class GameStreamService : IGameStreamService { object _Key = new object(); Game _Game = null; Timer _Timer = null; Random _Random = null; Dictionary<string, IGameStreamClient> _ClientCallbacks = new Dictionary<string, IGameStreamClient>(); static AsyncCallback _ReceiveGameDataCompleted = new AsyncCallback(ReceiveGameDataCompleted); public GameStreamService() { _Game = new Game(); _Timer = new Timer { Enabled = false, Interval = 2000, AutoReset = true }; _Timer.Elapsed += new ElapsedEventHandler(_Timer_Elapsed); _Timer.Start(); _Random = new Random(); }} Listing 3. The GameStreamService implements the IGameStreamService interface which defines a callback contract that allows the service class to push data back to the client. By implementing the IGameStreamService interface, GameStreamService must supply a GetTeamData() method which is responsible for supplying information about the teams that are playing as well as individual players. GetTeamData() also acts as a client subscription method that tracks clients wanting to receive game data. Listing 4 shows the GetTeamData() method. public void GetTeamData() { //Get client callback channel var context = OperationContext.Current; var sessionID = context.SessionId; var currClient = context.GetCallbackChannel<IGameStreamClient>(); context.Channel.Faulted += Disconnect; context.Channel.Closed += Disconnect; IGameStreamClient client; if (!_ClientCallbacks.TryGetValue(sessionID, out client)) { lock (_Key) { _ClientCallbacks[sessionID] = currClient; } } currClient.ReceiveTeamData(_Game.GetTeamData()); //Start timer which when fired sends updated score information to client if (!_Timer.Enabled) { _Timer.Enabled = true; } } Listing 4. The GetTeamData() method subscribes a given client to the game service and returns. The key the line of code in the GetTeamData() method is the call to GetCallbackChannel<IGameStreamClient>(). This method is responsible for accessing the calling client’s callback channel. The callback channel is defined by the IGameStreamClient interface shown earlier in Listing 2 and used by the server to communicate with the client. Before passing team data back to the client, GetTeamData() grabs the client’s session ID and checks if it already exists in the _ClientCallbacks dictionary object used to track clients wanting callbacks from the server. If the client doesn’t exist it adds it into the collection. It then pushes team data from the Game class back to the client by calling ReceiveTeamData(). Since the service simulates a basketball game, a timer is then started if it’s not already enabled which is then used to randomly send data to the client. When the timer fires, game data is pushed down to the client. Listing 5 shows the _Timer_Elapsed() method that is called when the timer fires as well as the SendGameData() method used to send data to the client. void _Timer_Elapsed(object sender, ElapsedEventArgs e) { int interval = _Random.Next(3000, 7000); lock (_Key) { _Timer.Interval = interval; _Timer.Enabled = false; } SendGameData(_Game.GetGameData()); } private void SendGameData(GameData gameData) { var cbs = _ClientCallbacks.Where(cb => ((IContextChannel)cb.Value).State == CommunicationState.Opened); for (int i = 0; i < cbs.Count(); i++) { var cb = cbs.ElementAt(i).Value; try { cb.BeginReceiveGameData(gameData, _ReceiveGameDataCompleted, cb); } catch (TimeoutException texp) { //Log timeout error } catch (CommunicationException cexp) { //Log communication error } } lock (_Key) _Timer.Enabled = true; } private static void ReceiveGameDataCompleted(IAsyncResult result) { try { ((IGameStreamClient)(result.AsyncState)).EndReceiveGameData(result); } catch (CommunicationException) { // empty } catch (TimeoutException) { // empty } } LIsting 5. _Timer_Elapsed is used to simulate time in a basketball game. When _Timer_Elapsed() fires the SendGameData() method is called which iterates through the clients wanting to be notified of changes. As each client is identified, their respective BeginReceiveGameData() method is called which ultimately pushes game data down to the client. Recall that this method was defined in the client callback interface named IGameStreamClient shown earlier in Listing 2. Notice that BeginReceiveGameData() accepts _ReceiveGameDataCompleted as its second parameter (an AsyncCallback delegate defined in the service class) and passes the client callback as the third parameter. The initial version of the sample application had a standard ReceiveGameData() method in the client callback interface. However, sometimes the client callbacks would work properly and sometimes they wouldn’t which was a little baffling at first glance. After some investigation I realized that I needed to implement an asynchronous pattern for client callbacks to work properly since 3 – 7 second delays are occurring as a result of the timer. Once I added the BeginReceiveGameData() and ReceiveGameDataCompleted() methods everything worked properly since each call was handled in an asynchronous manner. The final task that had to be completed to get the server working properly with HTTP Polling Duplex was adding configuration code into web.config. In the interest of brevity I won’t post all of the code here since the sample application includes everything you need. However, Listing 6 shows the key configuration code to handle creating a custom binding named pollingDuplexBinding and associate it with the service’s endpoint. <bindings> <customBinding> <binding name="pollingDuplexBinding"> <binaryMessageEncoding /> <pollingDuplex maxPendingSessions="2147483647" maxPendingMessagesPerSession="2147483647" inactivityTimeout="02:00:00" serverPollTimeout="00:05:00"/> <httpTransport /> </binding> </customBinding> </bindings> <services> <service name="GameService.GameStreamService" behaviorConfiguration="GameStreamServiceBehavior"> <endpoint address="" binding="customBinding" bindingConfiguration="pollingDuplexBinding" contract="GameService.IGameStreamService"/> <endpoint address="mex" binding="mexHttpBinding" contract="IMetadataExchange" /> </service> </services> Listing 6. Configuring an HTTP Polling Duplex binding in web.config and associating an endpoint with it. Calling the Service and Receiving “Pushed” Data Calling the service and handling data that is pushed from the server is a simple and straightforward process in Silverlight. Since the service is configured with a MEX endpoint and exposes a WSDL file, you can right-click on the Silverlight project and select the standard Add Service Reference item. After the web service proxy is created you may notice that the ServiceReferences.ClientConfig file only contains an empty configuration element instead of the normal configuration elements created when creating a standard WCF proxy. You can certainly update the file if you want to read from it at runtime but for the sample application I fed the service URI directly to the service proxy as shown next: var address = new EndpointAddress("http://localhost.:5661/GameStreamService.svc"); var binding = new PollingDuplexHttpBinding(); _Proxy = new GameStreamServiceClient(binding, address); _Proxy.ReceiveTeamDataReceived += _Proxy_ReceiveTeamDataReceived; _Proxy.ReceiveGameDataReceived += _Proxy_ReceiveGameDataReceived; _Proxy.GetTeamDataAsync(); This code creates the proxy and passes the endpoint address and binding to use to its constructor. It then wires the different receive events to callback methods and calls GetTeamDataAsync(). Calling GetTeamDataAsync() causes the server to store the client in the server-side dictionary collection mentioned earlier so that it can receive data that is pushed. As the server-side timer fires and game data is pushed to the client, the user interface is updated as shown in Listing 7. Listing 8 shows the _Proxy_ReceiveGameDataReceived() method responsible for handling the data and calling UpdateGameData() to process it. Listing 7. The Silverlight interface. Game data is pushed from the server to the client using HTTP Polling Duplex. void _Proxy_ReceiveGameDataReceived(object sender, ReceiveGameDataReceivedEventArgs e) { UpdateGameData(e.gameData); } private void UpdateGameData(GameData gameData) { //Update Score this.tbTeam1Score.Text = gameData.Team1Score.ToString(); this.tbTeam2Score.Text = gameData.Team2Score.ToString(); //Update ball visibility if (gameData.Action != ActionsEnum.Foul) { if (tbTeam1.Text == gameData.TeamOnOffense) { AnimateBall(this.BB1, this.BB2); } else //Team 2 { AnimateBall(this.BB2, this.BB1); } } if (this.lbActions.Items.Count > 9) this.lbActions.Items.Clear(); this.lbActions.Items.Add(gameData.LastAction); if (this.lbActions.Visibility == Visibility.Collapsed) this.lbActions.Visibility = Visibility.Visible; } private void AnimateBall(Image onBall, Image offBall) { this.FadeIn.Stop(); Storyboard.SetTarget(this.FadeInAnimation, onBall); Storyboard.SetTarget(this.FadeOutAnimation, offBall); this.FadeIn.Begin(); } Listing 8. As the server pushes game data, the client’s _Proxy_ReceiveGameDataReceived() method is called to process the data. In a real-life application I’d go with a ViewModel class to handle retrieving team data, setup data bindings and handle data that is pushed from the server. However, for the sample application I wanted to focus on HTTP Polling Duplex and keep things as simple as possible. Summary Silverlight supports three options when duplex communication is required in an application including TCP bindins, sockets and HTTP Polling Duplex. In this post you’ve seen how HTTP Polling Duplex interfaces can be created and implemented on the server as well as how they can be consumed by a Silverlight client. HTTP Polling Duplex provides a nice way to “push” data from a server while still allowing the data to flow over port 80 or another port of your choice. Sample Application Download

Read the article
[GEEK SCHOOL] Network Security 2: Preventing Disaster with User Account Control

- by Ciprian Rusen

In this second lesson in our How-To Geek School about securing the Windows devices in your network, we will talk about User Account Control (UAC). Users encounter this feature each time they need to install desktop applications in Windows, when some applications need administrator permissions in order to work and when they have to change different system settings and files. UAC was introduced in Windows Vista as part of Microsoft’s “Trustworthy Computing” initiative. Basically, UAC is meant to act as a wedge between you and installing applications or making system changes. When you attempt to do either of these actions, UAC will pop up and interrupt you. You may either have to confirm you know what you’re doing, or even enter an administrator password if you don’t have those rights. Some users find UAC annoying and choose to disable it but this very important security feature of Windows (and we strongly caution against doing that). That’s why in this lesson, we will carefully explain what UAC is and everything it does. As you will see, this feature has an important role in keeping Windows safe from all kinds of security problems. In this lesson you will learn which activities may trigger a UAC prompt asking for permissions and how UAC can be set so that it strikes the best balance between usability and security. You will also learn what kind of information you can find in each UAC prompt. Last but not least, you will learn why you should never turn off this feature of Windows. By the time we’re done today, we think you will have a newly found appreciation for UAC, and will be able to find a happy medium between turning it off completely and letting it annoy you to distraction. What is UAC and How Does it Work? UAC or User Account Control is a security feature that helps prevent unauthorized system changes to your Windows computer or device. These changes can be made by users, applications, and sadly, malware (which is the biggest reason why UAC exists in the first place). When an important system change is initiated, Windows displays a UAC prompt asking for your permission to make the change. If you don’t give your approval, the change is not made. In Windows, you will encounter UAC prompts mostly when working with desktop applications that require administrative permissions. For example, in order to install an application, the installer (generally a setup.exe file) asks Windows for administrative permissions. UAC initiates an elevation prompt like the one shown earlier asking you whether it is okay to elevate permissions or not. If you say “Yes”, the installer starts as administrator and it is able to make the necessary system changes in order to install the application correctly. When the installer is closed, its administrator privileges are gone. If you run it again, the UAC prompt is shown again because your previous approval is not remembered. If you say “No”, the installer is not allowed to run and no system changes are made. If a system change is initiated from a user account that is not an administrator, e.g. the Guest account, the UAC prompt will also ask for the administrator password in order to give the necessary permissions. Without this password, the change won’t be made. Which Activities Trigger a UAC Prompt? There are many types of activities that may trigger a UAC prompt: Running a desktop application as an administrator Making changes to settings and files in the Windows and Program Files folders Installing or removing drivers and desktop applications Installing ActiveX controls Changing settings to Windows features like the Windows Firewall, UAC, Windows Update, Windows Defender, and others Adding, modifying, or removing user accounts Configuring Parental Controls in Windows 7 or Family Safety in Windows 8.x Running the Task Scheduler Restoring backed-up system files Viewing or changing the folders and files of another user account Changing the system date and time You will encounter UAC prompts during some or all of these activities, depending on how UAC is set on your Windows device. If this security feature is turned off, any user account or desktop application can make any of these changes without a prompt asking for permissions. In this scenario, the different forms of malware existing on the Internet will also have a higher chance of infecting and taking control of your system. In Windows 8.x operating systems you will never see a UAC prompt when working with apps from the Windows Store. That’s because these apps, by design, are not allowed to modify any system settings or files. You will encounter UAC prompts only when working with desktop programs. What You Can Learn from a UAC Prompt? When you see a UAC prompt on the screen, take time to read the information displayed so that you get a better understanding of what is going on. Each prompt first tells you the name of the program that wants to make system changes to your device, then you can see the verified publisher of that program. Dodgy software tends not to display this information and instead of a real company name, you will see an entry that says “Unknown”. If you have downloaded that program from a less than trustworthy source, then it might be better to select “No” in the UAC prompt. The prompt also shares the origin of the file that’s trying to make these changes. In most cases the file origin is “Hard drive on this computer”. You can learn more by pressing “Show details”. You will see an additional entry named “Program location” where you can see the physical location on your hard drive, for the file that’s trying to perform system changes. Make your choice based on the trust you have in the program you are trying to run and its publisher. If a less-known file from a suspicious location is requesting a UAC prompt, then you should seriously consider pressing “No”. What’s Different About Each UAC Level? Windows 7 and Windows 8.x have four UAC levels: Always notify – when this level is used, you are notified before desktop applications make changes that require administrator permissions or before you or another user account changes Windows settings like the ones mentioned earlier. When the UAC prompt is shown, the desktop is dimmed and you must choose “Yes” or “No” before you can do anything else. This is the most secure and also the most annoying way to set UAC because it triggers the most UAC prompts. Notify me only when programs/apps try to make changes to my computer (default) – Windows uses this as the default for UAC. When this level is used, you are notified before desktop applications make changes that require administrator permissions. If you are making system changes, UAC doesn’t show any prompts and it automatically gives you the necessary permissions for making the changes you desire. When a UAC prompt is shown, the desktop is dimmed and you must choose “Yes” or “No” before you can do anything else. This level is slightly less secure than the previous one because malicious programs can be created for simulating the keystrokes or mouse moves of a user and change system settings for you. If you have a good security solution in place, this scenario should never occur. Notify me only when programs/apps try to make changes to my computer (do not dim my desktop) – this level is different from the previous in in the fact that, when the UAC prompt is shown, the desktop is not dimmed. This decreases the security of your system because different kinds of desktop applications (including malware) might be able to interfere with the UAC prompt and approve changes that you might not want to be performed. Never notify – this level is the equivalent of turning off UAC. When using it, you have no protection against unauthorized system changes. Any desktop application and any user account can make system changes without your permission. How to Configure UAC If you would like to change the UAC level used by Windows, open the Control Panel, then go to “System and Security” and select “Action Center”. On the column on the left you will see an entry that says “Change User Account Control settings”. The “User Account Control Settings” window is now opened. Change the position of the UAC slider to the level you want applied then press “OK”. Depending on how UAC was initially set, you may receive a UAC prompt requiring you to confirm this change. Why You Should Never Turn Off UAC If you want to keep the security of your system at decent levels, you should never turn off UAC. When you disable it, everything and everyone can make system changes without your consent. This makes it easier for all kinds of malware to infect and take control of your system. It doesn’t matter whether you have a security suite or antivirus installed or third-party antivirus, basic common-sense measures like having UAC turned on make a big difference in keeping your devices safe from harm. We have noticed that some users disable UAC prior to setting up their Windows devices and installing third-party software on them. They keep it disabled while installing all the software they will use and enable it when done installing everything, so that they don’t have to deal with so many UAC prompts. Unfortunately this causes problems with some desktop applications. They may fail to work after you enable UAC. This happens because, when UAC is disabled, the virtualization techniques UAC uses for your applications are inactive. This means that certain user settings and files are installed in a different place and when you turn on UAC, applications stop working because they should be placed elsewhere. Therefore, whatever you do, do not turn off UAC completely! Coming up next … In the next lesson you will learn about Windows Defender, what this tool can do in Windows 7 and Windows 8.x, what’s different about it in these operating systems and how it can be used to increase the security of your system.

Read the article
Building Simple Workflows in Oozie

- by dan.mcclary

Introduction More often than not, data doesn't come packaged exactly as we'd like it for analysis. Transformation, match-merge operations, and a host of data munging tasks are usually needed before we can extract insights from our Big Data sources. Few people find data munging exciting, but it has to be done. Once we've suffered that boredom, we should take steps to automate the process. We want codify our work into repeatable units and create workflows which we can leverage over and over again without having to write new code. In this article, we'll look at how to use Oozie to create a workflow for the parallel machine learning task I described on Cloudera's site. Hive Actions: Prepping for Pig In my parallel machine learning article, I use data from the National Climatic Data Center to build weather models on a state-by-state basis. NCDC makes the data freely available as gzipped files of day-over-day observations stretching from the 1930s to today. In reading that post, one might get the impression that the data came in a handy, ready-to-model files with convenient delimiters. The truth of it is that I need to perform some parsing and projection on the dataset before it can be modeled. If I get more observations, I'll want to retrain and test those models, which will require more parsing and projection. This is a good opportunity to start building up a workflow with Oozie. I store the data from the NCDC in HDFS and create an external Hive table partitioned by year. This gives me flexibility of Hive's query language when I want it, but let's me put the dataset in a directory of my choosing in case I want to treat the same data with Pig or MapReduce code. CREATE EXTERNAL TABLE IF NOT EXISTS historic_weather(column 1, column2) PARTITIONED BY (yr string) STORED AS ... LOCATION '/user/oracle/weather/historic'; As new weather data comes in from NCDC, I'll need to add partitions to my table. That's an action I should put in the workflow. Similarly, the weather data requires parsing in order to be useful as a set of columns. Because of their long history, the weather data is broken up into fields of specific byte lengths: x bytes for the station ID, y bytes for the dew point, and so on. The delimiting is consistent from year to year, so writing SerDe or a parser for transformation is simple. Once that's done, I want to select columns on which to train, classify certain features, and place the training data in an HDFS directory for my Pig script to access. ALTER TABLE historic_weather ADD IF NOT EXISTS PARTITION (yr='2010') LOCATION '/user/oracle/weather/historic/yr=2011'; INSERT OVERWRITE DIRECTORY '/user/oracle/weather/cleaned_history' SELECT w.stn, w.wban, w.weather_year, w.weather_month, w.weather_day, w.temp, w.dewp, w.weather FROM ( FROM historic_weather SELECT TRANSFORM(...) USING '/path/to/hive/filters/ncdc_parser.py' as stn, wban, weather_year, weather_month, weather_day, temp, dewp, weather ) w; Since I'm going to prepare training directories with at least the same frequency that I add partitions, I should also add that to my workflow. Oozie is going to invoke these Hive actions using what's somewhat obviously referred to as a Hive action. Hive actions amount to Oozie running a script file containing our query language statements, so we can place them in a file called weather_train.hql. Starting Our Workflow Oozie offers two types of jobs: workflows and coordinator jobs. Workflows are straightforward: they define a set of actions to perform as a sequence or directed acyclic graph. Coordinator jobs can take all the same actions of Workflow jobs, but they can be automatically started either periodically or when new data arrives in a specified location. To keep things simple we'll make a workflow job; coordinator jobs simply require another XML file for scheduling. The bare minimum for workflow XML defines a name, a starting point, and an end point: <workflow-app name="WeatherMan" xmlns="uri:oozie:workflow:0.1"> <start to="ParseNCDCData"/> <end name="end"/> </workflow-app> To this we need to add an action, and within that we'll specify the hive parameters Also, keep in mind that actions require <ok> and <error> tags to direct the next action on success or failure. <action name="ParseNCDCData"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>localhost:8021</job-tracker> <name-node>localhost:8020</name-node> <configuration> <property> <name>oozie.hive.defaults</name> <value>/user/oracle/weather_ooze/hive-default.xml</value> </property> </configuration> <script>ncdc_parse.hql</script> </hive> <ok to="WeatherMan"/> <error to="end"/> </action> There are a couple of things to note here: I have to give the FQDN (or IP) and port of my JobTracker and NameNode. I have to include a hive-default.xml file. I have to include a script file. The hive-default.xml and script file must be stored in HDFS That last point is particularly important. Oozie doesn't make assumptions about where a given workflow is being run. You might submit workflows against different clusters, or have different hive-defaults.xml on different clusters (e.g. MySQL or Postgres-backed metastores). A quick way to ensure that all the assets end up in the right place in HDFS is just to make a working directory locally, build your workflow.xml in it, and copy the assets you'll need to it as you add actions to workflow.xml. At this point, our local directory should contain: workflow.xml hive-defaults.xml (make sure this file contains your metastore connection data) ncdc_parse.hql Adding Pig to the Ooze Adding our Pig script as an action is slightly simpler from an XML standpoint. All we do is add an action to workflow.xml as follows: <action name="WeatherMan"> <pig> <job-tracker>localhost:8021</job-tracker> <name-node>localhost:8020</name-node> <script>weather_train.pig</script> </pig> <ok to="end"/> <error to="end"/> </action> Once we've done this, we'll copy weather_train.pig to our working directory. However, there's a bit of a "gotcha" here. My pig script registers the Weka Jar and a chunk of jython. If those aren't also in HDFS, our action will fail from the outset -- but where do we put them? The Jython script goes into the working directory at the same level as the pig script, because pig attempts to load Jython files in the directory from which the script executes. However, that's not where our Weka jar goes. While Oozie doesn't assume much, it does make an assumption about the Pig classpath. Anything under working_directory/lib gets automatically added to the Pig classpath and no longer requires a REGISTER statement in the script. Anything that uses a REGISTER statement cannot be in the working_directory/lib directory. Instead, it needs to be in a different HDFS directory and attached to the pig action with an <archive> tag. Yes, that's as confusing as you think it is. You can get the exact rules for adding Jars to the distributed cache from Oozie's Pig Cookbook. Making the Workflow Work We've got a workflow defined and have collected all the components we'll need to run. But we can't run anything yet, because we still have to define some properties about the job and submit it to Oozie. We need to start with the job properties, as this is essentially the "request" we'll submit to the Oozie server. In the same working directory, we'll make a file called job.properties as follows: nameNode=hdfs://localhost:8020 jobTracker=localhost:8021 queueName=default weatherRoot=weather_ooze mapreduce.jobtracker.kerberos.principal=foo dfs.namenode.kerberos.principal=foo oozie.libpath=${nameNode}/user/oozie/share/lib oozie.wf.application.path=${nameNode}/user/${user.name}/${weatherRoot} outputDir=weather-ooze While some of the pieces of the properties file are familiar (e.g., JobTracker address), others take a bit of explaining. The first is weatherRoot: this is essentially an environment variable for the script (as are jobTracker and queueName). We're simply using them to simplify the directives for the Oozie job. The oozie.libpath pieces is extremely important. This is a directory in HDFS which holds Oozie's shared libraries: a collection of Jars necessary for invoking Hive, Pig, and other actions. It's a good idea to make sure this has been installed and copied up to HDFS. The last two lines are straightforward: run the application defined by workflow.xml at the application path listed and write the output to the output directory. We're finally ready to submit our job! After all that work we only need to do a few more things: Validate our workflow.xml Copy our working directory to HDFS Submit our job to the Oozie server Run our workflow Let's do them in order. First validate the workflow: oozie validate workflow.xml Next, copy the working directory up to HDFS: hadoop fs -put working_dir /user/oracle/working_dir Now we submit the job to the Oozie server. We need to ensure that we've got the correct URL for the Oozie server, and we need to specify our job.properties file as an argument. oozie job -oozie http://url.to.oozie.server:port_number/ -config /path/to/working_dir/job.properties -submit We've submitted the job, but we don't see any activity on the JobTracker? All I got was this funny bit of output: 14-20120525161321-oozie-oracle This is because submitting a job to Oozie creates an entry for the job and places it in PREP status. What we got back, in essence, is a ticket for our workflow to ride the Oozie train. We're responsible for redeeming our ticket and running the job. oozie -oozie http://url.to.oozie.server:port_number/ -start 14-20120525161321-oozie-oracle Of course, if we really want to run the job from the outset, we can change the "-submit" argument above to "-run." This will prep and run the workflow immediately. Takeaway So, there you have it: the somewhat laborious process of building an Oozie workflow. It's a bit tedious the first time out, but it does present a pair of real benefits to those of us who spend a great deal of time data munging. First, when new data arrives that requires the same processing, we already have the workflow defined and ready to run. Second, as we build up a set of useful action definitions over time, creating new workflows becomes quicker and quicker.

Read the article
Install NPM Packages Automatically for Node.js on Windows Azure Web Site

- by Shaun

In one of my previous post I described and demonstrated how to use NPM packages in Node.js and Windows Azure Web Site (WAWS). In that post I used NPM command to install packages, and then use Git for Windows to commit my changes and sync them to WAWS git repository. Then WAWS will trigger a new deployment to host my Node.js application. Someone may notice that, a NPM package may contains many files and could be a little bit huge. For example, the “azure” package, which is the Windows Azure SDK for Node.js, is about 6MB. Another popular package “express”, which is a rich MVC framework for Node.js, is about 1MB. When I firstly push my codes to Windows Azure, all of them must be uploaded to the cloud. Is that possible to let Windows Azure download and install these packages for us? In this post, I will introduce how to make WAWS install all required packages for us when deploying. Let’s Start with Demo Demo is most straightforward. Let’s create a new WAWS and clone it to my local disk. Drag the folder into Git for Windows so that it can help us commit and push. Please refer to this post if you are not familiar with how to use Windows Azure Web Site, Git deployment, git clone and Git for Windows. And then open a command windows and install a package in our code folder. Let’s say I want to install “express”. And then created a new Node.js file named “server.js” and pasted the code as below. 1: var express = require("express"); 2: var app = express(); 3: 4: app.get("/", function(req, res) { 5: res.send("Hello Node.js and Express."); 6: }); 7: 8: console.log("Web application opened."); 9: app.listen(process.env.PORT); If we switch to Git for Windows right now we will find that it detected the changes we made, which includes the “server.js” and all files under “node_modules” folder. What we need to upload should only be our source code, but the huge package files also have to be uploaded as well. Now I will show you how to exclude them and let Windows Azure install the package on the cloud. First we need to add a special file named “.gitignore”. It seems cannot be done directly from the file explorer since this file only contains extension name. So we need to do it from command line. Navigate to the local repository folder and execute the command below to create an empty file named “.gitignore”. If the command windows asked for input just press Enter. 1: echo > .gitignore Now open this file and copy the content below and save. 1: node_modules Now if we switch to Git for Windows we will found that the packages under the “node_modules” were not in the change list. So now if we commit and push, the “express” packages will not be uploaded to Windows Azure. Second, let’s tell Windows Azure which packages it needs to install when deploying. Create another file named “package.json” and copy the content below into that file and save. 1: { 2: "name": "npmdemo", 3: "version": "1.0.0", 4: "dependencies": { 5: "express": "*" 6: } 7: } Now back to Git for Windows, commit our changes and push it to WAWS. Then let’s open the WAWS in developer portal, we will see that there’s a new deployment finished. Click the arrow right side of this deployment we can see how WAWS handle this deployment. Especially we can find WAWS executed NPM. And if we opened the log we can review what command WAWS executed to install the packages and the installation output messages. As you can see WAWS installed “express” for me from the cloud side, so that I don’t need to upload the whole bunch of the package to Azure. Open this website and we can see the result, which proved the “express” had been installed successfully. What’s Happened Under the Hood Now let’s explain a bit on what the “.gitignore” and “package.json” mean. The “.gitignore” is an ignore configuration file for git repository. All files and folders listed in the “.gitignore” will be skipped from git push. In the example below I copied “node_modules” into this file in my local repository. This means, do not track and upload all files under the “node_modules” folder. So by using “.gitignore” I skipped all packages from uploading to Windows Azure. “.gitignore” can contain files, folders. It can also contain the files and folders that we do NOT want to ignore. In the next section we will see how to use the un-ignore syntax to make the SQL package included. The “package.json” file is the package definition file for Node.js application. We can define the application name, version, description, author, etc. information in it in JSON format. And we can also put the dependent packages as well, to indicate which packages this Node.js application is needed. In WAWS, name and version is necessary. And when a deployment happened, WAWS will look into this file, find the dependent packages, execute the NPM command to install them one by one. So in the demo above I copied “express” into this file so that WAWS will install it for me automatically. I updated the dependencies section of the “package.json” file manually. But this can be done partially automatically. If we have a valid “package.json” in our local repository, then when we are going to install some packages we can specify “--save” parameter in “npm install” command, so that NPM will help us upgrade the dependencies part. For example, when I wanted to install “azure” package I should execute the command as below. Note that I added “--save” with the command. 1: npm install azure --save Once it finished my “package.json” will be updated automatically. Each dependent packages will be presented here. The JSON key is the package name while the value is the version range. Below is a brief list of the version range format. For more information about the “package.json” please refer here. Format Description Example version Must match the version exactly. "azure": "0.6.7" >=version Must be equal or great than the version. "azure": ">0.6.0" 1.2.x The version number must start with the supplied digits, but any digit may be used in place of the x. "azure": "0.6.x" ~version The version must be at least as high as the range, and it must be less than the next major revision above the range. "azure": "~0.6.7" * Matches any version. "azure": "*" And WAWS will install the proper version of the packages based on what you defined here. The process of WAWS git deployment and NPM installation would be like this. But Some Packages… As we know, when we specified the dependencies in “package.json” WAWS will download and install them on the cloud. For most of packages it works very well. But there are some special packages may not work. This means, if the package installation needs some special environment restraints it might be failed. For example, the SQL Server Driver for Node.js package needs “node-gyp”, Python and C++ 2010 installed on the target machine during the NPM installation. If we just put the “msnodesql” in “package.json” file and push it to WAWS, the deployment will be failed since there’s no “node-gyp”, Python and C++ 2010 in the WAWS virtual machine. For example, the “server.js” file. 1: var express = require("express"); 2: var app = express(); 3: 4: app.get("/", function(req, res) { 5: res.send("Hello Node.js and Express."); 6: }); 7: 8: var sql = require("msnodesql"); 9: var connectionString = "Driver={SQL Server Native Client 10.0};Server=tcp:tqy4c0isfr.database.windows.net,1433;Database=msteched2012;Uid=shaunxu@tqy4c0isfr;Pwd=P@ssw0rd123;Encrypt=yes;Connection Timeout=30;"; 10: app.get("/sql", function (req, res) { 11: sql.open(connectionString, function (err, conn) { 12: if (err) { 13: console.log(err); 14: res.send(500, "Cannot open connection."); 15: } 16: else { 17: conn.queryRaw("SELECT * FROM [Resource]", function (err, results) { 18: if (err) { 19: console.log(err); 20: res.send(500, "Cannot retrieve records."); 21: } 22: else { 23: res.json(results); 24: } 25: }); 26: } 27: }); 28: }); 29: 30: console.log("Web application opened."); 31: app.listen(process.env.PORT); The “package.json” file. 1: { 2: "name": "npmdemo", 3: "version": "1.0.0", 4: "dependencies": { 5: "express": "*", 6: "msnodesql": "*" 7: } 8: } And it failed to deploy to WAWS. From the NPM log we can see it’s because “msnodesql” cannot be installed on WAWS. The solution is, in “.gitignore” file we should ignore all packages except the “msnodesql”, and upload the package by ourselves. This can be done by use the content as below. We firstly un-ignored the “node_modules” folder. And then we ignored all sub folders but need git to check each sub folders. And then we un-ignore one of the sub folders named “msnodesql” which is the SQL Server Node.js Driver. 1: !node_modules/ 2: 3: node_modules/* 4: !node_modules/msnodesql For more information about the syntax of “.gitignore” please refer to this thread. Now if we go to Git for Windows we will find the “msnodesql” was included in the uncommitted set while “express” was not. I also need remove the dependency of “msnodesql” from “package.json”. Commit and push to WAWS. Now we can see the deployment successfully done. And then we can use the Windows Azure SQL Database from our Node.js application through the “msnodesql” package we uploaded. Summary In this post I demonstrated how to leverage the deployment process of Windows Azure Web Site to install NPM packages during the publish action. With the “.gitignore” and “package.json” file we can ignore the dependent packages from our Node.js and let Windows Azure Web Site download and install them while deployed. For some special packages that cannot be installed by Windows Azure Web Site, such as “msnodesql”, we can put them into the publish payload as well. With the combination of Windows Azure Web Site, Node.js and NPM it makes even more easy and quick for us to develop and deploy our Node.js application to the cloud. Hope this helps, Shaun All documents and related graphics, codes are provided "AS IS" without warranty of any kind. Copyright © Shaun Ziyan Xu. This work is licensed under the Creative Commons License.

Read the article

Search Results

Search found 21331 results on 854 pages for 'require once'.

Page 82/854 | < Previous Page | 78 79 80 81 82 83 84 85 86 87 88 89 | Next Page >

- by epitka

- by Tommy Kennedy

- by Jack

- by Sean Ashmore

- by notJim

- by gammay

- by user348255

- by notJim

- by Mike

- by Manuel R. Ciosici

- by Bart Teunissen

- by Septagram

- by Mark

- by yx

- by Dione

- by user246114

- by Simon Thorpe

- by Eric Nelson

- by arungupta

- by arungupta

- by dwahlin

- by Ciprian Rusen

- by dan.mcclary

- by Shaun

< Previous Page | 78 79 80 81 82 83 84 85 86 87 88 89 | Next Page >