spell checking - Page 216

Scripting with the Sun ZFS Storage 7000 Appliance

- by Geoff Ongley

The Sun ZFS Storage 7000 appliance has a user friendly and easy to understand graphical web based interface we call the "BUI" or "Browser User Interface".This interface is very useful for many tasks, but in some cases a script (or workflow) may be more appropriate, such as:Repetitive tasksTasks which work on (or obtain information about) a large number of shares or usersTasks which are triggered by an alert threshold (workflows)Tasks where you want a only very basic input, but a consistent output (workflows)The appliance scripting language is based on ECMAscript 3 (close to javascript). I'm not going to cover ECMAscript 3 in great depth (I'm far from an expert here), but I would like to show you some neat things you can do with the appliance, to get you started based on what I have found from my own playing around.I'm making the assumption you have some sort of programming background, and understand variables, arrays, functions to some extent - but of course if something is not clear, please let me know so I can fix it up or clarify it.Variable Declarations and ArraysVariablesECMAScript is a dynamically and weakly typed language. If you don't know what that means, google is your friend - but at a high level it means we can just declare variables with no specific type and on the fly.For example, I can declare a variable and use it straight away in the middle of my code, for example:projects=list();Which makes projects an array of values that are returned from the list(); function (which is usable in most contexts). With this kind of variable, I can do things like:projects.length (this property on array tells you how many objects are in it, good for for loops etc). Alternatively, I could say:projects=3;and now projects is just a simple number.Should we declare variables like this so loosely? In my opinion, the answer is no - I feel it is a better practice to declare variables you are going to use, before you use them - and given them an initial value. You can do so as follows:var myVariable=0;To demonstrate the ability to just randomly assign and change the type of variables, you can create a simple script at the cli as follows (bold for input):fishy10:> script("." to run)> run("cd /");("." to run)> run ("shares");("." to run)> var projects;("." to run)> projects=list();("." to run)> printf("Number of projects is: %d\n",projects.length);("." to run)> projects=152;("." to run)> printf("Value of the projects variable as an integer is now: %d\n",projects);("." to run)> .Number of projects is: 7Value of the projects variable as an integer is now: 152You can also confirm this behaviour by checking the typeof variable we are dealing with:fishy10:> script("." to run)> run("cd /");("." to run)> run ("shares");("." to run)> var projects;("." to run)> projects=list();("." to run)> printf("var projects is of type %s\n",typeof(projects));("." to run)> projects=152;("." to run)> printf("var projects is of type %s\n",typeof(projects));("." to run)> .var projects is of type objectvar projects is of type numberArraysSo you likely noticed that we have already touched on arrays, as the list(); (in the shares context) stored an array into the 'projects' variable.But what if you want to declare your own array? Easy! This is very similar to Java and other languages, we just instantiate a brand new "Array" object using the keyword new:var myArray = new Array();will create an array called "myArray".A quick example:fishy10:> script("." to run)> testArray = new Array();("." to run)> testArray[0]="This";("." to run)> testArray[1]="is";("." to run)> testArray[2]="just";("." to run)> testArray[3]="a";("." to run)> testArray[4]="test";("." to run)> for (i=0; i < testArray.length; i++)("." to run)> {("." to run)> printf("Array element %d is %s\n",i,testArray[i]);("." to run)> }("." to run)> .Array element 0 is ThisArray element 1 is isArray element 2 is justArray element 3 is aArray element 4 is testWorking With LoopsFor LoopFor loops are very similar to those you will see in C, java and several other languages. One of the key differences here is, as you were made aware earlier, we can be a bit more sloppy with our variable declarations.The general way you would likely use a for loop is as follows:for (variable; test-case; modifier for variable){}For example, you may wish to declare a variable i as 0; and a MAX_ITERATIONS variable to determine how many times this loop should repeat:var i=0;var MAX_ITERATIONS=10;And then, use this variable to be tested against some case existing (has i reached MAX_ITERATIONS? - if not, increment i using i++);for (i=0; i < MAX_ITERATIONS; i++){ // some work to do}So lets run something like this on the appliance:fishy10:> script("." to run)> var i=0;("." to run)> var MAX_ITERATIONS=10;("." to run)> for (i=0; i < MAX_ITERATIONS; i++)("." to run)> {("." to run)> printf("The number is %d\n",i);("." to run)> }("." to run)> .The number is 0The number is 1The number is 2The number is 3The number is 4The number is 5The number is 6The number is 7The number is 8The number is 9While LoopWhile loops again are very similar to other languages, we loop "while" a condition is met. For example:fishy10:> script("." to run)> var isTen=false;("." to run)> var counter=0;("." to run)> while(isTen==false)("." to run)> {("." to run)> if (counter==10) ("." to run)> { ("." to run)> isTen=true; ("." to run)> } ("." to run)> printf("Counter is %d\n",counter);("." to run)> counter++; ("." to run)> }("." to run)> printf("Loop has ended and Counter is %d\n",counter);("." to run)> .Counter is 0Counter is 1Counter is 2Counter is 3Counter is 4Counter is 5Counter is 6Counter is 7Counter is 8Counter is 9Counter is 10Loop has ended and Counter is 11So what do we notice here? Something has actually gone wrong - counter will technically be 11 once the loop completes... Why is this?Well, if we have a loop like this, where the 'while' condition that will end the loop may be set based on some other condition(s) existing (such as the counter has reached 10) - we must ensure that we terminate this iteration of the loop when the condition is met - otherwise the rest of the code will be followed which may not be desirable. In other words, like in other languages, we will only ever check the loop condition once we are ready to perform the next iteration, so any other code after we set "isTen" to be true, will still be executed as we can see it was above.We can avoid this by adding a break into our loop once we know we have set the condition - this will stop the rest of the logic being processed in this iteration (and as such, counter will not be incremented). So lets try that again:fishy10:> script("." to run)> var isTen=false;("." to run)> var counter=0;("." to run)> while(isTen==false)("." to run)> {("." to run)> if (counter==10) ("." to run)> { ("." to run)> isTen=true; ("." to run)> break;("." to run)> } ("." to run)> printf("Counter is %d\n",counter);("." to run)> counter++; ("." to run)> }("." to run)> printf("Loop has ended and Counter is %d\n", counter);("." to run)> .Counter is 0Counter is 1Counter is 2Counter is 3Counter is 4Counter is 5Counter is 6Counter is 7Counter is 8Counter is 9Loop has ended and Counter is 10Much better!Methods to Obtain and Manipulate DataGet MethodThe get method allows you to get simple properties from an object, for example a quota from a user. The syntax is fairly simple:var myVariable=get('property');An example of where you may wish to use this, is when you are getting a bunch of information about a user (such as quota information when in a shares context):var users=list();for(k=0; k < users.length; k++){ user=users[k]; run('select ' + user); var username=get('name'); var usage=get('usage'); var quota=get('quota');...Which you can then use to your advantage - to print or manipulate infomation (you could change a user's information with a set method, based on the information returned from the get method). The set method is explained next.Set MethodThe set method can be used in a simple manner, similar to get. The syntax for set is:set('property','value'); // where value is a string, if it was a number, you don't need quotesFor example, we could set the quota on a share as follows (first observing the initial value):fishy10:shares default/test-geoff> script("." to run)> var currentQuota=get('quota');("." to run)> printf("Current Quota is: %s\n",currentQuota);("." to run)> set('quota','30G');("." to run)> run('commit');("." to run)> currentQuota=get('quota');("." to run)> printf("Current Quota is: %s\n",currentQuota);("." to run)> .Current Quota is: 0Current Quota is: 32212254720This shows us using both the get and set methods as can be used in scripts, of course when only setting an individual share, the above is overkill - it would be much easier to set it manually at the cli using 'set quota=3G' and then 'commit'.List MethodThe list method can be very powerful, especially in more complex scripts which iterate over large amounts of data and manipulate it if so desired. The general way you will use list is as follows:var myVar=list();Which will make "myVar" an array, containing all the objects in the relevant context (this could be a list of users, shares, projects, etc). You can then gather or manipulate data very easily.We could list all the shares and mountpoints in a given project for example:fishy10:shares another-project> script("." to run)> var shares=list();("." to run)> for (i=0; i < shares.length; i++)("." to run)> {("." to run)> run('select ' + shares[i]);("." to run)> var mountpoint=get('mountpoint');("." to run)> printf("Share %s discovered, has mountpoint %s\n",shares[i],mountpoint);("." to run)> run('done');("." to run)> }("." to run)> .Share and-another discovered, has mountpoint /export/another-project/and-anotherShare another-share discovered, has mountpoint /export/another-project/another-shareShare bob discovered, has mountpoint /export/another-projectShare more-shares-for-all discovered, has mountpoint /export/another-project/more-shares-for-allShare yep discovered, has mountpoint /export/another-project/yepWriting More Complex and Re-Usable CodeFunctionsThe best way to be able to write more complex code is to use functions to split up repeatable or reusable sections of your code. This also makes your more complex code easier to read and understand for other programmers.We write functions as follows:function functionName(variable1,variable2,...,variableN){}For example, we could have a function that takes a project name as input, and lists shares for that project (assuming we're already in the 'project' context - context is important!):function getShares(proj){ run('select ' + proj); shares=list(); printf("Project: %s\n", proj); for(j=0; j < shares.length; j++) { printf("Discovered share: %s\n",shares[i]); } run('done'); // exit selected project}Commenting your CodeLike any other language, a large part of making it readable and understandable is to comment it. You can use the same comment style as in C and Java amongst other languages.In other words, sngle line comments use://at the beginning of the comment.Multi line comments use:/*at the beginning, and:*/ at the end.For example, here we will use both:fishy10:> script("." to run)> // This is a test comment("." to run)> printf("doing some work...\n");("." to run)> /* This is a multi-line("." to run)> comment which I will span across("." to run)> three lines in total */("." to run)> printf("doing some more work...\n");("." to run)> .doing some work...doing some more work...Your comments do not have to be on their own, they can begin (particularly with single line comments this is handy) at the end of a statement, for examplevar projects=list(); // The variable projects is an array containing all projects on the system.Try and Catch StatementsYou may be used to using try and catch statements in other languages, and they can (and should) be utilised in your code to catch expected or unexpected error conditions, that you do NOT wish to stop your code from executing (if you do not catch these errors, your script will exit!):try{ // do some work}catch(err) // Catch any error that could occur{ // do something here under the error condition}For example, you may wish to only execute some code if a context can be reached. If you can't perform certain actions under certain circumstances, that may be perfectly acceptable.For example if you want to test a condition that only makes sense when looking at a SMB/NFS share, but does not make sense when you hit an iscsi or FC LUN, you don't want to stop all processing of other shares you may not have covered yet.For example we may wish to obtain quota information on all shares for all users on a share (but this makes no sense for a LUN):function getShareQuota(shar) // Get quota for each user of this share{ run('select ' + shar); printf(" SHARE: %s\n", shar); try { run('users'); printf(" %20s %11s %11s %3s\n","Username","Usage(G)","Quota(G)","Quota(%)"); printf(" %20s %11s %11s %4s\n","--------","--------","--------","----"); users=list(); for(k=0; k < users.length; k++) { user=users[k]; getUserQuota(user); } run('done'); // exit user context } catch(err) { printf(" SKIPPING %s - This is NOT a NFS or CIFs share, not looking for users\n", shar); } run('done'); // done with this share}Running Scripts Remotely over SSHAs you have likely noticed, writing and running scripts for all but the simplest jobs directly on the appliance is not going to be a lot of fun.There's a couple of choices on what you can do here:Create scripts on a remote system and run them over sshCreate scripts, wrapping them in workflow code, so they are stored on the appliance and can be triggered under certain circumstances (like a threshold being reached)We'll cover the first one here, and then cover workflows later on (as these are for the most part just scripts with some wrapper information around them).Creating a SSH Public/Private SSH Key PairLog on to your handy Solaris box (You wouldn't be using any other OS, right? :P) and use ssh-keygen to create a pair of ssh keys. I'm storing this separate to my normal key:[geoff@lightning ~] ssh-keygen -t rsa -b 1024Generating public/private rsa key pair.Enter file in which to save the key (/export/home/geoff/.ssh/id_rsa): /export/home/geoff/.ssh/nas_key_rsaEnter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /export/home/geoff/.ssh/nas_key_rsa.Your public key has been saved in /export/home/geoff/.ssh/nas_key_rsa.pub.The key fingerprint is:7f:3d:53:f0:2a:5e:8b:2d:94:2a:55:77:66:5c:9b:14 geoff@lightningInstalling the Public Key on the ApplianceOn your Solaris host, observe the public key:[geoff@lightning ~] cat .ssh/nas_key_rsa.pub ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEAvYfK3RIaAYmMHBOvyhKM41NaSmcgUMC3igPN5gUKJQvSnYmjuWG6CBr1CkF5UcDji7v19jG3qAD5lAMFn+L0CxgRr8TNaAU+hA4/tpAGkjm+dKYSyJgEdMIURweyyfUFXoerweR8AWW5xlovGKEWZTAfvJX9Zqvh8oMQ5UJLUUc= geoff@lightningNow, copy and paste everything after "ssh-rsa" and before "user@hostname" - in this case, geoff@lightning. That is, this bit:AAAAB3NzaC1yc2EAAAABIwAAAIEAvYfK3RIaAYmMHBOvyhKM41NaSmcgUMC3igPN5gUKJQvSnYmjuWG6CBr1CkF5UcDji7v19jG3qAD5lAMFn+L0CxgRr8TNaAU+hA4/tpAGkjm+dKYSyJgEdMIURweyyfUFXoerweR8AWW5xlovGKEWZTAfvJX9Zqvh8oMQ5UJLUUc=Logon to your appliance and get into the preferences -> keys area for this user (root):[geoff@lightning ~] ssh [email protected]: Last login: Mon Dec 6 17:13:28 2010 from 192.168.0.2fishy10:> configuration usersfishy10:configuration users> select rootfishy10:configuration users root> preferences fishy10:configuration users root preferences> keysOR do it all in one hit:fishy10:> configuration users select root preferences keysNow, we create a new public key that will be accepted for this user and set the type to RSA:fishy10:configuration users root preferences keys> createfishy10:configuration users root preferences key (uncommitted)> set type=RSASet the key itself using the string copied previously (between ssh-rsa and user@host), and set the key ensuring you put double quotes around it (eg. set key="<key>"):fishy10:configuration users root preferences key (uncommitted)> set key="AAAAB3NzaC1yc2EAAAABIwAAAIEAvYfK3RIaAYmMHBOvyhKM41NaSmcgUMC3igPN5gUKJQvSnYmjuWG6CBr1CkF5UcDji7v19jG3qAD5lAMFn+L0CxgRr8TNaAU+hA4/tpAGkjm+dKYSyJgEdMIURweyyfUFXoerweR8AWW5xlovGKEWZTAfvJX9Zqvh8oMQ5UJLUUc="Now set the comment for this key (do not use spaces):fishy10:configuration users root preferences key (uncommitted)> set comment="LightningRSAKey" Commit the new key:fishy10:configuration users root preferences key (uncommitted)> commitVerify the key is there:fishy10:configuration users root preferences keys> lsKeys:NAME MODIFIED TYPE COMMENT key-000 2010-10-25 20:56:42 RSA cycloneRSAKey key-001 2010-12-6 17:44:53 RSA LightningRSAKey As you can see, we now have my new key, and a previous key I have created on this appliance.Running your Script over SSH from a Remote SystemHere I have created a basic test script, and saved it as test.ecma3:[geoff@lightning ~] cat test.ecma3 script// This is a test script, By Geoff Ongley 2010.printf("Testing script remotely over ssh\n");.Now, we can run this script remotely with our keyless login:[geoff@lightning ~] ssh -i .ssh/nas_key_rsa root@fishy10 < test.ecma3Pseudo-terminal will not be allocated because stdin is not a terminal.Testing script remotely over sshPutting it Together - An Example Completed Quota Gathering ScriptSo now we have a lot of the basics to creating a script, let us do something useful, like, find out how much every user is using, on every share on the system (you will recognise some of the code from my previous examples): script/************************************** Quick and Dirty Quota Check script ** Written By Geoff Ongley ** 25 October 2010 **************************************/function getUserQuota(usr){ run('select ' + usr); var username=get('name'); var usage=get('usage'); var quota=get('quota'); var usage_g=usage / 1073741824; // convert bytes to gigabytes var quota_g=quota / 1073741824; // as above var quota_percent=0 if (quota > 0) { quota_percent=(usage / quota)*(100/1); } printf(" %20s %8.2f %8.2f %d%%\n",username,usage_g,quota_g,quota_percent); run('done'); // done with this selected user}function getShareQuota(shar){ //printf("DEBUG: selecting share %s\n", shar); run('select ' + shar); printf(" SHARE: %s\n", shar); try { run('users'); printf(" %20s %11s %11s %3s\n","Username","Usage(G)","Quota(G)","Quota(%)"); printf(" %20s %11s %11s %4s\n","--------","--------","--------","--------"); users=list(); for(k=0; k < users.length; k++) { user=users[k]; getUserQuota(user); } run('done'); // exit user context } catch(err) { printf(" SKIPPING %s - This is NOT a NFS or CIFs share, not looking for users\n", shar); } run('done'); // done with this share}function getShares(proj){ //printf("DEBUG: selecting project %s\n",proj); run('select ' + proj); shares=list(); printf("Project: %s\n", proj); for(j=0; j < shares.length; j++) { share=shares[j]; getShareQuota(share); } run('done'); // exit selected project}function getProjects(){ run('cd /'); run('shares'); projects=list(); for (i=0; i < projects.length; i++) { var project=projects[i]; getShares(project); } run('done'); // exit context for all projects}getProjects();.Which can be run as follows, and will print information like this:[geoff@lightning ~/FISHWORKS_SCRIPTS] ssh -i ~/.ssh/nas_key_rsa root@fishy10 < get_quota_utilisation.ecma3Pseudo-terminal will not be allocated because stdin is not a terminal.Project: another-project SHARE: and-another Username Usage(G) Quota(G) Quota(%) -------- -------- -------- -------- nobody 0.00 0.00 0% geoffro 0.05 0.00 0% Billy 0.10 0.00 0% root 0.00 0.00 0% testing-user 0.05 0.00 0% SHARE: another-share Username Usage(G) Quota(G) Quota(%) -------- -------- -------- -------- root 0.00 0.00 0% nobody 0.00 0.00 0% geoffro 0.05 0.49 9% testing-user 0.05 0.02 249% Billy 0.10 0.29 33% SHARE: bob Username Usage(G) Quota(G) Quota(%) -------- -------- -------- -------- nobody 0.00 0.00 0% root 0.00 0.00 0% SHARE: more-shares-for-all Username Usage(G) Quota(G) Quota(%) -------- -------- -------- -------- Billy 0.10 0.00 0% testing-user 0.05 0.00 0% nobody 0.00 0.00 0% root 0.00 0.00 0% geoffro 0.05 0.00 0% SHARE: yep Username Usage(G) Quota(G) Quota(%) -------- -------- -------- -------- root 0.00 0.00 0% nobody 0.00 0.00 0% Billy 0.10 0.01 999% testing-user 0.05 0.49 9% geoffro 0.05 0.00 0%Project: default SHARE: Test-LUN SKIPPING Test-LUN - This is NOT a NFS or CIFs share, not looking for users SHARE: test-geoff Username Usage(G) Quota(G) Quota(%) -------- -------- -------- -------- geoffro 0.05 0.00 0% root 3.18 10.00 31% uucp 0.00 0.00 0% nobody 0.59 0.49 119%^CKilled by signal 2.Creating a WorkflowWorkflows are scripts that we store on the appliance, and can have the script execute either on request (even from the BUI), or on an event such as a threshold being met.Workflow BasicsA workflow allows you to create a simple process that can be executed either via the BUI interface interactively, or by an alert being raised (for some threshold being reached, for example).The basics parameters you will have to set for your "workflow object" (notice you're creating a variable, that embodies ECMAScript) are as follows (parameters is optional):name: A name for this workflowdescription: A Description for the workflowparameters: A set of input parameters (useful when you need user input to execute the workflow)execute: The code, the script itself to execute, which will be function (parameters)With parameters, you can specify things like this (slightly modified sample taken from the System Administration Guide): ...parameters: variableParam1: { label: 'Name of Share', type: 'String' }, variableParam2 { label: 'Share Size', type: 'size' },execute: ....}; Note the commas separating the sections of name, parameters, execute, and so on. This is important!Also - there is plenty of properties you can set on the parameters for your workflow, these are described in the Sun ZFS Storage System Administration Guide.Creating a Basic Workflow from a Basic ScriptTo make a basic script into a basic workflow, you need to wrap the following around your script to create a 'workflow' object:var workflow = {name: 'Get User Quotas',description: 'Displays Quota Utilisation for each user on each share',execute: function() {// (basic script goes here, minus the "script" at the beginning, and "." at the end)}};However, it appears (at least in my experience to date) that the workflow object may only be happy with one function in the execute parameter - either that or I'm doing something wrong. As far as I can tell, after execute: you should only have a basic one function context like so:execute: function(){}To deal with this, and to give an example similar to our script earlier, I have created another simple quota check, to show the same basic functionality, but in a workflow format:var workflow = {name: 'Get User Quotas',description: 'Displays Quota Utilisation for each user on each share',execute: function () { run('cd /'); run('shares'); projects=list(); for (i=0; i < projects.length; i++) { run('select ' + projects[i]); shares=list('filesystem'); printf("Project: %s\n", projects[i]); for(j=0; j < shares.length; j++) { run('select ' +shares[j]); try { run('users'); printf(" SHARE: %s\n", shares[j]); printf(" %20s %11s %11s %3s\n","Username","Usage(G)","Quota(G)","Quota(%)"); printf(" %20s %11s %11s %4s\n","--------","--------","--------","-------"); users=list(); for(k=0; k < users.length; k++) { run('select ' + users[k]); username=get('name'); usage=get('usage'); quota=get('quota'); usage_g=usage / 1073741824; // convert bytes to gigabytes quota_g=quota / 1073741824; // as above quota_percent=0 if (quota > 0) { quota_percent=(usage / quota)*(100/1); } printf(" %20s %8.2f %8.2f %d%%\n",username,usage_g,quota_g,quota_percent); run('done'); } run('done'); // exit user context } catch(err) { // printf(" %s is a LUN, Not looking for users\n", shares[j]); } run('done'); // exit selected share context } run('done'); // exit project context } }};SummaryThe Sun ZFS Storage 7000 Appliance offers lots of different and interesting features to Sun/Oracle customers, including the world renowned Analytics. Hopefully the above will help you to think of new creative things you could be doing by taking advantage of one of the other neat features, the internal scripting engine!Some references are below to help you continue learning more, I'll update this post as I do the same! Enjoy...More information on ECMAScript 3A complete reference to ECMAScript 3 which will help you learn more of the details you may be interested in, can be found here:http://www.ecma-international.org/publications/files/ECMA-ST-ARCH/ECMA-262,%203rd%20edition,%20December%201999.pdfMore Information on Administering the Sun ZFS Storage 7000The Sun ZFS Storage 7000 System Administration guide can be a useful reference point, and can be found here:http://wikis.sun.com/download/attachments/186238602/2010_Q3_2_ADMIN.pdf

Read the article

What are good design practices when working with Entity Framework

- by AD

This will apply mostly for an asp.net application where the data is not accessed via soa. Meaning that you get access to the objects loaded from the framework, not Transfer Objects, although some recommendation still apply. This is a community post, so please add to it as you see fit. Applies to: Entity Framework 1.0 shipped with Visual Studio 2008 sp1. Why pick EF in the first place? Considering it is a young technology with plenty of problems (see below), it may be a hard sell to get on the EF bandwagon for your project. However, it is the technology Microsoft is pushing (at the expense of Linq2Sql, which is a subset of EF). In addition, you may not be satisfied with NHibernate or other solutions out there. Whatever the reasons, there are people out there (including me) working with EF and life is not bad.make you think. EF and inheritance The first big subject is inheritance. EF does support mapping for inherited classes that are persisted in 2 ways: table per class and table the hierarchy. The modeling is easy and there are no programming issues with that part. (The following applies to table per class model as I don't have experience with table per hierarchy, which is, anyway, limited.) The real problem comes when you are trying to run queries that include one or many objects that are part of an inheritance tree: the generated sql is incredibly awful, takes a long time to get parsed by the EF and takes a long time to execute as well. This is a real show stopper. Enough that EF should probably not be used with inheritance or as little as possible. Here is an example of how bad it was. My EF model had ~30 classes, ~10 of which were part of an inheritance tree. On running a query to get one item from the Base class, something as simple as Base.Get(id), the generated SQL was over 50,000 characters. Then when you are trying to return some Associations, it degenerates even more, going as far as throwing SQL exceptions about not being able to query more than 256 tables at once. Ok, this is bad, EF concept is to allow you to create your object structure without (or with as little as possible) consideration on the actual database implementation of your table. It completely fails at this. So, recommendations? Avoid inheritance if you can, the performance will be so much better. Use it sparingly where you have to. In my opinion, this makes EF a glorified sql-generation tool for querying, but there are still advantages to using it. And ways to implement mechanism that are similar to inheritance. Bypassing inheritance with Interfaces First thing to know with trying to get some kind of inheritance going with EF is that you cannot assign a non-EF-modeled class a base class. Don't even try it, it will get overwritten by the modeler. So what to do? You can use interfaces to enforce that classes implement some functionality. For example here is a IEntity interface that allow you to define Associations between EF entities where you don't know at design time what the type of the entity would be. public enum EntityTypes{ Unknown = -1, Dog = 0, Cat } public interface IEntity { int EntityID { get; } string Name { get; } Type EntityType { get; } } public partial class Dog : IEntity { // implement EntityID and Name which could actually be fields // from your EF model Type EntityType{ get{ return EntityTypes.Dog; } } } Using this IEntity, you can then work with undefined associations in other classes // lets take a class that you defined in your model. // that class has a mapping to the columns: PetID, PetType public partial class Person { public IEntity GetPet() { return IEntityController.Get(PetID,PetType); } } which makes use of some extension functions: public class IEntityController { static public IEntity Get(int id, EntityTypes type) { switch (type) { case EntityTypes.Dog: return Dog.Get(id); case EntityTypes.Cat: return Cat.Get(id); default: throw new Exception("Invalid EntityType"); } } } Not as neat as having plain inheritance, particularly considering you have to store the PetType in an extra database field, but considering the performance gains, I would not look back. It also cannot model one-to-many, many-to-many relationship, but with creative uses of 'Union' it could be made to work. Finally, it creates the side effet of loading data in a property/function of the object, which you need to be careful about. Using a clear naming convention like GetXYZ() helps in that regards. Compiled Queries Entity Framework performance is not as good as direct database access with ADO (obviously) or Linq2SQL. There are ways to improve it however, one of which is compiling your queries. The performance of a compiled query is similar to Linq2Sql. What is a compiled query? It is simply a query for which you tell the framework to keep the parsed tree in memory so it doesn't need to be regenerated the next time you run it. So the next run, you will save the time it takes to parse the tree. Do not discount that as it is a very costly operation that gets even worse with more complex queries. There are 2 ways to compile a query: creating an ObjectQuery with EntitySQL and using CompiledQuery.Compile() function. (Note that by using an EntityDataSource in your page, you will in fact be using ObjectQuery with EntitySQL, so that gets compiled and cached). An aside here in case you don't know what EntitySQL is. It is a string-based way of writing queries against the EF. Here is an example: "select value dog from Entities.DogSet as dog where dog.ID = @ID". The syntax is pretty similar to SQL syntax. You can also do pretty complex object manipulation, which is well explained [here][1]. Ok, so here is how to do it using ObjectQuery< string query = "select value dog " + "from Entities.DogSet as dog " + "where dog.ID = @ID"; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance)); oQuery.Parameters.Add(new ObjectParameter("ID", id)); oQuery.EnablePlanCaching = true; return oQuery.FirstOrDefault(); The first time you run this query, the framework will generate the expression tree and keep it in memory. So the next time it gets executed, you will save on that costly step. In that example EnablePlanCaching = true, which is unnecessary since that is the default option. The other way to compile a query for later use is the CompiledQuery.Compile method. This uses a delegate: static readonly Func<Entities, int, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, Dog>((ctx, id) => ctx.DogSet.FirstOrDefault(it => it.ID == id)); or using linq static readonly Func<Entities, int, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, Dog>((ctx, id) => (from dog in ctx.DogSet where dog.ID == id select dog).FirstOrDefault()); to call the query: query_GetDog.Invoke( YourContext, id ); The advantage of CompiledQuery is that the syntax of your query is checked at compile time, where as EntitySQL is not. However, there are other consideration... Includes Lets say you want to have the data for the dog owner to be returned by the query to avoid making 2 calls to the database. Easy to do, right? EntitySQL string query = "select value dog " + "from Entities.DogSet as dog " + "where dog.ID = @ID"; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance)).Include("Owner"); oQuery.Parameters.Add(new ObjectParameter("ID", id)); oQuery.EnablePlanCaching = true; return oQuery.FirstOrDefault(); CompiledQuery static readonly Func<Entities, int, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, Dog>((ctx, id) => (from dog in ctx.DogSet.Include("Owner") where dog.ID == id select dog).FirstOrDefault()); Now, what if you want to have the Include parametrized? What I mean is that you want to have a single Get() function that is called from different pages that care about different relationships for the dog. One cares about the Owner, another about his FavoriteFood, another about his FavotireToy and so on. Basicly, you want to tell the query which associations to load. It is easy to do with EntitySQL public Dog Get(int id, string include) { string query = "select value dog " + "from Entities.DogSet as dog " + "where dog.ID = @ID"; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance)) .IncludeMany(include); oQuery.Parameters.Add(new ObjectParameter("ID", id)); oQuery.EnablePlanCaching = true; return oQuery.FirstOrDefault(); } The include simply uses the passed string. Easy enough. Note that it is possible to improve on the Include(string) function (that accepts only a single path) with an IncludeMany(string) that will let you pass a string of comma-separated associations to load. Look further in the extension section for this function. If we try to do it with CompiledQuery however, we run into numerous problems: The obvious static readonly Func<Entities, int, string, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, string, Dog>((ctx, id, include) => (from dog in ctx.DogSet.Include(include) where dog.ID == id select dog).FirstOrDefault()); will choke when called with: query_GetDog.Invoke( YourContext, id, "Owner,FavoriteFood" ); Because, as mentionned above, Include() only wants to see a single path in the string and here we are giving it 2: "Owner" and "FavoriteFood" (which is not to be confused with "Owner.FavoriteFood"!). Then, let's use IncludeMany(), which is an extension function static readonly Func<Entities, int, string, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, string, Dog>((ctx, id, include) => (from dog in ctx.DogSet.IncludeMany(include) where dog.ID == id select dog).FirstOrDefault()); Wrong again, this time it is because the EF cannot parse IncludeMany because it is not part of the functions that is recognizes: it is an extension. Ok, so you want to pass an arbitrary number of paths to your function and Includes() only takes a single one. What to do? You could decide that you will never ever need more than, say 20 Includes, and pass each separated strings in a struct to CompiledQuery. But now the query looks like this: from dog in ctx.DogSet.Include(include1).Include(include2).Include(include3) .Include(include4).Include(include5).Include(include6) .[...].Include(include19).Include(include20) where dog.ID == id select dog which is awful as well. Ok, then, but wait a minute. Can't we return an ObjectQuery< with CompiledQuery? Then set the includes on that? Well, that what I would have thought so as well: static readonly Func<Entities, int, ObjectQuery<Dog>> query_GetDog = CompiledQuery.Compile<Entities, int, string, ObjectQuery<Dog>>((ctx, id) => (ObjectQuery<Dog>)(from dog in ctx.DogSet where dog.ID == id select dog)); public Dog GetDog( int id, string include ) { ObjectQuery<Dog> oQuery = query_GetDog(id); oQuery = oQuery.IncludeMany(include); return oQuery.FirstOrDefault; } That should have worked, except that when you call IncludeMany (or Include, Where, OrderBy...) you invalidate the cached compiled query because it is an entirely new one now! So, the expression tree needs to be reparsed and you get that performance hit again. So what is the solution? You simply cannot use CompiledQueries with parametrized Includes. Use EntitySQL instead. This doesn't mean that there aren't uses for CompiledQueries. It is great for localized queries that will always be called in the same context. Ideally CompiledQuery should always be used because the syntax is checked at compile time, but due to limitation, that's not possible. An example of use would be: you may want to have a page that queries which two dogs have the same favorite food, which is a bit narrow for a BusinessLayer function, so you put it in your page and know exactly what type of includes are required. Passing more than 3 parameters to a CompiledQuery Func is limited to 5 parameters, of which the last one is the return type and the first one is your Entities object from the model. So that leaves you with 3 parameters. A pitance, but it can be improved on very easily. public struct MyParams { public string param1; public int param2; public DateTime param3; } static readonly Func<Entities, MyParams, IEnumerable<Dog>> query_GetDog = CompiledQuery.Compile<Entities, MyParams, IEnumerable<Dog>>((ctx, myParams) => from dog in ctx.DogSet where dog.Age == myParams.param2 && dog.Name == myParams.param1 and dog.BirthDate > myParams.param3 select dog); public List<Dog> GetSomeDogs( int age, string Name, DateTime birthDate ) { MyParams myParams = new MyParams(); myParams.param1 = name; myParams.param2 = age; myParams.param3 = birthDate; return query_GetDog(YourContext,myParams).ToList(); } Return Types (this does not apply to EntitySQL queries as they aren't compiled at the same time during execution as the CompiledQuery method) Working with Linq, you usually don't force the execution of the query until the very last moment, in case some other functions downstream wants to change the query in some way: static readonly Func<Entities, int, string, IEnumerable<Dog>> query_GetDog = CompiledQuery.Compile<Entities, int, string, IEnumerable<Dog>>((ctx, age, name) => from dog in ctx.DogSet where dog.Age == age && dog.Name == name select dog); public IEnumerable<Dog> GetSomeDogs( int age, string name ) { return query_GetDog(YourContext,age,name); } public void DataBindStuff() { IEnumerable<Dog> dogs = GetSomeDogs(4,"Bud"); // but I want the dogs ordered by BirthDate gridView.DataSource = dogs.OrderBy( it => it.BirthDate ); } What is going to happen here? By still playing with the original ObjectQuery (that is the actual return type of the Linq statement, which implements IEnumerable), it will invalidate the compiled query and be force to re-parse. So, the rule of thumb is to return a List< of objects instead. static readonly Func<Entities, int, string, IEnumerable<Dog>> query_GetDog = CompiledQuery.Compile<Entities, int, string, IEnumerable<Dog>>((ctx, age, name) => from dog in ctx.DogSet where dog.Age == age && dog.Name == name select dog); public List<Dog> GetSomeDogs( int age, string name ) { return query_GetDog(YourContext,age,name).ToList(); //<== change here } public void DataBindStuff() { List<Dog> dogs = GetSomeDogs(4,"Bud"); // but I want the dogs ordered by BirthDate gridView.DataSource = dogs.OrderBy( it => it.BirthDate ); } When you call ToList(), the query gets executed as per the compiled query and then, later, the OrderBy is executed against the objects in memory. It may be a little bit slower, but I'm not even sure. One sure thing is that you have no worries about mis-handling the ObjectQuery and invalidating the compiled query plan. Once again, that is not a blanket statement. ToList() is a defensive programming trick, but if you have a valid reason not to use ToList(), go ahead. There are many cases in which you would want to refine the query before executing it. Performance What is the performance impact of compiling a query? It can actually be fairly large. A rule of thumb is that compiling and caching the query for reuse takes at least double the time of simply executing it without caching. For complex queries (read inherirante), I have seen upwards to 10 seconds. So, the first time a pre-compiled query gets called, you get a performance hit. After that first hit, performance is noticeably better than the same non-pre-compiled query. Practically the same as Linq2Sql When you load a page with pre-compiled queries the first time you will get a hit. It will load in maybe 5-15 seconds (obviously more than one pre-compiled queries will end up being called), while subsequent loads will take less than 300ms. Dramatic difference, and it is up to you to decide if it is ok for your first user to take a hit or you want a script to call your pages to force a compilation of the queries. Can this query be cached? { Dog dog = from dog in YourContext.DogSet where dog.ID == id select dog; } No, ad-hoc Linq queries are not cached and you will incur the cost of generating the tree every single time you call it. Parametrized Queries Most search capabilities involve heavily parametrized queries. There are even libraries available that will let you build a parametrized query out of lamba expressions. The problem is that you cannot use pre-compiled queries with those. One way around that is to map out all the possible criteria in the query and flag which one you want to use: public struct MyParams { public string name; public bool checkName; public int age; public bool checkAge; } static readonly Func<Entities, MyParams, IEnumerable<Dog>> query_GetDog = CompiledQuery.Compile<Entities, MyParams, IEnumerable<Dog>>((ctx, myParams) => from dog in ctx.DogSet where (myParams.checkAge == true && dog.Age == myParams.age) && (myParams.checkName == true && dog.Name == myParams.name ) select dog); protected List<Dog> GetSomeDogs() { MyParams myParams = new MyParams(); myParams.name = "Bud"; myParams.checkName = true; myParams.age = 0; myParams.checkAge = false; return query_GetDog(YourContext,myParams).ToList(); } The advantage here is that you get all the benifits of a pre-compiled quert. The disadvantages are that you most likely will end up with a where clause that is pretty difficult to maintain, that you will incur a bigger penalty for pre-compiling the query and that each query you run is not as efficient as it could be (particularly with joins thrown in). Another way is to build an EntitySQL query piece by piece, like we all did with SQL. protected List<Dod> GetSomeDogs( string name, int age) { string query = "select value dog from Entities.DogSet where 1 = 1 "; if( !String.IsNullOrEmpty(name) ) query = query + " and dog.Name == @Name "; if( age > 0 ) query = query + " and dog.Age == @Age "; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>( query, YourContext ); if( !String.IsNullOrEmpty(name) ) oQuery.Parameters.Add( new ObjectParameter( "Name", name ) ); if( age > 0 ) oQuery.Parameters.Add( new ObjectParameter( "Age", age ) ); return oQuery.ToList(); } Here the problems are: - there is no syntax checking during compilation - each different combination of parameters generate a different query which will need to be pre-compiled when it is first run. In this case, there are only 4 different possible queries (no params, age-only, name-only and both params), but you can see that there can be way more with a normal world search. - Noone likes to concatenate strings! Another option is to query a large subset of the data and then narrow it down in memory. This is particularly useful if you are working with a definite subset of the data, like all the dogs in a city. You know there are a lot but you also know there aren't that many... so your CityDog search page can load all the dogs for the city in memory, which is a single pre-compiled query and then refine the results protected List<Dod> GetSomeDogs( string name, int age, string city) { string query = "select value dog from Entities.DogSet where dog.Owner.Address.City == @City "; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>( query, YourContext ); oQuery.Parameters.Add( new ObjectParameter( "City", city ) ); List<Dog> dogs = oQuery.ToList(); if( !String.IsNullOrEmpty(name) ) dogs = dogs.Where( it => it.Name == name ); if( age > 0 ) dogs = dogs.Where( it => it.Age == age ); return dogs; } It is particularly useful when you start displaying all the data then allow for filtering. Problems: - Could lead to serious data transfer if you are not careful about your subset. - You can only filter on the data that you returned. It means that if you don't return the Dog.Owner association, you will not be able to filter on the Dog.Owner.Name So what is the best solution? There isn't any. You need to pick the solution that works best for you and your problem: - Use lambda-based query building when you don't care about pre-compiling your queries. - Use fully-defined pre-compiled Linq query when your object structure is not too complex. - Use EntitySQL/string concatenation when the structure could be complex and when the possible number of different resulting queries are small (which means fewer pre-compilation hits). - Use in-memory filtering when you are working with a smallish subset of the data or when you had to fetch all of the data on the data at first anyway (if the performance is fine with all the data, then filtering in memory will not cause any time to be spent in the db). Singleton access The best way to deal with your context and entities accross all your pages is to use the singleton pattern: public sealed class YourContext { private const string instanceKey = "On3GoModelKey"; YourContext(){} public static YourEntities Instance { get { HttpContext context = HttpContext.Current; if( context == null ) return Nested.instance; if (context.Items[instanceKey] == null) { On3GoEntities entity = new On3GoEntities(); context.Items[instanceKey] = entity; } return (YourEntities)context.Items[instanceKey]; } } class Nested { // Explicit static constructor to tell C# compiler // not to mark type as beforefieldinit static Nested() { } internal static readonly YourEntities instance = new YourEntities(); } } NoTracking, is it worth it? When executing a query, you can tell the framework to track the objects it will return or not. What does it mean? With tracking enabled (the default option), the framework will track what is going on with the object (has it been modified? Created? Deleted?) and will also link objects together, when further queries are made from the database, which is what is of interest here. For example, lets assume that Dog with ID == 2 has an owner which ID == 10. Dog dog = (from dog in YourContext.DogSet where dog.ID == 2 select dog).FirstOrDefault(); //dog.OwnerReference.IsLoaded == false; Person owner = (from o in YourContext.PersonSet where o.ID == 10 select dog).FirstOrDefault(); //dog.OwnerReference.IsLoaded == true; If we were to do the same with no tracking, the result would be different. ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>) (from dog in YourContext.DogSet where dog.ID == 2 select dog); oDogQuery.MergeOption = MergeOption.NoTracking; Dog dog = oDogQuery.FirstOrDefault(); //dog.OwnerReference.IsLoaded == false; ObjectQuery<Person> oPersonQuery = (ObjectQuery<Person>) (from o in YourContext.PersonSet where o.ID == 10 select o); oPersonQuery.MergeOption = MergeOption.NoTracking; Owner owner = oPersonQuery.FirstOrDefault(); //dog.OwnerReference.IsLoaded == false; Tracking is very useful and in a perfect world without performance issue, it would always be on. But in this world, there is a price for it, in terms of performance. So, should you use NoTracking to speed things up? It depends on what you are planning to use the data for. Is there any chance that the data your query with NoTracking can be used to make update/insert/delete in the database? If so, don't use NoTracking because associations are not tracked and will causes exceptions to be thrown. In a page where there are absolutly no updates to the database, you can use NoTracking. Mixing tracking and NoTracking is possible, but it requires you to be extra careful with updates/inserts/deletes. The problem is that if you mix then you risk having the framework trying to Attach() a NoTracking object to the context where another copy of the same object exist with tracking on. Basicly, what I am saying is that Dog dog1 = (from dog in YourContext.DogSet where dog.ID == 2).FirstOrDefault(); ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>) (from dog in YourContext.DogSet where dog.ID == 2 select dog); oDogQuery.MergeOption = MergeOption.NoTracking; Dog dog2 = oDogQuery.FirstOrDefault(); dog1 and dog2 are 2 different objects, one tracked and one not. Using the detached object in an update/insert will force an Attach() that will say "Wait a minute, I do already have an object here with the same database key. Fail". And when you Attach() one object, all of its hierarchy gets attached as well, causing problems everywhere. Be extra careful. How much faster is it with NoTracking It depends on the queries. Some are much more succeptible to tracking than other. I don't have a fast an easy rule for it, but it helps. So I should use NoTracking everywhere then? Not exactly. There are some advantages to tracking object. The first one is that the object is cached, so subsequent call for that object will not hit the database. That cache is only valid for the lifetime of the YourEntities object, which, if you use the singleton code above, is the same as the page lifetime. One page request == one YourEntity object. So for multiple calls for the same object, it will load only once per page request. (Other caching mechanism could extend that). What happens when you are using NoTracking and try to load the same object multiple times? The database will be queried each time, so there is an impact there. How often do/should you call for the same object during a single page request? As little as possible of course, but it does happens. Also remember the piece above about having the associations connected automatically for your? You don't have that with NoTracking, so if you load your data in multiple batches, you will not have a link to between them: ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>)(from dog in YourContext.DogSet select dog); oDogQuery.MergeOption = MergeOption.NoTracking; List<Dog> dogs = oDogQuery.ToList(); ObjectQuery<Person> oPersonQuery = (ObjectQuery<Person>)(from o in YourContext.PersonSet select o); oPersonQuery.MergeOption = MergeOption.NoTracking; List<Person> owners = oPersonQuery.ToList(); In this case, no dog will have its .Owner property set. Some things to keep in mind when you are trying to optimize the performance. No lazy loading, what am I to do? This can be seen as a blessing in disguise. Of course it is annoying to load everything manually. However, it decreases the number of calls to the db and forces you to think about when you should load data. The more you can load in one database call the better. That was always true, but it is enforced now with this 'feature' of EF. Of course, you can call if( !ObjectReference.IsLoaded ) ObjectReference.Load(); if you want to, but a better practice is to force the framework to load the objects you know you will need in one shot. This is where the discussion about parametrized Includes begins to make sense. Lets say you have you Dog object public class Dog { public Dog Get(int id) { return YourContext.DogSet.FirstOrDefault(it => it.ID == id ); } } This is the type of function you work with all the time. It gets called from all over the place and once you have that Dog object, you will do very different things to it in different functions. First, it should be pre-compiled, because you will call that very often. Second, each different pages will want to have access to a different subset of the Dog data. Some will want the Owner, some the FavoriteToy, etc. Of course, you could call Load() for each reference you need anytime you need one. But that will generate a call to the database each time. Bad idea. So instead, each page will ask for the data it wants to see when it first request for the Dog object: static public Dog Get(int id) { return GetDog(entity,"");} static public Dog Get(int id, string includePath) { string query = "select value o " + " from YourEntities.DogSet as o " +

Read the article

socket operation on nonsocket or bad file descriptor

- by Magn3s1um

I'm writing a pthread server which takes requests from clients and sends them back a bunch of .ppm files. Everything seems to go well, but sometimes when I have just 1 client connected, when trying to read from the file descriptor (for the file), it says Bad file Descriptor. This doesn't make sense, since my int fd isn't -1, and the file most certainly exists. Other times, I get this "Socket operation on nonsocket" error. This is weird because other times, it doesn't give me this error and everything works fine. When trying to connect multiple clients, for some reason, it will only send correctly to one, and then the other client gets the bad file descriptor or "nonsocket" error, even though both threads are processing the same messages and do the same routines. Anyone have an idea why? Here's the code that is giving me that error: while(mqueue.head != mqueue.tail && count < dis_m){ printf("Sending to client %s: %s\n", pointer->id, pointer->message); int fd; fd = open(pointer->message, O_RDONLY); char buf[58368]; int bytesRead; printf("This is fd %d\n", fd); bytesRead=read(fd,buf,58368); send(pointer->socket,buf,bytesRead,0); perror("Error:\n"); fflush(stdout); close(fd); mqueue.mcount--; mqueue.head = mqueue.head->next; free(pointer->message); free(pointer); pointer = mqueue.head; count++; } printf("Sending %s\n", pointer->message); int fd; fd = open(pointer->message, O_RDONLY); printf("This is fd %d\n", fd); printf("I am hhere2\n"); char buf[58368]; int bytesRead; bytesRead=read(fd,buf,58368); send(pointer->socket,buf,bytesRead,0); perror("Error:\n"); close(fd); mqueue.mcount--; if(mqueue.head != mqueue.tail){ mqueue.head = mqueue.head->next; } else{ mqueue.head->next = malloc(sizeof(struct message)); mqueue.head = mqueue.head->next; mqueue.head->next = malloc(sizeof(struct message)); mqueue.tail = mqueue.head->next; mqueue.head->message = NULL; } free(pointer->message); free(pointer); pthread_mutex_unlock(&numm); pthread_mutex_unlock(&circ); pthread_mutex_unlock(&slots); The messages for both threads are the same, being of the form ./path/imageXX.ppm where XX is the number that should go to the client. The file size of each image is 58368 bytes. Sometimes, this code hangs on the read, and stops execution. I don't know this would be either, because the file descriptor comes back as valid. Thanks in advanced. Edit: Here's some sample output: Sending to client a: ./support/images/sw90.ppm This is fd 4 Error: : Socket operation on non-socket Sending to client a: ./support/images/sw91.ppm This is fd 4 Error: : Socket operation on non-socket Sending ./support/images/sw92.ppm This is fd 4 I am hhere2 Error: : Socket operation on non-socket My dispatcher has defeated evil Sample with 2 clients (client b was serviced first) Sending to client b: ./support/images/sw87.ppm This is fd 6 Error: : Success Sending to client b: ./support/images/sw88.ppm This is fd 6 Error: : Success Sending to client b: ./support/images/sw89.ppm This is fd 6 Error: : Success This is fd 6 Error: : Bad file descriptor Sending to client a: ./support/images/sw85.ppm This is fd 6 Error: As you can see, who ever is serviced first in this instance can open the files, but not the 2nd person. Edit2: Full code. Sorry, its pretty long and terribly formatted. #include <netinet/in.h> #include <netinet/in.h> #include <netdb.h> #include <arpa/inet.h> #include <sys/types.h> #include <sys/socket.h> #include <errno.h> #include <stdio.h> #include <unistd.h> #include <pthread.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include "ring.h" /* Version 1 Here is what is implemented so far: The threads are created from the arguments specified (number of threads that is) The server will lock and update variables based on how many clients are in the system and such. The socket that is opened when a new client connects, must be passed to the threads. To do this, we need some sort of global array. I did this by specifying an int client and main_pool_busy, and two pointers poolsockets and nonpoolsockets. My thinking on this was that when a new client enters the system, the server thread increments the variable client. When a thread is finished with this client (after it sends it the data), the thread will decrement client and close the socket. HTTP servers act this way sometimes (they terminate the socket as soon as one transmission is sent). *Note down at bottom After the server portion increments the client counter, we must open up a new socket (denoted by new_sd) and get this value to the appropriate thread. To do this, I created global array poolsockets, which will hold all the socket descriptors for our pooled threads. The server portion gets the new socket descriptor, and places the value in the first spot of the array that has a 0. We only place a value in this array IF: 1. The variable main_pool_busy < worknum (If we have more clients in the system than in our pool, it doesn't mean we should always create a new thread. At the end of this, the server signals on the condition variable clientin that a new client has arrived. In our pooled thread, we then must walk this array and check the array until we hit our first non-zero value. This is the socket we will give to that thread. The thread then changes the array to have a zero here. What if our all threads in our pool our busy? If this is the case, then we will know it because our threads in this pool will increment main_pool_busy by one when they are working on a request and decrement it when they are done. If main_pool_busy >= worknum, then we must dynamically create a new thread. Then, we must realloc the size of our nonpoolsockets array by 1 int. We then add the new socket descriptor to our pool. Here's what we need to figure out: NOTE* Each worker should generate 100 messages which specify the worker thread ID, client socket descriptor and a copy of the client message. Additionally, each message should include a message number, starting from 0 and incrementing for each subsequent message sent to the same client. I don't know how to keep track of how many messages were to the same client. Maybe we shouldn't close the socket descriptor, but rather keep an array of structs for each socket that includes how many messages they have been sent. Then, the server adds the struct, the threads remove it, then the threads add it back once they've serviced one request (unless the count is 100). ------------------------------------------------------------- CHANGES Version 1 ---------- NONE: this is the first version. */ #define MAXSLOTS 30 #define dis_m 15 //problems with dis_m ==1 //Function prototypes void inc_clients(); void init_mutex_stuff(pthread_t*, pthread_t*); void *threadpool(void *); void server(int); void add_to_socket_pool(int); void inc_busy(); void dec_busy(); void *dispatcher(); void create_message(long, int, int, char *, char *); void init_ring(); void add_to_ring(char *, char *, int, int, int); int socket_from_string(char *); void add_to_head(char *); void add_to_tail(char *); struct message * reorder(struct message *, struct message *, int); int get_threadid(char *); void delete_socket_messages(int); struct message * merge(struct message *, struct message *, int); int get_request(char *, char *, char*); ///////////////////// //Global mutexes and condition variables pthread_mutex_t startservice; pthread_mutex_t numclients; pthread_mutex_t pool_sockets; pthread_mutex_t nonpool_sockets; pthread_mutex_t m_pool_busy; pthread_mutex_t slots; pthread_mutex_t numm; pthread_mutex_t circ; pthread_cond_t clientin; pthread_cond_t m; /////////////////////////////////////// //Global variables int clients; int main_pool_busy; int * poolsockets, nonpoolsockets; int worknum; struct ring mqueue; /////////////////////////////////////// int main(int argc, char ** argv){ //error handling if not enough arguments to program if(argc != 3){ printf("Not enough arguments to server: ./server portnum NumThreadsinPool\n"); _exit(-1); } //Convert arguments from strings to integer values int port = atoi(argv[1]); worknum = atoi(argv[2]); //Start server portion server(port); } /////////////////////////////////////////////////////////////////////////////////////////////// //The listen server thread///////////////////////////////////////////////////////////////////// /////////////////////////////////////////////////////////////////////////////////////////////// void server(int port){ int sd, new_sd; struct sockaddr_in name, cli_name; int sock_opt_val = 1; int cli_len; pthread_t threads[worknum]; //create our pthread id array pthread_t dis[1]; //create our dispatcher array (necessary to create thread) init_mutex_stuff(threads, dis); //initialize mutexes and stuff //Server setup /////////////////////////////////////////////////////// if ((sd = socket (AF_INET, SOCK_STREAM, 0)) < 0) { perror("(servConn): socket() error"); _exit (-1); } if (setsockopt (sd, SOL_SOCKET, SO_REUSEADDR, (char *) &sock_opt_val, sizeof(sock_opt_val)) < 0) { perror ("(servConn): Failed to set SO_REUSEADDR on INET socket"); _exit (-1); } name.sin_family = AF_INET; name.sin_port = htons (port); name.sin_addr.s_addr = htonl(INADDR_ANY); if (bind (sd, (struct sockaddr *)&name, sizeof(name)) < 0) { perror ("(servConn): bind() error"); _exit (-1); } listen (sd, 5); //End of server Setup ////////////////////////////////////////////////// for (;;) { cli_len = sizeof (cli_name); new_sd = accept (sd, (struct sockaddr *) &cli_name, &cli_len); printf ("Assigning new socket descriptor: %d\n", new_sd); inc_clients(); //New client has come in, increment clients add_to_socket_pool(new_sd); //Add client to the pool of sockets if (new_sd < 0) { perror ("(servConn): accept() error"); _exit (-1); } } pthread_exit(NULL); //Quit } //Adds the new socket to the array designated for pthreads in the pool void add_to_socket_pool(int socket){ pthread_mutex_lock(&m_pool_busy); //Lock so that we can check main_pool_busy int i; //If not all our main pool is busy, then allocate to one of them if(main_pool_busy < worknum){ pthread_mutex_unlock(&m_pool_busy); //unlock busy, we no longer need to hold it pthread_mutex_lock(&pool_sockets); //Lock the socket pool array so that we can edit it without worry for(i = 0; i < worknum; i++){ //Find a poolsocket that is -1; then we should put the real socket there. This value will be changed back to -1 when the thread grabs the sockfd if(poolsockets[i] == -1){ poolsockets[i] = socket; pthread_mutex_unlock(&pool_sockets); //unlock our pool array, we don't need it anymore inc_busy(); //Incrememnt busy (locks the mutex itself) pthread_cond_signal(&clientin); //Signal first thread waiting on a client that a client needs to be serviced break; } } } else{ //Dynamic thread creation goes here pthread_mutex_unlock(&m_pool_busy); } } //Increments the client number. If client number goes over worknum, we must dynamically create new pthreads void inc_clients(){ pthread_mutex_lock(&numclients); clients++; pthread_mutex_unlock(&numclients); } //Increments busy void inc_busy(){ pthread_mutex_lock(&m_pool_busy); main_pool_busy++; pthread_mutex_unlock(&m_pool_busy); } //Initialize all of our mutexes at the beginning and create our pthreads void init_mutex_stuff(pthread_t * threads, pthread_t * dis){ pthread_mutex_init(&startservice, NULL); pthread_mutex_init(&numclients, NULL); pthread_mutex_init(&pool_sockets, NULL); pthread_mutex_init(&nonpool_sockets, NULL); pthread_mutex_init(&m_pool_busy, NULL); pthread_mutex_init(&circ, NULL); pthread_cond_init (&clientin, NULL); main_pool_busy = 0; poolsockets = malloc(sizeof(int)*worknum); int threadreturn; //error checking variables long i = 0; //Loop and create pthreads for(i; i < worknum; i++){ threadreturn = pthread_create(&threads[i], NULL, threadpool, (void *) i); poolsockets[i] = -1; if(threadreturn){ perror("Thread pool created unsuccessfully"); _exit(-1); } } pthread_create(&dis[0], NULL, dispatcher, NULL); } ////////////////////////////////////////////////////////////////////////////////////////// /////////Main pool routines ///////////////////////////////////////////////////////////////////////////////////////// void dec_busy(){ pthread_mutex_lock(&m_pool_busy); main_pool_busy--; pthread_mutex_unlock(&m_pool_busy); } void dec_clients(){ pthread_mutex_lock(&numclients); clients--; pthread_mutex_unlock(&numclients); } //This is what our threadpool pthreads will be running. void *threadpool(void * threadid){ long id = (long) threadid; //Id of this thread int i; int socket; int counter = 0; //Try and gain access to the next client that comes in and wait until server signals that a client as arrived while(1){ pthread_mutex_lock(&startservice); //lock start service (required for cond wait) pthread_cond_wait(&clientin, &startservice); //wait for signal from server that client exists pthread_mutex_unlock(&startservice); //unlock mutex. pthread_mutex_lock(&pool_sockets); //Lock the pool socket so we can get the socket fd unhindered/interrupted for(i = 0; i < worknum; i++){ if(poolsockets[i] != -1){ socket = poolsockets[i]; poolsockets[i] = -1; pthread_mutex_unlock(&pool_sockets); } } printf("Thread #%d is past getting the socket\n", id); int incoming = 1; while(counter < 100 && incoming != 0){ char buffer[512]; bzero(buffer,512); int startcounter = 0; incoming = read(socket, buffer, 512); if(buffer[0] != 0){ //client ID:priority:request:arguments char id[100]; long prior; char request[100]; char arg1[100]; char message[100]; char arg2[100]; char * point; point = strtok(buffer, ":"); strcpy(id, point); point = strtok(NULL, ":"); prior = atoi(point); point = strtok(NULL, ":"); strcpy(request, point); point = strtok(NULL, ":"); strcpy(arg1, point); point = strtok(NULL, ":"); if(point != NULL){ strcpy(arg2, point); } int fd; if(strcmp(request, "start_movie") == 0){ int count = 1; while(count <= 100){ char temp[10]; snprintf(temp, 50, "%d\0", count); strcpy(message, "./support/images/"); strcat(message, arg1); strcat(message, temp); strcat(message, ".ppm"); printf("This is message %s to %s\n", message, id); count++; add_to_ring(message, id, prior, counter, socket); //Adds our created message to the ring counter++; } printf("I'm out of the loop\n"); } else if(strcmp(request, "seek_movie") == 0){ int count = atoi(arg2); while(count <= 100){ char temp[10]; snprintf(temp, 10, "%d\0", count); strcpy(message, "./support/images/"); strcat(message, arg1); strcat(message, temp); strcat(message, ".ppm"); printf("This is message %s\n", message); count++; } } //create_message(id, socket, counter, buffer, message); //Creates our message from the input from the client. Stores it in buffer } else{ delete_socket_messages(socket); break; } } counter = 0; close(socket);//Zero out counter again } dec_clients(); //client serviced, decrement clients dec_busy(); //thread finished, decrement busy } //Creates a message void create_message(long threadid, int socket, int counter, char * buffer, char * message){ snprintf(message, strlen(buffer)+15, "%d:%d:%d:%s", threadid, socket, counter, buffer); } //Gets the socket from the message string (maybe I should just pass in the socket to another method) int socket_from_string(char * message){ char * substr1 = strstr(message, ":"); char * substr2 = substr1; substr2++; int occurance = strcspn(substr2, ":"); char sock[10]; strncpy(sock, substr2, occurance); return atoi(sock); } //Adds message to our ring buffer's head void add_to_head(char * message){ printf("Adding to head of ring\n"); mqueue.head->message = malloc(strlen(message)+1); //Allocate space for message strcpy(mqueue.head->message, message); //copy bytes into allocated space } //Adds our message to our ring buffer's tail void add_to_tail(char * message){ printf("Adding to tail of ring\n"); mqueue.tail->message = malloc(strlen(message)+1); //allocate space for message strcpy(mqueue.tail->message, message); //copy bytes into allocated space mqueue.tail->next = malloc(sizeof(struct message)); //allocate space for the next message struct } //Adds a message to our ring void add_to_ring(char * message, char * id, int prior, int mnum, int socket){ //printf("This is message %s:" , message); pthread_mutex_lock(&circ); //Lock the ring buffer pthread_mutex_lock(&numm); //Lock the message count (will need this to make sure we can't fill the buffer over the max slots) if(mqueue.head->message == NULL){ add_to_head(message); //Adds it to head mqueue.head->socket = socket; //Set message socket mqueue.head->priority = prior; //Set its priority (thread id) mqueue.head->mnum = mnum; //Set its message number (used for sorting) mqueue.head->id = malloc(sizeof(id)); strcpy(mqueue.head->id, id); } else if(mqueue.tail->message == NULL){ //This is the problem for dis_m 1 I'm pretty sure add_to_tail(message); mqueue.tail->socket = socket; mqueue.tail->priority = prior; mqueue.tail->mnum = mnum; mqueue.tail->id = malloc(sizeof(id)); strcpy(mqueue.tail->id, id); } else{ mqueue.tail->next = malloc(sizeof(struct message)); mqueue.tail = mqueue.tail->next; add_to_tail(message); mqueue.tail->socket = socket; mqueue.tail->priority = prior; mqueue.tail->mnum = mnum; mqueue.tail->id = malloc(sizeof(id)); strcpy(mqueue.tail->id, id); } mqueue.mcount++; pthread_mutex_unlock(&circ); if(mqueue.mcount >= dis_m){ pthread_mutex_unlock(&numm); pthread_cond_signal(&m); } else{ pthread_mutex_unlock(&numm); } printf("out of add to ring\n"); fflush(stdout); } ////////////////////////////////// //Dispatcher routines ///////////////////////////////// void *dispatcher(){ init_ring(); while(1){ pthread_mutex_lock(&slots); pthread_cond_wait(&m, &slots); pthread_mutex_lock(&numm); pthread_mutex_lock(&circ); printf("Dispatcher to the rescue!\n"); mqueue.head = reorder(mqueue.head, mqueue.tail, mqueue.mcount); //printf("This is the head %s\n", mqueue.head->message); //printf("This is the tail %s\n", mqueue.head->message); fflush(stdout); struct message * pointer = mqueue.head; int count = 0; while(mqueue.head != mqueue.tail && count < dis_m){ printf("Sending to client %s: %s\n", pointer->id, pointer->message); int fd; fd = open(pointer->message, O_RDONLY); char buf[58368]; int bytesRead; printf("This is fd %d\n", fd); bytesRead=read(fd,buf,58368); send(pointer->socket,buf,bytesRead,0); perror("Error:\n"); fflush(stdout); close(fd); mqueue.mcount--; mqueue.head = mqueue.head->next; free(pointer->message); free(pointer); pointer = mqueue.head; count++; } printf("Sending %s\n", pointer->message); int fd; fd = open(pointer->message, O_RDONLY); printf("This is fd %d\n", fd); printf("I am hhere2\n"); char buf[58368]; int bytesRead; bytesRead=read(fd,buf,58368); send(pointer->socket,buf,bytesRead,0); perror("Error:\n"); close(fd); mqueue.mcount--; if(mqueue.head != mqueue.tail){ mqueue.head = mqueue.head->next; } else{ mqueue.head->next = malloc(sizeof(struct message)); mqueue.head = mqueue.head->next; mqueue.head->next = malloc(sizeof(struct message)); mqueue.tail = mqueue.head->next; mqueue.head->message = NULL; } free(pointer->message); free(pointer); pthread_mutex_unlock(&numm); pthread_mutex_unlock(&circ); pthread_mutex_unlock(&slots); printf("My dispatcher has defeated evil\n"); } } void init_ring(){ mqueue.head = malloc(sizeof(struct message)); mqueue.head->next = malloc(sizeof(struct message)); mqueue.tail = mqueue.head->next; mqueue.mcount = 0; } struct message * reorder(struct message * begin, struct message * end, int num){ //printf("I am reordering for size %d\n", num); fflush(stdout); int i; if(num == 1){ //printf("Begin: %s\n", begin->message); begin->next = NULL; return begin; } else{ struct message * left = begin; struct message * right; int middle = num/2; for(i = 1; i < middle; i++){ left = left->next; } right = left -> next; left -> next = NULL; //printf("Begin: %s\nLeft: %s\nright: %s\nend:%s\n", begin->message, left->message, right->message, end->message); left = reorder(begin, left, middle); if(num%2 != 0){ right = reorder(right, end, middle+1); } else{ right = reorder(right, end, middle); } return merge(left, right, num); } } struct message * merge(struct message * left, struct message * right, int num){ //printf("I am merginging! left: %s %d, right: %s %dnum: %d\n", left->message,left->priority, right->message, right->priority, num); struct message * start, * point; int lenL= 0; int lenR = 0; int flagL = 0; int flagR = 0; int count = 0; int middle1 = num/2; int middle2; if(num%2 != 0){ middle2 = middle1+1; } else{ middle2 = middle1; } while(lenL < middle1 && lenR < middle2){ count++; //printf("In here for count %d\n", count); if(lenL == 0 && lenR == 0){ if(left->priority < right->priority){ start = left; //Set the start point point = left; //set our enum; left = left->next; //move the left pointer point->next = NULL; //Set the next node to NULL lenL++; } else if(left->priority > right->priority){ start = right; point = right; right = right->next; point->next = NULL; lenR++; } else{ if(left->mnum < right->mnum){ ////printf("This is where we are\n"); start = left; //Set the start point point = left; //set our enum; left = left->next; //move the left pointer point->next = NULL; //Set the next node to NULL lenL++; } else{ start = right; point = right; right = right->next; point->next = NULL; lenR++; } } } else{ if(left->priority < right->priority){ point->next = left; left = left->next; //move the left pointer point = point->next; point->next = NULL; //Set the next node to NULL lenL++; } else if(left->priority > right->priority){ point->next = right; right = right->next; point = point->next; point->next = NULL; lenR++; } else{ if(left->mnum < right->mnum){ point->next = left; //set our enum; left = left->next; point = point->next;//move the left pointer point->next = NULL; //Set the next node to NULL lenL++; } else{ point->next = right; right = right->next; point = point->next; point->next = NULL; lenR++; } } } if(lenL == middle1){ flagL = 1; break; } if(lenR == middle2){ flagR = 1; break; } } if(flagL == 1){ point->next = right; point = point->next; for(lenR; lenR< middle2-1; lenR++){ point = point->next; } point->next = NULL; mqueue.tail = point; } else{ point->next = left; point = point->next; for(lenL; lenL< middle1-1; lenL++){ point = point->next; } point->next = NULL; mqueue.tail = point; } //printf("This is the start %s\n", start->message); //printf("This is mqueue.tail %s\n", mqueue.tail->message); return start; } void delete_socket_messages(int a){ }

Developer IT