Search Results

Search found 395 results on 16 pages for 'truth'.

Page 16/16 | < Previous Page | 12 13 14 15 16

Building Simple Workflows in Oozie

- by dan.mcclary

Introduction More often than not, data doesn't come packaged exactly as we'd like it for analysis. Transformation, match-merge operations, and a host of data munging tasks are usually needed before we can extract insights from our Big Data sources. Few people find data munging exciting, but it has to be done. Once we've suffered that boredom, we should take steps to automate the process. We want codify our work into repeatable units and create workflows which we can leverage over and over again without having to write new code. In this article, we'll look at how to use Oozie to create a workflow for the parallel machine learning task I described on Cloudera's site. Hive Actions: Prepping for Pig In my parallel machine learning article, I use data from the National Climatic Data Center to build weather models on a state-by-state basis. NCDC makes the data freely available as gzipped files of day-over-day observations stretching from the 1930s to today. In reading that post, one might get the impression that the data came in a handy, ready-to-model files with convenient delimiters. The truth of it is that I need to perform some parsing and projection on the dataset before it can be modeled. If I get more observations, I'll want to retrain and test those models, which will require more parsing and projection. This is a good opportunity to start building up a workflow with Oozie. I store the data from the NCDC in HDFS and create an external Hive table partitioned by year. This gives me flexibility of Hive's query language when I want it, but let's me put the dataset in a directory of my choosing in case I want to treat the same data with Pig or MapReduce code. CREATE EXTERNAL TABLE IF NOT EXISTS historic_weather(column 1, column2) PARTITIONED BY (yr string) STORED AS ... LOCATION '/user/oracle/weather/historic'; As new weather data comes in from NCDC, I'll need to add partitions to my table. That's an action I should put in the workflow. Similarly, the weather data requires parsing in order to be useful as a set of columns. Because of their long history, the weather data is broken up into fields of specific byte lengths: x bytes for the station ID, y bytes for the dew point, and so on. The delimiting is consistent from year to year, so writing SerDe or a parser for transformation is simple. Once that's done, I want to select columns on which to train, classify certain features, and place the training data in an HDFS directory for my Pig script to access. ALTER TABLE historic_weather ADD IF NOT EXISTS PARTITION (yr='2010') LOCATION '/user/oracle/weather/historic/yr=2011'; INSERT OVERWRITE DIRECTORY '/user/oracle/weather/cleaned_history' SELECT w.stn, w.wban, w.weather_year, w.weather_month, w.weather_day, w.temp, w.dewp, w.weather FROM ( FROM historic_weather SELECT TRANSFORM(...) USING '/path/to/hive/filters/ncdc_parser.py' as stn, wban, weather_year, weather_month, weather_day, temp, dewp, weather ) w; Since I'm going to prepare training directories with at least the same frequency that I add partitions, I should also add that to my workflow. Oozie is going to invoke these Hive actions using what's somewhat obviously referred to as a Hive action. Hive actions amount to Oozie running a script file containing our query language statements, so we can place them in a file called weather_train.hql. Starting Our Workflow Oozie offers two types of jobs: workflows and coordinator jobs. Workflows are straightforward: they define a set of actions to perform as a sequence or directed acyclic graph. Coordinator jobs can take all the same actions of Workflow jobs, but they can be automatically started either periodically or when new data arrives in a specified location. To keep things simple we'll make a workflow job; coordinator jobs simply require another XML file for scheduling. The bare minimum for workflow XML defines a name, a starting point, and an end point: <workflow-app name="WeatherMan" xmlns="uri:oozie:workflow:0.1"> <start to="ParseNCDCData"/> <end name="end"/> </workflow-app> To this we need to add an action, and within that we'll specify the hive parameters Also, keep in mind that actions require <ok> and <error> tags to direct the next action on success or failure. <action name="ParseNCDCData"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>localhost:8021</job-tracker> <name-node>localhost:8020</name-node> <configuration> <property> <name>oozie.hive.defaults</name> <value>/user/oracle/weather_ooze/hive-default.xml</value> </property> </configuration> <script>ncdc_parse.hql</script> </hive> <ok to="WeatherMan"/> <error to="end"/> </action> There are a couple of things to note here: I have to give the FQDN (or IP) and port of my JobTracker and NameNode. I have to include a hive-default.xml file. I have to include a script file. The hive-default.xml and script file must be stored in HDFS That last point is particularly important. Oozie doesn't make assumptions about where a given workflow is being run. You might submit workflows against different clusters, or have different hive-defaults.xml on different clusters (e.g. MySQL or Postgres-backed metastores). A quick way to ensure that all the assets end up in the right place in HDFS is just to make a working directory locally, build your workflow.xml in it, and copy the assets you'll need to it as you add actions to workflow.xml. At this point, our local directory should contain: workflow.xml hive-defaults.xml (make sure this file contains your metastore connection data) ncdc_parse.hql Adding Pig to the Ooze Adding our Pig script as an action is slightly simpler from an XML standpoint. All we do is add an action to workflow.xml as follows: <action name="WeatherMan"> <pig> <job-tracker>localhost:8021</job-tracker> <name-node>localhost:8020</name-node> <script>weather_train.pig</script> </pig> <ok to="end"/> <error to="end"/> </action> Once we've done this, we'll copy weather_train.pig to our working directory. However, there's a bit of a "gotcha" here. My pig script registers the Weka Jar and a chunk of jython. If those aren't also in HDFS, our action will fail from the outset -- but where do we put them? The Jython script goes into the working directory at the same level as the pig script, because pig attempts to load Jython files in the directory from which the script executes. However, that's not where our Weka jar goes. While Oozie doesn't assume much, it does make an assumption about the Pig classpath. Anything under working_directory/lib gets automatically added to the Pig classpath and no longer requires a REGISTER statement in the script. Anything that uses a REGISTER statement cannot be in the working_directory/lib directory. Instead, it needs to be in a different HDFS directory and attached to the pig action with an <archive> tag. Yes, that's as confusing as you think it is. You can get the exact rules for adding Jars to the distributed cache from Oozie's Pig Cookbook. Making the Workflow Work We've got a workflow defined and have collected all the components we'll need to run. But we can't run anything yet, because we still have to define some properties about the job and submit it to Oozie. We need to start with the job properties, as this is essentially the "request" we'll submit to the Oozie server. In the same working directory, we'll make a file called job.properties as follows: nameNode=hdfs://localhost:8020 jobTracker=localhost:8021 queueName=default weatherRoot=weather_ooze mapreduce.jobtracker.kerberos.principal=foo dfs.namenode.kerberos.principal=foo oozie.libpath=${nameNode}/user/oozie/share/lib oozie.wf.application.path=${nameNode}/user/${user.name}/${weatherRoot} outputDir=weather-ooze While some of the pieces of the properties file are familiar (e.g., JobTracker address), others take a bit of explaining. The first is weatherRoot: this is essentially an environment variable for the script (as are jobTracker and queueName). We're simply using them to simplify the directives for the Oozie job. The oozie.libpath pieces is extremely important. This is a directory in HDFS which holds Oozie's shared libraries: a collection of Jars necessary for invoking Hive, Pig, and other actions. It's a good idea to make sure this has been installed and copied up to HDFS. The last two lines are straightforward: run the application defined by workflow.xml at the application path listed and write the output to the output directory. We're finally ready to submit our job! After all that work we only need to do a few more things: Validate our workflow.xml Copy our working directory to HDFS Submit our job to the Oozie server Run our workflow Let's do them in order. First validate the workflow: oozie validate workflow.xml Next, copy the working directory up to HDFS: hadoop fs -put working_dir /user/oracle/working_dir Now we submit the job to the Oozie server. We need to ensure that we've got the correct URL for the Oozie server, and we need to specify our job.properties file as an argument. oozie job -oozie http://url.to.oozie.server:port_number/ -config /path/to/working_dir/job.properties -submit We've submitted the job, but we don't see any activity on the JobTracker? All I got was this funny bit of output: 14-20120525161321-oozie-oracle This is because submitting a job to Oozie creates an entry for the job and places it in PREP status. What we got back, in essence, is a ticket for our workflow to ride the Oozie train. We're responsible for redeeming our ticket and running the job. oozie -oozie http://url.to.oozie.server:port_number/ -start 14-20120525161321-oozie-oracle Of course, if we really want to run the job from the outset, we can change the "-submit" argument above to "-run." This will prep and run the workflow immediately. Takeaway So, there you have it: the somewhat laborious process of building an Oozie workflow. It's a bit tedious the first time out, but it does present a pair of real benefits to those of us who spend a great deal of time data munging. First, when new data arrives that requires the same processing, we already have the workflow defined and ready to run. Second, as we build up a set of useful action definitions over time, creating new workflows becomes quicker and quicker.

Read the article
Launching mysql server: same permissions for root and for user

- by toinbis

Hi folks, have been directed here from stackoverflow here, am reposting the question and adding my.cnf at the end of a post. so far in my 10+ years experience with linux, all the permission problems I've ever encountered, have been successfully solved with chmod -R 777 /path/where/the/problem/has/occured (every lie has a grain of truth in it :) This time the trick doesn't work, so I'm turning to you for help. I'm compiling mysql server from scratch with zc.buildout (www . buildout . org). I do launch it by executing /home/toinbis/.../parts/mysql/bin/mysqld_safe, this works. The thing is that i'll be launching this from within supervisor (supervisord . org) script, and when used on the deployment server, it'll need it to be launched with root permissions(so that nginx server, launched with the same script, would have access to 80 port). The problem is that sudo /home/toinbis/.../parts/mysql/bin/mysqld_safe, fails, generating the error, posted bellow, in mysql error log (apache and nginx works as expected). http://lists.mysql.com/mysql/216045 suggests, that "there are two errors: A missing table and a file system that mysqld doesn't have access to". Mysqldatadir and all the mysql server binary files has 777 permissions, talbe mysql.plugin does exist and has 777 permissions (why Can't open the mysql.plugin table?), "sudo touch mysql_datadir/tmp/file" does create file (why Can't create/write to file /home/toinbis/.../runtime/mysql_datadir/tmp/ib4e9Huz?). chgrp -R mysql mysql_datadir and adding "root, toinbis, mysql" users to mysql group ( cat /etc/group | grep mysql outputs mysql:x:124:root,toinbis,mysql) has no effect - when i launch it as a casual user, it starts, when as a root - it fails. Does mysql server, even started as root, tries to operate as other, let's say, 'mysql' user? but even in that case, adding mysql user to mysql group and making all the mysql_datadirs files belong to mysql group should make things work smoothly. I do know that it might be a better idea to simply to launch one the nginx as root and mysql - as just a user, but this error irritated me enough so to devote enough energy so not to only "make things work", but to also make things work exactly as i wanted it initially, so to have a proof of concept that it's possible. and this is the generated error: 091213 20:02:55 mysqld_safe Starting mysqld daemon with databases from /home/toinbis/.../runtime/mysql_datadir /home/toinbis/.../parts/mysql/libexec/mysqld: Table 'plugin' is read only 091213 20:02:55 [ERROR] Can't open the mysql.plugin table. Please run mysql_upgrade to create it. /home/toinbis/.../parts/mysql/libexec/mysqld: Can't create/write to file '/home/toinbis/.../runtime/mysql_datadir/tmp/ib4e9Huz' (Errcode: 13) 091213 20:02:55 InnoDB: Error: unable to create temporary file; errno: 13 091213 20:02:55 [ERROR] Plugin 'InnoDB' init function returned error. 091213 20:02:55 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. 091213 20:02:55 [ERROR] Can't start server : Bind on unix socket: Permission denied 091213 20:02:55 [ERROR] Do you already have another mysqld server running on socket: /home/toinbis/.../runtime/var/pids/mysql.sock ? 091213 20:02:55 [ERROR] Aborting 091213 20:02:55 [Note] /home/toinbis/.../parts/mysql/libexec/mysqld: Shutdown complete 091213 20:02:55 mysqld_safe mysqld from pid file /home/toinbis/.../runtime/var/pids/mysql.pid ended My my.cnf (the basedir and datadir(including tempdir) have chmod -R 777 permissions) : [client] socket = /home/toinbis/.../runtime/var/pids/mysql.sock port = 8002 [mysqld_safe] socket = /home/toinbis/.../runtime/var/pids/mysql.sock nice = 0 [mysqld] # # * Basic Settings # socket = /home/toinbis/.../runtime/var/pids/mysql.sock port = 8002 pid-file = /home/toinbis/.../runtime/var/pids/mysql.pid basedir = /home/toinbis/.../parts/mysql datadir = /home/toinbis/.../runtime/mysql_datadir tmpdir = /home/toinbis/.../runtime/mysql_datadir/tmp skip-external-locking bind-address = 127.0.0.1 log-error =/home/toinbis/.../runtime/logs/mysql_errorlog # # * Fine Tuning # key_buffer = 16M max_allowed_packet = 32M thread_stack = 128K thread_cache_size = 8 myisam-recover = BACKUP #max_connections = 100 #table_cache = 64 #thread_concurrency = 10 # # * Query Cache Configuration # query_cache_limit = 1M query_cache_size = 16M # # * Logging and Replication # # Both location gets rotated by the cronjob. # Be aware that this log type is a performance killer. #log = /home/toinbis/.../runtime/logs/mysql_logs/mysql.log # # Error logging goes to syslog. This is a Debian improvement :) # # Here you can see queries with especially long duration #log_slow_queries = /home/toinbis/.../runtime/logs/mysql_logs/mysql-slow.log #long_query_time = 2 #log-queries-not-using-indexes # # The following can be used as easy to replay backup logs or for replication. #server-id = 1 #log_bin = /home/toinbis/.../runtime/mysql_datadir/mysql-bin.log #binlog_format = ROW #read_only = 0 #expire_logs_days = 10 #max_binlog_size = 100M #sync_binlog = 1 #binlog_do_db = include_database_name #binlog_ignore_db = include_database_name # # * InnoDB # innodb_data_file_path = ibdata1:10M:autoextend innodb_buffer_pool_size=64M innodb_log_file_size=16M innodb_log_buffer_size=8M innodb_flush_log_at_trx_commit=1 innodb_file_per_table innodb_locks_unsafe_for_binlog=1 [mysqldump] quick quote-names max_allowed_packet = 32M [mysql] #no-auto-rehash # faster start of mysql but no tab completion [isamchk] key_buffer = 16M Any ideas much appreciated! regards, to P.S. sorry for messy hyperlinks, it's my first post and anti-spam feature of SF doesn't allow to post them properly :)

Read the article
Software development is (mostly) a trade, and what to do about it

- by Jeff

(This is another cross-post from my personal blog. I don’t even remember when I first started to write it, but I feel like my opinion is well enough baked to share.) I've been sitting on this for a long time, particularly as my opinion has changed dramatically over the last few years. That I've encountered more crappy code than maintainable, quality code in my career as a software developer only reinforces what I'm about to say. Software development is just a trade for most, and not a huge academic endeavor. For those of you with computer science degrees readying your pitchforks and collecting your algorithm interview questions, let me explain. This is not an assault on your way of life, and if you've been around, you know I'm right about the quality problem. You also know the HR problem is very real, or we wouldn't be paying top dollar for mediocre developers and importing people from all over the world to fill the jobs we can't fill. I'm going to try and outline what I see as some of the problems, and hopefully offer my views on how to address them. The recruiting problem I think a lot of companies are doing it wrong. Over the years, I've had two kinds of interview experiences. The first, and right, kind of experience involves talking about real life achievements, followed by some variation on white boarding in pseudo-code, drafting some basic system architecture, or even sitting down at a comprooder and pecking out some basic code to tackle a real problem. I can honestly say that I've had a job offer for every interview like this, save for one, because the task was to debug something and they didn't like me asking where to look ("everyone else in the company died in a plane crash"). The other interview experience, the wrong one, involves the classic torture test designed to make the candidate feel stupid and do things they never have, and never will do in their job. First they will question you about obscure academic material you've never seen, or don't care to remember. Then they'll ask you to white board some ridiculous algorithm involving prime numbers or some kind of string manipulation no one would ever do. In fact, if you had to do something like this, you'd Google for a solution instead of waste time on a solved problem. Some will tell you that the academic gauntlet interview is useful to see how people respond to pressure, how they engage in complex logic, etc. That might be true, unless of course you have someone who brushed up on the solutions to the silly puzzles, and they're playing you. But here's the real reason why the second experience is wrong: You're evaluating for things that aren't the job. These might have been useful tactics when you had to hire people to write machine language or C++, but in a world dominated by managed code in C#, or Java, people aren't managing memory or trying to be smarter than the compilers. They're using well known design patterns and techniques to deliver software. More to the point, these puzzle gauntlets don't evaluate things that really matter. They don't get into code design, issues of loose coupling and testability, knowledge of the basics around HTTP, or anything else that relates to building supportable and maintainable software. The first situation, involving real life problems, gives you an immediate idea of how the candidate will work out. One of my favorite experiences as an interviewee was with a guy who literally brought his work from that day and asked me how to deal with his problem. I had to demonstrate how I would design a class, make sure the unit testing coverage was solid, etc. I worked at that company for two years. So stop looking for algorithm puzzle crunchers, because a guy who can crush a Fibonacci sequence might also be a guy who writes a class with 5,000 lines of untestable code. Fashion your interview process on ways to reveal a developer who can write supportable and maintainable code. I would even go so far as to let them use the Google. If they want to cut-and-paste code, pass on them, but if they're looking for context or straight class references, hire them, because they're going to be life-long learners. The contractor problem I doubt anyone has ever worked in a place where contractors weren't used. The use of contractors seems like an obvious way to control costs. You can hire someone for just as long as you need them and then let them go. You can even give them the work that no one else wants to do. In practice, most places I've worked have retained and budgeted for the contractor year-round, meaning that the $90+ per hour they're paying (of which half goes to the person) would have been better spent on a full-time person with a $100k salary and benefits. But it's not even the cost that is an issue. It's the quality of work delivered. The accountability of a contractor is totally transient. They only need to deliver for as long as you keep them around, and chances are they'll never again touch the code. There's no incentive for them to get things right, there's little incentive to understand your system or learn anything. At the risk of making an unfair generalization, craftsmanship doesn't matter to most contractors. The education problem I don't know what they teach in college CS courses. I've believed for most of my adult life that a college degree was an essential part of being successful. Of course I would hold that bias, since I did it, and have the paper to show for it in a box somewhere in the basement. My first clue that maybe this wasn't a fully qualified opinion comes from the fact that I double-majored in journalism and radio/TV, not computer science. Eventually I worked with people who skipped college entirely, many of them at Microsoft. Then I worked with people who had a masters degree who sucked at writing code, next to the high school diploma types that rock it every day. I still think there's a lot to be said for the social development of someone who has the on-campus experience, but for software developers, college might not matter. As I mentioned before, most of us are not writing compilers, and we never will. It's actually surprising to find how many people are self-taught in the art of software development, and that should reveal some interesting truths about how we learn. The first truth is that we learn largely out of necessity. There's something that we want to achieve, so we do what I call just-in-time learning to meet those goals. We acquire knowledge when we need it. So what about the gaps in our knowledge? That's where the most valuable education occurs, via our mentors. They're the people we work next to and the people who write blogs. They are critical to our professional development. They don't need to be an encyclopedia of jargon, but they understand the craft. Even at this stage of my career, I probably can't tell you what SOLID stands for, but you can bet that I practice the principles behind that acronym every day. That comes from experience, augmented by my peers. I'm hell bent on passing that experience to others. Process issues If you're a manager type and don't do much in the way of writing code these days (shame on you for not messing around at least), then your job is to isolate your tradespeople from nonsense, while bringing your business into the realm of modern software development. That doesn't mean you slap up a white board with sticky notes and start calling yourself agile, it means getting all of your stakeholders to understand that frequent delivery of quality software is the best way to deal with change and evolving expectations. It also means that you have to play technical overlord to make sure the education and quality issues are dealt with. That's why I make the crack about sticky notes, because without the right technique being practiced among your code monkeys, you're just a guy with sticky notes. You're asking your business to accept frequent and iterative delivery, now make sure that the folks writing the code can handle the same thing. This means unit testing, the right instrumentation, integration tests, automated builds and deployments... all of the stuff that makes it easy to see when change breaks stuff. The prognosis I strongly believe that education is the most important part of what we do. I'm encouraged by things like The Starter League, and it's the kind of thing I'd love to see more of. I would go as far as to say I'd love to start something like this internally at an existing company. Most of all though, I can't emphasize enough how important it is that we mentor each other and share our knowledge. If you have people on your staff who don't want to learn, fire them. Seriously, get rid of them. A few months working with someone really good, who understands the craftsmanship required to build supportable and maintainable code, will change that person forever and increase their value immeasurably.

Read the article
AIDL based two way communication

- by sshasan

I have two apps between which I want some data exchanged. As they are running in different processes, so, I am using AIDL to communicate between them. Now, everything is happening really great in one direction (say my apps are A and B) i.e. data is being sent from A to B but, now I need to send some data from B to A. I noticed that we need to include the app with the AIDL in the build path of app where the AIDL method will be called. So in my case A includes B in its build path. For B to be able to send something to A, by that logic, B would need A in its build path. This would create a cycle. I am stuck at this point. And I cannot think of a work around this loop. Any help would be greatly appreciated :) . Thanks! ----EDIT---- So, I following the advice mentioned in one of the comments below, I have the following code In the IPCAIDL project the AIDL file resides, its contents are package ipc.android.aidl; interface Iaidl{ boolean pushBoolean(boolean flag); } This project is being used as a library in both the IPCServer and the IPC Client. The IPCServer Project has the service which defines what happens with the AIDL method. The file is booleanService.java package ipc.android.server; import ipc.android.aidl.Iaidl; import android.app.Service; import android.content.Intent; import android.os.IBinder; import android.os.RemoteException; import android.util.Log; public class booleanService extends Service { @Override public IBinder onBind(Intent intent) { return new Iaidl.Stub() { @Override public boolean pushBoolean(boolean arg0) throws RemoteException { Log.i("SERVER(IPC AIDL)", "Truth Value:"+arg0); return arg0; } }; } } The IPCClient file which calls this method is package ipc.android.client2; import ipc.android.aidl.Iaidl; import android.app.Activity; import android.content.ComponentName; import android.content.Context; import android.content.Intent; import android.content.ServiceConnection; import android.os.Bundle; import android.os.IBinder; import android.os.RemoteException; import android.view.View; import android.widget.Button; public class IPCClient2Activity extends Activity { Button b1; Iaidl iAIDL; boolean k = false; /** Called when the activity is first created. */ @Override public void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.main); bindService(new Intent("ipc.android.server.booleanService"), conn, Context.BIND_AUTO_CREATE); startService(new Intent("ipc.android.server.booleanService")); b1 = (Button) findViewById(R.id.button1); b1.setOnClickListener(new View.OnClickListener() { @Override public void onClick(View v) { if(k){ k = false; } else{ k = true; } try { iAIDL.pushBoolean(k); } catch (RemoteException e) { // TODO Auto-generated catch block e.printStackTrace(); } } }); } private ServiceConnection conn = new ServiceConnection() { @Override public void onServiceDisconnected(ComponentName name) { // TODO Auto-generated method stub } @Override public void onServiceConnected(ComponentName name, IBinder service) { iAIDL = Iaidl.Stub.asInterface(service); } }; } The manifest file for IPCServer includes the declaration of the service.

Read the article
MDI WinForm application and duplicate child form memory leak

- by Steve

This is a WinForm MDI application problem (.net framework 3.0). It will be described in C#. Sorry it is a bit long because I try to make things as clear as possible. I have a MDI application. At some point I find that one MDI child form is never released. There is a menu that creates the MDI child form and show it. When the MDI child form is closed, it is supposed to be destroyed and the memory taken by it should be given back to .net. But to my surprise, this is not true. All the MDI child form instances are kept in memory. This is obviously a "memory leak". Well, it is not a real leak in .net. It is just that I think the closed form should be dead but somehow there is at least one unknown reference from outside world that still connect with the closed form. I read some articles on the Web. Some says that when the MDI child form is closing, I should unwire all the event handlers, otherwise some event handlers may keep my form alive. Some says that DataBindings should be cleaned before the form is closing otherwise the DataBindings will add references to some global Hashtable and thus keep my form alive. My form contains quite a lot things. Many event handlers and many DataBindings and many BindingSources and few suspected controls containing user control and HelpProvider. I create a big method that unwires all the event handlers from all the relevant controls, clear all the DataBindings and DataSources. The HelpProvider and user controls are disposed carefully. At the end, I find that, I don't have to clear DataBindings and DataSources. Event handlers are definitely causing the problem. And MDI form structure also contributes to something. During my experiments, I find that, if you create a MDI child form, even if you close it, there will still be one instance in the memory. The reference is from PropertyStore of the main form. This means, unless the main form is closed (application ends), there will always be one instance of MDI child form in the memory. The good news is that, no matter how many times you open and close the child form, there will be only one instance, not a big "leak". When it comes to event handlers, things become more tricky. I have to address that, all the event handlers on my form are anonymous event handlers. Here is an example code: //On MDI child form's design code... Button btnSave = new Button(); btnSave.Click += new System.EventHandler(btnSave_Click); Where btnSave_Click is also a method in MDI child form. The above is always the case for various controls and various types of event. To me, this is a bi-directional circular reference. btnSave keeps a reference of MDI child form via the event handler. MDI child form keeps a reference of btnSave instance. To me again, such bi-directional circular reference should not cause any problem for .net's garbage collector. This means that I do not have to explicitly unwire the event when the form is being disposed: btnSave.Click -= btnSave_Click; But the truth is not so. For some event handlers, they are safe. Ignoring them do not cause any duplicate instance. For some other event handlers, they will cause one instance remaining in the memory (similar effect as the MDI form structure, but this time caused by the hanging event handlers). For some other event handlers, they will cause every instance opened in the memory. I am totally confused about the differences between these three types of event handlers. The controls are created in the same way and the event is attached in the same way. What is the difference? (Don't tell me it is the event handle methods that make difference.) Anyone has experience of this wired scenario and has an answer for me? Thanks a lot. So now, for safety issue, I will have to unwire all the event handlers when the form is being disposed. That will be a long list of similar code for each control. Is there a general way of removing events from controls in recursive way using reflection? What about performance issue? That's the end of my story and I am still in the middle of my problem. For any help, I thank you.

Read the article
eBooks on iPad vs. Kindle: More Debate than Smackdown

- by andrewbrust

When the iPad was presented at its San Francisco launch event on January 28th, Steve Jobs spent a significant amount of time explaining how well the device would serve as an eBook reader. He showed the iBooks reader application and iBookstore and laid down the gauntlet before Amazon and its beloved Kindle device. Almost immediately afterwards, criticism came rushing forth that the iPad could never beat the Kindle for book reading. The curious part of that criticism is that virtually no one offering it had actually used the iPad yet. A few weeks later, on April 3rd, the iPad was released for sale in the United States. I bought one on that day and in the few additional weeks that have elapsed, I’ve given quite a workout to most of its capabilities, including its eBook features. I’ve also spent some time with the Kindle, albeit a first-generation model, to see how it actually compares to the iPad. I had some expectations going in, but I came away with conclusions about each device that were more scenario-based than absolute. I present my findings to you here. Vital Statistics Let’s start with an inventory of each device’s underlying technology. The iPad has a color, backlit LCD screen and an on-screen keyboard. It has a battery which, on a full charge, lasts anywhere from 6-10 hours. The Kindle offers a monochrome, reflective E Ink display, a physical keyboard and a battery that on my first gen loaner unit can go up to a week between charges (Amazon claims the battery on the Kindle 2 can last up to 2 weeks on a single charge). The Kindle connects to Amazon’s Kindle Store using a 3G modem (the technology and network vary depending on the model) that incurs no airtime service charges whatsoever. The iPad units that are on-sale today work over WiFi only. 3G-equipped models will be on sale shortly and will command a $130 premium over their WiFi-only counterparts. 3G service on the iPad, in the U.S. from AT&T, will be fee-based, with a 250MB plan at $14.99 per month and an unlimited plan at $29.99. No contract is required for 3G service. All these tech specs aside, I think a more useful observation is that the iPad is a multi-purpose Internet-connected entertainment device, while the Kindle is a dedicated reading device. The question is whether those differences in design and intended use create a clear-cut winner for reading electronic publications. Let’s take a look at each device, in isolation, now. Kindle To me, what’s most innovative about the Kindle is its E Ink display. E Ink really looks like ink on a sheet of paper. It requires no backlight, it’s fully visible in direct sunlight and it causes almost none of the eyestrain that LCD-based computer display technology (like that used on the iPad) does. It’s really versatile in an all-around way. Forgive me if this sounds precious, but reading on it is really a joy. In fact, it’s a genuinely relaxing experience. Through the Kindle Store, Amazon allows users to download books (including audio books), magazines, newspapers and blog feeds. Books and magazines can be purchased either on a single-issue basis or as an annual subscription. Books, of course, are purchased singly. Oddly, blogs are not free, but instead carry a monthly subscription fee, typically $1.99. To me this is ludicrous, but I suppose the free 3G service is partially to blame. Books and magazine issues download quickly. Magazine and blog subscriptions cause new issues or posts to be pushed to your device on an automated basis. Available blogs include 9000-odd feeds that Amazon offers on the Kindle Store; unless I missed something, arbitrary RSS feeds are not supported (though there are third party workarounds to this limitation). The shopping experience is integrated well, has an huge selection, and offers certain graphical perks. For example, magazine and newspaper logos are displayed in menus, and book cover thumbnails appear as well. A simple search mechanism is provided and text entry through the physical keyboard is relatively painless. It’s very easy and straightforward to enter the store, find something you like and start reading it quickly. If you know what you’re looking for, it’s even faster. Given Kindle’s high portability, very reliable battery, instant-on capability and highly integrated content acquisition, it makes reading on whim, and in random spurts of downtime, very attractive. The Kindle’s home screen lists all of your publications, and easily lets you select one, then start reading it. Once opened, publications display in crisp, attractive text that is adjustable in size. “Turning” pages is achieved through buttons dedicated to the task. Notes can be recorded, bookmarks can be saved and pages can be saved as clippings. I am not an avid book reader, and yet I found the Kindle made it really fun, convenient and soothing to read. There’s something about the easy access to the material and the simplicity of the display that makes the Kindle seduce you into chilling out and reading page after page. On the other hand, the Kindle has an awkward navigation interface. While menus are displayed clearly on the screen, the method of selecting menu items is tricky: alongside the right-hand edge of the main display is a thin column that acts as a second display. It has a white background, and a scrollable silver cursor that is moved up or down through the use of the device’s scrollwheel. Picking a menu item on the main display involves scrolling the silver cursor to a position parallel to that menu item and pushing the scrollwheel in. This navigation technique creates a disconnect, literally. You don’t really click on a selection so much as you gesture toward it. I got used to this technique quickly, but I didn’t love it. It definitely created a kind of anxiety in me, making me feel the need to speed through menus and get to my destination document quickly. Once there, I could calm down and relax. Books are great on the Kindle. Magazines and newspapers much less so. I found the rendering of photographs, and even illustrations, to be unacceptably crude. For this reason, I expect that reading textbooks on the Kindle may leave students wanting. I found that the original flow and layout of any publication was sacrificed on the Kindle. In effect, browsing a magazine or newspaper was almost impossible. Reading the text of individual articles was enjoyable, but having to read this way made the whole experience much more “a la carte” than cohesive and thematic between articles. I imagine that for academic journals this is ideal, but for consumer publications it imposes a stripped-down, low-fidelity experience that evokes a sense of deprivation. In general, the Kindle is great for reading text. For just about anything else, especially activity that involves exploratory browsing, meandering and short-attention-span reading, it presents a real barrier to entry and adoption. Avid book readers will enjoy the Kindle (if they’re not already). It’s a great device for losing oneself in a book over long sittings. Multitaskers who are more interested in periodicals, be they online or off, will like it much less, as they will find compromise, and even sacrifice, to be palpable. iPad The iPad is a very different device from the Kindle. While the Kindle is oriented to pages of text, the iPad orbits around applications and their interfaces. Be it the pinch and zoom experience in the browser, the rich media features that augment content on news and weather sites, or the ability to interact with social networking services like Twitter, the iPad is versatile. While it shares a slate-like form factor with the Kindle, it’s effectively an elegant personal computer. One of its many features is the iBook application and integration of the iBookstore. But it’s a multi-purpose device. That turns out to be good and bad, depending on what you’re reading. The iBookstore is great for browsing. It’s color, rich animation-laden user interface make it possible to shop for books, rather than merely search and acquire them. Unfortunately, its selection is rather sparse at the moment. If you’re looking for a New York Times bestseller, or other popular titles, you should be OK. If you want to read something more specialized, it’s much harder. Unlike the awkward navigation interface of the Kindle, the iPad offers a nearly flawless touch-screen interface that seduces the user into tinkering and kibitzing every bit as much as the Kindle lulls you into a deep, concentrated read. It’s a dynamic and interactive device, whereas the Kindle is static and passive. The iBook reader is slick and fun. Use the iPad in landscape mode and you can read the book in 2-up (left/right 2-page) display; use it in portrait mode and you can read one page at a time. Rather than clicking a hardware button to turn pages, you simply drag and wipe from right-to-left to flip the single or right-hand page. The page actually travels through an animated path as it would in a physical book. The intuitiveness of the interface is uncanny. The reader also accommodates saving of bookmarks, searching of the text, and the ability to highlight a word and look it up in a dictionary. Pages display brightly and clearly. They’re easy to read. But the backlight and the glare made me less comfortable than I was with the Kindle. The knowledge that completely different applications (including the Web and email and Twitter) were just a few taps away made me antsy and very tempted to task-switch. The knowledge that battery life is an issue created subtle discomfort. If the Kindle makes you feel like you’re in a library reading room, then the iPad makes you feel, at best, like you’re under fluorescent lights at a Barnes and Noble or Borders store. If you’re lucky, you’d be on a couch or at a reading table in the store, but you might also be standing up, in the aisles. Clearly, I didn’t find this conducive to focused and sustained reading. But that may have more to do with my own tendency to read periodicals far more than books, and my neurotic . And, truth be known, the book reading experience, when not explicitly compared to Kindle’s, was still pleasant. It is also important to point out that Kindle Store-sourced books can be read on the iPad through a Kindle reader application, from Amazon, specific to the device. This offered a less rich experience than the iBooks reader, but it was completely adequate. Despite the Kindle brand of the reader, however, it offered little in terms of simulating the reading experience on its namesake device. When it comes to periodicals, the iPad wins hands down. Magazines, even if merely scanned images of their print editions, read on the iPad in a way that felt similar to reading hard copy. The full color display, touch navigation and even the ability to render advertisements in their full glory makes the iPad a great way to read through any piece of work that is measured in pages, rather than chapters. There are many ways to get magazines and newspapers onto the iPad, including the Zinio reader, and publication-specific applications like the Wall Street Journal’s and Popular Science’s. The New York Times’ free Editors’ Choice application offers a Times Reader-like interface to a subset of the Gray Lady’s daily content. The completely Web-based but iPad-optimized Times Skimmer site (at www.nytimes.com/timesskimmer) works well too. Even conventional Web sites themselves can be read much like magazines, given the iPad’s ability to zoom in on the text and crop out advertisements on the margins. While the Kindle does have an experimental Web browser, it reminded me a lot of early mobile phone browsers, only in a larger size. For text-heavy sites with simple layout, it works fine. For just about anything else, it becomes more trouble than it’s worth. And given the way magazine articles make me think of things I want to look up online, I think that’s a real liability for the Kindle. Summing Up What I came to realize is that the Kindle isn’t so much a computer or even an Internet device as it is a printer. While it doesn’t use physical paper, it still renders its content a page at a time, just like a laser printer does, and its output appears strikingly similar. You can read the rendered text, but you can’t interact with it in any way. That’s why the navigation requires a separate cursor display area. And because of the page-oriented rendering behavior, turning pages causes a flash on the display and requires a sometimes long pause before the next page is rendered. The good side of this is that once the page is generated, no battery power is required to display it. That makes for great battery life, optimal viewing under most lighting conditions (as long as there is some light) and low-eyestrain text-centric display of content. The Kindle is highly portable, has an excellent selection in its store and is refreshingly distraction-free. All of this is ideal for reading books. And iPad doesn’t offer any of it. What iPad does offer is versatility, variety, richness and luxury. It’s flush with accoutrements even if it’s low on focused, sustained text display. That makes it inferior to the Kindle for book reading. But that also makes it better than the Kindle for almost everything else. As such, and given that its book reading experience is still decent (even if not superior), I think the iPad will give Kindle a run for its money. True book lovers, and people on a budget, will want the Kindle. People with a robust amount of discretionary income may want both devices. Everyone else who is interested in a slate form factor e-reading device, especially if they also wish to have leisure-friendly Internet access, will likely choose the iPad exclusively. One thing is for sure: iPad has reduced Kindle’s market, and may have shifted its mass market potential to a mere niche play. If Amazon is smart, it will improve its iPad-based Kindle reader app significantly. It can then leverage the iPad channel as a significant market for the Kindle Store. After all, selling the eBooks themselves is what Amazon should care most about.

Read the article
SQLAuthority News – Job Interviewing the Right Way (and for the Right Reasons) – Guest Post by Feodor Georgiev

- by pinaldave

Feodor Georgiev is a SQL Server database specialist with extensive experience of thinking both within and outside the box. He has wide experience of different systems and solutions in the fields of architecture, scalability, performance, etc. Feodor has experience with SQL Server 2000 and later versions, and is certified in SQL Server 2008. Feodor has written excellent article on Job Interviewing the Right Way. Here is his article in his own language. A while back I was thinking to start a blog post series on interviewing and employing IT personnel. At that time I had just read the ‘Smart and gets things done’ book (http://www.joelonsoftware.com/items/2007/06/05.html) and I was hyped up on some debatable topics regarding finding and employing the best people in the branch. I have no problem with hiring the best of the best; it’s just the definition of ‘the best of the best’ that makes things a bit more complicated. One of the fundamental books one can read on the topic of interviewing is the one mentioned above. If you have not read it, then you must do so; not because it contains the ultimate truth, and not because it gives the answers to most questions on the subject, but because the book contains an extensive set of questions about interviewing and employing people. Of course, a big part of these questions have different answers, depending on location, culture, available funds and so on. (What works in the US may not necessarily work in the Nordic countries or India, or it may work in a different way). The only thing that is valid regardless of any external factor is this: curiosity. In my belief there are two kinds of people – curious and not-so-curious; regardless of profession. Think about it – professional success is directly proportional to the individual’s curiosity + time of active experience in the field. (I say ‘active experience’ because vacations and any distractions do not count as experience :) ) So, curiosity is the factor which will distinguish a good employee from the not-so-good one. But let’s shift our attention to something else for now: a few tips and tricks for successful interviews. Tip and trick #1: get your priorities straight. Your status usually dictates your priorities; for example, if the person looking for a job has just relocated to a new country, they might tend to ignore some of their priorities and overload others. In other words, setting priorities straight means to define the personal criteria by which the interview process is lead. For example, similar to the following questions can help define the criteria for someone looking for a job: How badly do I need a (any) job? Is it more important to work in a clean and quiet environment or is it important to get paid well (or both, if possible)? And so on… Furthermore, before going to the interview, the candidate should have a list of priorities, sorted by the most importance: e.g. I want a quiet environment, x amount of money, great helping boss, a desk next to a window and so on. Also it is a good idea to be prepared and know which factors can be compromised and to what extent. Tip and trick #2: the interview is a two-way street. A job candidate should not forget that the interview process is not a one-way street. What I mean by this is that while the employer is interviewing the potential candidate, the job seeker should not miss the chance to interview the employer. Usually, the employer and the candidate will meet for an interview and talk about a variety of topics. In a quality interview the candidate will be presented to key members of the team and will have the opportunity to ask them questions. By asking the right questions both parties will define their opinion about each other. For example, if the candidate talks to one of the potential bosses during the interview process and they notice that the potential manager has a hard time formulating a question, then it is up to the candidate to decide whether working with such person is a red flag for them. There are as many interview processes out there as there are companies and each one is different. Some bigger companies and corporates can afford pre-selection processes, 3 or even 4 stages of interviews, small companies usually settle with one interview. Some companies even give cognitive tests on the interview. Why not? In his book Joel suggests that a good candidate should be pampered and spoiled beyond belief with a week-long vacation in New York, fancy hotels, food and who knows what. For all I can imagine, an interview might even take place at the top of the Eifel tower (right, Mr. Joel, right?) I doubt, however, that this is the optimal way to capture the attention of a good employee. The ‘curiosity’ topic What I have learned so far in my professional experience is that opinions can be subjective. Plus, opinions on technology subjects can also be subjective. According to Joel, only hiring the best of the best is worth it. If you ask me, there is no such thing as best of the best, simply because human nature (well, aside from some physical limitations, like putting your pants on through your head :) ) has no boundaries. And why would it have boundaries? I have seen many curious and interesting people, naturally good at technology, though uninterested in it as one can possibly be; I have also seen plenty of people interested in technology, who (in an ideal world) should have stayed far from it. At any rate, all of this sums up at the end to the ‘supply and demand’ factor. The interview process big-bang boils down to this: If there is a mutual benefit for both the employer and the potential employee to work together, then it all sorts out nicely. If there is no benefit, then it is much harder to get to a common place. Tip and trick #3: word-of-mouth is worth a thousand words Here I would just mention that the best thing a job candidate can get during the interview process is access to future team members or other employees of the new company. Nowadays the world has become quite small and everyone knows everyone. Look at LinkedIn, look at other professional networks and you will realize how small the world really is. Knowing people is a good way to become more approachable and to approach them. Tip and trick #4: Be confident. It is true that for some people confidence is as natural as breathing and others have to work hard to express it. Confidence is, however, a key factor in convincing the other side (potential employer or employee) that there is a great chance for success by working together. But it cannot get you very far if it’s not backed up by talent, curiosity and knowledge. Tip and trick #5: The right reasons What really bothers me in Sweden (and I am sure that there are similar situations in other countries) is that there is a tendency to fill quotas and to filter out candidates by criteria different from their skill and knowledge. In job ads I see quite often the phrases ‘positive thinker’, ‘team player’ and many similar hints about personality features. So my guess here is that discrimination has evolved to a new level. Let me clear up the definition of discrimination: ‘unfair treatment of a person or group on the basis of prejudice’. And prejudice is the ‘partiality that prevents objective consideration of an issue or situation’. In other words, there is not much difference whether a job candidate is filtered out by race, gender or by personality features – it is all a bad habit. And in reality, there is no proven correlation between the technology knowledge paired with skills and the personal features (gender, race, age, optimism). It is true that a significantly greater number of Darwin awards were given to men than to women, but I am sure that somewhere there is a paper or theory explaining the genetics behind this. J This topic actually brings to mind one of my favorite work related stories. A while back I was working for a big company with many teams involved in their processes. One of the teams was occupying 2 rooms – one had the team members and was full of light, colorful posters, chit-chats and giggles, whereas the other room was dark, lighted only by a single monitor with a quiet person in front of it. Later on I realized that the ‘dark room’ person was the guru and the ultimate problem-solving-brain who did not like the chats and giggles and hence was in a separate room. In reality, all severe problems which the chatty and cheerful team members could not solve and all emergencies were directed to ‘the dark room’. And thus all worked out well. The moral of the story: Personality has nothing to do with technology knowledge and skills. End of story. Summary: I’d like to stress the fact that there is no ultimately perfect candidate for a job, and there is no such thing as ‘best-of-the-best’. From my personal experience, the main criteria by which I measure people (co-workers and bosses) is the curiosity factor; I know from experience that the more curious and inventive a person is, the better chances there are for great achievements in their field. Related stories: (for extra credit) 1) Get your priorities straight. A while back as a consultant I was working for a few days at a time at different offices and for different clients, and so I was able to compare and analyze the work environments. There were two different places which I compared and recently I asked a friend of mine the following question: “Which one would you prefer as a work environment: a noisy office full of people, or a quiet office full of faulty smells because the office is rarely cleaned?” My friend was puzzled for a while, thought about it and said: “Hmm, you are talking about two different kinds of pollution… I will probably choose the second, since I can clean the workplace myself a bit…” 2) The interview is a two-way street. One time, during a job interview, I met a potential boss that had a hard time phrasing a question. At that particular time it was clear to me that I would not have liked to work under this person. According to my work religion, the properly asked question contains at least half of the answer. And if I work with someone who cannot ask a question… then I’d be doing double or triple work. At another interview, after the technical part with the team leader of the department, I was introduced to one of the team members and we were left alone for 5 minutes. I immediately jumped on the occasion and asked the blunt question: ‘What have you learned here for the past year and how do you like your job?’ The team member looked at me and said ‘Nothing really. I like playing with my cats at home, so I am out of here at 5pm and I don’t have time for much.’ I was disappointed at the time and I did not take the job offer. I wasn’t that shocked a few months later when the company went bankrupt. 3) The right reasons to take a job: personality check. A while back I was asked to serve as a job reference for a coworker. I agreed, and after some weeks I got a phone call from the company where my colleague was applying for a job. The conversation started with the manager’s question about my colleague’s personality and about their social skills. (You can probably guess what my internal reaction was… J ) So, after 30 minutes of pouring common sense into the interviewer’s head, we finally agreed on the fact that a shy or quiet personality has nothing to do with work skills and knowledge. Some years down the road my former colleague is taking the manager’s position as the manager is demoted to a different department. Reference: Feodor Georgiev, Pinal Dave (http://blog.SQLAuthority.com) Filed under: PostADay, Readers Contribution, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
Down Tools Week Cometh: Kissing Goodbye to CVs/Resumes and Cover Letters

- by Bart Read

I haven't blogged about what I'm doing in my (not so new) temporary role as Red Gate's technical recruiter, mostly because it's been routine, business as usual stuff, and because I've been trying to understand the role by doing it. I think now though the time has come to get a little more radical, so I'm going to tell you why I want to largely eliminate CVs/resumes and cover letters from the application process for some of our technical roles, and why I think that might be a good thing for candidates (and for us). I have a terrible confession to make, or at least it's a terrible confession for a recruiter: I don't really like CV sifting, or reading cover letters, and, unless I've misread the mood around here, neither does anybody else. It's dull, it's time-consuming, and it's somewhat soul destroying because, when all is said and done, you're being paid to be incredibly judgemental about people based on relatively little information. I feel like I've dirtied myself by saying that - I mean, after all, it's a core part of my job - but it sucks, it really does. (And, of course, the truth is I'm still a software engineer at heart, and I'm always looking for ways to do things better.) On the flip side, I've never met anyone who likes writing their CV. It takes hours and hours of faffing around and massaging it into shape, and the whole process is beset by a gnawing anxiety, frustration, and insecurity. All you really want is a chance to demonstrate your skills - not just talk about them - and how do you do that in a CV or cover letter? Often the best candidates will include samples of their work (a portfolio, screenshots, links to websites, product downloads, etc.), but sometimes this isn't possible, or may not be appropriate, or you just don't think you're allowed because of what your school/university careers service has told you (more commonly an issue with grads, obviously). And what are we actually trying to find out about people with all of this? I think the common criteria are actually pretty basic: Smart Gets things done (thanks for these two Joel) Not an a55hole* (sorry, have to get around Simple Talk's swear filter - and thanks to Professor Robert I. Sutton for this one) *Of course, everyone has off days, and I don't honestly think we're too worried about somebody being a bit grumpy every now and again. We can do a bit better than this in the context of the roles I'm talking about: we can be more specific about what "gets things done" means, at least in part. For software engineers and interns, the non-exhaustive meaning of "gets things done" is: Excellent coder For test engineers, the non-exhaustive meaning of "gets things done" is: Good at finding problems in software Competent coder Team player, etc., to me, are covered by "not an a55hole". I don't expect people to be the life and soul of the party, or a wild extrovert - that's not what team player means, and it's not what "not an a55hole" means. Some of our best technical staff are quiet, introverted types, but they're still pleasant to work with. My problem is that I don't think the initial sift really helps us find out whether people are smart and get things done with any great efficacy. It's better than nothing, for sure, but it's not as good as it could be. It's also contentious, and potentially unfair/inequitable - if you want to get an idea of what I mean by this, check out the background information section at the bottom. Before I go any further, let's look at the Red Gate recruitment process for technical staff* as it stands now: (LOTS of) People apply for jobs. All these applications go through a brutal process of manual sifting, which eliminates between 75 and 90% of them, depending upon the role, and the time of year**. Depending upon the role, those who pass the sift will be sent an assessment or telescreened. For the purposes of this blog post I'm only interested in those that are sent some sort of programming assessment, or bug hunt. This means software engineers, test engineers, and software interns, which are the roles for which I receive the most applications. The telescreen tends to be reserved for project or product managers. Those that pass the assessment are invited in for first interview. This interview is mostly about assessing their technical skills***, although we're obviously on the look out for cultural fit red flags as well. If the first interview goes well we'll invite candidates back for a second interview. This is where team/cultural fit is really scoped out. We also use this interview to dive more deeply into certain areas of their skillset, and explore any concerns that may have come out of the first interview (these obviously won't have been serious or obvious enough to cause a rejection at that point, but are things we do need to look into before we'd consider making an offer). We might subsequently invite them in for lunch before we make them an offer. This tends to happen when we're recruiting somebody for a specific team and we'd like them to meet all the people they'll be working with directly. It's not an interview per se, but can prove pivotal if they don't gel with the team. Anyone who's made it this far will receive an offer from us. *We have a slightly quirky definition of "technical staff" as it relates to the technical recruiter role here. It includes software engineers, test engineers, software interns, user experience specialists, technical authors, project managers, product managers, and development managers, but does not include product support or information systems roles. **For example, the quality of graduate applicants overall noticeably drops as the academic year wears on, which is not to say that by now there aren't still stars in there, just that they're fewer and further between. ***Some organisations prefer to assess for team fit first, but I think assessing technical skills is a more effective initial filter - if they're the nicest person in the world, but can't cut a line of code they're not going to work out. Now, as I suggested in the title, Red Gate's Down Tools Week is upon us once again - next week in fact - and I had proposed as a project that we refactor and automate the first stage of marking our programming assessments. Marking assessments, and in fact organising the marking of them, is a somewhat time-consuming process, and we receive many assessment solutions that just don't make the cut, for whatever reason. Whilst I don't think it's possible to fully automate marking, I do think it ought to be possible to run a suite of automated tests over each candidate's solution to see whether or not it behaves correctly and, if it does, move on to a manual stage where we examine the code for structure, decomposition, style, readability, maintainability, etc. Obviously it's possible to use tools to generate potentially helpful metrics for some of these indices as well. This would obviously reduce the marking workload, and would provide candidates with quicker feedback about whether they've been successful - though I do wonder if waiting a tactful interval before sending a (nicely written) rejection might be wise. I duly scrawled out a picture of my ideal process, which looked like this: The problem is, as soon as I'd roughed it out, I realised that fundamentally it wasn't an ideal process at all, which explained the gnawing feeling of cognitive dissonance I'd been wrestling with all week, whilst I'd been trying to find time to do this. Here's what I mean. Automated assessment marking, and the associated infrastructure around that, makes it much easier for us to deal with large numbers of assessments. This means we can be much more permissive about who we send assessments out to or, in other words, we can give more candidates the opportunity to really demonstrate their skills to us. And this leads to a question: why not give everyone the opportunity to demonstrate their skills, to show that they're smart and can get things done? (Two or three of us even discussed this in the down tools week hustings earlier this week.) And isn't this a lot simpler than the alternative we'd been considering? (FYI, this was automated CV/cover letter sifting by some form of textual analysis to ideally eliminate the worst 50% or so of applications based on an analysis of the 20,000 or so historical applications we've received since 2007 - definitely not the basic keyword analysis beloved of recruitment agencies, since this would eliminate hardly anyone who was awful, but definitely would eliminate stellar Oxbridge candidates - #fail - or some nightmarishly complex Google-like system where we profile all our currently employees, only to realise that we're never going to get representative results because we don't have a statistically significant sample size in any given role - also #fail.) No, I think the new way is better. We let people self-select. We make them the masters (or mistresses) of their own destiny. We give applicants the power - we put their fate in their hands - by giving them the chance to demonstrate their skills, which is what they really want anyway, instead of requiring that they spend hours and hours creating a CV and cover letter that I'm going to evaluate for suitability, and make a value judgement about, in approximately 1 minute (give or take). It doesn't matter what university you attended, it doesn't matter if you had a bad year when you took your A-levels - here's your chance to shine, so take it and run with it. (As a side benefit, we cut the number of applications we have to sift by something like two thirds.) WIN! OK, yeah, sounds good, but will it actually work? That's an excellent question. My gut feeling is yes, and I'll justify why below (and hopefully have gone some way towards doing that above as well), but what I'm proposing here is really that we run an experiment for a period of time - probably a couple of months or so - and measure the outcomes we see: How many people apply? (Wouldn't be surprised or alarmed to see this cut by a factor of ten.) How many of them submit a good assessment? (More/less than at present?) How much overhead is there for us in dealing with these assessments compared to now? What are the success and failure rates at each interview stage compared to now? How many people are we hiring at the end of it compared to now? I think it'll work because I hypothesize that, amongst other things: It self-selects for people who really want to work at Red Gate which, at the moment, is something I have to try and assess based on their CV and cover letter - but if you're not that bothered about working here, why would you complete the assessment? Candidates who would submit a shoddy application probably won't feel motivated to do the assessment. Candidates who would demonstrate good attention to detail in their CV/cover letter will demonstrate good attention to detail in the assessment. In general, only the better candidates will complete and submit the assessment. Marking assessments is much less work so we'll be able to deal with any increase that we see (hopefully we will see). There are obviously other questions as well: Is plagiarism going to be a problem? Is there any way we can detect/discourage potential plagiarism? How do we assess candidates' education and experience? What about their ability to communicate in writing? Do we still want them to submit a CV afterwards if they pass assessment? Do we want to offer them the opportunity to tell us a bit about why they'd like the job when they submit their assessment? How does this affect our relationship with recruitment agencies we might use to hire for these roles? So, what's the objective for next week's Down Tools Week? Pretty simple really - we want to implement this process for the Graduate Software Engineer and Software Engineer positions that you can find on our website. I will be joined by a crack team of our best developers (Kevin Boyle, and new Red-Gater, Sam Blackburn), and recruiting hostess with the mostest Laura McQuillen, and hopefully a couple of others as well - if I can successfully twist more arms before Monday.* Hopefully by next Friday our experiment will be up and running, and we may have changed the way Red Gate recruits software engineers for good! Stay tuned and we'll let you know how it goes! *I'm going to play dirty by offering them beer and chocolate during meetings. Some background information: how agonising over the initial CV/cover letter sift helped lead us to bin it off entirely The other day I was agonising about the new university/good degree grade versus poor A-level results issue, and decided to canvas for other opinions to see if there was something I could do that was fairer than my current approach, which is almost always to reject. This generated quite an involved discussion on our Yammer site: I'm sure you can glean a pretty good impression of my own educational prejudices from that discussion as well, although I'm very open to changing my opinion - hopefully you've already figured that out from reading the rest of this post. Hopefully you can also trace a logical path from agonising about sifting to, "Uh, hang on, why on earth are we doing this anyway?!?" Technorati Tags: recruitment,hr,developers,testers,red gate,cv,resume,cover letter,assessment,sea change

Read the article
Custom validation works in development but not in unit test

- by Geolev

I want to validate that at least one of two columns have a value in my model. I found somewhere on the web that I could create a custom validator as follows: # Check for the presence of one or another field: # :validates_presence_of_at_least_one_field :last_name, :company_name - would require either last_name or company_name to be filled in # also works with arrays # :validates_presence_of_at_least_one_field :email, [:name, :address, :city, :state] - would require email or a mailing type address module ActiveRecord module Validations module ClassMethods def validates_presence_of_at_least_one_field(*attr_names) msg = attr_names.collect {|a| a.is_a?(Array) ? " ( #{a.join(", ")} ) " : a.to_s}.join(", ") + "can't all be blank. At least one field must be filled in." configuration = { :on => :save, :message => msg } configuration.update(attr_names.extract_options!) send(validation_method(configuration[:on]), configuration) do |record| found = false attr_names.each do |a| a = [a] unless a.is_a?(Array) found = true a.each do |attr| value = record.respond_to?(attr.to_s) ? record.send(attr.to_s) : record[attr.to_s] found = !value.blank? end break if found end record.errors.add_to_base(configuration[:message]) unless found end end end end end I put this in a file called lib/acs_validator.rb in my project and added "require 'acs_validator'" to my environment.rb. This does exactly what I want. It works perfectly when I manually test it in the development environment but when I write a unit test it breaks my test environment. This is my unit test: require 'test_helper' class CustomerTest < ActiveSupport::TestCase # Replace this with your real tests. test "the truth" do assert true end test "customer not valid" do puts "customer not valid" customer = Customer.new assert !customer.valid? assert customer.errors.invalid?(:subdomain) assert_equal "Company Name and Last Name can't both be blank.", customer.errors.on(:contact_lname) end end This is my model: class Customer < ActiveRecord::Base validates_presence_of :subdomain validates_presence_of_at_least_one_field :customer_company_name, :contact_lname, :message => "Company Name and Last Name can't both be blank." has_one :service_plan end When I run the unit test, I get the following error: DEPRECATION WARNING: Rake tasks in vendor/plugins/admin_data/tasks, vendor/plugins/admin_data/tasks, and vendor/plugins/admin_data/tasks are deprecated. Use lib/tasks instead. (called from /usr/lib/ruby/gems/1.8/gems/rails-2.3.8/lib/tasks/rails.rb:10) Couldn't drop acs_test : #<ActiveRecord::StatementInvalid: PGError: ERROR: database "acs_test" is being accessed by other users DETAIL: There are 1 other session(s) using the database. : DROP DATABASE IF EXISTS "acs_test"> acs_test already exists NOTICE: CREATE TABLE will create implicit sequence "customers_id_seq" for serial column "customers.id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "customers_pkey" for table "customers" NOTICE: CREATE TABLE will create implicit sequence "service_plans_id_seq" for serial column "service_plans.id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "service_plans_pkey" for table "service_plans" /usr/bin/ruby1.8 -I"lib:test" "/usr/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake/rake_test_loader.rb" "test/unit/customer_test.rb" "test/unit/service_plan_test.rb" "test/unit/helpers/dashboard_helper_test.rb" "test/unit/helpers/customers_helper_test.rb" "test/unit/helpers/service_plans_helper_test.rb" /usr/lib/ruby/gems/1.8/gems/activerecord-2.3.8/lib/active_record/base.rb:1994:in `method_missing_without_paginate': undefined method `validates_presence_of_at_least_one_field' for #<Class:0xb7076bd0> (NoMethodError) from /usr/lib/ruby/gems/1.8/gems/will_paginate-2.3.12/lib/will_paginate/finder.rb:170:in `method_missing' from /home/george/projects/advancedcomfortcs/app/models/customer.rb:3 from /usr/local/lib/site_ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require' from /usr/local/lib/site_ruby/1.8/rubygems/custom_require.rb:31:in `require' from /usr/lib/ruby/gems/1.8/gems/activesupport-2.3.8/lib/active_support/dependencies.rb:158:in `require' from /usr/lib/ruby/gems/1.8/gems/activesupport-2.3.8/lib/active_support/dependencies.rb:265:in `require_or_load' from /usr/lib/ruby/gems/1.8/gems/activesupport-2.3.8/lib/active_support/dependencies.rb:224:in `depend_on' from /usr/lib/ruby/gems/1.8/gems/activesupport-2.3.8/lib/active_support/dependencies.rb:136:in `require_dependency' from /usr/lib/ruby/gems/1.8/gems/rails-2.3.8/lib/initializer.rb:414:in `load_application_classes' from /usr/lib/ruby/gems/1.8/gems/rails-2.3.8/lib/initializer.rb:413:in `each' from /usr/lib/ruby/gems/1.8/gems/rails-2.3.8/lib/initializer.rb:413:in `load_application_classes' from /usr/lib/ruby/gems/1.8/gems/rails-2.3.8/lib/initializer.rb:411:in `each' from /usr/lib/ruby/gems/1.8/gems/rails-2.3.8/lib/initializer.rb:411:in `load_application_classes' from /usr/lib/ruby/gems/1.8/gems/rails-2.3.8/lib/initializer.rb:197:in `process' from /usr/lib/ruby/gems/1.8/gems/rails-2.3.8/lib/initializer.rb:113:in `send' from /usr/lib/ruby/gems/1.8/gems/rails-2.3.8/lib/initializer.rb:113:in `run' from /home/george/projects/advancedcomfortcs/config/environment.rb:9 from ./test/test_helper.rb:2:in `require' from ./test/test_helper.rb:2 from ./test/unit/customer_test.rb:1:in `require' from ./test/unit/customer_test.rb:1 from /usr/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake/rake_test_loader.rb:5:in `load' from /usr/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake/rake_test_loader.rb:5 from /usr/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake/rake_test_loader.rb:5:in `each' from /usr/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake/rake_test_loader.rb:5 rake aborted! Command failed with status (1): [/usr/bin/ruby1.8 -I"lib:test" "/usr/lib/ru...] (See full trace by running task with --trace) It seems to have stepped on will_paginate somehow. Does anyone have any suggestions? Is there another way to do the validation I'm attempting to do? Thanks, George

Read the article
Replacing objects, handling clones, dealing with write logs

- by Alix

Hi everyone, I'm dealing with a problem I can't figure out how to solve, and I'd love to hear some suggestions. [NOTE: I realise I'm asking several questions; however, answers need to take into account all of the issues, so I cannot split this into several questions] Here's the deal: I'm implementing a system that underlies user applications and that protect shared objects from concurrent accesses. The application programmer (whose application will run on top of my system) defines such shared objects like this: public class MyAtomicObject { // These are just examples of fields you may want to have in your class. public virtual int x { get; set; } public virtual List<int> list { get; set; } public virtual MyClassA objA { get; set; } public virtual MyClassB objB { get; set; } } As you can see they declare the fields of their class as auto-generated properties (auto-generated means they don't need to implement get and set). This is so that I can go in and extend their class and implement each get and set myself in order to handle possible concurrent accesses, etc. This is all well and good, but now it starts to get ugly: the application threads run transactions, like this: The thread signals it's starting a transaction. This means we now need to monitor its accesses to the fields of the atomic objects. The thread runs its code, possibly accessing fields for reading or writing. If there are accesses for writing, we'll hide them from the other transactions (other threads), and only make them visible in step 3. This is because the transaction may fail and have to roll back (undo) its updates, and in that case we don't want other threads to see its "dirty" data. The thread signals it wants to commit the transaction. If the commit is successful, the updates it made will now become visible to everyone else. Otherwise, the transaction will abort, the updates will remain invisible, and no one will ever know the transaction was there. So basically the concept of transaction is a series of accesses that appear to have happened atomically, that is, all at the same time, in the same instant, which would be the moment of successful commit. (This is as opposed to its updates becoming visible as it makes them) In order to hide the write accesses in step 2, I clone the accessed field (let's say it's the field list) and put it in the transaction's write log. After that, any time the transaction accesses list, it will actually be accessing the clone in its write log, and not the global copy everyone else sees. Like this, any changes it makes will be done to the (invisible) clone, not to the global copy. If in step 3 the commit is successful, the transaction should replace the global copy with the updated list it has in its write log, and then the changes become visible for everyone else at once. It would be something like this: myAtomicObject.list = updatedCloneOfListInTheWriteLog; Problem #1: possible references to the list. Let's say someone puts a reference to the global list in a dictionary. When I do... myAtomicObject.list = updatedCloneOfListInTheWriteLog; ...I'm just replacing the reference in the field list, but not the real object (I'm not overwriting the data), so in the dictionary we'll still have a reference to the old version of the list. A possible solution would be to overwrite the data (in the case of a list, empty the global list and add all the elements of the clone). More generically, I would need to copy the fields of one list to the other. I can do this with reflection, but that's not very pretty. Is there any other way to do it? Problem #2: even if problem #1 is solved, I still have a similar problem with the clone: the application programmer doesn't know I'm giving him a clone and not the global copy. What if he puts the clone in a dictionary? Then at commit there will be some references to the global copy and some to the clone, when in truth they should all point to the same object. I thought about providing a wrapper object that contains both the cloned list and a pointer to the global copy, but the programmer doesn't know about this wrapper, so they're not going to use the pointer at all. The wrapper would be like this: public class Wrapper<T> : T { // This would be the pointer to the global copy. The local data is contained in whatever fields the wrapper inherits from T. private T thisPtr; } I do need this wrapper for comparisons: if I have a dictionary that has an entry with the global copy as key, if I look it up with the clone, like this: dictionary[updatedCloneOfListInTheWriteLog] I need it to return the entry, that is, to think that updatedCloneOfListInTheWriteLog and the global copy are the same thing. For this, I can just override Equals, GetHashCode, operator== and operator!=, no problem. However I still don't know how to solve the case in which the programmer unknowingly inserts a reference to the clone in a dictionary. Problem #3: the wrapper must extend the class of the object it wraps (if it's wrapping MyClassA, it must extend MyClassA) so that it's accepted wherever an object of that class (MyClass) would be accepted. However, that class (MyClassA) may be final. This is pretty horrible :$. Any suggestions? I don't need to use a wrapper, anything you can think of is fine. What I cannot change is the write log (I need to have a write log) and the fact that the programmer doesn't know about the clone. I hope I've made some sense. Feel free to ask for more info if something needs some clearing up. Thanks so much!

Read the article
Backing up my data causes my server to crash using Symantec Backup Exec 12, or How I Came to Loathe

- by Kyle Noland

I have a Dell PowerEdge 2850 running Windows Server 2003. It is the primary file server for one of my clients. I have another server also running Windows Server 2003 that acts as the core media server for Symantec Backup Exec 12. I recently upgraded from Backup Exec 11d to 12. This upgrade was necessary because we also just upgraded from Exchange 2003 to Exchange 2007. After the upgrade I had to push-install the new version 12 Backup Exec Remote Agents to each of the servers I am backing up (about 6 total). 5 of my servers are doing just fine, faithfully completing backups every night. My file server routinely crashes. Observations: When the server crashes, it does not blue screen, it just locks up completely. Even the mouse is unresponsive. If you leave the server locked up long enough, it will eventually reboot itself and hang on the Windows splash screen. There is absolutely zero useful Event Viewer evidence of a problem. The logs go from routine logging to an Unexplained Shutdown Event the next morning when I have to hard reset the server to get it to boot. 90% of the time the server does not boot cleanly, it hangs on the Windows splash screen. I don't have any light to shed here. When the server hangs all I can do is hard reset it and try again. Even after a successful boot and chkdsk /r operation, if you reboot the machine, you have a 90% chance it won't back up again cleanly. The back story: This server started crashing during nightly backups about a month ago. I tried everything I could think of to troubleshoot the problem and eventually had to give up because I could not keep coming to the office at 4 AM to try to get the server back online. One Friday I got lucky and the server stayed up for its entire full backup. I took this opportunity to restore the full backup to a temporary server I set up and switched all my users to the temporary. Then I reloaded the ailing file server. I kept all my users on the temporary file server for about 3 weeks. I installed the same Backup Exec Remote Agent and Trend Micro A/V client on the temporary server that I was using on the regular file server. During this time, I had absolutely no problems backing up the temporary server. I tested the reloaded file server extensively. I rebooted the server once an hour every day for 3 weeks trying to make it fail. It never did. I felt confident that the reload was the answer to my problems. I moved all of the data from the temporary server back to the regular server. I got 3 nightly backups out of it before it locked up again and started the familiar failure to boot cleanly behavior. This weekend I decided to monitor the file server through the entire backup job. I RDPd into the file server and also into the server running Backup Exec. On the file server I opened the Task Manager so I could view the processes and watch CPU and memory usage. Everything was running smoothly for about 60GB worth of backup. Then I noticed that the byte count of the backup job in Backup Exec had stopped progressing. I looked back over at my RDP session into the file server, and I was getting real time updates about CPU and memory usage still - both nearly 0%, which is unusual. Backups usually hover around 40% usage for the duration of the backup job. Let me reiterate this point: The screen was refreshing and I was getting real time Task Manager updates - until I clicked on the Start menu. The screen went black and the server locked up. In truth, I think the server had already locked up, the video card just hadn't figured it out yet. I went back into my bag of trick: driving to the office and hard reseting the server over and over again when it hangs up at the Windows splash screen. I did this for 2 hours without getting a successful boot. I started panicking because I did not have a decent backup to use to get everything back onto the working temporary file server. Once I exhausted everything I knew to do, I took a deep breath, booted to the Windows Server 2003 CD and performed a repair installation of Windows. The server came back up fine, with all of my data intact. I can now reboot the server at will and it will come back up cleanly. The problem is that I'm afraid as soon as I try to back that data up again I will back at square one. So let me sum things up: Here is what I've done so far to troubleshoot this server: Deleted and recreated the RAID 5 sets. Initialized the drives. Reloaded the server with a fresh Server 2003 install. Confirmed with Dell that I have installed the latest, Dell approved BIOS and NIC drivers. Uninstalled / reinstalled the Backup Exec Remote Agent. Uninstalled the Trend Micro A/V client. Configured the server not to reboot itself after a blue screen so I can see any stop error. I used to think the server was blue screening, but since I enabled this setting I now know that the server just completely locks up. Run chkdsk /r from the Windows Recovery Console. Several errors were found and corrected, but did not help my problem. Help confirm or deny the following assumptions: There are two problems at work here. Why the server is locking up in the first place, and why the server won't boot cleanly after a lockup. This is ultimately a software problem. The server works fine and can be rebooted cleanly all day long - until the first lockup - following a fresh OS load or even a Repair installation. This is not a problem with Backup Exec in general. All of my other servers back up just fine. For the record, all of the other servers run Server 2003, and some of them house more data than the file server in question here. Any help is appreciated. The irony is almost too much to bear. Backing up my data is what is jeopardizing it.

Read the article
Does anyone really understand how HFSC scheduling in Linux/BSD works?

- by Mecki

I read the original SIGCOMM '97 PostScript paper about HFSC, it is very technically, but I understand the basic concept. Instead of giving a linear service curve (as with pretty much every other scheduling algorithm), you can specify a convex or concave service curve and thus it is possible to decouple bandwidth and delay. However, even though this paper mentions to kind of scheduling algorithms being used (real-time and link-share), it always only mentions ONE curve per scheduling class (the decoupling is done by specifying this curve, only one curve is needed for that). Now HFSC has been implemented for BSD (OpenBSD, FreeBSD, etc.) using the ALTQ scheduling framework and it has been implemented Linux using the TC scheduling framework (part of iproute2). Both implementations added two additional service curves, that were NOT in the original paper! A real-time service curve and an upper-limit service curve. Again, please note that the original paper mentions two scheduling algorithms (real-time and link-share), but in that paper both work with one single service curve. There never have been two independent service curves for either one as you currently find in BSD and Linux. Even worse, some version of ALTQ seems to add an additional queue priority to HSFC (there is no such thing as priority in the original paper either). I found several BSD HowTo's mentioning this priority setting (even though the man page of the latest ALTQ release knows no such parameter for HSFC, so officially it does not even exist). This all makes the HFSC scheduling even more complex than the algorithm described in the original paper and there are tons of tutorials on the Internet that often contradict each other, one claiming the opposite of the other one. This is probably the main reason why nobody really seems to understand how HFSC scheduling really works. Before I can ask my questions, we need a sample setup of some kind. I'll use a very simple one as seen in the image below: Here are some questions I cannot answer because the tutorials contradict each other: What for do I need a real-time curve at all? Assuming A1, A2, B1, B2 are all 128 kbit/s link-share (no real-time curve for either one), then each of those will get 128 kbit/s if the root has 512 kbit/s to distribute (and A and B are both 256 kbit/s of course), right? Why would I additionally give A1 and B1 a real-time curve with 128 kbit/s? What would this be good for? To give those two a higher priority? According to original paper I can give them a higher priority by using a curve, that's what HFSC is all about after all. By giving both classes a curve of [256kbit/s 20ms 128kbit/s] both have twice the priority than A2 and B2 automatically (still only getting 128 kbit/s on average) Does the real-time bandwidth count towards the link-share bandwidth? E.g. if A1 and B1 both only have 64kbit/s real-time and 64kbit/s link-share bandwidth, does that mean once they are served 64kbit/s via real-time, their link-share requirement is satisfied as well (they might get excess bandwidth, but lets ignore that for a second) or does that mean they get another 64 kbit/s via link-share? So does each class has a bandwidth "requirement" of real-time plus link-share? Or does a class only have a higher requirement than the real-time curve if the link-share curve is higher than the real-time curve (current link-share requirement equals specified link-share requirement minus real-time bandwidth already provided to this class)? Is upper limit curve applied to real-time as well, only to link-share, or maybe to both? Some tutorials say one way, some say the other way. Some even claim upper-limit is the maximum for real-time bandwidth + link-share bandwidth? What is the truth? Assuming A2 and B2 are both 128 kbit/s, does it make any difference if A1 and B1 are 128 kbit/s link-share only, or 64 kbit/s real-time and 128 kbit/s link-share, and if so, what difference? If I use the seperate real-time curve to increase priorities of classes, why would I need "curves" at all? Why is not real-time a flat value and link-share also a flat value? Why are both curves? The need for curves is clear in the original paper, because there is only one attribute of that kind per class. But now, having three attributes (real-time, link-share, and upper-limit) what for do I still need curves on each one? Why would I want the curves shape (not average bandwidth, but their slopes) to be different for real-time and link-share traffic? According to the little documentation available, real-time curve values are totally ignored for inner classes (class A and B), they are only applied to leaf classes (A1, A2, B1, B2). If that is true, why does the ALTQ HFSC sample configuration (search for 3.3 Sample configuration) set real-time curves on inner classes and claims that those set the guaranteed rate of those inner classes? Isn't that completely pointless? (note: pshare sets the link-share curve in ALTQ and grate the real-time curve; you can see this in the paragraph above the sample configuration). Some tutorials say the sum of all real-time curves may not be higher than 80% of the line speed, others say it must not be higher than 70% of the line speed. Which one is right or are they maybe both wrong? One tutorial said you shall forget all the theory. No matter how things really work (schedulers and bandwidth distribution), imagine the three curves according to the following "simplified mind model": real-time is the guaranteed bandwidth that this class will always get. link-share is the bandwidth that this class wants to become fully satisfied, but satisfaction cannot be guaranteed. In case there is excess bandwidth, the class might even get offered more bandwidth than necessary to become satisfied, but it may never use more than upper-limit says. For all this to work, the sum of all real-time bandwidths may not be above xx% of the line speed (see question above, the percentage varies). Question: Is this more or less accurate or a total misunderstanding of HSFC? And if assumption above is really accurate, where is prioritization in that model? E.g. every class might have a real-time bandwidth (guaranteed), a link-share bandwidth (not guaranteed) and an maybe an upper-limit, but still some classes have higher priority needs than other classes. In that case I must still prioritize somehow, even among real-time traffic of those classes. Would I prioritize by the slope of the curves? And if so, which curve? The real-time curve? The link-share curve? The upper-limit curve? All of them? Would I give all of them the same slope or each a different one and how to find out the right slope? I still haven't lost hope that there exists at least a hand full of people in this world that really understood HFSC and are able to answer all these questions accurately. And doing so without contradicting each other in the answers would be really nice ;-)

Read the article
Does anyone really understand how HFSC scheduling in Linux/BSD works?

- by Mecki

I read the original SIGCOMM '97 PostScript paper about HFSC, it is very technically, but I understand the basic concept. Instead of giving a linear service curve (as with pretty much every other scheduling algorithm), you can specify a convex or concave service curve and thus it is possible to decouple bandwidth and delay. However, even though this paper mentions to kind of scheduling algorithms being used (real-time and link-share), it always only mentions ONE curve per scheduling class (the decoupling is done by specifying this curve, only one curve is needed for that). Now HFSC has been implemented for BSD (OpenBSD, FreeBSD, etc.) using the ALTQ scheduling framework and it has been implemented Linux using the TC scheduling framework (part of iproute2). Both implementations added two additional service curves, that were NOT in the original paper! A real-time service curve and an upper-limit service curve. Again, please note that the original paper mentions two scheduling algorithms (real-time and link-share), but in that paper both work with one single service curve. There never have been two independent service curves for either one as you currently find in BSD and Linux. Even worse, some version of ALTQ seems to add an additional queue priority to HSFC (there is no such thing as priority in the original paper either). I found several BSD HowTo's mentioning this priority setting (even though the man page of the latest ALTQ release knows no such parameter for HSFC, so officially it does not even exist). This all makes the HFSC scheduling even more complex than the algorithm described in the original paper and there are tons of tutorials on the Internet that often contradict each other, one claiming the opposite of the other one. This is probably the main reason why nobody really seems to understand how HFSC scheduling really works. Before I can ask my questions, we need a sample setup of some kind. I'll use a very simple one as seen in the image below: Here are some questions I cannot answer because the tutorials contradict each other: What for do I need a real-time curve at all? Assuming A1, A2, B1, B2 are all 128 kbit/s link-share (no real-time curve for either one), then each of those will get 128 kbit/s if the root has 512 kbit/s to distribute (and A and B are both 256 kbit/s of course), right? Why would I additionally give A1 and B1 a real-time curve with 128 kbit/s? What would this be good for? To give those two a higher priority? According to original paper I can give them a higher priority by using a curve, that's what HFSC is all about after all. By giving both classes a curve of [256kbit/s 20ms 128kbit/s] both have twice the priority than A2 and B2 automatically (still only getting 128 kbit/s on average) Does the real-time bandwidth count towards the link-share bandwidth? E.g. if A1 and B1 both only have 64kbit/s real-time and 64kbit/s link-share bandwidth, does that mean once they are served 64kbit/s via real-time, their link-share requirement is satisfied as well (they might get excess bandwidth, but lets ignore that for a second) or does that mean they get another 64 kbit/s via link-share? So does each class has a bandwidth "requirement" of real-time plus link-share? Or does a class only have a higher requirement than the real-time curve if the link-share curve is higher than the real-time curve (current link-share requirement equals specified link-share requirement minus real-time bandwidth already provided to this class)? Is upper limit curve applied to real-time as well, only to link-share, or maybe to both? Some tutorials say one way, some say the other way. Some even claim upper-limit is the maximum for real-time bandwidth + link-share bandwidth? What is the truth? Assuming A2 and B2 are both 128 kbit/s, does it make any difference if A1 and B1 are 128 kbit/s link-share only, or 64 kbit/s real-time and 128 kbit/s link-share, and if so, what difference? If I use the seperate real-time curve to increase priorities of classes, why would I need "curves" at all? Why is not real-time a flat value and link-share also a flat value? Why are both curves? The need for curves is clear in the original paper, because there is only one attribute of that kind per class. But now, having three attributes (real-time, link-share, and upper-limit) what for do I still need curves on each one? Why would I want the curves shape (not average bandwidth, but their slopes) to be different for real-time and link-share traffic? According to the little documentation available, real-time curve values are totally ignored for inner classes (class A and B), they are only applied to leaf classes (A1, A2, B1, B2). If that is true, why does the ALTQ HFSC sample configuration (search for 3.3 Sample configuration) set real-time curves on inner classes and claims that those set the guaranteed rate of those inner classes? Isn't that completely pointless? (note: pshare sets the link-share curve in ALTQ and grate the real-time curve; you can see this in the paragraph above the sample configuration). Some tutorials say the sum of all real-time curves may not be higher than 80% of the line speed, others say it must not be higher than 70% of the line speed. Which one is right or are they maybe both wrong? One tutorial said you shall forget all the theory. No matter how things really work (schedulers and bandwidth distribution), imagine the three curves according to the following "simplified mind model": real-time is the guaranteed bandwidth that this class will always get. link-share is the bandwidth that this class wants to become fully satisfied, but satisfaction cannot be guaranteed. In case there is excess bandwidth, the class might even get offered more bandwidth than necessary to become satisfied, but it may never use more than upper-limit says. For all this to work, the sum of all real-time bandwidths may not be above xx% of the line speed (see question above, the percentage varies). Question: Is this more or less accurate or a total misunderstanding of HSFC? And if assumption above is really accurate, where is prioritization in that model? E.g. every class might have a real-time bandwidth (guaranteed), a link-share bandwidth (not guaranteed) and an maybe an upper-limit, but still some classes have higher priority needs than other classes. In that case I must still prioritize somehow, even among real-time traffic of those classes. Would I prioritize by the slope of the curves? And if so, which curve? The real-time curve? The link-share curve? The upper-limit curve? All of them? Would I give all of them the same slope or each a different one and how to find out the right slope? I still haven't lost hope that there exists at least a hand full of people in this world that really understood HFSC and are able to answer all these questions accurately. And doing so without contradicting each other in the answers would be really nice ;-)

Read the article
Backing up my data causes my server to crash using Symantec Backup Exec 12, or How I Came to Loathe Irony

- by Kyle Noland

I have a Dell PowerEdge 2850 running Windows Server 2003. It is the primary file server for one of my clients. I have another server also running Windows Server 2003 that acts as the core media server for Symantec Backup Exec 12. I recently upgraded from Backup Exec 11d to 12. This upgrade was necessary because we also just upgraded from Exchange 2003 to Exchange 2007. After the upgrade I had to push-install the new version 12 Backup Exec Remote Agents to each of the servers I am backing up (about 6 total). 5 of my servers are doing just fine, faithfully completing backups every night. My file server routinely crashes. Observations: When the server crashes, it does not blue screen, it just locks up completely. Even the mouse is unresponsive. If you leave the server locked up long enough, it will eventually reboot itself and hang on the Windows splash screen. There is absolutely zero useful Event Viewer evidence of a problem. The logs go from routine logging to an Unexplained Shutdown Event the next morning when I have to hard reset the server to get it to boot. 90% of the time the server does not boot cleanly, it hangs on the Windows splash screen. I don't have any light to shed here. When the server hangs all I can do is hard reset it and try again. Even after a successful boot and chkdsk /r operation, if you reboot the machine, you have a 90% chance it won't back up again cleanly. The back story: This server started crashing during nightly backups about a month ago. I tried everything I could think of to troubleshoot the problem and eventually had to give up because I could not keep coming to the office at 4 AM to try to get the server back online. One Friday I got lucky and the server stayed up for its entire full backup. I took this opportunity to restore the full backup to a temporary server I set up and switched all my users to the temporary. Then I reloaded the ailing file server. I kept all my users on the temporary file server for about 3 weeks. I installed the same Backup Exec Remote Agent and Trend Micro A/V client on the temporary server that I was using on the regular file server. During this time, I had absolutely no problems backing up the temporary server. I tested the reloaded file server extensively. I rebooted the server once an hour every day for 3 weeks trying to make it fail. It never did. I felt confident that the reload was the answer to my problems. I moved all of the data from the temporary server back to the regular server. I got 3 nightly backups out of it before it locked up again and started the familiar failure to boot cleanly behavior. This weekend I decided to monitor the file server through the entire backup job. I RDPd into the file server and also into the server running Backup Exec. On the file server I opened the Task Manager so I could view the processes and watch CPU and memory usage. Everything was running smoothly for about 60GB worth of backup. Then I noticed that the byte count of the backup job in Backup Exec had stopped progressing. I looked back over at my RDP session into the file server, and I was getting real time updates about CPU and memory usage still - both nearly 0%, which is unusual. Backups usually hover around 40% usage for the duration of the backup job. Let me reiterate this point: The screen was refreshing and I was getting real time Task Manager updates - until I clicked on the Start menu. The screen went black and the server locked up. In truth, I think the server had already locked up, the video card just hadn't figured it out yet. I went back into my bag of trick: driving to the office and hard reseting the server over and over again when it hangs up at the Windows splash screen. I did this for 2 hours without getting a successful boot. I started panicking because I did not have a decent backup to use to get everything back onto the working temporary file server. Once I exhausted everything I knew to do, I took a deep breath, booted to the Windows Server 2003 CD and performed a repair installation of Windows. The server came back up fine, with all of my data intact. I can now reboot the server at will and it will come back up cleanly. The problem is that I'm afraid as soon as I try to back that data up again I will back at square one. So let me sum things up: Here is what I've done so far to troubleshoot this server: Deleted and recreated the RAID 5 sets. Initialized the drives. Reloaded the server with a fresh Server 2003 install. Confirmed with Dell that I have installed the latest, Dell approved BIOS and NIC drivers. Uninstalled / reinstalled the Backup Exec Remote Agent. Uninstalled the Trend Micro A/V client. Configured the server not to reboot itself after a blue screen so I can see any stop error. I used to think the server was blue screening, but since I enabled this setting I now know that the server just completely locks up. Run chkdsk /r from the Windows Recovery Console. Several errors were found and corrected, but did not help my problem. Help confirm or deny the following assumptions: There are two problems at work here. Why the server is locking up in the first place, and why the server won't boot cleanly after a lockup. This is ultimately a software problem. The server works fine and can be rebooted cleanly all day long - until the first lockup - following a fresh OS load or even a Repair installation. This is not a problem with Backup Exec in general. All of my other servers back up just fine. For the record, all of the other servers run Server 2003, and some of them house more data than the file server in question here. Any help is appreciated. The irony is almost too much to bear. Backing up my data is what is jeopardizing it.

Read the article
Towards Database Continuous Delivery – What Next after Continuous Integration? A Checklist

- by Ben Rees

.dbd-banner p{ font-size:0.75em; padding:0 0 10px; margin:0 } .dbd-banner p span{ color:#675C6D; } .dbd-banner p:last-child{ padding:0; } @media ALL and (max-width:640px){ .dbd-banner{ background:#f0f0f0; padding:5px; color:#333; margin-top: 5px; } } -- Database delivery patterns & practices STAGE 4 AUTOMATED DEPLOYMENT If you’ve been fortunate enough to get to the stage where you’ve implemented some sort of continuous integration process for your database updates, then hopefully you’re seeing the benefits of that investment – constant feedback on changes your devs are making, advanced warning of data loss (prior to the production release on Saturday night!), a nice suite of automated tests to check business logic, so you know it’s going to work when it goes live, and so on. But what next? What can you do to improve your delivery process further, moving towards a full continuous delivery process for your database? In this article I describe some of the issues you might need to tackle on the next stage of this journey, and how to plan to overcome those obstacles before they appear. Our Database Delivery Learning Program consists of four stages, really three – source controlling a database, running continuous integration processes, then how to set up automated deployment (the middle stage is split in two – basic and advanced continuous integration, making four stages in total). If you’ve managed to work through the first three of these stages – source control, basic, then advanced CI, then you should have a solid change management process set up where, every time one of your team checks in a change to your database (whether schema or static reference data), this change gets fully tested automatically by your CI server. But this is only part of the story. Great, we know that our updates work, that the upgrade process works, that the upgrade isn’t going to wipe our 4Tb of production data with a single DROP TABLE. But – how do you get this (fully tested) release live? Continuous delivery means being always ready to release your software at any point in time. There’s a significant gap between your latest version being tested, and it being easily releasable. Just a quick note on terminology – there’s a nice piece here from Atlassian on the difference between continuous integration, continuous delivery and continuous deployment. This piece also gives a nice description of the benefits of continuous delivery. These benefits have been summed up by Jez Humble at Thoughtworks as: “Continuous delivery is a set of principles and practices to reduce the cost, time, and risk of delivering incremental changes to users” There’s another really useful piece here on Simple-Talk about the need for continuous delivery and how it applies to the database written by Phil Factor – specifically the extra needs and complexities of implementing a full CD solution for the database (compared to just implementing CD for, say, a web app). So, hopefully you’re convinced of moving on the the next stage! The next step after CI is to get some sort of automated deployment (or “release management”) process set up. But what should I do next? What do I need to plan and think about for getting my automated database deployment process set up? Can’t I just install one of the many release management tools available and hey presto, I’m ready! If only it were that simple. Below I list some of the areas that it’s worth spending a little time on, where a little planning and prep could go a long way. It’s also worth pointing out, that this should really be an evolving process. Depending on your starting point of course, it can be a long journey from your current setup to a full continuous delivery pipeline. If you’ve got a CI mechanism in place, you’re certainly a long way down that path. Nevertheless, we’d recommend evolving your process incrementally. Pages 157 and 129-141 of the book on Continuous Delivery (by Jez Humble and Dave Farley) have some great guidance on building up a pipeline incrementally: http://www.amazon.com/Continuous-Delivery-Deployment-Automation-Addison-Wesley/dp/0321601912 For now, in this post, we’ll look at the following areas for your checklist: You and Your Team Environments The Deployment Process Rollback and Recovery Development Practices You and Your Team It’s a cliché in the DevOps community that “It’s not all about processes and tools, really it’s all about a culture”. As stated in this DevOps report from Puppet Labs: “DevOps processes and tooling contribute to high performance, but these practices alone aren’t enough to achieve organizational success. The most common barriers to DevOps adoption are cultural: lack of manager or team buy-in, or the value of DevOps isn’t understood outside of a specific group”. Like most clichés, there’s truth in there – if you want to set up a database continuous delivery process, you need to get your boss, your department, your company (if relevant) onside. Why? Because it’s an investment with the benefits coming way down the line. But the benefits are huge – for HP, in the book A Practical Approach to Large-Scale Agile Development: How HP Transformed LaserJet FutureSmart Firmware, these are summarized as: -2008 to present: overall development costs reduced by 40% -Number of programs under development increased by 140% -Development costs per program down 78% -Firmware resources now driving innovation increased by a factor of 8 (from 5% working on new features to 40% But what does this mean? It means that, when moving to the next stage, to make that extra investment in automating your deployment process, it helps a lot if everyone is convinced that this is a good thing. That they understand the benefits of automated deployment and are willing to make the effort to transform to a new way of working. Incidentally, if you’re ever struggling to convince someone of the value I’d strongly recommend just buying them a copy of this book – a great read, and a very practical guide to how it can really work at a large org. I’ve spoken to many customers who have implemented database CI who describe their deployment process as “The point where automation breaks down. Up to that point, the CI process runs, untouched by human hand, but as soon as that’s finished we revert to manual.” This deployment process can involve, for example, a DBA manually comparing an environment (say, QA) to production, creating the upgrade scripts, reading through them, checking them against an Excel document emailed to him/her the night before, turning to page 29 in his/her notebook to double-check how replication is switched off and on for deployments, and so on and so on. Painful, error-prone and lengthy. But the point is, if this is something like your deployment process, telling your DBA “We’re changing everything you do and your toolset next week, to automate most of your role – that’s okay isn’t it?” isn’t likely to go down well. There’s some work here to bring him/her onside – to explain what you’re doing, why there will still be control of the deployment process and so on. Or of course, if you’re the DBA looking after this process, you have to do a similar job in reverse. You may have researched and worked out how you’d like to change your methodology to start automating your painful release process, but do the dev team know this? What if they have to start producing different artifacts for you? Will they be happy with this? Worth talking to them, to find out. As well as talking to your DBA/dev team, the other group to get involved before implementation is your manager. And possibly your manager’s manager too. As mentioned, unless there’s buy-in “from the top”, you’re going to hit problems when the implementation starts to get rocky (and what tool/process implementations don’t get rocky?!). You need to have support from someone senior in your organisation – someone you can turn to when you need help with a delayed implementation, lack of resources or lack of progress. Actions: Get your DBA involved (or whoever looks after live deployments) and discuss what you’re planning to do or, if you’re the DBA yourself, get the dev team up-to-speed with your plans, Get your boss involved too and make sure he/she is bought in to the investment. Environments Where are you going to deploy to? And really this question is – what environments do you want set up for your deployment pipeline? Assume everyone has “Production”, but do you have a QA environment? Dedicated development environments for each dev? Proper pre-production? I’ve seen every setup under the sun, and there is often a big difference between “What we want, to do continuous delivery properly” and “What we’re currently stuck with”. Some of these differences are: What we want What we’ve got Each developer with their own dedicated database environment A single shared “development” environment, used by everyone at once An Integration box used to test the integration of all check-ins via the CI process, along with a full suite of unit-tests running on that machine In fact if you have a CI process running, you’re likely to have some sort of integration server running (even if you don’t call it that!). Whether you have a full suite of unit tests running is a different question… Separate QA environment used explicitly for manual testing prior to release “We just test on the dev environments, or maybe pre-production” A proper pre-production (or “staging”) box that matches production as closely as possible Hopefully a pre-production box of some sort. But does it match production closely!? A production environment reproducible from source control A production box which has drifted significantly from anything in source control The big question is – how much time and effort are you going to invest in fixing these issues? In reality this just involves figuring out which new databases you’re going to create and where they’ll be hosted – VMs? Cloud-based? What about size/data issues – what data are you going to include on dev environments? Does it need to be masked to protect access to production data? And often the amount of work here really depends on whether you’re working on a new, greenfield project, or trying to update an existing, brownfield application. There’s a world if difference between starting from scratch with 4 or 5 clean environments (reproducible from source control of course!), and trying to re-purpose and tweak a set of existing databases, with all of their surrounding processes and quirks. But for a proper release management process, ideally you have: Dedicated development databases, An Integration server used for testing continuous integration and running unit tests. [NB: This is the point at which deployments are automatic, without human intervention. Each deployment after this point is a one-click (but human) action], QA – QA engineers use a one-click deployment process to automatically* deploy chosen releases to QA for testing, Pre-production. The environment you use to test the production release process, Production. * A note on the use of the word “automatic” – when carrying out automated deployments this does not mean that the deployment is happening without human intervention (i.e. that something is just deploying over and over again). It means that the process of carrying out the deployment is automatic in that it’s not a person manually running through a checklist or set of actions. The deployment still requires a single-click from a user. Actions: Get your environments set up and ready, Set access permissions appropriately, Make sure everyone understands what the environments will be used for (it’s not a “free-for-all” with all environments to be accessed, played with and changed by development). The Deployment Process As described earlier, most existing database deployment processes are pretty manual. The following is a description of a process we hear very often when we ask customers “How do your database changes get live? How does your manual process work?” Check pre-production matches production (use a schema compare tool, like SQL Compare). Sometimes done by taking a backup from production and restoring in to pre-prod, Again, use a schema compare tool to find the differences between the latest version of the database ready to go live (i.e. what the team have been developing). This generates a script, User (generally, the DBA), reviews the script. This often involves manually checking updates against a spreadsheet or similar, Run the script on pre-production, and check there are no errors (i.e. it upgrades pre-production to what you hoped), If all working, run the script on production.* * this assumes there’s no problem with production drifting away from pre-production in the interim time period (i.e. someone has hacked something in to the production box without going through the proper change management process). This difference could undermine the validity of your pre-production deployment test. Red Gate is currently working on a free tool to detect this problem – sign up here at www.sqllighthouse.com, if you’re interested in testing early versions. There are several variations on this process – some better, some much worse! How do you automate this? In particular, step 3 – surely you can’t automate a DBA checking through a script, that everything is in order!? The key point here is to plan what you want in your new deployment process. There are so many options. At one extreme, pure continuous deployment – whenever a dev checks something in to source control, the CI process runs (including extensive and thorough testing!), before the deployment process keys in and automatically deploys that change to the live box. Not for the faint hearted – and really not something we recommend. At the other extreme, you might be more comfortable with a semi-automated process – the pre-production/production matching process is automated (with an error thrown if these environments don’t match), followed by a manual intervention, allowing for script approval by the DBA. One he/she clicks “Okay, I’m happy for that to go live”, the latter stages automatically take the script through to live. And anything in between of course – and other variations. But we’d strongly recommended sitting down with a whiteboard and your team, and spending a couple of hours mapping out “What do we do now?”, “What do we actually want?”, “What will satisfy our needs for continuous delivery, but still maintaining some sort of continuous control over the process?” NB: Most of what we’re discussing here is about production deployments. It’s important to note that you will also need to map out a deployment process for earlier environments (for example QA). However, these are likely to be less onerous, and many customers opt for a much more automated process for these boxes. Actions: Sit down with your team and a whiteboard, and draw out the answers to the questions above for your production deployments – “What do we do now?”, “What do we actually want?”, “What will satisfy our needs for continuous delivery, but still maintaining some sort of continuous control over the process?” Repeat for earlier environments (QA and so on). Rollback and Recovery If only every deployment went according to plan! Unfortunately they don’t – and when things go wrong, you need a rollback or recovery plan for what you’re going to do in that situation. Once you move in to a more automated database deployment process, you’re far more likely to be deploying more frequently than before. No longer once every 6 months, maybe now once per week, or even daily. Hence the need for a quick rollback or recovery process becomes paramount, and should be planned for. NB: These are mainly scenarios for handling rollbacks after the transaction has been committed. If a failure is detected during the transaction, the whole transaction can just be rolled back, no problem. There are various options, which we’ll explore in subsequent articles, things like: Immediately restore from backup, Have a pre-tested rollback script (remembering that really this is a “roll-forward” script – there’s not really such a thing as a rollback script for a database!) Have fallback environments – for example, using a blue-green deployment pattern. Different options have pros and cons – some are easier to set up, some require more investment in infrastructure; and of course some work better than others (the key issue with using backups, is loss of the interim transaction data that has been added between the failed deployment and the restore). The best mechanism will be primarily dependent on how your application works and how much you need a cast-iron failsafe mechanism. Actions: Work out an appropriate rollback strategy based on how your application and business works, your appetite for investment and requirements for a completely failsafe process. Development Practices This is perhaps the more difficult area for people to tackle. The process by which you can deploy database updates is actually intrinsically linked with the patterns and practices used to develop that database and linked application. So you need to decide whether you want to implement some changes to the way your developers actually develop the database (particularly schema changes) to make the deployment process easier. A good example is the pattern “Branch by abstraction”. Explained nicely here, by Martin Fowler, this is a process that can be used to make significant database changes (e.g. splitting a table) in a step-wise manner so that you can always roll back, without data loss – by making incremental updates to the database backward compatible. Slides 103-108 of the following slidedeck, from Niek Bartholomeus explain the process: https://speakerdeck.com/niekbartho/orchestration-in-meatspace As these slides show, by making a significant schema change in multiple steps – where each step can be rolled back without any loss of new data – this affords the release team the opportunity to have zero-downtime deployments with considerably less stress (because if an increment goes wrong, they can roll back easily). There are plenty more great patterns that can be implemented – the book Refactoring Databases, by Scott Ambler and Pramod Sadalage is a great read, if this is a direction you want to go in: http://www.amazon.com/Refactoring-Databases-Evolutionary-paperback-Addison-Wesley/dp/0321774515 But the question is – how much of this investment are you willing to make? How often are you making significant schema changes that would require these best practices? Again, there’s a difference here between migrating old projects and starting afresh – with the latter it’s much easier to instigate best practice from the start. Actions: For your business, work out how far down the path you want to go, amending your database development patterns to “best practice”. It’s a trade-off between implementing quality processes, and the necessity to do so (depending on how often you make complex changes). Socialise these changes with your development group. No-one likes having “best practice” changes imposed on them, so good to introduce these ideas and the rationale behind them early. Summary The next stages of implementing a continuous delivery pipeline for your database changes (once you have CI up and running) require a little pre-planning, if you want to get the most out of the work, and for the implementation to go smoothly. We’ve covered some of the checklist of areas to consider – mainly in the areas of “Getting the team ready for the changes that are coming” and “Planning our your pipeline, environments, patterns and practices for development”, though there will be more detail, depending on where you’re coming from – and where you want to get to. This article is part of our database delivery patterns & practices series on Simple Talk. Find more articles for version control, automated testing, continuous integration & deployment.

Read the article
Part 2–Load Testing In The Cloud

- by Tarun Arora

Welcome to Part 2, In Part 1 we discussed the advantages of creating a Test Rig in the cloud, the Azure edge and the Test Rig Topology we want to get to. In Part 2, Let’s start by understanding the components of Azure we’ll be making use of followed by manually putting them together to create the test rig, so… let’s get down dirty start setting up the Test Rig. What Components of Azure will I be using for building the Test Rig in the Cloud? To run the Test Agents we’ll make use of Windows Azure Compute and to enable communication between Test Controller and Test Agents we’ll make use of Windows Azure Connect. Azure Connect The Test Controller is on premise and the Test Agents are in the cloud (How will they talk?). To enable communication between the two, we’ll make use of Windows Azure Connect. With Windows Azure Connect, you can use a simple user interface to configure IPsec protected connections between computers or virtual machines (VMs) in your organization’s network, and roles running in Windows Azure. With this you can now join Windows Azure role instances to your domain, so that you can use your existing methods for domain authentication, name resolution, or other domain-wide maintenance actions. For more details refer to an overview of Windows Azure connect. A very useful video explaining everything you wanted to know about Windows Azure connect. Azure Compute Windows Azure compute provides developers a platform to host and manage applications in Microsoft’s data centres across the globe. A Windows Azure application is built from one or more components called ‘roles.’ Roles come in three different types: Web role, Worker role, and Virtual Machine (VM) role, we’ll be using the Worker role to set up the Test Agents. A very nice blog post discussing the difference between the 3 role types. Developers are free to use the .NET framework or other software that runs on Windows with the Worker role or Web role. Developers can also create applications using languages such as PHP and Java. More on Windows Azure Compute. Each Windows Azure compute instance represents a virtual server... Virtual Machine Size CPU Cores Memory Cost Per Hour Extra Small Shared 768 MB $0.04 Small 1 1.75 GB $0.12 Medium 2 3.50 GB $0.24 Large 4 7.00 GB $0.48 Extra Large 8 14.00 GB $0.96 You might want to review the Windows Azure Pricing FAQ. Let’s Get Started building the Test Rig… Configuration Machine Role Comments VM – 1 Domain Controller for Playpit.com On Premise VM – 2 TFS, Test Controller On Premise VM – 3 Test Agent Cloud In this blog post I would assume that you have the domain, Team Foundation Server and Test Controller Installed and set up already. If not, please refer to the TFS 2010 Installation Guide and this walkthrough on MSDN to set up your Test Controller. You can also download a preconfigured TFS 2010 VM from Brian Keller's blog, Brian also has some great hands on Labs on TFS 2010 that you may want to explore. I. Lets start building VM – 3: The Test Agent Download the Windows Azure SDK and Tools Open Visual Studio and create a new Windows Azure Project using the Cloud Template Choose the Worker Role for reasons explained in the earlier post The WorkerRole.cs implements the Run() and OnStart() methods, no code changes required. You should be able to compile the project and run it in the compute emulator (The compute emulator should have been installed as part of the Windows Azure Toolkit) on your local machine. We will only be making changes to WindowsAzureProject, open ServiceDefinition.csdef. Ensure that the vmsize is small (remember the cost chart above). Import the “Connect” module. I am importing the Connect module because I need to join the Worker role VM to the Playpit domain. <?xml version="1.0" encoding="utf-8"?> <ServiceDefinition name="WindowsAzureProject2" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceDefinition"> <WorkerRole name="WorkerRole1" vmsize="Small"> <Imports> <Import moduleName="Diagnostics" /> <Import moduleName="Connect"/> </Imports> </WorkerRole> </ServiceDefinition> Go to the ServiceConfiguration.Cloud.cscfg and note that settings with key ‘Microsoft.WindowsAzure.Plugins.Connect.%%%%’ have been added to the configuration file. This is because you decided to import the connect module. See the config below. <?xml version="1.0" encoding="utf-8"?> <ServiceConfiguration serviceName="WindowsAzureProject2" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="1" osVersion="*"> <Role name="WorkerRole1"> <Instances count="1" /> <ConfigurationSettings> <Setting name="Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString" value="UseDevelopmentStorage=true" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.ActivationToken" value="" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.Refresh" value="" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.WaitForConnectivity" value="" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.Upgrade" value="" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.EnableDomainJoin" value="" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.DomainFQDN" value="" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.DomainControllerFQDN" value="" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.DomainAccountName" value="" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.DomainPassword" value="" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.DomainOU" value="" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.Administrators" value="" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.DomainSiteName" value="" /> </ConfigurationSettings> </Role> </ServiceConfiguration> Let’s go step by step and understand all the highlighted parameters and where you can find the values for them. osFamily – By default this is set to 1 (Windows Server 2008 SP2). Change this to 2 if you want the Windows Server 2008 R2 operating system. The Advantage of using osFamily = “2” is that you get Powershell 2.0 rather than Powershell 1.0. In Powershell 2.0 you could simply use “powershell -ExecutionPolicy Unrestricted ./myscript.ps1” and it will work while in Powershell 1.0 you will have to change the registry key by including the following in your command file “reg add HKLM\Software\Microsoft\PowerShell\1\ShellIds\Microsoft.PowerShell /v ExecutionPolicy /d Unrestricted /f” before you can execute any power shell. The other reason you might want to move to os2 is if you wanted IIS 7.5. Activation Token – To enable communication between the on premise machine and the Windows Azure Worker role VM both need to have the same token. Log on to Windows Azure Management Portal, click on Connect, click on Get Activation Token, this should give you the activation token, copy the activation token to the clipboard and paste it in the configuration file. Note – Later in the blog I’ll be showing you how to install connect on the on premise machine. EnableDomainJoin – Set the value to true, ofcourse we want to join the on windows azure worker role VM to the domain. DomainFQDN, DomainControllerFQDN, DomainAccountName, DomainPassword, DomainOU, Administrators – This information is specific to your domain. I have extracted this information from the ‘service manager’ and ‘Active Directory Users and Computers’. Also, i created a new Domain-OU namely ‘CloudInstances’ so all my cloud instances joined to my domain show up here, this is optional. You can encrypt the DomainPassword – refer to the instructions here. Or hold fire, I’ll be covering that when i come to certificates and encryption in the coming section. Now once you have filled all this information up, the configuration file should look something like below, <?xml version="1.0" encoding="utf-8"?> <ServiceConfiguration serviceName="WindowsAzureProject2" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="2" osVersion="*"> <Role name="WorkerRole1"> <Instances count="1" /> <ConfigurationSettings> <Setting name="Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString" value="UseDevelopmentStorage=true" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.ActivationToken" value="45f55fea-f194-4fbc-b36e-25604faac784" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.Refresh" value="" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.WaitForConnectivity" value="" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.Upgrade" value="" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.EnableDomainJoin" value="true" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.DomainFQDN" value="play.pit.com" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.DomainControllerFQDN" value="WIN-KUDQMQFGQOL.play.pit.com" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.DomainAccountName" value="playpit\Administrator" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.DomainPassword" value="************************" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.DomainOU" value="OU=CloudInstances, DC=Play, DC=Pit, DC=com" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.Administrators" value="Playpit\Administrator" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.DomainSiteName" value="" /> </ConfigurationSettings> </Role> </ServiceConfiguration> Next we will be enabling the Remote Desktop module in to the ServiceDefinition.csdef, we could make changes manually or allow a beautiful wizard to help us make changes. I prefer the second option. So right click on the Windows Azure project and choose Publish Now once you get the publish wizard, if you haven’t already you would be asked to import your Windows Azure subscription, this is simply the Msdn subscription activation key xml. Once you have done click Next to go to the Settings page and check ‘Enable Remote Desktop for all roles’. As soon as you do that you get another pop up asking you the details for the user that you would be logging in with (make sure you enter a reasonable expiry date, you do not want the user account to expire today). Notice the more information tag at the bottom, click that to get access to the certificate section. See screen shot below. From the drop down select the option to create a new certificate In the pop up window enter the friendly name for your certificate. In my case I entered ‘WAC – Test Rig’ and click ok. This will create a new certificate for you. Click on the view button to see the certificate details. Do you see the Thumbprint, this is the value that will go in the config file (very important). Now click on the Copy to File button to copy the certificate, we will need to import the certificate to the windows Azure Management portal later. So, make sure you save it a safe location. Click Finish and enter details of the user you would like to create with permissions for remote desktop access, once you have entered the details on the ‘Remote desktop configuration’ screen click on Ok. From the Publish Windows Azure Wizard screen press Cancel. Cancel because we don’t want to publish the role just yet and Yes because we want to save all the changes in the config file. Now if you go to the ServiceDefinition.csdef file you will see that the RemoteAccess and RemoteForwarder roles have been imported for you. <?xml version="1.0" encoding="utf-8"?> <ServiceDefinition name="WindowsAzureProject2" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceDefinition"> <WorkerRole name="WorkerRole1" vmsize="Small"> <Imports> <Import moduleName="Diagnostics" /> <Import moduleName="Connect" /> <Import moduleName="RemoteAccess" /> <Import moduleName="RemoteForwarder" /> </Imports> </WorkerRole> </ServiceDefinition> Now go to the ServiceConfiguration.Cloud.cscfg file and you see a whole bunch for setting “Microsoft.WindowsAzure.Plugins.RemoteAccess.%%%” values added for you. <?xml version="1.0" encoding="utf-8"?> <ServiceConfiguration serviceName="WindowsAzureProject2" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="2" osVersion="*"> <Role name="WorkerRole1"> <Instances count="1" /> <ConfigurationSettings> <Setting name="Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString" value="UseDevelopmentStorage=true" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.ActivationToken" value="45f55fea-f194-4fbc-b36e-25604faac784" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.Refresh" value="" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.WaitForConnectivity" value="" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.Upgrade" value="" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.EnableDomainJoin" value="true" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.DomainFQDN" value="play.pit.com" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.DomainControllerFQDN" value="WIN-KUDQMQFGQOL.play.pit.com" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.DomainAccountName" value="playpit\Administrator" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.DomainPassword" value="************************" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.DomainOU" value="OU=CloudInstances, DC=Play, DC=Pit, DC=com" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.Administrators" value="Playpit\Administrator" /> <Setting name="Microsoft.WindowsAzure.Plugins.Connect.DomainSiteName" value="" /> <Setting name="Microsoft.WindowsAzure.Plugins.RemoteAccess.Enabled" value="true" /> <Setting name="Microsoft.WindowsAzure.Plugins.RemoteAccess.AccountUsername" value="Administrator" /> <Setting name="Microsoft.WindowsAzure.Plugins.RemoteAccess.AccountEncryptedPassword" value="MIIBnQYJKoZIhvcNAQcDoIIBjjCCAYoCAQAxggFOMIIBSgIBADAyMB4xHDAaBgNVBAMME1dpbmRvd 3MgQXp1cmUgVG9vbHMCEGa+B46voeO5T305N7TSG9QwDQYJKoZIhvcNAQEBBQAEggEABg4ol5Xol66Ip6QKLbAPWdmD4ae ADZ7aKj6fg4D+ATr0DXBllZHG5Umwf+84Sj2nsPeCyrg3ZDQuxrfhSbdnJwuChKV6ukXdGjX0hlowJu/4dfH4jTJC7sBWS AKaEFU7CxvqYEAL1Hf9VPL5fW6HZVmq1z+qmm4ecGKSTOJ20Fptb463wcXgR8CWGa+1w9xqJ7UmmfGeGeCHQ4QGW0IDSBU6ccg vzF2ug8/FY60K1vrWaCYOhKkxD3YBs8U9X/kOB0yQm2Git0d5tFlIPCBT2AC57bgsAYncXfHvPesI0qs7VZyghk8LVa9g5IqaM Cp6cQ7rmY/dLsKBMkDcdBHuCTAzBgkqhkiG9w0BBwEwFAYIKoZIhvcNAwcECDRVifSXbA43gBApNrp40L1VTVZ1iGag+3O1" /> <Setting name="Microsoft.WindowsAzure.Plugins.RemoteAccess.AccountExpiration" value="2012-11-27T23:59:59.0000000+00:00" /> <Setting name="Microsoft.WindowsAzure.Plugins.RemoteForwarder.Enabled" value="true" /> </ConfigurationSettings> <Certificates> <Certificate name="Microsoft.WindowsAzure.Plugins.RemoteAccess.PasswordEncryption" thumbprint="AA23016CF0BDFC344400B5B82706B608B92E4217" thumbprintAlgorithm="sha1" /> </Certificates> </Role> </ServiceConfiguration> Okay let’s look at them one at a time, Enabled - Yes, we would like to enable Remote Access. AccountUserName – This is the user name you entered while you were on the publish windows azure role screen, as detailed above. AccountEncrytedPassword – Try and decode that, the certificate is used to encrypt the password you specified for the user account. Remember earlier i said, either use the instructions or wait and i’ll be showing you encryption, now the user account i am using for rdp has the same password as my domain password, so i can simply copy the value of the AccountEncryptedPassword to the DomainPassword as well. AccountExpiration – This is the expiration as you specified in the wizard earlier, make sure your account does not expire today. Remote Forwarder – Check out the documentation, below is how I understand it, -- One role in an application that implements a remote desktop connection must import the RemoteForwarder module. The two modules work together to enable the remote desktop connections to role instances. -- If you have multiple roles defined in the service model, it does not matter which role you add the RemoteForwarder module to, but you must add it to only one of the role definitions. Certificate – Remember the certificate thumbprint from the wizard, the on premise machine and windows azure role machine that need to speak to each other must have the same thumbprint. More on that when we install Windows Azure connect Endpoints on the on premise machine. As i said earlier, in this blog post, I’ll be showing you the manual process so i won’t be scripting any star up tasks to install the test agent or register the test agent with the TFS Server. I’ll be showing you all this cool stuff in the next blog post, that’s because it’s important to understand the manual side of it, it becomes easier for you to troubleshoot in case something fails. Having said that, the changes we have made are sufficient to spin up the Windows Azure Worker Role aka Test Agent VM, have it connected with the play.pit.com domain and have remote access enabled on it. Before we deploy the Test Agent VM we need to set up Windows Azure Connect on the TFS Server. II. Windows Azure Connect: Setting up Connect on VM – 2 i.e. TFS & Test Controller Glad you made it so far, now to enable communication between the on premise TFS/Test Controller and Azure-ed Test Agent we need to enable communication. We have set up the Azure connect module in the Test Agent configuration, now the connect end points need to be enabled on the on premise machines, let’s have a look at how we can do this. Log on to VM – 2 running the TFS Server and Test Controller Log on to the Windows Azure Management Portal and click on Virtual Network Click on Virtual Network, if you already have a subscription you should see the below screen shot, if not, you would be asked to complete the subscription first Click on Install Local Endpoints from the top left on the panel and you get a url appended with a token id in it, remember the token i showed you earlier, in theory the token you get here should match the token you added to the Test Agent config file. Copy the url to the clip board and paste it in IE explorer (important, the installation at present only works out of IE and you need to have cookies enabled in order to complete the installation). As stated in the pop up, you can NOT download and run the software later, you need to run it as is, since it contains a token. Once the installation completes you should see the Windows Azure connect icon in the system tray. Right click the Azure Connect icon, choose Diagnostics and refer to this link for diagnostic detail terminology. NOTE – Unfortunately I could not see the Windows Azure connect icon in the system tray, a bit of binging with Google revealed that the azure connect icon is only shown when the ‘Windows Azure Connect Endpoint’ Service is started. So go to services.msc and make sure that the service is started, if not start it, unfortunately again, the service did not start for me on a manual start and i realised that one of the dependant services was disabled, you can look at the service dependencies and start them and then start windows azure connect. Bottom line, you need to start Windows Azure connect service before you can proceed. Please refer here on MSDN for more on Troubleshooting Windows Azure connect. (Follow the next step as well) Now go back to the Windows Azure Management Portal and from Groups and Roles create a new group, lets call it ‘Test Rig’. Make sure you add the VM – 2 (the TFS Server VM where you just installed the endpoint). Now if you go back to the Azure Connect icon in the system tray and click ‘Refresh Policy’ you will notice that the disconnected status of the icon should change to ready for connection. III. Importing Certificate in to Windows Azure Management Portal But before that you need to import the certificate you created in Step I in to the Windows Azure Management Portal. Log on to the Windows Azure Management Portal and click on ‘Hosted Services, Storage Accounts & CDN’ and then ‘Management Certificates’ followed by Add Certificates as shown in the screen shot below Browse to the location where you saved the certificate earlier, remember… Refer to Step I in case you forgot. Now you should be able to see the imported certificate here, make sure the thumbprint of the certificate matches the one you inserted in the config files IV. Publish Windows Azure Worker Role aka Test Agent Having completed I, II and III, you are ready to publish the Test Agent VM – 3 to the cloud. Go to Visual Studio and right click the Windows Azure project and select Publish. Verify the infomration in the wizard, from the advanced settings tab, you can also enabled capture of intellitrace or profiling information. Click Next and Click Publish! From the view menu bar select the Windows Azure Activity Log window. Now you should be able to see the deployment progress in real time. In the Windows Azure Management Portal, you should also be able to see the progress of creation of a new Worker Role. Once the deployment is complete you should be able to RDP (go to run prompt type mstsc and in the pop up the machine name) in to the Test Agent Worker Role VM from the Playpit network using the domain admin user account. In case you are unable to log in to the Test Agent using the domain admin user account it means the process of joining the Test Agent to the domain has failed! But the good news is, because you imported the connect module, you can connect to the Test Agent machine using Windows Azure Management Portal and troubleshoot the reason for failure, you will be able to log in with the user name and password you specified in the config file for the keys ‘RemoteAccess.AccountUsername, RemoteAccess.EncryptedPassword (just that enter the password unencrypted)’, fix it or manually join the machine to the domain. Once you have managed to Join the Test Agent VM to the Domain move to the next step. So, log in to the Test Agent Worker Role VM with the Playpit Domain Administrator and verify that you can log in, the machine is connected to the domain and the connect service is successfully running. If yes, give your self a pat on the back, you are 80% mission accomplished! Go to the Windows Azure Management Portal and click on Virtual Network, click on Groups and Roles and click on Test Rig, click Edit Group, the edit the Test Rig group you created earlier. In the Connect to section, click on Add to select the worker role you have just deployed. Also, check the ‘Allow connections between endpoints in the group’ with this you will enable to communication between test controller and test agents and test agents/test agents. Click Save. Now, you are ready to deploy the Test Agent software on the Worker Role Test Agent VM and configure it to work with the Test Controller. V. Configuring VM – 3: Installing Test Agent and Associating Test Agent to Controller Log in to the Worker Role Test Agent VM that you have just successfully deployed, make sure you log in with the domain administrator account. Download the All Agents software from MSDN, ‘en_visual_studio_agents_2010_x86_x64_dvd_509679.iso’, extract the iso and navigate to where you have extracted the iso. In my case, i have extracted the iso to “C:\Resources\Temp\VsAgentSetup”. Open the Test Agent folder and double click on setup.exe. Once you have installed the Test Agent you should reach the configuration window. If you face any issues installing TFS Test Agent on the VM, refer to the walkthrough on MSDN. Once you have successfully installed the Test Agent software you will need to configure the test agent. Right click the test agent configuration tool and run as a different user. i.e. an Administrator. This is really to run the configuration wizard with elevated privileges (you might have UAC block something's otherwise). In the run options, you can select ‘service’ you do not need to run the agent as interactive un less you are running coded UI tests. I have specified the domain administrator to connect to the TFS Test Controller. In real life, i would never do that, i would create a separate test user service account for this purpose. But for the blog post, we are using the most powerful user so that any policies or restrictions don’t block you. Click the Apply Settings button and you should be all green! If not, the summary usually gives helpful error messages that you can resolve and proceed. As per my experience, you may run in to either a permission or a firewall blocking communication issue. And now the moment of truth! Go to VM –2 open up Visual Studio and from the Test Menu select Manage Test Controller Mission Accomplished! You should be able to see the Test Agent that you have just configured here, VI. Creating and Running Load Tests on your brand new Azure-ed Test Rig I have various blog posts on Performance Testing with Visual Studio Ultimate, you can follow the links and videos below, Blog Posts: - Part 1 – Performance Testing using Visual Studio 2010 Ultimate - Part 2 – Performance Testing using Visual Studio 2010 Ultimate - Part 3 – Performance Testing using Visual Studio 2010 Ultimate Videos: - Test Tools Configuration & Settings in Visual Studio - Why & How to Record Web Performance Tests in Visual Studio Ultimate - Goal Driven Load Testing using Visual Studio Ultimate Now that you have created your load tests, there is one last change you need to make before you can run the tests on your Azure Test Rig, create a new Test settings file, and change the Test Execution method to ‘Remote Execution’ and select the test controller you have configured the Worker Role Test Agent against in our case VM – 2 So, go on, fire off a test run and see the results of the test being executed on the Azur-ed Test Rig. Review and What’s next? A quick recap of the benefits of running the Test Rig in the cloud and what i will be covering in the next blog post AND I would love to hear your feedback! Advantages Utilizing the power of Azure compute to run a heavy virtual user load. Benefiting from the Azure flexibility, destroy Test Agents when not in use, takes < 25 minutes to spin up a new Test Agent. Most important test Network Latency, (network latency and speed of connection are two different things – usually network latency is very hard to test), by placing the Test Agents in Microsoft Data centres around the globe, one can actually test the lag in transferring the bytes not because of a slow connection but because the page has been requested from the other side of the globe. Next Steps The process of spinning up the Test Agents in windows Azure is not 100% automated. I am working on the Worker process and power shell scripts to make the role deployment, unattended install of test agent software and registration of the test agent to the test controller automated. In the next blog post I will show you how to make the complete process unattended and automated. Remember to subscribe to http://feeds.feedburner.com/TarunArora. Hope you enjoyed this post, I would love to hear your feedback! If you have any recommendations on things that I should consider or any questions or feedback, feel free to leave a comment. See you in Part III. Share this post : CodeProject

Read the article
How to find an entry-level job after you already have a graduate degree?

- by Uri

Note: I asked this question in early 2009. A couple of months later, I found a great job. I've previously updated this question with some tips for whoever ends up in a similar situation, and now cleaned it up a little for the benefit of the fresh batch of graduates. Original post: In my early 20s I abandoned a great C++ development career path in a major company to go to graduate school and get a research masters (3 years). I did another year in industrial research, and then moved to the US to attend graduate school again, getting another masters and a Ph.D in software engineering from a top school (another 6 years down the drain). I was coding the whole way throughout my degrees (core Java and Eclipse plug-ins) and working on research related to software engineering (usability of APIs). I ended up graduating the year of the recession, with a son on the way and the prospects of no healthcare. Academic jobs and industrial research jobs are quite scarce. Initially, I was naive, thinking that with my background, I could easily find a coding job. Big mistake. It turns out that I'm in a complicated position. Entry level positions are usually offered to college undergraduates. I attended my school's career fairs, but you could immediately see signs of Ph.D. aversion and overqualification issues. Some of the recruiters I spoke with explicitly told me that they wanted 20 year olds with clean slates, and some were looking for interns since they are in various forms of hiring freezes. I managed to get a couple of interviews from these career fairs and through recruiters. However, since I've been out of school for a long time and programming primarily in Java, I am also no longer proficient in C/C++ and the usual range of college-level interview questions that everyone uses. I had no problems with this when I was 19 and interviewing for my first job since a lot of what you do in C is manipulate pointers and I was coding C++ for fun and for school. Later I was routinely doing pointer manipulation on the job, and during my first masters taught college courses with data structures and C++. But even though I remember many properties of C++ well, it's been close to ten years since I regularly used C++ and pointers. As a Java developer I rarely had to work at this level, but experience in OOD and in writing good maintainable code is meaningless for C++ interviews. Reading books as a refresh and looking at sample code did not do the trick. I also looked at mid-to-senior level Java positions, but most of them focused on J2EE APIs rather than on core Java and required a certain number of years in industrial positions. Coding research tools and prior C++ experience doesn't count. So that sends me back to entry-level jobs that are posted through job-boards, and these are not common (mostly they are Monster junk), and small companies are even less likely to answer a Ph.D. compared to the giants who participate in top-10 career fairs. Even worse, in many companies initial screening is done by HR folks who really don't want to deal with anything anomalous like a Ph.D. Any tips on how I should approach this intractable position? For example, what should I write in cover letters? Note that while immigration is not an issue for me, I cannot go freelance as I need the benefits (and in particular group health insurance). During my studies I had no time to contribute to open-source projects or maintain a popular blog, so even if I invested in that now there would be no immediate benefit. Updates: In the two months after posting this I received several offers to work as a core Java developer in the financial industry and accepted one from a firm where I am working to this day. For those who find themselves in similar situations, here are my tips: Give up on trying to find an entry level positions. You can't undo time. Accept the fact that there is Ph.D. discrimination in the job market (some might say rightfully so). It is legal to discriminate based on education. No point fighting it. The most important tip is to focus on the language you are comfortable with. The sad truth about programming in a particular language is that it is not like riding a bike. If you haven't used a language in the last few years, and can't actually apply it routinely (not just as a refresher) before you start your search, it is going to be very difficult to do well in an interview. Now that I'm interviewing others, I routinely see it in folks with a mixed C++/Java background. We maintain "a shadow" of the old language but end up with a weird mix that makes it hard to interview on either. Entry-level folks are at an advantage here since they usually have one language. Memory can help you do great in a screening interview, but without recent day-to-day experience, code tests will be difficult. Despite the supposed relation, core Java programming and J2EE programming are two different things with different skillsets. If you come from academia, you likely have very little J2EE experience and may find it hard to get accepted for a J2EE job. J2EE jobs seem to have a larger list of acronyms in their requirements. In addition, from interviewing J2EE developers it seems that for many there is a focus on mastering specific APIs and architectures, whereas core Java development tends to be secondary. In the same way that I can no longer manipulate pointers well, a J2EE developer may have difficulties doing low level Java manipulation. This puts you at a relative advantage in competing for core Java jobs! If you are able to work for startups (in terms of family life and stability) or migrate to startup-rich areas such as the west coast, you can find many exciting opportunities where advanced degrees are a benefit. I've since been approached by several startups, although I had to decline. Work through a recruiter if possible. They have direct contacts with the hiring parties, allowing you to "stand out". It is better to get a clear yes/no confirmation from a recruiter on whether a company might be interested in interviewing you, than it is to send your resume and hope that someone will ever see it. Recruiters are also a great way of bypassing HR. However, also beware of recruiters. They have a vested interest and will go to various shady practices and pressure tactics. To find a good recruiter, talk to a friend who declined a job offer he got through a recruiter. A good recruiter, to me, is measured in how they handle that. Interview for the jobs that require your core strength. If you're rusty or entirely unfamiliar with a technology around which the job revolves, you're probably not a good match. Yes, you probably have the talent to master them, but most companies would want "instant gratification". I got my offers from companies that wanted core Java developer. I didn't do well on places that wanted advance C++ because I am too rusty and not up to date on recent libraries. I also didn't hear from companies that wanted lots of J2EE experience, and that's ok. Finding companies that want core Java without web is harder, but exists in specific industries (e.g., finance, defense). This requires a lot more legwork in terms of search, but these jobs do exist. There are different interview styles. Some companies focus on puzzles, some companies focus on algorithms, and some companies focus on design and coding skills. I had the most success in places where the questions were the most related to the function I would have been performing. Pick companies accordingly as well.

Read the article
Avoid Jquery Plugin Conflict

- by user1511579

on the same page i'm using this plugin: $g=jQuery.noConflict(); $g(function() { /* number of fieldsets */ var fieldsetCount = $g('#formElem').children().length; /* current position of fieldset / navigation link */ var current = 1; /* sum and save the widths of each one of the fieldsets set the final sum as the total width of the steps element */ var stepsWidth = 0; var widths = new Array(); $g('#steps .step').each(function(i){ var $step = $g(this); widths[i] = stepsWidth; stepsWidth += $step.width(); }); $g('#steps').width(stepsWidth); /* to avoid problems in IE, focus the first input of the form */ $g('#formElem').children(':first').find(':input:first').focus(); /* show the navigation bar */ $g('#navigation_form').show(); /* when clicking on a navigation link the form slides to the corresponding fieldset */ $g('#navigation_form a').bind('click',function(e){ var $this = $g(this); var prev = current; $this.closest('ul').find('li').removeClass('selected'); $this.parent().addClass('selected'); /* we store the position of the link in the current variable */ current = $this.parent().index() + 1; /* animate / slide to the next or to the corresponding fieldset. The order of the links in the navigation is the order of the fieldsets. Also, after sliding, we trigger the focus on the first input element of the new fieldset If we clicked on the last link (confirmation), then we validate all the fieldsets, otherwise we validate the previous one before the form slided */ $g('#steps').stop().animate({ marginLeft: '-' + widths[current-1] + 'px' },500,function(){ if(current == fieldsetCount) validateSteps(); else validateStep(prev); $g('#formElem').children(':nth-child('+ parseInt(current) +')').find(':input:first').focus(); }); e.preventDefault(); }); /* clicking on the tab (on the last input of each fieldset), makes the form slide to the next step */ $g('#formElem > fieldset').each(function(){ var $fieldset = $g(this); $fieldset.children(':last').find(':input').keydown(function(e){ if (e.which == 9){ $g('#navigation_form li:nth-child(' + (parseInt(current)+1) + ') a').click(); /* force the blur for validation */ $g(this).blur(); e.preventDefault(); } }); }); /* validates errors on all the fieldsets records if the Form has errors in $('#formElem').data() */ function validateSteps(){ var FormErrors = false; for(var i = 1; i < fieldsetCount; ++i){ var error = validateStep(i); if(error == -1) FormErrors = true; } $g('#formElem').data('errors',FormErrors); } /* validates one fieldset and returns -1 if errors found, or 1 if not */ function validateStep(step){ if(step == fieldsetCount) return; var error = 1; var hasError = false; $g('#formElem').children(':nth-child('+ parseInt(step) +')').find(':input:not(button)').each(function(){ var $this = $g(this); var valueLength = jQuery.trim($this.val()).length; if(valueLength == ''){ hasError = true; $this.css('background-color','#FFEDEF'); } else $this.css('background-color','#FFFFFF'); }); var $link = $g('#navigation_form li:nth-child(' + parseInt(step) + ') a'); $link.parent().find('.error,.checked').remove(); var valclass = 'checked'; if(hasError){ error = -1; valclass = 'error'; } $g('<span class="'+valclass+'"></span>').insertAfter($link); return error; } /* if there are errors don't allow the user to submit */ $g('#registerButton').bind('click',function(){ if($g('#formElem').data('errors')){ alert('Please correct the errors in the Form'); return false; } }); }); and this one: (function($){ $countCursos = 1; $countFormsA = 1; $countFormsB = 1; $.fn.addForms = function(idform){ var adicionar_curso = "<p>"+ " <label for='nome_curso'>Nome do Curso</label>"+ " <input id='nome_curso' name='nome_curso["+$countCursos+"]' type='text' />"+ " </p>"; var myform2 = "<table>"+ " <tr>"+ " <td>Field C</td>"+ " <td><input type='text' name='fieldc["+$countFormsA+"]'></td>"+ " <td>Field D ("+$countFormsA+"):</td>"+ " <td><textarea name='fieldd["+$countFormsA+"]'></textarea></td>"+ " <td><button>remove</button></td>"+ " </tr>"+ "</table>"; var myform3 = "<table>"+ " <tr>"+ " <td>Field C</td>"+ " <td><input type='text' name='fieldc["+$countFormsB+"]'></td>"+ " <td>Field D ("+$countFormsB+"):</td>"+ " <td><textarea name='fieldd["+$countFormsB+"]'></textarea></td>"+ " <td><button>remove</button></td>"+ " </tr>"+ "</table>"; if(idform=='novo_curso'){ alert(idform); adicionar_curso = $("<div>"+adicionar_curso+"</div>"); $("button", $(adicionar_curso)).click(function(){ $(this).parent().parent().remove(); }); $(this).append(adicionar_curso); $countCursos++; } if(idform=='mybutton1'){ alert(idform); myform2 = $("<div>"+myform2+"</div>"); $("button", $(myform2)).click(function(){ $(this).parent().parent().remove(); }); $(this).append(myform2); $countFormsA++; } if(idform=='mybutton2'){ alert(idform); myform3 = $("<div>"+myform3+"</div>"); $("button", $(myform3)).click(function(){ $(this).parent().parent().remove(); }); $(this).append(myform3); $countFormsB++; } }; })(jQuery); $(function(){ $("#mybutton1").bind("click", function(e){ e.preventDefault(); var idform=this.id; if($countFormsA<3){ $("#container1").addForms(idform); } }); }); $(function(){ $("#novo_curso").bind("click", function(e){ e.preventDefault(); var idform=this.id; alert(idform); if($countCursos<3){ $("#outro_curso").addForms(idform); } }); }); $(function(){ $("#mybutton2").bind("click", function(e){ e.preventDefault(); var idform=this.id; if($countFormsB<3){ $("#container2").addForms(idform); } }); }); My problem is the two are making conflict: I added previously the $g on the first to avoid conflict, but the truth is they don't work together, any hint how can i configure the second one to avoid this? Thanks in advance!

Read the article
Sendmail Failing to Forward Locally Addressed Mail to Exchange Server

- by DomainSoil

I've recently gained employment as a web developer with a small company. What they neglected to tell me upon hire was that I would be administrating the server along with my other daily duties. Now, truth be told, I'm not clueless when it comes to these things, but this is my first rodeo working with a rack server/console.. However, I'm confident that I will be able to work through any solutions you provide. Short Description: When a customer places an order via our (Magento CE 1.8.1.0) website, a copy of said order is supposed to be BCC'd to our sales manager. I say supposed because this was a working feature before the old administrator left. Long Description: Shortly after I started, we had a server crash which required a server restart. After restart, we noticed a few features on our site weren't working, but all those have been cleaned up except this one. I had to create an account on our server for root access. When a customer places an order, our sites software (Magento CE 1.8.1.0) is configured to BCC the customers order email to our sales manager. We use a Microsoft Exchange 2007 Server for our mail, which is hosted on a different machine (in-house) that I don't have access to ATM, but I'm sure I could if needed. As far as I can tell, all other external emails work.. Only INTERNAL email addresses fail to deliver. I know this because I've also tested my own internal address via our website. I set up an account with an internal email, made a test order, and never received the email. I changed my email for the account to an external GMail account, and received emails as expected. Let's dive into the logs and config's. For privacy/security reasons, names have been changed to the following: domain.com = Our Top Level Domain. email.local = Our Exchange Server. example.com = ANY other TLD. OLDadmin = Our previous Server Administrator. NEWadmin = Me. SALES@ = Our Sales Manager. Customer# = A Customer. Here's a list of the programs and config files used that hold relevant for this issue: Server: > [root@www ~]# cat /etc/centos-release CentOS release 6.3 (final) Sendmail: > [root@www ~]# sendmail -d0.1 -bt < /dev/null Version 8.14.4 ========SYSTEM IDENTITY (after readcf)======== (short domain name) $w = domain (canonical domain name) $j = domain.com (subdomain name) $m = com (node name) $k = www.domain.com > [root@www ~]# rpm -qa | grep -i sendmail sendmail-cf-8.14.4-8.e16.noarch sendmail-8.14-4-8.e16.x86_64 nslookup: > [root@www ~]# nslookup email.local Name: email.local Address: 192.168.1.50 hostname: > [root@www ~]# hostname www.domain.com /etc/mail/access: > [root@www ~]# vi /etc/mail/access Connect:localhost.localdomain RELAY Connect:localhost RELAY Connect:127.0.0.1 RELAY /etc/mail/domaintable: > [root@www ~]# vi /etc/mail/domaintable # /etc/mail/local-host-names: > [root@www ~]# vi /etc/mail/local-host-names # /etc/mail/mailertable: > [root@www ~]# vi /etc/mail/mailertable # /etc/mail/sendmail.cf: > [root@www ~]# vi /etc/mail/sendmail.cf ###################################################################### ##### ##### DO NOT EDIT THIS FILE! Only edit the source .mc file. ##### ###################################################################### ###################################################################### ##### $Id: cfhead.m4,v 8.120 2009/01/23 22:39:21 ca Exp $ ##### ##### $Id: cf.m4,v 8.32 1999/02/07 07:26:14 gshapiro Exp $ ##### ##### setup for linux ##### ##### $Id: linux.m4,v 8.13 2000/09/17 17:30:00 gshapiro Exp $ ##### ##### $Id: local_procmail.m4,v 8.22 2002/11/17 04:24:19 ca Exp $ ##### ##### $Id: no_default_msa.m4,v 8.2 2001/02/14 05:03:22 gshapiro Exp $ ##### ##### $Id: smrsh.m4,v 8.14 1999/11/18 05:06:23 ca Exp $ ##### ##### $Id: mailertable.m4,v 8.25 2002/06/27 23:23:57 gshapiro Exp $ ##### ##### $Id: virtusertable.m4,v 8.23 2002/06/27 23:23:57 gshapiro Exp $ ##### ##### $Id: redirect.m4,v 8.15 1999/08/06 01:47:36 gshapiro Exp $ ##### ##### $Id: always_add_domain.m4,v 8.11 2000/09/12 22:00:53 ca Exp $ ##### ##### $Id: use_cw_file.m4,v 8.11 2001/08/26 20:58:57 gshapiro Exp $ ##### ##### $Id: use_ct_file.m4,v 8.11 2001/08/26 20:58:57 gshapiro Exp $ ##### ##### $Id: local_procmail.m4,v 8.22 2002/11/17 04:24:19 ca Exp $ ##### ##### $Id: access_db.m4,v 8.27 2006/07/06 21:10:10 ca Exp $ ##### ##### $Id: blacklist_recipients.m4,v 8.13 1999/04/02 02:25:13 gshapiro Exp $ ##### ##### $Id: accept_unresolvable_domains.m4,v 8.10 1999/02/07 07:26:07 gshapiro Exp $ ##### ##### $Id: masquerade_envelope.m4,v 8.9 1999/02/07 07:26:10 gshapiro Exp $ ##### ##### $Id: masquerade_entire_domain.m4,v 8.9 1999/02/07 07:26:10 gshapiro Exp $ ##### ##### $Id: proto.m4,v 8.741 2009/12/11 00:04:53 ca Exp $ ##### # level 10 config file format V10/Berkeley # override file safeties - setting this option compromises system security, # addressing the actual file configuration problem is preferred # need to set this before any file actions are encountered in the cf file #O DontBlameSendmail=safe # default LDAP map specification # need to set this now before any LDAP maps are defined #O LDAPDefaultSpec=-h localhost ################## # local info # ################## # my LDAP cluster # need to set this before any LDAP lookups are done (including classes) #D{sendmailMTACluster}$m Cwlocalhost # file containing names of hosts for which we receive email Fw/etc/mail/local-host-names # my official domain name # ... define this only if sendmail cannot automatically determine your domain #Dj$w.Foo.COM # host/domain names ending with a token in class P are canonical CP. # "Smart" relay host (may be null) DSemail.local # operators that cannot be in local usernames (i.e., network indicators) CO @ % ! # a class with just dot (for identifying canonical names) C.. # a class with just a left bracket (for identifying domain literals) C[[ # access_db acceptance class C{Accept}OK RELAY C{ResOk}OKR # Hosts for which relaying is permitted ($=R) FR-o /etc/mail/relay-domains # arithmetic map Karith arith # macro storage map Kmacro macro # possible values for TLS_connection in access map C{Tls}VERIFY ENCR # who I send unqualified names to if FEATURE(stickyhost) is used # (null means deliver locally) DRemail.local. # who gets all local email traffic # ($R has precedence for unqualified names if FEATURE(stickyhost) is used) DHemail.local. # dequoting map Kdequote dequote # class E: names that should be exposed as from this host, even if we masquerade # class L: names that should be delivered locally, even if we have a relay # class M: domains that should be converted to $M # class N: domains that should not be converted to $M #CL root C{E}root C{w}localhost.localdomain C{M}domain.com # who I masquerade as (null for no masquerading) (see also $=M) DMdomain.com # my name for error messages DnMAILER-DAEMON # Mailer table (overriding domains) Kmailertable hash -o /etc/mail/mailertable.db # Virtual user table (maps incoming users) Kvirtuser hash -o /etc/mail/virtusertable.db CPREDIRECT # Access list database (for spam stomping) Kaccess hash -T<TMPF> -o /etc/mail/access.db # Configuration version number DZ8.14.4 /etc/mail/sendmail.mc: > [root@www ~]# vi /etc/mail/sendmail.mc divert(-1)dnl dnl # dnl # This is the sendmail macro config file for m4. If you make changes to dnl # /etc/mail/sendmail.mc, you will need to regenerate the dnl # /etc/mail/sendmail.cf file by confirming that the sendmail-cf package is dnl # installed and then performing a dnl # dnl # /etc/mail/make dnl # include(`/usr/share/sendmail-cf/m4/cf.m4')dnl VERSIONID(`setup for linux')dnl OSTYPE(`linux')dnl dnl # dnl # Do not advertize sendmail version. dnl # dnl define(`confSMTP_LOGIN_MSG', `$j Sendmail; $b')dnl dnl # dnl # default logging level is 9, you might want to set it higher to dnl # debug the configuration dnl # dnl define(`confLOG_LEVEL', `9')dnl dnl # dnl # Uncomment and edit the following line if your outgoing mail needs to dnl # be sent out through an external mail server: dnl # define(`SMART_HOST', `email.local')dnl dnl # define(`confDEF_USER_ID', ``8:12'')dnl dnl define(`confAUTO_REBUILD')dnl define(`confTO_CONNECT', `1m')dnl define(`confTRY_NULL_MX_LIST', `True')dnl define(`confDONT_PROBE_INTERFACES', `True')dnl define(`PROCMAIL_MAILER_PATH', `/usr/bin/procmail')dnl define(`ALIAS_FILE', `/etc/aliases')dnl define(`STATUS_FILE', `/var/log/mail/statistics')dnl define(`UUCP_MAILER_MAX', `2000000')dnl define(`confUSERDB_SPEC', `/etc/mail/userdb.db')dnl define(`confPRIVACY_FLAGS', `authwarnings,novrfy,noexpn,restrictqrun')dnl define(`confAUTH_OPTIONS', `A')dnl dnl # dnl # The following allows relaying if the user authenticates, and disallows dnl # plaintext authentication (PLAIN/LOGIN) on non-TLS links dnl # dnl define(`confAUTH_OPTIONS', `A p')dnl dnl # dnl # PLAIN is the preferred plaintext authentication method and used by dnl # Mozilla Mail and Evolution, though Outlook Express and other MUAs do dnl # use LOGIN. Other mechanisms should be used if the connection is not dnl # guaranteed secure. dnl # Please remember that saslauthd needs to be running for AUTH. dnl # dnl TRUST_AUTH_MECH(`EXTERNAL DIGEST-MD5 CRAM-MD5 LOGIN PLAIN')dnl dnl define(`confAUTH_MECHANISMS', `EXTERNAL GSSAPI DIGEST-MD5 CRAM-MD5 LOGIN PLAIN')dnl dnl # dnl # Rudimentary information on creating certificates for sendmail TLS: dnl # cd /etc/pki/tls/certs; make sendmail.pem dnl # Complete usage: dnl # make -C /etc/pki/tls/certs usage dnl # dnl define(`confCACERT_PATH', `/etc/pki/tls/certs')dnl dnl define(`confCACERT', `/etc/pki/tls/certs/ca-bundle.crt')dnl dnl define(`confSERVER_CERT', `/etc/pki/tls/certs/sendmail.pem')dnl dnl define(`confSERVER_KEY', `/etc/pki/tls/certs/sendmail.pem')dnl dnl # dnl # This allows sendmail to use a keyfile that is shared with OpenLDAP's dnl # slapd, which requires the file to be readble by group ldap dnl # dnl define(`confDONT_BLAME_SENDMAIL', `groupreadablekeyfile')dnl dnl # dnl define(`confTO_QUEUEWARN', `4h')dnl dnl define(`confTO_QUEUERETURN', `5d')dnl dnl define(`confQUEUE_LA', `12')dnl dnl define(`confREFUSE_LA', `18')dnl define(`confTO_IDENT', `0')dnl dnl FEATURE(delay_checks)dnl FEATURE(`no_default_msa', `dnl')dnl FEATURE(`smrsh', `/usr/sbin/smrsh')dnl FEATURE(`mailertable', `hash -o /etc/mail/mailertable.db')dnl FEATURE(`virtusertable', `hash -o /etc/mail/virtusertable.db')dnl FEATURE(redirect)dnl FEATURE(always_add_domain)dnl FEATURE(use_cw_file)dnl FEATURE(use_ct_file)dnl dnl # dnl # The following limits the number of processes sendmail can fork to accept dnl # incoming messages or process its message queues to 20.) sendmail refuses dnl # to accept connections once it has reached its quota of child processes. dnl # dnl define(`confMAX_DAEMON_CHILDREN', `20')dnl dnl # dnl # Limits the number of new connections per second. This caps the overhead dnl # incurred due to forking new sendmail processes. May be useful against dnl # DoS attacks or barrages of spam. (As mentioned below, a per-IP address dnl # limit would be useful but is not available as an option at this writing.) dnl # dnl define(`confCONNECTION_RATE_THROTTLE', `3')dnl dnl # dnl # The -t option will retry delivery if e.g. the user runs over his quota. dnl # FEATURE(local_procmail, `', `procmail -t -Y -a $h -d $u')dnl FEATURE(`access_db', `hash -T<TMPF> -o /etc/mail/access.db')dnl FEATURE(`blacklist_recipients')dnl EXPOSED_USER(`root')dnl dnl # dnl # For using Cyrus-IMAPd as POP3/IMAP server through LMTP delivery uncomment dnl # the following 2 definitions and activate below in the MAILER section the dnl # cyrusv2 mailer. dnl # dnl define(`confLOCAL_MAILER', `cyrusv2')dnl dnl define(`CYRUSV2_MAILER_ARGS', `FILE /var/lib/imap/socket/lmtp')dnl dnl # dnl # The following causes sendmail to only listen on the IPv4 loopback address dnl # 127.0.0.1 and not on any other network devices. Remove the loopback dnl # address restriction to accept email from the internet or intranet. dnl # DAEMON_OPTIONS(`Port=smtp,Addr=127.0.0.1, Name=MTA')dnl dnl # dnl # The following causes sendmail to additionally listen to port 587 for dnl # mail from MUAs that authenticate. Roaming users who can't reach their dnl # preferred sendmail daemon due to port 25 being blocked or redirected find dnl # this useful. dnl # dnl DAEMON_OPTIONS(`Port=submission, Name=MSA, M=Ea')dnl dnl # dnl # The following causes sendmail to additionally listen to port 465, but dnl # starting immediately in TLS mode upon connecting. Port 25 or 587 followed dnl # by STARTTLS is preferred, but roaming clients using Outlook Express can't dnl # do STARTTLS on ports other than 25. Mozilla Mail can ONLY use STARTTLS dnl # and doesn't support the deprecated smtps; Evolution <1.1.1 uses smtps dnl # when SSL is enabled-- STARTTLS support is available in version 1.1.1. dnl # dnl # For this to work your OpenSSL certificates must be configured. dnl # dnl DAEMON_OPTIONS(`Port=smtps, Name=TLSMTA, M=s')dnl dnl # dnl # The following causes sendmail to additionally listen on the IPv6 loopback dnl # device. Remove the loopback address restriction listen to the network. dnl # dnl DAEMON_OPTIONS(`port=smtp,Addr=::1, Name=MTA-v6, Family=inet6')dnl dnl # dnl # enable both ipv6 and ipv4 in sendmail: dnl # dnl DAEMON_OPTIONS(`Name=MTA-v4, Family=inet, Name=MTA-v6, Family=inet6') dnl # dnl # We strongly recommend not accepting unresolvable domains if you want to dnl # protect yourself from spam. However, the laptop and users on computers dnl # that do not have 24x7 DNS do need this. dnl # FEATURE(`accept_unresolvable_domains')dnl dnl # dnl FEATURE(`relay_based_on_MX')dnl dnl # dnl # Also accept email sent to "localhost.localdomain" as local email. dnl # LOCAL_DOMAIN(`localhost.localdomain')dnl dnl # dnl # The following example makes mail from this host and any additional dnl # specified domains appear to be sent from mydomain.com dnl # MASQUERADE_AS(`domain.com')dnl dnl # dnl # masquerade not just the headers, but the envelope as well dnl FEATURE(masquerade_envelope)dnl dnl # dnl # masquerade not just @mydomainalias.com, but @*.mydomainalias.com as well dnl # FEATURE(masquerade_entire_domain)dnl dnl # MASQUERADE_DOMAIN(domain.com)dnl dnl MASQUERADE_DOMAIN(localhost.localdomain)dnl dnl MASQUERADE_DOMAIN(mydomainalias.com)dnl dnl MASQUERADE_DOMAIN(mydomain.lan)dnl MAILER(smtp)dnl MAILER(procmail)dnl dnl MAILER(cyrusv2)dnl /etc/mail/trusted-users: > [root@www ~]# vi /etc/mail/trusted-users # /etc/mail/virtusertable: > [root@www ~]# vi /etc/mail/virtusertable [email protected] [email protected] [email protected] [email protected] /etc/hosts: > [root@www ~]# vi /etc/hosts 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 192.168.1.50 email.local I've only included the "local info" part of sendmail.cf, to save space. If there are any files that I've missed, please advise so I may produce them. Now that that's out of the way, lets look at some entries from /var/log/maillog. The first entry is from an order BEFORE the crash, when the site was working as expected. ##Order 200005374 Aug 5, 2014 7:06:38 AM## Aug 5 07:06:39 www sendmail[26149]: s75C6dqB026149: from=OLDadmin, size=11091, class=0, nrcpts=2, msgid=<[email protected]>, relay=OLDadmin@localhost Aug 5 07:06:39 www sendmail[26150]: s75C6dXe026150: from=<[email protected]>, size=11257, class=0, nrcpts=2, msgid=<[email protected]>, proto=ESMTP, daemon=MTA, relay=localhost.localdomain [127.0.0.1] Aug 5 07:06:39 www sendmail[26149]: s75C6dqB026149: [email protected],=?utf-8?B?dGhvbWFzICBHaWxsZXNwaWU=?= <[email protected]>, ctladdr=OLDadmin (501/501), delay=00:00:00, xdelay=00:00:00, mailer=relay, pri=71091, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (s75C6dXe026150 Message accepted for delivery) Aug 5 07:06:40 www sendmail[26152]: s75C6dXe026150: to=<[email protected]>,<[email protected]>, delay=00:00:01, xdelay=00:00:01, mailer=relay, pri=161257, relay=email.local. [192.168.1.50], dsn=2.0.0, stat=Sent ( <[email protected]> Queued mail for delivery) This next entry from maillog is from an order AFTER the crash. ##Order 200005375 Aug 5, 2014 9:45:25 AM## Aug 5 09:45:26 www sendmail[30021]: s75EjQ4O030021: from=OLDadmin, size=11344, class=0, nrcpts=2, msgid=<[email protected]>, relay=OLDadmin@localhost Aug 5 09:45:26 www sendmail[30022]: s75EjQm1030022: <[email protected]>... User unknown Aug 5 09:45:26 www sendmail[30021]: s75EjQ4O030021: [email protected], ctladdr=OLDadmin (501/501), delay=00:00:00, xdelay=00:00:00, mailer=relay, pri=71344, relay=[127.0.0.1] [127.0.0.1], dsn=5.1.1, stat=User unknown Aug 5 09:45:26 www sendmail[30022]: s75EjQm1030022: from=<[email protected]>, size=11500, class=0, nrcpts=1, msgid=<[email protected]>, proto=ESMTP, daemon=MTA, relay=localhost.localdomain [127.0.0.1] Aug 5 09:45:26 www sendmail[30021]: s75EjQ4O030021: to==?utf-8?B?S2VubmV0aCBCaWViZXI=?= <[email protected]>, ctladdr=OLDadmin (501/501), delay=00:00:00, xdelay=00:00:00, mailer=relay, pri=71344, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (s75EjQm1030022 Message accepted for delivery) Aug 5 09:45:26 www sendmail[30021]: s75EjQ4O030021: s75EjQ4P030021: DSN: User unknown Aug 5 09:45:26 www sendmail[30022]: s75EjQm3030022: <[email protected]>... User unknown Aug 5 09:45:26 www sendmail[30021]: s75EjQ4P030021: to=OLDadmin, delay=00:00:00, xdelay=00:00:00, mailer=relay, pri=42368, relay=[127.0.0.1] [127.0.0.1], dsn=5.1.1, stat=User unknown Aug 5 09:45:26 www sendmail[30022]: s75EjQm3030022: from=<>, size=12368, class=0, nrcpts=0, proto=ESMTP, daemon=MTA, relay=localhost.localdomain [127.0.0.1] Aug 5 09:45:26 www sendmail[30021]: s75EjQ4P030021: s75EjQ4Q030021: return to sender: User unknown Aug 5 09:45:26 www sendmail[30022]: s75EjQm5030022: from=<>, size=14845, class=0, nrcpts=1, msgid=<[email protected]>, proto=ESMTP, daemon=MTA, relay=localhost.localdomain [127.0.0.1] Aug 5 09:45:26 www sendmail[30021]: s75EjQ4Q030021: to=postmaster, delay=00:00:00, xdelay=00:00:00, mailer=relay, pri=43392, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (s75EjQm5030022 Message accepted for delivery) Aug 5 09:45:26 www sendmail[30025]: s75EjQm5030022: to=root, delay=00:00:00, xdelay=00:00:00, mailer=local, pri=45053, dsn=2.0.0, stat=Sent Aug 5 09:45:27 www sendmail[30024]: s75EjQm1030022: to=<[email protected]>, delay=00:00:01, xdelay=00:00:01, mailer=relay, pri=131500, relay=email.local. [192.168.1.50], dsn=2.0.0, stat=Sent ( <[email protected]> Queued mail for delivery) To add a little more, I think I've pinpointed the actual crash event. ##THE CRASH## Aug 5 09:39:46 www sendmail[3251]: restarting /usr/sbin/sendmail due to signal Aug 5 09:39:46 www sm-msp-queue[3260]: restarting /usr/sbin/sendmail due to signal Aug 5 09:39:46 www sm-msp-queue[29370]: starting daemon (8.14.4): queueing@01:00:00 Aug 5 09:39:47 www sendmail[29372]: starting daemon (8.14.4): SMTP+queueing@01:00:00 Aug 5 09:40:02 www sendmail[29465]: s75Ee2vT029465: Authentication-Warning: www.domain.com: OLDadmin set sender to root using -f Aug 5 09:40:02 www sendmail[29464]: s75Ee2IF029464: Authentication-Warning: www.domain.com: OLDadmin set sender to root using -f Aug 5 09:40:02 www sendmail[29465]: s75Ee2vT029465: from=root, size=1426, class=0, nrcpts=1, msgid=<[email protected]>, relay=OLDadmin@localhost Aug 5 09:40:02 www sendmail[29464]: s75Ee2IF029464: from=root, size=1426, class=0, nrcpts=1, msgid=<[email protected]>, relay=OLDadmin@localhost Aug 5 09:40:02 www sendmail[29466]: s75Ee23t029466: from=<[email protected]>, size=1784, class=0, nrcpts=1, msgid=<[email protected]>, proto=ESMTP, daemon=MTA, relay=localhost.localdomain [127.0.0.1] Aug 5 09:40:02 www sendmail[29466]: s75Ee23t029466: to=<[email protected]>, delay=00:00:00, mailer=local, pri=31784, dsn=4.4.3, stat=queued Aug 5 09:40:02 www sendmail[29467]: s75Ee2wh029467: from=<[email protected]>, size=1784, class=0, nrcpts=1, msgid=<[email protected]>, proto=ESMTP, daemon=MTA, relay=localhost.localdomain [127.0.0.1] Aug 5 09:40:02 www sendmail[29467]: s75Ee2wh029467: to=<[email protected]>, delay=00:00:00, mailer=local, pri=31784, dsn=4.4.3, stat=queued Aug 5 09:40:02 www sendmail[29464]: s75Ee2IF029464: to=OLDadmin, ctladdr=root (0/0), delay=00:00:00, xdelay=00:00:00, mailer=relay, pri=31426, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (s75Ee23t029466 Message accepted for delivery) Aug 5 09:40:02 www sendmail[29465]: s75Ee2vT029465: to=OLDadmin, ctladdr=root (0/0), delay=00:00:00, xdelay=00:00:00, mailer=relay, pri=31426, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (s75Ee2wh029467 Message accepted for delivery) Aug 5 09:40:06 www sm-msp-queue[29370]: restarting /usr/sbin/sendmail due to signal Aug 5 09:40:06 www sendmail[29372]: restarting /usr/sbin/sendmail due to signal Aug 5 09:40:06 www sm-msp-queue[29888]: starting daemon (8.14.4): queueing@01:00:00 Aug 5 09:40:06 www sendmail[29890]: starting daemon (8.14.4): SMTP+queueing@01:00:00 Aug 5 09:40:06 www sendmail[29891]: s75Ee23t029466: to=<[email protected]>, delay=00:00:04, mailer=local, pri=121784, dsn=5.1.1, stat=User unknown Aug 5 09:40:06 www sendmail[29891]: s75Ee23t029466: s75Ee6xY029891: DSN: User unknown Aug 5 09:40:06 www sendmail[29891]: s75Ee6xY029891: to=<[email protected]>, delay=00:00:00, xdelay=00:00:00, mailer=local, pri=33035, dsn=2.0.0, stat=Sent Aug 5 09:40:06 www sendmail[29891]: s75Ee2wh029467: to=<[email protected]>, delay=00:00:04, mailer=local, pri=121784, dsn=5.1.1, stat=User unknown Aug 5 09:40:06 www sendmail[29891]: s75Ee2wh029467: s75Ee6xZ029891: DSN: User unknown Aug 5 09:40:06 www sendmail[29891]: s75Ee6xZ029891: to=<[email protected]>, delay=00:00:00, xdelay=00:00:00, mailer=local, pri=33035, dsn=2.0.0, stat=Sent Something to note about the maillog's: Before the crash, the msgid included localhost.localdomain; after the crash it's been domain.com. Thanks to all who take the time to read and look into this issue. I appreciate it and look forward to tackling this issue together.

Read the article
Zen and the Art of File and Folder Organization

- by Mark Virtue

Is your desk a paragon of neatness, or does it look like a paper-bomb has gone off? If you’ve been putting off getting organized because the task is too huge or daunting, or you don’t know where to start, we’ve got 40 tips to get you on the path to zen mastery of your filing system. For all those readers who would like to get their files and folders organized, or, if they’re already organized, better organized—we have compiled a complete guide to getting organized and staying organized, a comprehensive article that will hopefully cover every possible tip you could want. Signs that Your Computer is Poorly Organized If your computer is a mess, you’re probably already aware of it. But just in case you’re not, here are some tell-tale signs: Your Desktop has over 40 icons on it “My Documents” contains over 300 files and 60 folders, including MP3s and digital photos You use the Windows’ built-in search facility whenever you need to find a file You can’t find programs in the out-of-control list of programs in your Start Menu You save all your Word documents in one folder, all your spreadsheets in a second folder, etc Any given file that you’re looking for may be in any one of four different sets of folders But before we start, here are some quick notes: We’re going to assume you know what files and folders are, and how to create, save, rename, copy and delete them The organization principles described in this article apply equally to all computer systems. However, the screenshots here will reflect how things look on Windows (usually Windows 7). We will also mention some useful features of Windows that can help you get organized. Everyone has their own favorite methodology of organizing and filing, and it’s all too easy to get into “My Way is Better than Your Way” arguments. The reality is that there is no perfect way of getting things organized. When I wrote this article, I tried to keep a generalist and objective viewpoint. I consider myself to be unusually well organized (to the point of obsession, truth be told), and I’ve had 25 years experience in collecting and organizing files on computers. So I’ve got a lot to say on the subject. But the tips I have described here are only one way of doing it. Hopefully some of these tips will work for you too, but please don’t read this as any sort of “right” way to do it. At the end of the article we’ll be asking you, the reader, for your own organization tips. Why Bother Organizing At All? For some, the answer to this question is self-evident. And yet, in this era of powerful desktop search software (the search capabilities built into the Windows Vista and Windows 7 Start Menus, and third-party programs like Google Desktop Search), the question does need to be asked, and answered. I have a friend who puts every file he ever creates, receives or downloads into his My Documents folder and doesn’t bother filing them into subfolders at all. He relies on the search functionality built into his Windows operating system to help him find whatever he’s looking for. And he always finds it. He’s a Search Samurai. For him, filing is a waste of valuable time that could be spent enjoying life! It’s tempting to follow suit. On the face of it, why would anyone bother to take the time to organize their hard disk when such excellent search software is available? Well, if all you ever want to do with the files you own is to locate and open them individually (for listening, editing, etc), then there’s no reason to ever bother doing one scrap of organization. But consider these common tasks that are not achievable with desktop search software: Find files manually. Often it’s not convenient, speedy or even possible to utilize your desktop search software to find what you want. It doesn’t work 100% of the time, or you may not even have it installed. Sometimes its just plain faster to go straight to the file you want, if you know it’s in a particular sub-folder, rather than trawling through hundreds of search results. Find groups of similar files (e.g. all your “work” files, all the photos of your Europe holiday in 2008, all your music videos, all the MP3s from Dark Side of the Moon, all your letters you wrote to your wife, all your tax returns). Clever naming of the files will only get you so far. Sometimes it’s the date the file was created that’s important, other times it’s the file format, and other times it’s the purpose of the file. How do you name a collection of files so that they’re easy to isolate based on any of the above criteria? Short answer, you can’t. Move files to a new computer. It’s time to upgrade your computer. How do you quickly grab all the files that are important to you? Or you decide to have two computers now – one for home and one for work. How do you quickly isolate only the work-related files to move them to the work computer? Synchronize files to other computers. If you have more than one computer, and you need to mirror some of your files onto the other computer (e.g. your music collection), then you need a way to quickly determine which files are to be synced and which are not. Surely you don’t want to synchronize everything? Choose which files to back up. If your backup regime calls for multiple backups, or requires speedy backups, then you’ll need to be able to specify which files are to be backed up, and which are not. This is not possible if they’re all in the same folder. Finally, if you’re simply someone who takes pleasure in being organized, tidy and ordered (me! me!), then you don’t even need a reason. Being disorganized is simply unthinkable. Tips on Getting Organized Here we present our 40 best tips on how to get organized. Or, if you’re already organized, to get better organized. Tip #1. Choose Your Organization System Carefully The reason that most people are not organized is that it takes time. And the first thing that takes time is deciding upon a system of organization. This is always a matter of personal preference, and is not something that a geek on a website can tell you. You should always choose your own system, based on how your own brain is organized (which makes the assumption that your brain is, in fact, organized). We can’t instruct you, but we can make suggestions: You may want to start off with a system based on the users of the computer. i.e. “My Files”, “My Wife’s Files”, My Son’s Files”, etc. Inside “My Files”, you might then break it down into “Personal” and “Business”. You may then realize that there are overlaps. For example, everyone may want to share access to the music library, or the photos from the school play. So you may create another folder called “Family”, for the “common” files. You may decide that the highest-level breakdown of your files is based on the “source” of each file. In other words, who created the files. You could have “Files created by ME (business or personal)”, “Files created by people I know (family, friends, etc)”, and finally “Files created by the rest of the world (MP3 music files, downloaded or ripped movies or TV shows, software installation files, gorgeous desktop wallpaper images you’ve collected, etc).” This system happens to be the one I use myself. See below: Mark is for files created by meVC is for files created by my company (Virtual Creations)Others is for files created by my friends and familyData is the rest of the worldAlso, Settings is where I store the configuration files and other program data files for my installed software (more on this in tip #34, below). Each folder will present its own particular set of requirements for further sub-organization. For example, you may decide to organize your music collection into sub-folders based on the artist’s name, while your digital photos might get organized based on the date they were taken. It can be different for every sub-folder! Another strategy would be based on “currentness”. Files you have yet to open and look at live in one folder. Ones that have been looked at but not yet filed live in another place. Current, active projects live in yet another place. All other files (your “archive”, if you like) would live in a fourth folder. (And of course, within that last folder you’d need to create a further sub-system based on one of the previous bullet points). Put some thought into this – changing it when it proves incomplete can be a big hassle! Before you go to the trouble of implementing any system you come up with, examine a wide cross-section of the files you own and see if they will all be able to find a nice logical place to sit within your system. Tip #2. When You Decide on Your System, Stick to It! There’s nothing more pointless than going to all the trouble of creating a system and filing all your files, and then whenever you create, receive or download a new file, you simply dump it onto your Desktop. You need to be disciplined – forever! Every new file you get, spend those extra few seconds to file it where it belongs! Otherwise, in just a month or two, you’ll be worse off than before – half your files will be organized and half will be disorganized – and you won’t know which is which! Tip #3. Choose the Root Folder of Your Structure Carefully Every data file (document, photo, music file, etc) that you create, own or is important to you, no matter where it came from, should be found within one single folder, and that one single folder should be located at the root of your C: drive (as a sub-folder of C:\). In other words, do not base your folder structure in standard folders like “My Documents”. If you do, then you’re leaving it up to the operating system engineers to decide what folder structure is best for you. And every operating system has a different system! In Windows 7 your files are found in C:\Users\YourName, whilst on Windows XP it was C:\Documents and Settings\YourName\My Documents. In UNIX systems it’s often /home/YourName. These standard default folders tend to fill up with junk files and folders that are not at all important to you. “My Documents” is the worst offender. Every second piece of software you install, it seems, likes to create its own folder in the “My Documents” folder. These folders usually don’t fit within your organizational structure, so don’t use them! In fact, don’t even use the “My Documents” folder at all. Allow it to fill up with junk, and then simply ignore it. It sounds heretical, but: Don’t ever visit your “My Documents” folder! Remove your icons/links to “My Documents” and replace them with links to the folders you created and you care about! Create your own file system from scratch! Probably the best place to put it would be on your D: drive – if you have one. This way, all your files live on one drive, while all the operating system and software component files live on the C: drive – simply and elegantly separated. The benefits of that are profound. Not only are there obvious organizational benefits (see tip #10, below), but when it comes to migrate your data to a new computer, you can (sometimes) simply unplug your D: drive and plug it in as the D: drive of your new computer (this implies that the D: drive is actually a separate physical disk, and not a partition on the same disk as C:). You also get a slight speed improvement (again, only if your C: and D: drives are on separate physical disks). Warning: From tip #12, below, you will see that it’s actually a good idea to have exactly the same file system structure – including the drive it’s filed on – on all of the computers you own. So if you decide to use the D: drive as the storage system for your own files, make sure you are able to use the D: drive on all the computers you own. If you can’t ensure that, then you can still use a clever geeky trick to store your files on the D: drive, but still access them all via the C: drive (see tip #17, below). If you only have one hard disk (C:), then create a dedicated folder that will contain all your files – something like C:\Files. The name of the folder is not important, but make it a single, brief word. There are several reasons for this: When creating a backup regime, it’s easy to decide what files should be backed up – they’re all in the one folder! If you ever decide to trade in your computer for a new one, you know exactly which files to migrate You will always know where to begin a search for any file If you synchronize files with other computers, it makes your synchronization routines very simple. It also causes all your shortcuts to continue to work on the other machines (more about this in tip #24, below). Once you’ve decided where your files should go, then put all your files in there – Everything! Completely disregard the standard, default folders that are created for you by the operating system (“My Music”, “My Pictures”, etc). In fact, you can actually relocate many of those folders into your own structure (more about that below, in tip #6). The more completely you get all your data files (documents, photos, music, etc) and all your configuration settings into that one folder, then the easier it will be to perform all of the above tasks. Once this has been done, and all your files live in one folder, all the other folders in C:\ can be thought of as “operating system” folders, and therefore of little day-to-day interest for us. Here’s a screenshot of a nicely organized C: drive, where all user files are located within the \Files folder: Tip #4. Use Sub-Folders This would be our simplest and most obvious tip. It almost goes without saying. Any organizational system you decide upon (see tip #1) will require that you create sub-folders for your files. Get used to creating folders on a regular basis. Tip #5. Don’t be Shy About Depth Create as many levels of sub-folders as you need. Don’t be scared to do so. Every time you notice an opportunity to group a set of related files into a sub-folder, do so. Examples might include: All the MP3s from one music CD, all the photos from one holiday, or all the documents from one client. It’s perfectly okay to put files into a folder called C:\Files\Me\From Others\Services\WestCo Bank\Statements\2009. That’s only seven levels deep. Ten levels is not uncommon. Of course, it’s possible to take this too far. If you notice yourself creating a sub-folder to hold only one file, then you’ve probably become a little over-zealous. On the other hand, if you simply create a structure with only two levels (for example C:\Files\Work) then you really haven’t achieved any level of organization at all (unless you own only six files!). Your “Work” folder will have become a dumping ground, just like your Desktop was, with most likely hundreds of files in it. Tip #6. Move the Standard User Folders into Your Own Folder Structure Most operating systems, including Windows, create a set of standard folders for each of its users. These folders then become the default location for files such as documents, music files, digital photos and downloaded Internet files. In Windows 7, the full list is shown below: Some of these folders you may never use nor care about (for example, the Favorites folder, if you’re not using Internet Explorer as your browser). Those ones you can leave where they are. But you may be using some of the other folders to store files that are important to you. Even if you’re not using them, Windows will still often treat them as the default storage location for many types of files. When you go to save a standard file type, it can become annoying to be automatically prompted to save it in a folder that’s not part of your own file structure. But there’s a simple solution: Move the folders you care about into your own folder structure! If you do, then the next time you go to save a file of the corresponding type, Windows will prompt you to save it in the new, moved location. Moving the folders is easy. Simply drag-and-drop them to the new location. Here’s a screenshot of the default My Music folder being moved to my custom personal folder (Mark): Tip #7. Name Files and Folders Intelligently This is another one that almost goes without saying, but we’ll say it anyway: Do not allow files to be created that have meaningless names like Document1.doc, or folders called New Folder (2). Take that extra 20 seconds and come up with a meaningful name for the file/folder – one that accurately divulges its contents without repeating the entire contents in the name. Tip #8. Watch Out for Long Filenames Another way to tell if you have not yet created enough depth to your folder hierarchy is that your files often require really long names. If you need to call a file Johnson Sales Figures March 2009.xls (which might happen to live in the same folder as Abercrombie Budget Report 2008.xls), then you might want to create some sub-folders so that the first file could be simply called March.xls, and living in the Clients\Johnson\Sales Figures\2009 folder. A well-placed file needs only a brief filename! Tip #9. Use Shortcuts! Everywhere! This is probably the single most useful and important tip we can offer. A shortcut allows a file to be in two places at once. Why would you want that? Well, the file and folder structure of every popular operating system on the market today is hierarchical. This means that all objects (files and folders) always live within exactly one parent folder. It’s a bit like a tree. A tree has branches (folders) and leaves (files). Each leaf, and each branch, is supported by exactly one parent branch, all the way back to the root of the tree (which, incidentally, is exactly why C:\ is called the “root folder” of the C: drive). That hard disks are structured this way may seem obvious and even necessary, but it’s only one way of organizing data. There are others: Relational databases, for example, organize structured data entirely differently. The main limitation of hierarchical filing structures is that a file can only ever be in one branch of the tree – in only one folder – at a time. Why is this a problem? Well, there are two main reasons why this limitation is a problem for computer users: The “correct” place for a file, according to our organizational rationale, is very often a very inconvenient place for that file to be located. Just because it’s correctly filed doesn’t mean it’s easy to get to. Your file may be “correctly” buried six levels deep in your sub-folder structure, but you may need regular and speedy access to this file every day. You could always move it to a more convenient location, but that would mean that you would need to re-file back to its “correct” location it every time you’d finished working on it. Most unsatisfactory. A file may simply “belong” in two or more different locations within your file structure. For example, say you’re an accountant and you have just completed the 2009 tax return for John Smith. It might make sense to you to call this file 2009 Tax Return.doc and file it under Clients\John Smith. But it may also be important to you to have the 2009 tax returns from all your clients together in the one place. So you might also want to call the file John Smith.doc and file it under Tax Returns\2009. The problem is, in a purely hierarchical filing system, you can’t put it in both places. Grrrrr! Fortunately, Windows (and most other operating systems) offers a way for you to do exactly that: It’s called a “shortcut” (also known as an “alias” on Macs and a “symbolic link” on UNIX systems). Shortcuts allow a file to exist in one place, and an icon that represents the file to be created and put anywhere else you please. In fact, you can create a dozen such icons and scatter them all over your hard disk. Double-clicking on one of these icons/shortcuts opens up the original file, just as if you had double-clicked on the original file itself. Consider the following two icons: The one on the left is the actual Word document, while the one on the right is a shortcut that represents the Word document. Double-clicking on either icon will open the same file. There are two main visual differences between the icons: The shortcut will have a small arrow in the lower-left-hand corner (on Windows, anyway) The shortcut is allowed to have a name that does not include the file extension (the “.docx” part, in this case) You can delete the shortcut at any time without losing any actual data. The original is still intact. All you lose is the ability to get to that data from wherever the shortcut was. So why are shortcuts so great? Because they allow us to easily overcome the main limitation of hierarchical file systems, and put a file in two (or more) places at the same time. You will always have files that don’t play nice with your organizational rationale, and can’t be filed in only one place. They demand to exist in two places. Shortcuts allow this! Furthermore, they allow you to collect your most often-opened files and folders together in one spot for convenient access. The cool part is that the original files stay where they are, safe forever in their perfectly organized location. So your collection of most often-opened files can – and should – become a collection of shortcuts! If you’re still not convinced of the utility of shortcuts, consider the following well-known areas of a typical Windows computer: The Start Menu (and all the programs that live within it) The Quick Launch bar (or the Superbar in Windows 7) The “Favorite folders” area in the top-left corner of the Windows Explorer window (in Windows Vista or Windows 7) Your Internet Explorer Favorites or Firefox Bookmarks Each item in each of these areas is a shortcut! Each of those areas exist for one purpose only: For convenience – to provide you with a collection of the files and folders you access most often. It should be easy to see by now that shortcuts are designed for one single purpose: To make accessing your files more convenient. Each time you double-click on a shortcut, you are saved the hassle of locating the file (or folder, or program, or drive, or control panel icon) that it represents. Shortcuts allow us to invent a golden rule of file and folder organization: “Only ever have one copy of a file – never have two copies of the same file. Use a shortcut instead” (this rule doesn’t apply to copies created for backup purposes, of course!) There are also lesser rules, like “don’t move a file into your work area – create a shortcut there instead”, and “any time you find yourself frustrated with how long it takes to locate a file, create a shortcut to it and place that shortcut in a convenient location.” So how to we create these massively useful shortcuts? There are two main ways: “Copy” the original file or folder (click on it and type Ctrl-C, or right-click on it and select Copy): Then right-click in an empty area of the destination folder (the place where you want the shortcut to go) and select Paste shortcut: Right-drag (drag with the right mouse button) the file from the source folder to the destination folder. When you let go of the mouse button at the destination folder, a menu pops up: Select Create shortcuts here. Note that when shortcuts are created, they are often named something like Shortcut to Budget Detail.doc (windows XP) or Budget Detail – Shortcut.doc (Windows 7). If you don’t like those extra words, you can easily rename the shortcuts after they’re created, or you can configure Windows to never insert the extra words in the first place (see our article on how to do this). And of course, you can create shortcuts to folders too, not just to files! Bottom line: Whenever you have a file that you’d like to access from somewhere else (whether it’s convenience you’re after, or because the file simply belongs in two places), create a shortcut to the original file in the new location. Tip #10. Separate Application Files from Data Files Any digital organization guru will drum this rule into you. Application files are the components of the software you’ve installed (e.g. Microsoft Word, Adobe Photoshop or Internet Explorer). Data files are the files that you’ve created for yourself using that software (e.g. Word Documents, digital photos, emails or playlists). Software gets installed, uninstalled and upgraded all the time. Hopefully you always have the original installation media (or downloaded set-up file) kept somewhere safe, and can thus reinstall your software at any time. This means that the software component files are of little importance. Whereas the files you have created with that software is, by definition, important. It’s a good rule to always separate unimportant files from important files. So when your software prompts you to save a file you’ve just created, take a moment and check out where it’s suggesting that you save the file. If it’s suggesting that you save the file into the same folder as the software itself, then definitely don’t follow that suggestion. File it in your own folder! In fact, see if you can find the program’s configuration option that determines where files are saved by default (if it has one), and change it. Tip #11. Organize Files Based on Purpose, Not on File Type If you have, for example a folder called Work\Clients\Johnson, and within that folder you have two sub-folders, Word Documents and Spreadsheets (in other words, you’re separating “.doc” files from “.xls” files), then chances are that you’re not optimally organized. It makes little sense to organize your files based on the program that created them. Instead, create your sub-folders based on the purpose of the file. For example, it would make more sense to create sub-folders called Correspondence and Financials. It may well be that all the files in a given sub-folder are of the same file-type, but this should be more of a coincidence and less of a design feature of your organization system. Tip #12. Maintain the Same Folder Structure on All Your Computers In other words, whatever organizational system you create, apply it to every computer that you can. There are several benefits to this: There’s less to remember. No matter where you are, you always know where to look for your files If you copy or synchronize files from one computer to another, then setting up the synchronization job becomes very simple Shortcuts can be copied or moved from one computer to another with ease (assuming the original files are also copied/moved). There’s no need to find the target of the shortcut all over again on the second computer Ditto for linked files (e.g Word documents that link to data in a separate Excel file), playlists, and any files that reference the exact file locations of other files. This applies even to the drive that your files are stored on. If your files are stored on C: on one computer, make sure they’re stored on C: on all your computers. Otherwise all your shortcuts, playlists and linked files will stop working! Tip #13. Create an “Inbox” Folder Create yourself a folder where you store all files that you’re currently working on, or that you haven’t gotten around to filing yet. You can think of this folder as your “to-do” list. You can call it “Inbox” (making it the same metaphor as your email system), or “Work”, or “To-Do”, or “Scratch”, or whatever name makes sense to you. It doesn’t matter what you call it – just make sure you have one! Once you have finished working on a file, you then move it from the “Inbox” to its correct location within your organizational structure. You may want to use your Desktop as this “Inbox” folder. Rightly or wrongly, most people do. It’s not a bad place to put such files, but be careful: If you do decide that your Desktop represents your “to-do” list, then make sure that no other files find their way there. In other words, make sure that your “Inbox”, wherever it is, Desktop or otherwise, is kept free of junk – stray files that don’t belong there. So where should you put this folder, which, almost by definition, lives outside the structure of the rest of your filing system? Well, first and foremost, it has to be somewhere handy. This will be one of your most-visited folders, so convenience is key. Putting it on the Desktop is a great option – especially if you don’t have any other folders on your Desktop: the folder then becomes supremely easy to find in Windows Explorer: You would then create shortcuts to this folder in convenient spots all over your computer (“Favorite Links”, “Quick Launch”, etc). Tip #14. Ensure You have Only One “Inbox” Folder Once you’ve created your “Inbox” folder, don’t use any other folder location as your “to-do list”. Throw every incoming or created file into the Inbox folder as you create/receive it. This keeps the rest of your computer pristine and free of randomly created or downloaded junk. The last thing you want to be doing is checking multiple folders to see all your current tasks and projects. Gather them all together into one folder. Here are some tips to help ensure you only have one Inbox: Set the default “save” location of all your programs to this folder. Set the default “download” location for your browser to this folder. If this folder is not your desktop (recommended) then also see if you can make a point of not putting “to-do” files on your desktop. This keeps your desktop uncluttered and Zen-like: (the Inbox folder is in the bottom-right corner) Tip #15. Be Vigilant about Clearing Your “Inbox” Folder This is one of the keys to staying organized. If you let your “Inbox” overflow (i.e. allow there to be more than, say, 30 files or folders in there), then you’re probably going to start feeling like you’re overwhelmed: You’re not keeping up with your to-do list. Once your Inbox gets beyond a certain point (around 30 files, studies have shown), then you’ll simply start to avoid it. You may continue to put files in there, but you’ll be scared to look at it, fearing the “out of control” feeling that all overworked, chaotic or just plain disorganized people regularly feel. So, here’s what you can do: Visit your Inbox/to-do folder regularly (at least five times per day). Scan the folder regularly for files that you have completed working on and are ready for filing. File them immediately. Make it a source of pride to keep the number of files in this folder as small as possible. If you value peace of mind, then make the emptiness of this folder one of your highest (computer) priorities If you know that a particular file has been in the folder for more than, say, six weeks, then admit that you’re not actually going to get around to processing it, and move it to its final resting place. Tip #16. File Everything Immediately, and Use Shortcuts for Your Active Projects As soon as you create, receive or download a new file, store it away in its “correct” folder immediately. Then, whenever you need to work on it (possibly straight away), create a shortcut to it in your “Inbox” (“to-do”) folder or your desktop. That way, all your files are always in their “correct” locations, yet you still have immediate, convenient access to your current, active files. When you finish working on a file, simply delete the shortcut. Ideally, your “Inbox” folder – and your Desktop – should contain no actual files or folders. They should simply contain shortcuts. Tip #17. Use Directory Symbolic Links (or Junctions) to Maintain One Unified Folder Structure Using this tip, we can get around a potential hiccup that we can run into when creating our organizational structure – the issue of having more than one drive on our computer (C:, D:, etc). We might have files we need to store on the D: drive for space reasons, and yet want to base our organized folder structure on the C: drive (or vice-versa). Your chosen organizational structure may dictate that all your files must be accessed from the C: drive (for example, the root folder of all your files may be something like C:\Files). And yet you may still have a D: drive and wish to take advantage of the hundreds of spare Gigabytes that it offers. Did you know that it’s actually possible to store your files on the D: drive and yet access them as if they were on the C: drive? And no, we’re not talking about shortcuts here (although the concept is very similar). By using the shell command mklink, you can essentially take a folder that lives on one drive and create an alias for it on a different drive (you can do lots more than that with mklink – for a full rundown on this programs capabilities, see our dedicated article). These aliases are called directory symbolic links (and used to be known as junctions). You can think of them as “virtual” folders. They function exactly like regular folders, except they’re physically located somewhere else. For example, you may decide that your entire D: drive contains your complete organizational file structure, but that you need to reference all those files as if they were on the C: drive, under C:\Files. If that was the case you could create C:\Files as a directory symbolic link – a link to D:, as follows: mklink /d c:\files d:\ Or it may be that the only files you wish to store on the D: drive are your movie collection. You could locate all your movie files in the root of your D: drive, and then link it to C:\Files\Media\Movies, as follows: mklink /d c:\files\media\movies d:\ (Needless to say, you must run these commands from a command prompt – click the Start button, type cmd and press Enter) Tip #18. Customize Your Folder Icons This is not strictly speaking an organizational tip, but having unique icons for each folder does allow you to more quickly visually identify which folder is which, and thus saves you time when you’re finding files. An example is below (from my folder that contains all files downloaded from the Internet): To learn how to change your folder icons, please refer to our dedicated article on the subject. Tip #19. Tidy Your Start Menu The Windows Start Menu is usually one of the messiest parts of any Windows computer. Every program you install seems to adopt a completely different approach to placing icons in this menu. Some simply put a single program icon. Others create a folder based on the name of the software. And others create a folder based on the name of the software manufacturer. It’s chaos, and can make it hard to find the software you want to run. Thankfully we can avoid this chaos with useful operating system features like Quick Launch, the Superbar or pinned start menu items. Even so, it would make a lot of sense to get into the guts of the Start Menu itself and give it a good once-over. All you really need to decide is how you’re going to organize your applications. A structure based on the purpose of the application is an obvious candidate. Below is an example of one such structure: In this structure, Utilities means software whose job it is to keep the computer itself running smoothly (configuration tools, backup software, Zip programs, etc). Applications refers to any productivity software that doesn’t fit under the headings Multimedia, Graphics, Internet, etc. In case you’re not aware, every icon in your Start Menu is a shortcut and can be manipulated like any other shortcut (copied, moved, deleted, etc). With the Windows Start Menu (all version of Windows), Microsoft has decided that there be two parallel folder structures to store your Start Menu shortcuts. One for you (the logged-in user of the computer) and one for all users of the computer. Having two parallel structures can often be redundant: If you are the only user of the computer, then having two parallel structures is totally redundant. Even if you have several users that regularly log into the computer, most of your installed software will need to be made available to all users, and should thus be moved out of the “just you” version of the Start Menu and into the “all users” area. To take control of your Start Menu, so you can start organizing it, you’ll need to know how to access the actual folders and shortcut files that make up the Start Menu (both versions of it). To find these folders and files, click the Start button and then right-click on the All Programs text (Windows XP users should right-click on the Start button itself): The Open option refers to the “just you” version of the Start Menu, while the Open All Users option refers to the “all users” version. Click on the one you want to organize. A Windows Explorer window then opens with your chosen version of the Start Menu selected. From there it’s easy. Double-click on the Programs folder and you’ll see all your folders and shortcuts. Now you can delete/rename/move until it’s just the way you want it. Note: When you’re reorganizing your Start Menu, you may want to have two Explorer windows open at the same time – one showing the “just you” version and one showing the “all users” version. You can drag-and-drop between the windows. Tip #20. Keep Your Start Menu Tidy Once you have a perfectly organized Start Menu, try to be a little vigilant about keeping it that way. Every time you install a new piece of software, the icons that get created will almost certainly violate your organizational structure. So to keep your Start Menu pristine and organized, make sure you do the following whenever you install a new piece of software: Check whether the software was installed into the “just you” area of the Start Menu, or the “all users” area, and then move it to the correct area. Remove all the unnecessary icons (like the “Read me” icon, the “Help” icon (you can always open the help from within the software itself when it’s running), the “Uninstall” icon, the link(s)to the manufacturer’s website, etc) Rename the main icon(s) of the software to something brief that makes sense to you. For example, you might like to rename Microsoft Office Word 2010 to simply Word Move the icon(s) into the correct folder based on your Start Menu organizational structure And don’t forget: when you uninstall a piece of software, the software’s uninstall routine is no longer going to be able to remove the software’s icon from the Start Menu (because you moved and/or renamed it), so you’ll need to remove that icon manually. Tip #21. Tidy C:\ The root of your C: drive (C:\) is a common dumping ground for files and folders – both by the users of your computer and by the software that you install on your computer. It can become a mess. There’s almost no software these days that requires itself to be installed in C:\. 99% of the time it can and should be installed into C:\Program Files. And as for your own files, well, it’s clear that they can (and almost always should) be stored somewhere else. In an ideal world, your C:\ folder should look like this (on Windows 7): Note that there are some system files and folders in C:\ that are usually and deliberately “hidden” (such as the Windows virtual memory file pagefile.sys, the boot loader file bootmgr, and the System Volume Information folder). Hiding these files and folders is a good idea, as they need to stay where they are and are almost never needed to be opened or even seen by you, the user. Hiding them prevents you from accidentally messing with them, and enhances your sense of order and well-being when you look at your C: drive folder. Tip #22. Tidy Your Desktop The Desktop is probably the most abused part of a Windows computer (from an organization point of view). It usually serves as a dumping ground for all incoming files, as well as holding icons to oft-used applications, plus some regularly opened files and folders. It often ends up becoming an uncontrolled mess. See if you can avoid this. Here’s why… Application icons (Word, Internet Explorer, etc) are often found on the Desktop, but it’s unlikely that this is the optimum place for them. The “Quick Launch” bar (or the Superbar in Windows 7) is always visible and so represents a perfect location to put your icons. You’ll only be able to see the icons on your Desktop when all your programs are minimized. It might be time to get your application icons off your desktop… You may have decided that the Inbox/To-do folder on your computer (see tip #13, above) should be your Desktop. If so, then enough said. Simply be vigilant about clearing it and preventing it from being polluted by junk files (see tip #15, above). On the other hand, if your Desktop is not acting as your “Inbox” folder, then there’s no reason for it to have any data files or folders on it at all, except perhaps a couple of shortcuts to often-opened files and folders (either ongoing or current projects). Everything else should be moved to your “Inbox” folder. In an ideal world, it might look like this: Tip #23. Move Permanent Items on Your Desktop Away from the Top-Left Corner When files/folders are dragged onto your desktop in a Windows Explorer window, or when shortcuts are created on your Desktop from Internet Explorer, those icons are always placed in the top-left corner – or as close as they can get. If you have other files, folders or shortcuts that you keep on the Desktop permanently, then it’s a good idea to separate these permanent icons from the transient ones, so that you can quickly identify which ones the transients are. An easy way to do this is to move all your permanent icons to the right-hand side of your Desktop. That should keep them separated from incoming items. Tip #24. Synchronize If you have more than one computer, you’ll almost certainly want to share files between them. If the computers are permanently attached to the same local network, then there’s no need to store multiple copies of any one file or folder – shortcuts will suffice. However, if the computers are not always on the same network, then you will at some point need to copy files between them. For files that need to permanently live on both computers, the ideal way to do this is to synchronize the files, as opposed to simply copying them. We only have room here to write a brief summary of synchronization, not a full article. In short, there are several different types of synchronization: Where the contents of one folder are accessible anywhere, such as with Dropbox Where the contents of any number of folders are accessible anywhere, such as with Windows Live Mesh Where any files or folders from anywhere on your computer are synchronized with exactly one other computer, such as with the Windows “Briefcase”, Microsoft SyncToy, or (much more powerful, yet still free) SyncBack from 2BrightSparks. This only works when both computers are on the same local network, at least temporarily. A great advantage of synchronization solutions is that once you’ve got it configured the way you want it, then the sync process happens automatically, every time. Click a button (or schedule it to happen automatically) and all your files are automagically put where they’re supposed to be. If you maintain the same file and folder structure on both computers, then you can also sync files depend upon the correct location of other files, like shortcuts, playlists and office documents that link to other office documents, and the synchronized files still work on the other computer! Tip #25. Hide Files You Never Need to See If you have your files well organized, you will often be able to tell if a file is out of place just by glancing at the contents of a folder (for example, it should be pretty obvious if you look in a folder that contains all the MP3s from one music CD and see a Word document in there). This is a good thing – it allows you to determine if there are files out of place with a quick glance. Yet sometimes there are files in a folder that seem out of place but actually need to be there, such as the “folder art” JPEGs in music folders, and various files in the root of the C: drive. If such files never need to be opened by you, then a good idea is to simply hide them. Then, the next time you glance at the folder, you won’t have to remember whether that file was supposed to be there or not, because you won’t see it at all! To hide a file, simply right-click on it and choose Properties: Then simply tick the Hidden tick-box: Tip #26. Keep Every Setup File These days most software is downloaded from the Internet. Whenever you download a piece of software, keep it. You’ll never know when you need to reinstall the software. Further, keep with it an Internet shortcut that links back to the website where you originally downloaded it, in case you ever need to check for updates. See tip #33 below for a full description of the excellence of organizing your setup files. Tip #27. Try to Minimize the Number of Folders that Contain Both Files and Sub-folders Some of the folders in your organizational structure will contain only files. Others will contain only sub-folders. And you will also have some folders that contain both files and sub-folders. You will notice slight improvements in how long it takes you to locate a file if you try to avoid this third type of folder. It’s not always possible, of course – you’ll always have some of these folders, but see if you can avoid it. One way of doing this is to take all the leftover files that didn’t end up getting stored in a sub-folder and create a special “Miscellaneous” or “Other” folder for them. Tip #28. Starting a Filename with an Underscore Brings it to the Top of a List Further to the previous tip, if you name that “Miscellaneous” or “Other” folder in such a way that its name begins with an underscore “_”, then it will appear at the top of the list of files/folders. The screenshot below is an example of this. Each folder in the list contains a set of digital photos. The folder at the top of the list, _Misc, contains random photos that didn’t deserve their own dedicated folder: Tip #29. Clean Up those CD-ROMs and (shudder!) Floppy Disks Have you got a pile of CD-ROMs stacked on a shelf of your office? Old photos, or files you archived off onto CD-ROM (or even worse, floppy disks!) because you didn’t have enough disk space at the time? In the meantime have you upgraded your computer and now have 500 Gigabytes of space you don’t know what to do with? If so, isn’t it time you tidied up that stack of disks and filed them into your gorgeous new folder structure? So what are you waiting for? Bite the bullet, copy them all back onto your computer, file them in their appropriate folders, and then back the whole lot up onto a shiny new 1000Gig external hard drive! Useful Folders to Create This next section suggests some useful folders that you might want to create within your folder structure. I’ve personally found them to be indispensable. The first three are all about convenience – handy folders to create and then put somewhere that you can always access instantly. For each one, it’s not so important where the actual folder is located, but it’s very important where you put the shortcut(s) to the folder. You might want to locate the shortcuts: On your Desktop In your “Quick Launch” area (or pinned to your Windows 7 Superbar) In your Windows Explorer “Favorite Links” area Tip #30. Create an “Inbox” (“To-Do”) Folder This has already been mentioned in depth (see tip #13), but we wanted to reiterate its importance here. This folder contains all the recently created, received or downloaded files that you have not yet had a chance to file away properly, and it also may contain files that you have yet to process. In effect, it becomes a sort of “to-do list”. It doesn’t have to be called “Inbox” – you can call it whatever you want. Tip #31. Create a Folder where Your Current Projects are Collected Rather than going hunting for them all the time, or dumping them all on your desktop, create a special folder where you put links (or work folders) for each of the projects you’re currently working on. You can locate this folder in your “Inbox” folder, on your desktop, or anywhere at all – just so long as there’s a way of getting to it quickly, such as putting a link to it in Windows Explorer’s “Favorite Links” area: Tip #32. Create a Folder for Files and Folders that You Regularly Open You will always have a few files that you open regularly, whether it be a spreadsheet of your current accounts, or a favorite playlist. These are not necessarily “current projects”, rather they’re simply files that you always find yourself opening. Typically such files would be located on your desktop (or even better, shortcuts to those files). Why not collect all such shortcuts together and put them in their own special folder? As with the “Current Projects” folder (above), you would want to locate that folder somewhere convenient. Below is an example of a folder called “Quick links”, with about seven files (shortcuts) in it, that is accessible through the Windows Quick Launch bar: See tip #37 below for a full explanation of the power of the Quick Launch bar. Tip #33. Create a “Set-ups” Folder A typical computer has dozens of applications installed on it. For each piece of software, there are often many different pieces of information you need to keep track of, including: The original installation setup file(s). This can be anything from a simple 100Kb setup.exe file you downloaded from a website, all the way up to a 4Gig ISO file that you copied from a DVD-ROM that you purchased. The home page of the software manufacturer (in case you need to look up something on their support pages, their forum or their online help) The page containing the download link for your actual file (in case you need to re-download it, or download an upgraded version) The serial number Your proof-of-purchase documentation Any other template files, plug-ins, themes, etc that also need to get installed For each piece of software, it’s a great idea to gather all of these files together and put them in a single folder. The folder can be the name of the software (plus possibly a very brief description of what it’s for – in case you can’t remember what the software does based in its name). Then you would gather all of these folders together into one place, and call it something like “Software” or “Setups”. If you have enough of these folders (I have several hundred, being a geek, collected over 20 years), then you may want to further categorize them. My own categorization structure is based on “platform” (operating system): The last seven folders each represents one platform/operating system, while _Operating Systems contains set-up files for installing the operating systems themselves. _Hardware contains ROMs for hardware I own, such as routers. Within the Windows folder (above), you can see the beginnings of the vast library of software I’ve compiled over the years: An example of a typical application folder looks like this: Tip #34. Have a “Settings” Folder We all know that our documents are important. So are our photos and music files. We save all of these files into folders, and then locate them afterwards and double-click on them to open them. But there are many files that are important to us that can’t be saved into folders, and then searched for and double-clicked later on. These files certainly contain important information that we need, but are often created internally by an application, and saved wherever that application feels is appropriate. A good example of this is the “PST” file that Outlook creates for us and uses to store all our emails, contacts, appointments and so forth. Another example would be the collection of Bookmarks that Firefox stores on your behalf. And yet another example would be the customized settings and configuration files of our all our software. Granted, most Windows programs store their configuration in the Registry, but there are still many programs that use configuration files to store their settings. Imagine if you lost all of the above files! And yet, when people are backing up their computers, they typically only back up the files they know about – those that are stored in the “My Documents” folder, etc. If they had a hard disk failure or their computer was lost or stolen, their backup files would not include some of the most vital files they owned. Also, when migrating to a new computer, it’s vital to ensure that these files make the journey. It can be a very useful idea to create yourself a folder to store all your “settings” – files that are important to you but which you never actually search for by name and double-click on to open them. Otherwise, next time you go to set up a new computer just the way you want it, you’ll need to spend hours recreating the configuration of your previous computer! So how to we get our important files into this folder? Well, we have a few options: Some programs (such as Outlook and its PST files) allow you to place these files wherever you want. If you delve into the program’s options, you will find a setting somewhere that controls the location of the important settings files (or “personal storage” – PST – when it comes to Outlook) Some programs do not allow you to change such locations in any easy way, but if you get into the Registry, you can sometimes find a registry key that refers to the location of the file(s). Simply move the file into your Settings folder and adjust the registry key to refer to the new location. Some programs stubbornly refuse to allow their settings files to be placed anywhere other then where they stipulate. When faced with programs like these, you have three choices: (1) You can ignore those files, (2) You can copy the files into your Settings folder (let’s face it – settings don’t change very often), or (3) you can use synchronization software, such as the Windows Briefcase, to make synchronized copies of all your files in your Settings folder. All you then have to do is to remember to run your sync software periodically (perhaps just before you run your backup software!). There are some other things you may decide to locate inside this new “Settings” folder: Exports of registry keys (from the many applications that store their configurations in the Registry). This is useful for backup purposes or for migrating to a new computer Notes you’ve made about all the specific customizations you have made to a particular piece of software (so that you’ll know how to do it all again on your next computer) Shortcuts to webpages that detail how to tweak certain aspects of your operating system or applications so they are just the way you like them (such as how to remove the words “Shortcut to” from the beginning of newly created shortcuts). In other words, you’d want to create shortcuts to half the pages on the How-To Geek website! Here’s an example of a “Settings” folder: Windows Features that Help with Organization This section details some of the features of Microsoft Windows that are a boon to anyone hoping to stay optimally organized. Tip #35. Use the “Favorite Links” Area to Access Oft-Used Folders Once you’ve created your great new filing system, work out which folders you access most regularly, or which serve as great starting points for locating the rest of the files in your folder structure, and then put links to those folders in your “Favorite Links” area of the left-hand side of the Windows Explorer window (simply called “Favorites” in Windows 7): Some ideas for folders you might want to add there include: Your “Inbox” folder (or whatever you’ve called it) – most important! The base of your filing structure (e.g. C:\Files) A folder containing shortcuts to often-accessed folders on other computers around the network (shown above as Network Folders) A folder containing shortcuts to your current projects (unless that folder is in your “Inbox” folder) Getting folders into this area is very simple – just locate the folder you’re interested in and drag it there! Tip #36. Customize the Places Bar in the File/Open and File/Save Boxes Consider the screenshot below: The highlighted icons (collectively known as the “Places Bar”) can be customized to refer to any folder location you want, allowing instant access to any part of your organizational structure. Note: These File/Open and File/Save boxes have been superseded by new versions that use the Windows Vista/Windows 7 “Favorite Links”, but the older versions (shown above) are still used by a surprisingly large number of applications. The easiest way to customize these icons is to use the Group Policy Editor, but not everyone has access to this program. If you do, open it up and navigate to: User Configuration > Administrative Templates > Windows Components > Windows Explorer > Common Open File Dialog If you don’t have access to the Group Policy Editor, then you’ll need to get into the Registry. Navigate to: HKEY_CURRENT_USER \ Software \ Microsoft \ Windows \ CurrentVersion \ Policies \ comdlg32 \ Placesbar It should then be easy to make the desired changes. Log off and log on again to allow the changes to take effect. Tip #37. Use the Quick Launch Bar as a Application and File Launcher That Quick Launch bar (to the right of the Start button) is a lot more useful than people give it credit for. Most people simply have half a dozen icons in it, and use it to start just those programs. But it can actually be used to instantly access just about anything in your filing system: For complete instructions on how to set this up, visit our dedicated article on this topic. Tip #38. Put a Shortcut to Windows Explorer into Your Quick Launch Bar This is only necessary in Windows Vista and Windows XP. The Microsoft boffins finally got wise and added it to the Windows 7 Superbar by default. Windows Explorer – the program used for managing your files and folders – is one of the most useful programs in Windows. Anyone who considers themselves serious about being organized needs instant access to this program at any time. A great place to create a shortcut to this program is in the Windows XP and Windows Vista “Quick Launch” bar: To get it there, locate it in your Start Menu (usually under “Accessories”) and then right-drag it down into your Quick Launch bar (and create a copy). Tip #39. Customize the Starting Folder for Your Windows 7 Explorer Superbar Icon If you’re on Windows 7, your Superbar will include a Windows Explorer icon. Clicking on the icon will launch Windows Explorer (of course), and will start you off in your “Libraries” folder. Libraries may be fine as a starting point, but if you have created yourself an “Inbox” folder, then it would probably make more sense to start off in this folder every time you launch Windows Explorer. To change this default/starting folder location, then first right-click the Explorer icon in the Superbar, and then right-click Properties:Then, in Target field of the Windows Explorer Properties box that appears, type %windir%\explorer.exe followed by the path of the folder you wish to start in. For example: %windir%\explorer.exe C:\Files If that folder happened to be on the Desktop (and called, say, “Inbox”), then you would use the following cleverness: %windir%\explorer.exe shell:desktop\Inbox Then click OK and test it out. Tip #40. Ummmmm…. No, that’s it. I can’t think of another one. That’s all of the tips I can come up with. I only created this one because 40 is such a nice round number… Case Study – An Organized PC To finish off the article, I have included a few screenshots of my (main) computer (running Vista). The aim here is twofold: To give you a sense of what it looks like when the above, sometimes abstract, tips are applied to a real-life computer, and To offer some ideas about folders and structure that you may want to steal to use on your own PC. Let’s start with the C: drive itself. Very minimal. All my files are contained within C:\Files. I’ll confine the rest of the case study to this folder: That folder contains the following: Mark: My personal files VC: My business (Virtual Creations, Australia) Others contains files created by friends and family Data contains files from the rest of the world (can be thought of as “public” files, usually downloaded from the Net) Settings is described above in tip #34 The Data folder contains the following sub-folders: Audio: Radio plays, audio books, podcasts, etc Development: Programmer and developer resources, sample source code, etc (see below) Humour: Jokes, funnies (those emails that we all receive) Movies: Downloaded and ripped movies (all legal, of course!), their scripts, DVD covers, etc. Music: (see below) Setups: Installation files for software (explained in full in tip #33) System: (see below) TV: Downloaded TV shows Writings: Books, instruction manuals, etc (see below) The Music folder contains the following sub-folders: Album covers: JPEG scans Guitar tabs: Text files of guitar sheet music Lists: e.g. “Top 1000 songs of all time” Lyrics: Text files MIDI: Electronic music files MP3 (representing 99% of the Music folder): MP3s, either ripped from CDs or downloaded, sorted by artist/album name Music Video: Video clips Sheet Music: usually PDFs The Data\Writings folder contains the following sub-folders: (all pretty self-explanatory) The Data\Development folder contains the following sub-folders: Again, all pretty self-explanatory (if you’re a geek) The Data\System folder contains the following sub-folders: These are usually themes, plug-ins and other downloadable program-specific resources. The Mark folder contains the following sub-folders: From Others: Usually letters that other people (friends, family, etc) have written to me For Others: Letters and other things I have created for other people Green Book: None of your business Playlists: M3U files that I have compiled of my favorite songs (plus one M3U playlist file for every album I own) Writing: Fiction, philosophy and other musings of mine Mark Docs: Shortcut to C:\Users\Mark Settings: Shortcut to C:\Files\Settings\Mark The Others folder contains the following sub-folders: The VC (Virtual Creations, my business – I develop websites) folder contains the following sub-folders: And again, all of those are pretty self-explanatory. Conclusion These tips have saved my sanity and helped keep me a productive geek, but what about you? What tips and tricks do you have to keep your files organized? Please share them with us in the comments. Come on, don’t be shy… Similar Articles Productive Geek Tips Fix For When Windows Explorer in Vista Stops Showing File NamesWhy Did Windows Vista’s Music Folder Icon Turn Yellow?Print or Create a Text File List of the Contents in a Directory the Easy WayCustomize the Windows 7 or Vista Send To MenuAdd Copy To / Move To on Windows 7 or Vista Right-Click Menu TouchFreeze Alternative in AutoHotkey The Icy Undertow Desktop Windows Home Server – Backup to LAN The Clear & Clean Desktop Use This Bookmarklet to Easily Get Albums Use AutoHotkey to Assign a Hotkey to a Specific Window Latest Software Reviews Tinyhacker Random Tips Acronis Online Backup DVDFab 6 Revo Uninstaller Pro Registry Mechanic 9 for Windows Track Daily Goals With 42Goals Video Toolbox is a Superb Online Video Editor Fun with 47 charts and graphs Tomorrow is Mother’s Day Check the Average Speed of YouTube Videos You’ve Watched OutlookStatView Scans and Displays General Usage Statistics

Read the article

Search Results

Search found 395 results on 16 pages for 'truth'.

Page 16/16 | < Previous Page | 12 13 14 15 16

- by dan.mcclary

- by toinbis

- by Jeff

- by sshasan

- by Steve

- by andrewbrust

- by pinaldave

- by Bart Read

- by Geolev

- by Alix

- by Kyle Noland

- by Mecki

- by Mecki

- by Kyle Noland

- by Ben Rees

- by Tarun Arora

- by Uri

- by user1511579

- by DomainSoil

- by Mark Virtue

< Previous Page | 12 13 14 15 16